gpus | The Mud Dauber Chronicles

Anthropic's Compute Advantage: Why Silicon Strategy is Becoming an AI Moat

23 Mar 2026

datagravity.dev

Why Anthropic’s integration into hyperscaler silicon programs may give it a lasting advantage in the economics of frontier AI.

Analyzing Nvidia GB10's GPU

14 Mar 2026

chipsandcheese.com

Looking at Nvidia's latest effort to make a big iGPU

Taalas is replacing programmable GPUs with hardwired AI chips to achieve 17,000 tokens per second for ubiquitous inference - MarkTechPost

23 Feb 2026

marktechpost.com

Taalas is replacing programmable GPUs with hardwired AI chips to achieve 17,000 tokens per second for ubiquitous inference

Why GPU Useful Life Is the Most Misunderstood Variable in AI Economics

23 Feb 2026

stanleylaman.com

Amazon shortened GPU depreciation while Meta extended it—same month, same technology. The $3.6B divergence exposes the accounting discretion behind AI infrastructure economics.

nvidia's B200 does 1,760 TFLOPS on a square GEMM and 4 TFLOPS on a skinny one. the shape of your matmul is the single biggest variable in GPU performance. but why? same hardware, same operation… | Emilio Andere

23 Feb 2026

linkedin.com

nvidia's B200 does 1,760 TFLOPS on a square GEMM and 4 TFLOPS on a skinny one.

NVIDIA Rubin Is The Most Advanced AI Platform On The Planet: Up To 50 PFLOPs With HBM4, Vera CPU With 88 Olympus Cores, And Delivers 5x Uplift Vs Blackwell

5 Jan 2026

wccftech.com

NVIDIA is formally announcing its Rubin AI platform today which will be the heart of next-gen Data Centers, with a 5x upgrade over Blackwell.

Solving The Problems of HBM-on-Logic

18 Dec 2025

morethanmoore.substack.com

Future AI Accelerators Might Need To Be Slower To Be Faster

How Google’s TPUs are reshaping the economics of large-scale AI

10 Dec 2025

venturebeat.com

TPUv7 offers a viable alternative to the GPU-centric AI stack has already arrived — one with real implications for the economics and architecture of frontier-scale training.

README | GPU Glossary

11 Nov 2025

modal.com

Beyond Von Neumann: Toward a unified deterministic architecture

6 Oct 2025

venturebeat.com

Amd instinct mix has forced nvidia to make changes with the rubin ai chip

28 Sep 2025

wccftech.com

Another Giant Leap: The Rubin CPX Specialized Accelerator & Rack

15 Sep 2025

semianalysis.com

Nvidia announced the Rubin CPX, a solution that is specifically designed to be optimized for the prefill phase, with the single-die Rubin CPX heavily emphasizing compute FLOPS over memory bandwidth…

NVIDIA Unveils Its Newest ‘Rubin CPX’ AI GPUs, Featuring 128 GB GDDR7 Memory & Targeted …

10 Sep 2025

wccftech.com

NVIDIA has surprisingly unveiled a rather 'new class' of AI GPUs, featuring the Rubin CPX AI chip that offers immense inferencing power.

Tensordyne Claims 8x AI Efficiency Boost Over NVIDIA Using Logarithmic Math

8 Sep 2025

hothardware.com

The idea isn't novel, but presents major challenges. Tensordyne thinks it has solved them, and promises massive speed and efficiency gains as a result.

NVIDIA Blackwell Ultra “GB300” GPU, The Fastest AI Chip, Detailed: Dual Reticle GPU With Ove…

25 Aug 2025

wccftech.com

NVIDIA has provided an in-depth look at its fastest chip for AI, the Blackwell GB300, which is 50% faster than GB200 & packs 288 GB memory.

Writing Your First GPU Kernel in Python with Numba and CUDA - KDnuggets

18 Aug 2025

kdnuggets.com

80x Faster Python? Discover How One Line Turns Your Code Into a GPU Beast!

RDNA 4's "Out-of-Order" Memory Accesses

11 Aug 2025

chipsandcheese.com

Examining RDNA 4's out-of-order memory accesses in detail, and investigating with testing

Understanding the Landscape of Ampere GPU Memory Errors

11 Aug 2025

arxiv.org

Understanding the Landscape of Ampere GPU Memory Errors

11 Aug 2025

arxiv.org

Graphics Processing Units (GPUs) have become a de facto solution for accelerating high-performance computing (HPC) applications. Understanding their memory error behavior is an essential step toward achieving efficient and reliable HPC systems. In this work, we present a large-scale cross-supercomputer study to characterize GPU memory reliability, covering three supercomputers - Delta, Polaris, and Perlmutter - all equipped with NVIDIA A100 GPUs. We examine error logs spanning 67.77 million GPU device-hours across 10,693 GPUs. We compare error rates and mean-time-between-errors (MTBE) and highlight both shared and distinct error characteristics among these three systems. Based on these observations and analyses, we discuss the implications and lessons learned, focusing on the reliable operation of supercomputers, the choice of checkpointing interval, and the comparison of reliability characteristics with those of previous-generation GPUs. Our characterization study provides valuable insights into fault-tolerant HPC system design and operation, enabling more efficient execution of HPC applications.

Demystifying GPUs: From Core Architecture to Scalable Systems

20 Jul 2025

dev.to

Table of Contents Motivation Optimization goal of GPUs Key concepts of GPUs - software and...

Demystifying NCCL: An In-depth Analysis of GPU Communication Protocols and Algorithms

13 Jul 2025

hgpu.org

The NVIDIA Collective Communication Library (NCCL) is a critical software layer enabling high-performance collectives on large-scale GPU clusters. Despite being open source with a documented API, i…

Performance, efficiency, and cost analysis of wafer-scale AI accelerators vs. single-chip GPUs

24 Jun 2025

cell.com

This review compares wafer-scale AI accelerators and single-chip GPUs in terms of performance, energy efficiency, and cost for high-performance AI applications. It highlights enabling technologies, such as CoWoS, and explores future directions including 3D integration, photonic chips, and emerging semiconductor materials.

Got juice? Future AI processors said to drink up to 15,360 watts of power — titanic requirements demand exotic immersion and embedded cooling

18 Jun 2025

tomshardware.com

It is going to get hot.

The new AI calculus: Google’s 80% cost edge vs. OpenAI’s ecosystem

25 Apr 2025

venturebeat.com

Explore the Google vs OpenAI AI ecosystem battle post-o3. Deep dive into Google's huge cost advantage (TPU vs GPU), agent strategies & model risks for enterprise

The Future of AI Accelerators: A Roadmap of Industry Leaders The AI… | Nader EL-Masri

26 Mar 2025

linkedin.com

The Future of AI Accelerators: A Roadmap of Industry Leaders The AI hardware race is heating up, with major players like NVIDIA, AMD, Intel, Google, Amazon, and more unveiling their upcoming AI accelerators. Here’s a quick breakdown of the latest trends: Key Takeaways: NVIDIA Dominance: NVIDIA continues to lead with a robust roadmap, extending from H100 to future Rubin and Rubin Ultra chips with HBM4 memory by 2026-2027. AMD’s Competitive Push: AMD’s MI300 series is already competing, with MI350 and future MI400 models on the horizon. Intel’s AI Ambitions: Gaudi accelerators are growing, with Falcon Shores on track for a major memory upgrade. Google & Amazon’s Custom Chips: Google’s TPU lineup expands rapidly, while Amazon’s Trainium & Inferentia gain traction. Microsoft & Meta’s AI Expansion: Both companies are pushing their AI chip strategies with Maia and MTIA projects, respectively. Broadcom & ByteDance Join the Race: New challengers are emerging, signaling increased competition in AI hardware. What This Means: With the growing demand for AI and LLMs, companies are racing to deliver high-performance AI accelerators with advanced HBM (High Bandwidth Memory) configurations. The next few years will be crucial in shaping the AI infrastructure landscape. $NVDA $AMD $INTC $GOOGL $AMZN $META $AVGO $ASML $BESI

AMD's Strix Halo - Under the Hood

15 Mar 2025

chipsandcheese.com

Hello you fine Internet folks,

Understanding PTX, the Assembly Language of CUDA GPU Computing | NVIDIA Technical Blog

12 Mar 2025

developer.nvidia.com

Parallel thread execution (PTX) is a virtual machine instruction set architecture that has been part of CUDA from its beginning. You can think of PTX as the…

We Were Wrong About GPUs

15 Feb 2025

fly.io

Do my tears surprise you? Strong CEOs also cry.

Demystifying GPU Compute Architectures

28 Jan 2025

open.substack.com

Getting 'low level' with Nvidia and AMD GPUs

Inside the AMD Radeon Instinct MI300A's Giant Memory Subsystem

18 Jan 2025

open.substack.com

AMD acquired ATI in 2006, hoping ATI's GPU expertise would combine with AMD's CPU know-how to create integrated solutions worth more than the sum of their parts.

Apple-Nvidia collaboration speeds up AI model production

20 Dec 2024

appleinsider.com

Apple's latest machine learning research could make creating models for Apple Intelligence faster, by coming up with a technique to almost triple the rate of generating tokens when using Nvidia GPUs.

Intel Arc B580 "Battlemage" GPU Leak Confirms 12 GB Memory, Custom Models With Standard Power Connectors, Up To 2.8 GHz Clocks

23 Nov 2024

wccftech.com

Intel's first Arc B580 GPUs based on the Xe2 "Battlemage" architecture have been leaked & they look quite compelling.

HPC Gets A Reconfigurable Dataflow Engine To Take On CPUs And GPUs

29 Oct 2024

nextplatform.com

No matter how elegant and clever the design is for a compute engine, the difficulty and cost of moving existing – and sometimes very old – code from the

NVIDIA To Ship 150K-200K Blackwell GB200 AI Servers In Q4 2024 Alone & 500-550K Units In Q1 2025, Microsoft Being The Leading Buyer

18 Oct 2024

wccftech.com

NVIDIA's Blackwell AI servers to witness a massive shipment volume in Q4 2024, with Microsoft being the most "aggressive" acquirer.

Tenstorrent Launches Wormhole AI Processors: 466 FP8 TFLOPS at 300W

20 Jul 2024

anandtech.com

Meet Warp: A Python Framework for Writing High-Performance Simulation and G

17 Jul 2024

marktechpost.com

Speed and efficiency are crucial in computer graphics and simulation. It can be challenging to create high-performance simulations that can run smoothly on various hardware setups. Traditional methods can be slow and may not fully utilize the power of modern graphics processing units (GPUs). This creates a bottleneck for real-time or near-real-time feedback applications, such as video games, virtual reality environments, and scientific simulations. Existing solutions for this problem include using general-purpose computing on graphics processing units (GPGPU) frameworks like CUDA and OpenCL. These frameworks allow developers to write programs that can run on GPUs, but they often require a

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-preci

14 Jul 2024

pytorch.org

Attention, as a core layer of the ubiquitous Transformer architecture, is a bottleneck for large language models and long-context applications. FlashAttention (and FlashAttention-2) pioneered an approach to speed up attention on GPUs by minimizing memory reads/writes, and is now used by most libraries to accelerate Transformer training and inference. This has contributed to a massive increase in LLM context length in the last two years, from 2-4K (GPT-3, OPT) to 128K (GPT-4), or even 1M (Llama 3). However, despite its success, FlashAttention has yet to take advantage of new capabilities in modern hardware, with FlashAttention-2 achieving only 35% utilization of theoretical max FLOPs on the H100 GPU. In this blogpost, we describe three main techniques to speed up attention on Hopper GPUs: exploiting asynchrony of the Tensor Cores and TMA to (1) overlap overall computation and data movement via warp-specialization and (2) interleave block-wise matmul and softmax operations, and (3) incoherent processing that leverages hardware support for FP8 low-precision.

AMD’s Instinct MI300X AI Throughput Performance & Latency Improved By 7x Wi

3 Jul 2024

wccftech.com

Nscale has tested AMD's flagship Instinct MI300X AI accelerator utilizing the GEMM tuning framework, achieving 7x faster performance.

Nvidia Conquers Latest AI Tests

13 Jun 2024

spectrum.ieee.org

GPU maker tops new MLPerf benchmarks on graph neural nets and LLM fine-tuning

AMD Announces Instinct MI325X Today, CDNA4 To Come

3 Jun 2024

open.substack.com

A New Annual Cadence for ML

How To Make More Money Renting A GPU Than Nvidia Makes Selling It

4 May 2024

nextplatform.com

It is not a coincidence that the companies that got the most “Hopper” H100 allocations from Nvidia in 2023 were also the hyperscalers and cloud builders,

Biden has brought the ban hammer down on US export of AI chips to China

17 Apr 2024

theregister.com

Datacenter GPUs and some consumer cards now exceed performance limits

Intel preps export-friendly lower-power Gaudi 3 AI chips made for China

17 Apr 2024

theregister.com

Beijing will be thrilled by this nerfed silicon

Los Alamos Pushes The Memory Wall With “Venado” Supercomputer

15 Apr 2024

nextplatform.com

Today is the ribbon-cutting ceremony for the “Venado” supercomputer, which was hinted at back in April 2021 when Nvidia announced its plans for its first

Intel’s “Gaudi 3” AI accelerator chip may give Nvidia’s H100 a run for its

14 Apr 2024

arstechnica.com

Intel claims 50% more speed when running AI language models vs. the market leader.

Nvidia Blackwell Perf TCO Analysis - B100 vs B200 vs GB200NVL72

12 Apr 2024

open.substack.com

GPT-4 Profitability, Cost, Inference Simulator, Parallelism Explained, Performance TCO Modeling In Large & Small Model Inference and Training

How To Build A Better “Blackwell” GPU Than Nvidia Did

5 Apr 2024

nextplatform.com

While a lot of people focus on the floating point and integer processing architectures of various kinds of compute engines, we are spending more and more

AMD ROCm Going Open-Source: Will Include Software Stack & Hardware Document

4 Apr 2024

wccftech.com

AMD plans to open-source portions of its ROCm software stack and hardware documentation in a future update to refine its ecosystem.

Lenovo Shows Huge Optimism Towards AMD’s Instinct MI300X AI Accelerators

29 Mar 2024

wccftech.com

Lenovo, the firm emerging as a driving force behind AI computing, has expressed tremendous optimism about AMD's Instinct MI300X accelerator.

How Nvidia Blackwell Systems Attack 1 Trillion Parameter AI Models

29 Mar 2024

nextplatform.com

We like datacenter compute engines here at The Next Platform, but as the name implies, what we really like are platforms – how compute, storage,

AMD Quietly Funded A Drop-In CUDA Implementation Built On ROCm: It's Now Op

13 Feb 2024

phoronix.com

While there have been efforts by AMD over the years to make it easier to port codebases targeting NVIDIA's CUDA API to run atop HIP/ROCm, it still requires work on the part of developers.

Nvidia’s Big Tech Rivals Put Their Own A.I. Chips on the Table - The New Yo

7 Feb 2024

nytimes.com

Chafing at their dependence, Amazon, Google, Meta and Microsoft are racing to cut into Nvidia’s dominant share of the market.

How AMD May Get Across the CUDA Moat

7 Oct 2023

hpcwire.com

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically...

AMD’s Radeon Instinct MI210: GCN Lives On

28 Jul 2023

chipsandcheese.com

AMD, Nvidia, and Intel have all diverged their GPU architectures to separately optimize for compute and graphics.

Installation — Triton documentation

27 Jul 2023

triton-lang.org

Introducing Triton: Open-source GPU programming for neural networks

27 Jul 2023

openai.com

We’re releasing Triton 1.0, an open-source Python-like programming language which enables researchers with no CUDA experience to write highly efficient GPU code—most of the time on par with what an expert would be able to produce.

Calculate Computational Efficiency of Deep Learning Models with FLOPs and M

24 Jul 2023

kdnuggets.com

In this article we will learn about its definition, differences and how to calculate FLOPs and MACs using Python packages.

AI Capacity Constraints - CoWoS and HBM Supply Chain

9 Jul 2023

semianalysis.com

Quarterly Ramp for Nvidia, Broadcom, Google, AMD, AMD Embedded (Xilinx), Amazon, Marvell, Microsoft, Alchip, Alibaba T-Head, ZTE Sanechips, Samsung, Micron, and SK Hynix

Micron to Introduce GDDR7 Memory in 1H 2024

30 Jun 2023

tomshardware.com

GDDR7 is getting closer, says Micron.

Micron Announces GDDR7 for GPUs Coming in First Half of 2024

30 Jun 2023

extremetech.com

Though it'll arrive just in time for mid-cycle refresh from AMD, Nvidia, and Intel, it's unclear if there will be any takers just yet.

AI Server Cost Analysis – Memory Is The Biggest Loser

22 Jun 2023

semianalysis.com

Micron $MU looks very weak in AI

AMD Expands AI/HPC Product Lineup With Flagship GPU-only Instinct Mi300X wi

19 Jun 2023

anandtech.com

The Third Time Charm Of AMD’s Instinct GPU

14 Jun 2023

nextplatform.com

The great thing about the Cambrian explosion in compute that has been forced by the end of Dennard scaling of clock frequencies and Moore’s Law lowering

AMD’s RX 7600: Small RDNA 3 Appears

5 Jun 2023

chipsandcheese.com

Editor’s Note (6/14/2023): We have a new article that reevaluates the cache latency of Navi 31, so please refer to that article for some new latency data.

The Case for Running AI on CPUs Isn’t Dead Yet

2 Jun 2023

spectrum.ieee.org

GPUs may dominate, but CPUs could be perfect for smaller AI models

Google dives into the ‘supercomputer’ game by knitting together purpose-bui

12 May 2023

venturebeat.com

Google's new machines combine Nvidia H100 GPUs with Google’s high-speed interconnections for AI tasks like training very large language models.

Wtf is a kdf? | blog.dataparty

26 Apr 2023

blog.dataparty.xyz

Earlier this week a letter from an activist imprisoned in France was posted to the internet. Contained within Ivan Alococo’s dispatch from the Villepinte prison

Nvidia Tackles Chipmaking Process, Claims 40X Speed Up with cuLitho

21 Mar 2023

tomshardware.com

Faster masks, less power.

Towards a Benchmarking Suite for Kernel Tuners

19 Mar 2023

hgpu.org

As computing system become more complex, it is becoming harder for programmers to keep their codes optimized as the hardware gets updated. Autotuners try to alleviate this by hiding as many archite…

Meet the $10,000 Nvidia chip powering the race for A.I.

25 Feb 2023

cnbc.com

The $10,000 Nvidia A100has become one of the most critical tools in the artificial intelligence industry,

Hacker News

20 Jan 2023

timdettmers.com

Here, I provide an in-depth analysis of GPUs for deep learning/machine learning and explain what is the best GPU for your use-case and budget.

CUDA Toolkit 12.0 Released for General Availability

13 Dec 2022

developer.nvidia.com

NVIDIA announces the newest CUDA Toolkit software release, 12.0. This release is the first major release in many years and it focuses on new programming models and CUDA application acceleration…

How to Accelerate your PyTorch GPU Training with XLA

20 Oct 2022

towardsdatascience.com

The Power of PyTorch/XLA and how Amazon SageMaker Training Compiler Simplifies its use

New Video Gives Us One More Reason to Never Buy a Used Mining GPU

27 Sep 2022

extremetech.com

A new video making the rounds purports to show Vietnamese crypto miners preparing used GPUs for resale by blasting them with a pressure washer.

Nvidia Research Plots A Course To Multiple Multichip GPU Engines

6 Jan 2022

nextplatform.com

There are two types of packaging that represent the future of computing, and both will have validity in certain domains: Wafer scale integration and

GPUCC - An Open-Source GPGPU Compiler

11 Dec 2021

research.google

3D Stacking Could Boost GPU Machine Learning

8 Dec 2021

nextplatform.com

Nvidia has staked its growth in the datacenter on machine learning. Over the past few years, the company has rolled out features in its GPUs aimed neural

How FPGAs Can Take On GPUs And Knights Landing

8 Dec 2021

nextplatform.com

Nallatech doesn't make FPGAs, but it does have several decades of experience turning FPGAs into devices and systems that companies can deploy to solve

Analysis and Comparison of Performance and Power Consumption of Neural Netw

8 Dec 2021

hgpu.org

In this work, we analyze the performance of neural networks on a variety of heterogenous platforms. We strive to find the best platform in terms of raw benchmark performance, performance per watt a…

baidu-research/warp-ctc

7 Dec 2021

github.com

Fast parallel CTC.

NVIDIA Develops NVLink Switch: NVSwitch, 18 Ports For DGX-2 & More

7 Dec 2021

anandtech.com

NVLink Takes GPU Acceleration To The Next Level

7 Dec 2021

nextplatform.com

One of the breakthrough moments in computing, which was compelled by necessity, was the advent of symmetric multiprocessor, or SMP, clustering to make two

Stacking Up AMD MI200 Versus Nvidia A100 Compute Engines

7 Dec 2021

nextplatform.com

The modern GPU compute engine is a microcosm of the high performance computing datacenter at large. At every level of HPC – across systems in the

Survey paper on Deep Learning on GPUs

4 Dec 2021

hgpu.org

The rise of deep-learning (DL) has been fuelled by the improvements in accelerators. GPU continues to remain the most widely used accelerator for DL applications. We present a survey of architectur…

Fast Multi-GPU collectives with NCCL | NVIDIA Technical Blog

4 Dec 2021

devblogs.nvidia.com

Today many servers contain 8 or more GPUs. In principle then, scaling an application from one to many GPUs should provide a tremendous performance boost. But in practice, this benefit can be difficult…

A Graph-based Model for GPU Caching Problems

3 Dec 2021

arxiv.org

Modeling data sharing in GPU programs is a challenging task because of the massive parallelism and complex data sharing patterns provided by GPU architectures. Better GPU caching efficiency can be...

https://blog.riseml.com/comparing-google-tpuv2-against-nvidia-v100-on-resnet-50-c2bbb6a51e5e

2 Dec 2021

blog.riseml.com

1804

2 Dec 2021

arxiv.org

http://research.baidu.com/bringing-hpc-techniques-deep-learning/

1 Dec 2021

research.baidu.com

Beyond GPU Memory Limits with Unified Memory on Pascal | NVIDIA Technical Blog

1 Dec 2021

devblogs.nvidia.com

Unified Memory on NVIDIA Pascal GPUs enables applications to run out-of-the-box with larger memory footprints and achieve great baseline performance.

A Look at Baidu’s Industrial-Scale GPU Training Architecture

26 Jun 2021

nextplatform.com

Like its U.S. counterpart, Google, Baidu has made significant investments to build robust, large-scale systems to support global advertising programs. As

Mythic Resizes its AI Chip

26 Jun 2021

eetimes.com

Its second analog AI chip is optimized for different card sizes, but still aimed at computer vision workloads at the edge.

What Happens When Multipliers No Longer Define AI Accelerators?

24 Jun 2021

nextplatform.com

Current custom AI hardware devices are built around super-efficient, high performance matrix multiplication. This category of accelerators includes the

How to Accelerate Signal Processing in Python

9 Apr 2021

developer.nvidia.com

This post is the seventh installment of the series of articles on the RAPIDS ecosystem. The series explores and discusses various aspects of RAPIDS that allow its users solve ETL (Extract, Transform…

CPU-based algorithm trains deep neural nets up to 15 times faster than top

9 Apr 2021

techxplore.com

Rice University computer scientists have demonstrated artificial intelligence (AI) software that runs on commodity processors and trains deep neural networks 15 times faster than platforms based on graphics ...

State of the art NLP at scale with RAPIDS, HuggingFace and Dask

4 Apr 2021

medium.com

See how to build end-to-end NLP pipelines in a fast and scalable way on GPUs — from feature engineering to inference.

GPU Nomenclature History: No Shortage of GPUs Here

30 Mar 2021

tedium.co

What makes a GPU a GPU, and when did we start calling it that? Turns out that’s a more complicated question than it sounds.

The Rise, Fall and Revival of AMD (2020)

19 Mar 2021

techspot.com

AMD is one of the oldest designers of large scale microprocessors and has been the subject of polarizing debate among technology enthusiasts for nearly 50 years. Its...

Can Graviton Win A Three-Way Compute Race At AWS?

18 Mar 2021

nextplatform.com

One of the main tenets of the hyperscalers and cloud builders is that they buy what they can and they only build what they must. And if they are building

Welcome to AMD ROCm Platform — ROCm Documentation 1.0.0 documentation

15 Mar 2021

rocmdocs.amd.com

AMD ROCm documentation

Using RAPIDS with PyTorch

15 Mar 2021

developer.nvidia.com

In this post we take a look at how to use cuDF, the RAPIDS dataframe library, to do some of the preprocessing steps required to get the mortgage data in a format that PyTorch can process so that we…

Beginner’s Guide to Querying Data Using SQL on GPUs in Python

15 Mar 2021

developer.nvidia.com

Historically speaking, processing large amounts of structured data has been the domain of relational databases. Databases, consisting of tables that can be joined together or aggregated…

Python Pandas Tutorial – Beginner’s Guide to GPU Accelerated DataFrames for

12 Mar 2021

developer.nvidia.com

This series on the RAPIDS ecosystem explores the various aspects that enable you to solve extract, transform, load (ETL) problems, build machine learning (ML) and deep learning (DL) models…

Speculation Grows As AMD Files Patent for GPU Design

4 Jan 2021

hardware.slashdot.org

Long-time Slashdot reader UnknowingFool writes: AMD filed a patent on using chiplets for a GPU with hints on why it has waited this long to extend their CPU strategy to GPUs. The latency between chiplets poses more of a performance problem for GPUs, and AMD is attempting to solve the problem with a ...

Install the Latest Nvidia Linux Driver - LinuxConfig.org

12 Dec 2020

linuxconfig.org

Most of the modern Linux Desktop systems come with Nvidia driver pre-installed in a form of the Nouveau open-source graphics device driver for Nvidia video cards. Hence depending on your needs and in…

How to Install Nvidia Driver on Ubuntu 20.04

11 Dec 2020

linoxide.com

Which GPUs to get for deep learning

3 Nov 2020

timdettmers.com

Here, I provide an in-depth analysis of GPUs for deep learning/machine learning and explain what is the best GPU for your use-case and budget.

Nvidia Ampere GA102 GPU Architecture [pdf]

20 Oct 2020

nvidia.com

How Micron’s GDDR6X memory is the secret to unlocking 4K on Nvidia’s RTX 30

16 Sep 2020

venturebeat.com

Micron's GDDR6X is one of the star components in Nvidia's RTX 3070, 3080, and 3080 video cards. It's so fast it should boost gaming past the 4K barrier.

Diving Deep Into The Nvidia Ampere GPU Architecture

1 Jun 2020

nextplatform.com

When you have 54.2 billion transistors to play with, you can pack a lot of different functionality into a computing device, and this is precisely what

NVIDIA Ampere Unleashed: NVIDIA Announces New GPU Architecture, A100 GPU, a

14 May 2020

anandtech.com

CUDA 11 Features Revealed | NVIDIA Developer Blog

14 May 2020

devblogs.nvidia.com

The new NVIDIA A100 GPU based on the NVIDIA Ampere GPU architecture delivers the greatest generational leap in accelerated computing. The A100 GPU has revolutionary hardware capabilities and we’re…

Getting started with the NVIDIA Jetson Nano - PyImageSearch

11 Mar 2020

pyimagesearch.com

In this tutorial, you will learn how to get started with your NVIDIA Jetson Nano, including installing Keras + TensorFlow, accessing the camera, and performing image classification and object detection.

Part 1 - An Overview of AMD's GPU Architectures

23 Dec 2019

reddit.com

363 votes, 25 comments. This post has been split into a two-part series to work around Reddit’s per-post character limit. Please find Part 2 in the…

H2O.ai Releases H2O4GPU, the Fastest Collection of GPU Algorithms on the Ma

28 Sep 2017

blog.h2o.ai

The economics of GPUs: How to train your AI model without going broke

24 Aug 2017

venturebeat.com

AI won't replace you, but someone using AI will — so it’s time to embrace AI, and it’s possible to do so even on a low budget.

(9) How many servers does a typical data center house? - Quora

10 Nov 2016

quora.com

Answer (1 of 7): Lots. Definitions of the term "data centre" tend to vary. Some would label a small machine room with 2 or 3 racks a data centre, but that is not really a large facility by any stretch of the imagination. Most such installations are never going to hit the usual problems which dat...

Semiconductor Engineering .:. Making Waves In Deep Learning

12 Oct 2016

semiengineering.com

Making Waves in Deep Learning How deep learning applications will map onto a chip.

Memory is the Next Platform

10 Oct 2016

nextplatform.com

A new crop of applications is driving the market along some unexpected routes, in some cases bypassing the processor as the landmark for performance and

$2 H100s: How the GPU Bubble Burst - by Eugene Cheah

24 Oct 2012

latent.space

H100s used to be $8/hr if you could get them. Now there's 7 different places sometimes selling them under $2. What happened?

NVIDIA "Blackwell" GPUs are Sold Out for 12 Months Customers Ordering in 100K GPU Quantities

24 Oct 2011

techpowerup.com

NVIDIA's "Blackwell" series of GPUs, including B100, B200, and GB200, are reportedly sold out for 12 months or an entire year. This directly means that if a new customer is willing to order a new Blackwell GPU now, there is a 12-month waitlist to get that GPU. Analyst from Morgan Stanley Joe Moore c...

Why GPU Utilization Falls Short: Understanding Streaming Multiprocessor (SM) Efficiency for Better L

24 Sep 2003

marktechpost.com

Large Language Models (LLMs) have gained significant prominence in recent years, driving the need for efficient GPU utilization in machine learning tasks. However, researchers face a critical challenge in accurately assessing GPU performance. The commonly used metric, GPU Utilization, accessed through nvidia-smi or integrated observability tools, has proven to be an unreliable indicator of actual computational efficiency. Surprisingly, 100% GPU utilization can be achieved merely by reading and writing to memory without performing any computations. This revelation has sparked a reevaluation of performance metrics and methodologies in the field of machine learning, prompting researchers to seek more accurate ways to

gpus — my Raindrop.io articles