sram | The Mud Dauber Chronicles

Impact Of On-Chip SRAM Size And Frequency On Energy Efficiency And Performance of LLM Inference (Uppsala Univ.)

4 Jan 2026

semiengineering.com

A new technical paper titled “Prefill vs. Decode Bottlenecks: SRAM-Frequency Tradeoffs and the Memory-Bandwidth Ceiling” was published by researchers at Uppsala University. Abstract “Energy consumption dictates the cost and environmental impact of deploying Large Language Models. This paper investigates the impact of on-chip SRAM size and operating frequency on the energy efficiency and performance of... » read more

The Competitive Advantage of SRAM PUF Technology

9 Oct 2025

synopsys.com

Discover SRAM PUF’s security benefits and how Synopsys combines it with OTP memory for advanced, secure key storage in embedded systems.

Beyond Von Neumann: Toward a unified deterministic architecture

6 Oct 2025

venturebeat.com

Characterizing and Optimizing Realistic Workloads on a Commercial...

4 Oct 2025

arxiv.org

Compute-in-SRAM architectures offer a promising approach to achieving higher performance and energy efficiency across a range of data-intensive applications. However, prior evaluations have largely relied on simulators or small prototypes, limiting the understanding of their real-world potential. In this work, we present a comprehensive performance and energy characterization of a commercial compute-in-SRAM device, the GSI APU, under realistic workloads. We compare the GSI APU against established architectures, including CPUs and GPUs, to quantify its energy efficiency and performance potential. We introduce an analytical framework for general-purpose compute-in-SRAM devices that reveals fundamental optimization principles by modeling performance trade-offs, thereby guiding program optimizations. Exploiting the fine-grained parallelism of tightly integrated memory-compute architectures requires careful data management. We address this by proposing three optimizations: communication-aware reduction mapping, coalesced DMA, and broadcast-friendly data layouts. When applied to retrieval-augmented generation (RAG) over large corpora (10GB--200GB), these optimizations enable our compute-in-SRAM system to accelerate retrieval by 4.8$\times$--6.6$\times$ over an optimized CPU baseline, improving end-to-end RAG latency by 1.1$\times$--1.8$\times$. The shared off-chip memory bandwidth is modeled using a simulated HBM, while all other components are measured on the real compute-in-SRAM device. Critically, this system matches the performance of an NVIDIA A6000 GPU for RAG while being significantly more energy-efficient (54.4$\times$-117.9$\times$ reduction). These findings validate the viability of compute-in-SRAM for complex, real-world applications and provide guidance for advancing the technology.

LIDAR, optical distance & time of flight sensors | ams OSRAM

22 Sep 2025

ams-osram.com

Fully integrated dToF modules and iToF VCSEL illuminators for short range applications. Laser sources for long range LIDAR systems.

Tensordyne Claims 8x AI Efficiency Boost Over NVIDIA Using Logarithmic Math

8 Sep 2025

hothardware.com

The idea isn't novel, but presents major challenges. Tensordyne thinks it has solved them, and promises massive speed and efficiency gains as a result.

Energy-Efficient Signal Detectors For Massive MIMO Using SRAM-Based IMCs

15 Aug 2025

semiengineering.com

A new technical paper titled “Energy-Accuracy Trade-Offs in Massive MIMO Signal Detection Using SRAM-Based In-Memory Computing” was published by researchers at the University of Illinois at Urbana–Champaign. Abstract “This paper investigates the use of SRAM-based in-memory computing (IMC) architectures for designing energy efficient and accurate signal detectors for massive multi-input multi-output (MIMO) systems. SRAM-based IMCs... » read more

Understanding the Landscape of Ampere GPU Memory Errors

11 Aug 2025

arxiv.org

X-pSRAM: A Photonic SRAM with Embedded XOR Logic for Ultra-Fast...

28 Jul 2025

arxiv.org

Traditional von Neumann architectures suffer from fundamental bottlenecks due to continuous data movement between memory and processing units, a challenge that worsens with technology scaling as electrical interconnect delays become more significant. These limitations impede the performance and energy efficiency required for modern data-intensive applications. In contrast, photonic in-memory computing presents a promising alternative by harnessing the advantages of light, enabling ultra-fast data propagation without length-dependent impedance, thereby significantly reducing computational latency and energy consumption. This work proposes a novel differential photonic static random access memory (pSRAM) bitcell that facilitates electro-optic data storage while enabling ultra-fast in-memory Boolean XOR computation. By employing cross-coupled microring resonators and differential photodiodes, the XOR-augmented pSRAM (X-pSRAM) bitcell achieves at least 10 GHz read, write, and compute operations entirely in the optical domain. Additionally, wavelength-division multiplexing (WDM) enables n-bit XOR computation in a single-shot operation, supporting massively parallel processing and enhanced computational efficiency. Validated on GlobalFoundries' 45SPCLO node, the X-pSRAM consumed 13.2 fJ energy per bit for XOR computation, representing a significant advancement toward next-generation optical computing with applications in cryptography, hyperdimensional computing, and neural networks.

SRAM Has No Chill: Exploiting Power Domain Separation to Steal On-Chip Secrets – Communications of the ACM

26 Jul 2025

cacm.acm.org

Demystifying GPUs: From Core Architecture to Scalable Systems

20 Jul 2025

dev.to

Table of Contents Motivation Optimization goal of GPUs Key concepts of GPUs - software and...

First-Time Silicon Success Plummets

27 Mar 2025

semiengineering.com

Number of designs that are late increases. Rapidly rising complexity is the leading cause, but tools, training, and workflows need to improve.

Intel, Synopsys, TSMC All Unveil Record Memory Densities

3 Mar 2025

spectrum.ieee.org

The move to nanosheet transistors is a boon for SRAM

AMD Reveals Real Reason It Won't Put 3D V-Cache On Multiple CCDs

8 Jan 2025

hothardware.com

After persistent rumors refused to recede, AMD steps in with a clear explanation why dual-CCD V-Cache doesn't exist.

Why AI language models choke on too much text

22 Dec 2024

arstechnica.com

Compute costs scale with the square of the input size. That’s not great.

AMD Ryzen 7 9800X3D Uses A Thick Dummy Silicon That Comprises 93% Of The CCD Stack And Has No Performance Purpose

21 Dec 2024

wccftech.com

The CCD stack with 3D V-Cache on the AMD Ryzen 7 9800X3D is only 40-45µm in total, but the rest of the layers add up to a whopping 750µm.

Slim-Llama: An Energy-Efficient LLM ASIC Processor Supporting 3-Billion Parameters at Just 4.69mW

21 Dec 2024

marktechpost.com

Large Language Models (LLMs) have become a cornerstone of artificial intelligence, driving advancements in natural language processing and decision-making tasks. However, their extensive power demands, resulting from high computational overhead and frequent external memory access, significantly hinder their scalability and deployment, especially in energy-constrained environments such as edge devices. This escalates the cost of operation while also limiting accessibility to these LLMs, which therefore calls for energy-efficient approaches designed to handle billion-parameter models. Current approaches to reduce the computational and memory needs of LLMs are based either on general-purpose processors or on GPUs, with a combination of weight quantization and

TSMC Lifts the Curtain on Nanosheet Transistors

15 Dec 2024

spectrum.ieee.org

And Intel shows how far these devices could go

Is In-Memory Compute Still Alive?

12 Dec 2024

semiengineering.com

It hasn’t achieved commercial success, but there is still plenty of development happening; analog IMC is getting a second chance.

eugeneyan/llm-paper-notes: Notes from the Latent Space paper club. Follow along or start your own!

26 Nov 2024

github.com

Notes from the Latent Space paper club. Follow along or start your own! - eugeneyan/llm-paper-notes

Predictive PDK (ASAP) – ASU Engineering

25 Nov 2024

asap.asu.edu

Gate-All-Around (GAA): The Ultimate Solution to Reduce Leakage - EE Times

25 Oct 2024

eetimes.com

As awareness of environmental, social, and governance (ESG) issues grows, companies are adopting strategies for sustainable operations.

Surveying the Landscape of Smartphone Processors

2 Aug 2024

eetimes.com

There are many chip partitioning and placement tradeoffs when comparing top-tier smartphone processor designs.

Tenstorrent Launches Wormhole AI Processors: 466 FP8 TFLOPS at 300W

20 Jul 2024

anandtech.com

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-preci

14 Jul 2024

pytorch.org

Attention, as a core layer of the ubiquitous Transformer architecture, is a bottleneck for large language models and long-context applications. FlashAttention (and FlashAttention-2) pioneered an approach to speed up attention on GPUs by minimizing memory reads/writes, and is now used by most libraries to accelerate Transformer training and inference. This has contributed to a massive increase in LLM context length in the last two years, from 2-4K (GPT-3, OPT) to 128K (GPT-4), or even 1M (Llama 3). However, despite its success, FlashAttention has yet to take advantage of new capabilities in modern hardware, with FlashAttention-2 achieving only 35% utilization of theoretical max FLOPs on the H100 GPU. In this blogpost, we describe three main techniques to speed up attention on Hopper GPUs: exploiting asynchrony of the Tensor Cores and TMA to (1) overlap overall computation and data movement via warp-specialization and (2) interleave block-wise matmul and softmax operations, and (3) incoherent processing that leverages hardware support for FP8 low-precision.

Addressing Quantum Computing Threats With SRAM PUFs

8 Jun 2024

semiengineering.com

The impact of quantum algorithms on different cryptographic techniques and what can be done about it.

TSMC's Roadmap at a Glance: N3X, N2P, A16 Coming in 2025/2026

23 May 2024

anandtech.com

How to Put a Data Center in a Shoebox

16 May 2024

spectrum.ieee.org

Imec’s plan to use superconductors to shrink computers

SRAM Security Concerns Grow

10 May 2024

semiengineering.com

Volatile memory threat increases as chips are disaggregated into chiplets, making it easier to isolate memory and slow data degradation.

ASAP5: A predictive PDK for the 5 nm node

24 Feb 2024

sciencedirect.com

We present a predictive process design kit (PDK) for the 5 nm technology node, the ASAP5 PDK. ASAP5 is not related to a particular foundry and the ass…

Grokking Groq’s Groqness

22 Feb 2024

blocksandfiles.com

Startup Groq has developed an machine learning processor that it claims blows GPUs away in large language model workloads – 10x faster than an Nvidia GPU at 10 percent of the cost, and needing a tenth of the electricity. Update: Groq model compilation time and time from access to getting it up and running clarified. […]

Groq Inference Tokenomics: Speed, But At What Cost?

22 Feb 2024

semianalysis.com

Faster than Nvidia? Dissecting the economics

Downfall Attacks

9 Aug 2023

downfall.page

Downfall attacks targets a critical weakness found in billions of modern processors used in personal and cloud computers.

Atomera Plans to Breathe New Life into Older Chip Manufacturing

28 Jul 2023

spectrum.ieee.org

Atom-thin layers of oxygen in a chip’s silicon can make devices speedier and more reliable

ELI5: FlashAttention

24 Jul 2023

gordicaleksa.medium.com

Step by step explanation of how one of the most important MLSys breakthroughs work — in gory detail.

Comparing Analog and Digital SRAM In-Memory Computing Architectures (KU Leu

24 Jul 2023

semiengineering.com

A technical paper titled “Benchmarking and modeling of analog and digital SRAM in-memory computing architectures” was published by researchers at KU Leuven. Abstract: “In-memory-computing is emerging as an efficient hardware paradigm for deep neural network accelerators at the edge, enabling to break the memory wall and exploit massive computational parallelism. Two design models have surged:... » read more

The Secret Sauce behind 100K context window in LLMs: all tricks in one plac

23 Jul 2023

blog.gopenai.com

tldr; techniques to speed up training and inference of LLMs to use large context window up to 100K input tokens during training and…

AMD’s RX 7600: Small RDNA 3 Appears

5 Jun 2023

chipsandcheese.com

Editor’s Note (6/14/2023): We have a new article that reevaluates the cache latency of Navi 31, so please refer to that article for some new latency data.

The Ultimate Guide for Optimal SoC Floorplan

2 Jun 2023

anysilicon.com

Floorplanning plays a crucial role in the physical design of an SoC and lays the foundation for an efficient and high-performance ASIC layout. In this article, we will discuss ten essential floorplanning commandments that physical design engineers can follow to ensure a correct-by-construction design. Design Partitioning Design Partitioning refers to dividing a large

TSMC Details 3nm Evolution: N3E On Schedule, N3P and N3X To Deliver 5% Perf

27 Apr 2023

anandtech.com

Memory Roundup: Ultra-low-power SRAM, ULTRARAM, & 3D Flash Hit the Scene

25 Apr 2023

allaboutcircuits.com

New memory technologies have emerged to push the boundaries of conventional computer storage.

Growth of 300mm fab capacity picks up pace again - Bits&Chips

5 Apr 2023

bits-chips.nl

After dipping this year, the growth of 300mm semiconductor manufacturing capacity is set to gain momentum.

Why AI Inference Will Remain Largely On The CPU

5 Apr 2023

nextplatform.com

Sponsored Feature: Training an AI model takes an enormous amount of compute capacity coupled with high bandwidth memory. Because the model training can be

Taking a look at the ReRAM state of play

16 Mar 2023

blocksandfiles.com

ReRAM startup Intrinsic Semiconductor Technologies has raised $9.73 million to expand its engineering team and bring its product to market.

Security IP Cores: Ultimate Guide - AnySilicon

22 Jan 2023

anysilicon.com

Security IP cores are blocks that provide security features for integrated circuits (ICs) and systems-on-chips (SoCs). It includes encryption, decryption, authentication, and key management functions that protect against unauthorized access or hacking. The IP core can be integrated into a larger IC design to provide enhanced security for applications such as IoT devices, payment systems,

TSMC Might Cut 3nm Prices to Lure AMD, Nvidia

14 Jan 2023

tomshardware.com

Industry sources say TSMC is considering lowering 3nm prices to stimulate interest from chip designers

aolofsson/awesome-opensource-hardware: List of awesome open source hardware tools, generators, and reusable designs

22 Dec 2022

github.com

List of awesome open source hardware tools, generators, and reusable designs - aolofsson/awesome-opensource-hardware

Safeguarding SRAMs From IP Theft (Best Paper Award)

18 Dec 2022

semiengineering.com

A technical paper titled “Beware of Discarding Used SRAMs: Information is Stored Permanently” was published by researchers at Auburn University. The paper won “Best Paper Award” at the IEEE International Conference on Physical Assurance and Inspection of Electronics (PAINE) Oct. 25-27 in Huntsville. Abstract: “Data recovery has long been a focus of the electronics industry... » read more

Cerebras Reveals Andromeda, a 13.5 Million Core AI Supercomputer

15 Nov 2022

tomshardware.com

The world's largest chip scales to new heights.

How Memory Design Optimizes System Performance

26 Sep 2022

semiengineering.com

Changes are steady in the memory hierarchy, but how and where that memory is accessed is having a big impact.

Ultimate Guide: Clock Tree Synthesis

24 Sep 2022

anysilicon.com

A vast majority of modern digital integrated circuits are synchronous designs. They rely on storage elements called registers or flip-flops, all of which change their stored data in a lockstep manner with respect to a control signal called the clock. In many ways, the clock signal is like blood flowing through the veins of a

DRAM Thermal Issues Reach Crisis Point

18 Jul 2022

semiengineering.com

Increased transistor density and utilization are creating memory performance issues.

CXL: Protocol for Heterogenous Datacenters

8 Jul 2022

fabricatedknowledge.com

Let's learn more about the world's most important manufactured product. Meaningful insight, timely analysis, and an occasional investment idea.

Nvidia Research Plots A Course To Multiple Multichip GPU Engines

6 Jan 2022

nextplatform.com

There are two types of packaging that represent the future of computing, and both will have validity in certain domains: Wafer scale integration and

Ten Lessons From Three Generations Shaped Google’s TPUv4i - 2021-jouppi.pdf

5 Jan 2022

gwern.net

SRAM vs. DRAM: The Future of Memory - EE Times

11 Dec 2021

eetimes.com

EE Times Compares SRAM vs. DRAM, Common Issues With Each Type Of Memory, And Takes A Look At The Future For Computer Memory.

SweRV - An Annotated Deep Dive

10 Dec 2021

tomverbeure.github.io

An Introduction to Semiconductor Economics

6 Dec 2021

adapteva.com

This blog post is in response to a recent topic on the Parallella forum regarding Adapteva’s chip cost efficiency (GFLOPS/$): [forum discussion thread]. I had to be a little vague on some poi…

Synopsys Blog | Latest Insights on EDA, IP & Systems Design

4 Dec 2021

blogs.synopsys.com

Explore Synopsys Blog for the latest insights and trends in EDA, IP, and Systems Design. Stay updated with expert articles and industry news.

Domain-Specific Hardware Accelerators – Communications of the ACM

4 Dec 2021

cacm.acm.org

Effect of Design on Transistor Density - Semiwiki

3 Dec 2021

semiwiki.com

I have written a lot of articles looking at leading…

How to make your own deep learning accelerator chip!

3 Dec 2021

towardsdatascience.com

Currently there are more than 100 companies all over the world building ASIC’s (Application specific integrated circuit) or SOC’s (System…

Using Memory Differently To Boost Speed

3 Dec 2021

semiengineering.com

Getting data in and out of memory faster is adding some unexpected challenges.

DRAM Tradeoffs: Speed Vs. Energy

3 Dec 2021

semiengineering.com

Experts at the Table: Which type of DRAM is best for different applications, and why performance and power can vary so much.

TOPS, Memory, Throughput And Inference Efficiency

3 Dec 2021

semiengineering.com

Evaluate inference accelerators to find the best throughput for the money.

Next-Gen Chips Will Be Powered From Below

28 Aug 2021

spectrum.ieee.org

Buried interconnects will help save Moore's Law

Impact Of GAA Transistors At 3/2nm

17 Aug 2021

semiengineering.com

Some things will get better from a design perspective, while others will be worse.

Bumps Vs. Hybrid Bonding For Advanced Packaging

23 Jun 2021

semiengineering.com

New interconnects offer speed improvements, but tradeoffs include higher cost, complexity, and new manufacturing challenges.

AMD 3D Stacks SRAM Bumplessly

12 Jun 2021

fuse.wikichip.org

AMD recently unveiled 3D V-Cache, their first 3D-stacked technology-based product. Leapfrogging contemporary 3D bonding technologies, AMD jumped directly into advanced packaging with direct bonding and an order of magnitude higher wire density.

11 Ways To Reduce AI Energy Consumption

13 May 2021

semiengineering.com

Pushing AI to the edge requires new architectures, tools, and approaches.

Overcoming Challenges In Next-Generation SRAM Cell Architectures

19 Mar 2021

coventor.com

SVT: Six Stacked Vertical Transistors

18 Mar 2021

semiengineering.com

SRAM cell architecture introduction: design and process challenges assessment.

List of semiconductor fabrication plants - Wikipedia

18 Dec 2020

en.wikipedia.org

This is a list of semiconductor fabrication plants. A semiconductor fabrication plant is where integrated circuits (ICs), also known as microchips, are manufactured. They are either operated by Integrated Device Manufacturers (IDMs) that design and manufacture ICs in-house and may also manufacture designs from design-only (fabless firms), or by pure play foundries that manufacture designs from fabless companies and do not design their own ICs. Some pure play foundries like TSMC offer IC design services, and others, like Samsung, design and manufacture ICs for customers, while also designing, manufacturing and selling their own ICs.

New And Innovative Supply Chain Threats Emerging

5 Nov 2020

semiengineering.com

But so are better approaches to deal with thorny counterfeiting issues.

Making Full Memory IP Robust During Design - Semiwiki

3 Nov 2020

semiwiki.com

Looking at a typical SoC design today it's likely to…

Intel Networking: Not Just A Bag Of Parts

16 Oct 2020

nextplatform.com

What is the hardest job at Intel, excepting whoever is in charge of the development of chip etching processes and the foundries that implement it? We

Domain-Specific Hardware Accelerators | July 2020 | Communications of the A

23 Jun 2020

cacm.acm.org

TSMC Details 5 nm

23 Mar 2020

fuse.wikichip.org

TSMC details its 5-nanometer node for mobile and HPC applications. The process features the industry's highest density transistors with a high-mobility channel and highest-density SRAM cells.

96-Core Processor Made of Chiplets

19 Feb 2020

spectrum.ieee.org

Brian Piercy on LinkedIn: The Surprising Value of Obvious Insights

28 Dec 2019

linkedin.com

"...Google's people analytics experts had been studying how to onboard new hires effectively. They came back with a list of tips. Here’s the one that jumped…

A Look at Cerebras Wafer-Scale Engine: Half Square Foot Silicon Chip

23 Dec 2019

fuse.wikichip.org

A look at Cerebras Wafer-Scale Engine (WSE), a chip the size of a wafer, packing over 400K tiny AI cores using 1.2 trillion transistors on a half square foot of silicon.

Building An MRAM Array

17 Oct 2019

semiengineering.com

Why MRAM is so attractive.

New chips for machine intelligence

7 Oct 2019

jameswhanlon.com

AI Inference Memory System Tradeoffs

29 Aug 2019

semiengineering.com

TOPS isn't all you need to know about an inference chip.

Buried Power Lines Make Memory Faster - IEEE Spectrum

26 Jul 2019

spectrum.ieee.org

Researchers at imec explore strategy that could make memory more efficient and pack in more transistors

Startup Runs AI in Novel SRAM

22 Jul 2019

eetimes.com

Areanna claims that a custom SRAM delivers 100 TOPS/W on deep learning, but it’s early days for the startup.

Use Inference Benchmarks Similar To Your Application

7 Feb 2019

semiengineering.com

How the wrong benchmark can lead to incorrect conclusions.

Emerging Memories Today: Understanding Bit Selectors - The Memory Guy Blog

28 Nov 2018

thememoryguy.com

The previous post in this series (excerpted from the Objective Analysis and Coughlin Associates Emerging Memory report) explained why emerging memories are necessary. Oddly enough, this series will explain bit selectors before defining all of the emerging memory technologies themselves. The reason why is that the bit selector determines how small a bit cell can

Processing In Memory

6 Sep 2018

semiengineering.com

Processing In Memory Growing volume of data and limited improvements in performance create new opportunities for approaches that never got off the ground.

Imperfect Silicon, Near-Perfect Security

7 Feb 2018

semiengineering.com

Imperfect Silicon, Near-Perfect Security Physically unclonable functions (PUF) seem tailor-made for IoT security.

The Northwest-AI-Hub which is researching hybrid gain cell memory that combines DRAM's density with

24 Oct 2013

techmeme.com

Katherine Bourzac / IEEE Spectrum: The Northwest-AI-Hub, which is researching hybrid gain cell memory that combines DRAM's density with SRAM's speed, gets a $16.3M CHIPS Act grant via the US DOD

Hybrid Memory Designed to Cut AI Energy Use

24 Oct 2010

spectrum.ieee.org

Researchers developing dense, speedy hybrid gain cell memory recently got a boost from CHIPS Act funding

Clash of the Foundries: Gate All Around + Backside Power at 2nm

24 Oct 2002

open.substack.com

Fab Cost, WFE Implications, Backside Power Details

sram — my Raindrop.io articles