arxiv

LLM Research Papers: The 2025 List (July to December)

2 Jan 2026

magazine.sebastianraschka.com

In June, I shared a bonus article with my curated and bookmarked research paper lists to the paid subscribers who make this Substack possible.

LLM Research Papers: The 2025 List (July to December)

31 Dec 2025

sebastianraschka.com

A curated list of LLM research papers from July–December 2025, organized by reasoning models, inference-time scaling, architectures, training efficiency, and...

An Introduction to Multisets

16 Oct 2025

arxiv.org

Multisets are sets that allow repetition of elements. As such, multisets pave the way to a number of interesting possibilities of theoretical and applied nature. In the present work, after revising the main aspects of traditional sets, we introduce some of the main concepts and characteristics of multisets, followed by their generalization to take into account vectors and matrices. An approach is also proposed in which the real, negative multiplicities are allowed, implying the multiset universe to become finite and well-defined, corresponding to the multiset with null multiplicities. The complement operation in multisets is then defined, which allows properties involving complement -- including the De Morgan theorem -- to be recovered in multisets. In addition, it becomes possible to extend multisets to functions (which become multifunctions), scalar fields and other continuous mathematical structure, therefore achieving an enhanced space endowed with all algebraic operations plus set theoretical operations including union, intersection, and complementation. The possibility to define a set operation between mfunctions, namely the common product, that is analogous to the traditional inner product is also proposed, paving the way to obtaining respective mfunction transformations, and it is argued that the Walsh functions provide an orthogonal basis for the mfunctions space under the common product. This result also allowed the proposal of performing integrated signal processing operations on mset mfunctions, including filtering and enhanced template matching. Relationships between the cosine similarity index and the Jaccard index are also identified, including the presentation of an intersection-based variation of the cosine index. The potential of multisets in pattern recognition and deep learning is also briefly characterized and illustrated.

Characterizing and Optimizing Realistic Workloads on a Commercial...

4 Oct 2025

arxiv.org

Compute-in-SRAM architectures offer a promising approach to achieving higher performance and energy efficiency across a range of data-intensive applications. However, prior evaluations have largely relied on simulators or small prototypes, limiting the understanding of their real-world potential. In this work, we present a comprehensive performance and energy characterization of a commercial compute-in-SRAM device, the GSI APU, under realistic workloads. We compare the GSI APU against established architectures, including CPUs and GPUs, to quantify its energy efficiency and performance potential. We introduce an analytical framework for general-purpose compute-in-SRAM devices that reveals fundamental optimization principles by modeling performance trade-offs, thereby guiding program optimizations. Exploiting the fine-grained parallelism of tightly integrated memory-compute architectures requires careful data management. We address this by proposing three optimizations: communication-aware reduction mapping, coalesced DMA, and broadcast-friendly data layouts. When applied to retrieval-augmented generation (RAG) over large corpora (10GB--200GB), these optimizations enable our compute-in-SRAM system to accelerate retrieval by 4.8$\times$--6.6$\times$ over an optimized CPU baseline, improving end-to-end RAG latency by 1.1$\times$--1.8$\times$. The shared off-chip memory bandwidth is modeled using a simulated HBM, while all other components are measured on the real compute-in-SRAM device. Critically, this system matches the performance of an NVIDIA A6000 GPU for RAG while being significantly more energy-efficient (54.4$\times$-117.9$\times$ reduction). These findings validate the viability of compute-in-SRAM for complex, real-world applications and provide guidance for advancing the technology.

Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges

8 Sep 2025

arxiv.org

The last decade has witnessed an experimental revolution in data science and machine learning, epitomised by deep learning methods. Indeed, many high-dimensional learning tasks previously thought to be beyond reach -- such as computer vision, playing Go, or protein folding -- are in fact feasible with appropriate computational scale. Remarkably, the essence of deep learning is built from two simple algorithmic principles: first, the notion of representation or feature learning, whereby adapted, often hierarchical, features capture the appropriate notion of regularity for each task, and second, learning by local gradient-descent type methods, typically implemented as backpropagation. While learning generic functions in high dimensions is a cursed estimation problem, most tasks of interest are not generic, and come with essential pre-defined regularities arising from the underlying low-dimensionality and structure of the physical world. This text is concerned with exposing these regularities through unified geometric principles that can be applied throughout a wide spectrum of applications. Such a 'geometric unification' endeavour, in the spirit of Felix Klein's Erlangen Program, serves a dual purpose: on one hand, it provides a common mathematical framework to study the most successful neural network architectures, such as CNNs, RNNs, GNNs, and Transformers. On the other hand, it gives a constructive procedure to incorporate prior physical knowledge into neural architectures and provide principled way to build future architectures yet to be invented.

The Future of Memory: Limits and Opportunities

5 Sep 2025

arxiv.org

Memory latency, bandwidth, capacity, and energy increasingly limit performance. In this paper, we reconsider proposed system architectures that consist of huge (many-terabyte to petabyte scale) memories shared among large numbers of CPUs. We argue two practical engineering challenges, scaling and signaling, limit such designs. We propose the opposite approach. Rather than create large, shared, homogenous memories, systems explicitly break memory up into smaller slices more tightly coupled with compute elements. Leveraging advances in 2.5D/3D integration, this compute-memory node provisions private local memory, enabling accesses of node-exclusive data through micrometer-scale distances, and dramatically reduced access cost. In-package memory elements support shared state within a processor, providing far better bandwidth and energy-efficiency than DRAM, which is used as main memory for large working sets and cold data. Hardware making memory capacities and distances explicit allows software to efficiently compose this hierarchy, managing data placement and movement.

Understanding the Landscape of Ampere GPU Memory Errors

11 Aug 2025

arxiv.org

Hierarchical Reasoning Model

28 Jul 2025

arxiv.org

Reasoning, the process of devising and executing complex goal-oriented action sequences, remains a critical challenge in AI. Current large language models (LLMs) primarily employ Chain-of-Thought (CoT) techniques, which suffer from brittle task decomposition, extensive data requirements, and high latency. Inspired by the hierarchical and multi-timescale processing in the human brain, we propose the Hierarchical Reasoning Model (HRM), a novel recurrent architecture that attains significant computational depth while maintaining both training stability and efficiency. HRM executes sequential reasoning tasks in a single forward pass without explicit supervision of the intermediate process, through two interdependent recurrent modules: a high-level module responsible for slow, abstract planning, and a low-level module handling rapid, detailed computations. With only 27 million parameters, HRM achieves exceptional performance on complex reasoning tasks using only 1000 training samples. The model operates without pre-training or CoT data, yet achieves nearly perfect performance on challenging tasks including complex Sudoku puzzles and optimal path finding in large mazes. Furthermore, HRM outperforms much larger models with significantly longer context windows on the Abstraction and Reasoning Corpus (ARC), a key benchmark for measuring artificial general intelligence capabilities. These results underscore HRM's potential as a transformative advancement toward universal computation and general-purpose reasoning systems.

LLM Research Papers: The 2025 List (January to June)

19 Jul 2025

sebastianraschka.com

The latest in LLM research with a hand-curated, topic-organized list of over 200 research papers from 2025.

LLM Post-Training: A Deep Dive into Reasoning Large Language Models

17 Apr 2025

arxiv.org

Beyond Human Intervention: Algorithmic Collusion through Multi-Agent Learning Strategies

29 Jan 2025

freakonometrics.hypotheses.org

Our paper, Beyond Human Intervention: Algorithmic Collusion through Multi-Agent Learning Strategies, with Suzie Grondin and Philipp Ratz is now available online Collusion in market pricing is a concept associated with human actions to raise market prices through artificially limited supply. Recently, the idea of algorithmic collusion was put forward, where the human action in the … Continue reading Beyond Human Intervention: Algorithmic Collusion through Multi-Agent Learning Strategies →

Noteworthy LLM Research Papers of 2024

23 Jan 2025

sebastianraschka.com

This article covers 12 influential AI research papers of 2024, ranging from mixture-of-experts models to new LLM scaling laws for precision..

The 2025 AI Engineering Reading List

14 Jan 2025

latent.space

We picked 50 paper/models/blogs across 10 fields in AI Eng: LLMs, Benchmarks, Prompting, RAG, Agents, CodeGen, Vision, Voice, Diffusion, Finetuning. If you're starting from scratch, start here.

100 Must-Read Generative AI Papers from 2024

12 Jan 2025

open.substack.com

A comprehensive list of some of the most impactful generative papers from last year

An Opinionated Evals Reading List — Apollo Research

7 Jan 2025

apolloresearch.ai

A long reading list of evals papers with recommendations and comments by the evals team.

LLM Research Papers: The 2024 List

22 Dec 2024

magazine.sebastianraschka.com

A curated list of interesting LLM-related research papers from 2024, shared for those looking for something to read over the holidays.

eugeneyan/llm-paper-notes: Notes from the Latent Space paper club. Follow along or start your own!

26 Nov 2024

github.com

Notes from the Latent Space paper club. Follow along or start your own! - eugeneyan/llm-paper-notes

How to Run a Paper Club (also: LIVE at NeurIPS 2024!)

24 Nov 2024

open.substack.com

Your ultimate Paper Club Starter Kit, from your friends at the Latent Space Paper Club, where we have now read 100 papers. Also: Announcing Latent Space Paper Club LIVE! at Neurips 2024! Join us!

Analyzing the homerun year for LLMs: the top-100 most cited AI papers in 2023, with all medals for open models.

11 Nov 2024

zeta-alpha.com

9 October 2024, Mathias Parisot, Jakub Zavrel.Even in the red hot global race for AI dominance, you publish and you perish, unless your peers pick up your work, build further on it, and you manage to drive real progress in the field. And of course, we are all very curious who is currently having that kind of impact. Are the billions of dollars spent on AI R&D paying off in the long run? So here is, in continuation of our popular publication impact analysis of last year, Zeta Alpha's ranking of t

Aman's AI Journal • Primers • Ilya Sutskever's Top 30

21 Oct 2024

aman.ai

Aman's AI Journal | Course notes and learning material for Artificial Intelligence and Deep Learning Stanford classes.

Summary of Ilya Sutskevers AI Reading List · Tensor Labbet

19 Oct 2024

tensorlabbet.com

[2406.01506] The Geometry of Categorical and Hierarchical Concepts in Large

11 Jun 2024

arxiv.org

The linear representation hypothesis is the informal idea that semantic concepts are encoded as linear directions in the representation spaces of large language models (LLMs). Previous work has...

Title:You Only Cache Once: Decoder-Decoder Architectures for Language Model

11 May 2024

arxiv.org

We introduce a decoder-decoder architecture, YOCO, for large language models, which only caches key-value pairs once. It consists of two components, i.e., a cross-decoder stacked upon a...

[2404.19737] Better & Faster Large Language Models via Multi-token Predicti

8 May 2024

arxiv.org

Large language models such as GPT and Llama are trained with a next-token prediction loss. In this work, we suggest that training language models to predict multiple future tokens at once results...

Collusive Outcomes Without Collusion

30 Apr 2024

d.repec.org

Auctions with Dynamic Scoring

29 Apr 2024

d.repec.org

Algorithmic Information Disclosure in Optimal Auctions

29 Apr 2024

d.repec.org

Equitable Pricing in Auctions

23 Apr 2024

d.repec.org

Algorithmic Collusion and Price Discrimination: The Over-Usage of Data

23 Apr 2024

d.repec.org

[2404.09818] Error Detection and Correction Codes for Safe In-Memory Comput

16 Apr 2024

arxiv.org

In-Memory Computing (IMC) introduces a new paradigm of computation that offers high efficiency in terms of latency and power consumption for AI accelerators. However, the non-idealities and...

Tips for LLM Pretraining and Evaluating Reward Models

15 Apr 2024

magazine.sebastianraschka.com

Discussing AI Research Papers in March 2024

10 Noteworthy AI Research Papers of 2023

7 Jan 2024

magazine.sebastianraschka.com

This year has felt distinctly different. I've been working in, on, and with machine learning and AI for over a decade, yet I can't recall a time when these fields were as popular and rapidly evolving as they have been this year. To conclude an eventful 2023 in machine learning and AI research, I'm excited to share 10 noteworthy papers I've read this year. My personal focus has been more on large language models, so you'll find a heavier emphasis on large language model (LLM) papers than computer vision papers this year.

[2302.07730] Transformer models: an introduction and catalog

5 Oct 2023

arxiv.org

In the past few years we have seen the meteoric appearance of dozens of foundation models of the Transformer family, all of which have memorable and sometimes funny, but not self-explanatory,...

A Prompt Pattern Catalog

3 Oct 2023

arxiv.org

Coordinated Dynamic Bidding in Repeated Second-Price Auctions with Budgets

22 Jul 2023

d.repec.org

Distilling Step-by-Step! Outperforming Larger Language Models with...

5 May 2023

arxiv.org

Deploying large language models (LLMs) is challenging because they are memory inefficient and compute-intensive for practical applications. In reaction, researchers train smaller task-specific...

SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot

5 May 2023

arxiv.org

We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of...

A Cookbook of Self-Supervised Learning

25 Apr 2023

arxiv.org

Self-supervised learning, dubbed the dark matter of intelligence, is a promising path to advance machine learning. Yet, much like cooking, training SSL methods is a delicate art with a high...

Eight Things to Know about Large Language Models

21 Apr 2023

arxiv.org

The widespread public deployment of large language models (LLMs) in recent months has prompted a wave of new attention and engagement from advocates, policymakers, and scholars from many fields....

When do you need Chain-of-Thought Prompting for ChatGPT?

15 Apr 2023

arxiv.org

Chain-of-Thought (CoT) prompting can effectively elicit complex multi-step reasoning from Large Language Models~(LLMs). For example, by simply adding CoT instruction ``Let's think step-by-step''...

A Survey of Large Language Models

14 Apr 2023

arxiv.org

Language is essentially a complex, intricate system of human expressions governed by grammatical rules. It poses a significant challenge to develop capable AI algorithms for comprehending and...

Top Machine Learning Papers to Read in 2023 - KDnuggets

31 Mar 2023

kdnuggets.com

These curated papers would step up your machine-learning knowledge.

Must read: the 100 most cited AI papers in 2022

20 Mar 2023

zeta-alpha.com

Who Is publishing the most Impactful AI research right now? With the breakneck pace of innovation in AI, it is crucial to pick up some signal as soon as possible. No one has the time to read everything, but these 100 papers are sure to bend the road as to where our AI technology is going. The real test of impact of R&D teams is of course how the technology appears in products, and OpenAI shook the world by releasing ChatGPT at the end of November 2022, following fast on their March 2022 paper “T

2012.03854.pdf

16 Mar 2023

arxiv.org

2003.05689.pdf

16 Mar 2023

arxiv.org

Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning

16 Mar 2023

arxiv.org

2108.02497.pdf

16 Mar 2023

arxiv.org

Forecast Evaluation for Data Scientists: Common Pitfalls and Best Practices

16 Mar 2023

arxiv.org

?Top ML Papers of the Week - by elvis - NLP Newsletter

14 Mar 2023

nlpnews.substack.com

The top ML Papers of the Week (Mar 6 - Mar 12)

Why People Skip Music? On Predicting Music Skips using Deep...

9 Feb 2023

arxiv.org

Music recommender systems are an integral part of our daily life. Recent research has seen a significant effort around black-box recommender based approaches such as Deep Reinforcement Learning...

[1702.04680v1] Visual Discovery at Pinterest

21 Dec 2022

arxiv.org

Over the past three years Pinterest has experimented with several visual search and recommendation services, including Related Pins (2014), Similar Looks (2015), Flashlight (2016) and Lens (2017)....

2212.03551.pdf

11 Dec 2022

arxiv.org

[2206.14007] The Importance of (Exponentially More) Computing Power

30 Jul 2022

arxiv.org

Denizens of Silicon Valley have called Moore's Law "the most important graph in human history," and economists have found that Moore's Law-powered I.T. revolution has been one of the most...

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for...

18 Jul 2022

arxiv.org

YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56.8% AP among all known real-time object detectors with 30...

Machine Learning Operations (MLOps): Overview, Definition, and Architecture

13 Jul 2022

arxiv.org

The final goal of all industrial machine learning (ML) projects is to develop ML products and rapidly bring them into production. However, it is highly challenging to automate and operationalize...

Mastering the Game of Stratego with Model-Free Multiagent...

11 Jul 2022

arxiv.org

We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board...

Another Firing Among Google’s A.I. Brain Trust, and More Discord (Published 2022)

2 May 2022

nytimes.com

The researchers are considered a key to the company’s future. But they have had a hard time shaking infighting and controversy over a variety of issues.

The Modern Mathematics of Deep Learning

1 May 2022

arxiv.org

We describe the new field of mathematical analysis of deep learning. This field emerged around a list of research questions that were not answered within the classical framework of learning...

A Comprehensive Benchmark of Deep Learning Libraries on Mobile DevicesA Com

20 Feb 2022

arxiv.org

Computer Science and Game Theory authors/titles recent submissions

3 Feb 2022

arxiv.org

Detecting Twenty-thousand Classes using Image-level Supervision

12 Jan 2022

arxiv.org

Current object detectors are limited in vocabulary size due to the small scale of detection datasets. Image classifiers, on the other hand, reason about much larger vocabularies, as their datasets...

ArXiv.org Reaches a Milestone and a Reckoning

10 Jan 2022

scientificamerican.com

Runaway success and underfunding have led to growing pains for the preprint server

louisfb01/best_AI_papers_2021: A curated list of the latest breakthroughs in AI (in 2021) by release date with a clear video explanation, link to a more in-depth article, and code.

3 Dec 2021

github.com

A curated list of the latest breakthroughs in AI (in 2021) by release date with a clear video explanation, link to a more in-depth article, and code. - louisfb01/best_AI_papers_2021

Applications and Techniques for Fast Machine Learning in Science

29 Oct 2021

arxiv.org

In this community review report, we discuss applications and techniques for fast machine learning (ML) in science -- the concept of integrating power ML methods into the real-time experimental...

arxiv — my Raindrop.io articles