cover image

Google published a research blog post on Tuesday about a new compression algorithm for AI models. Within hours, memory stocks were falling. Micron dropped 3 per cent, Western Digital ...

Release: llm 0.29
23 Mar 2026
simonwillison.net

Access large language models from the command-line

cover image

Safely Deploying Machine Learning Models to Production: Four Controlled Strategies (A/B, Canary, Interleaved, Shadow Testing)

cover image
LLM Architecture Gallery
21 Mar 2026
sebastianraschka.com

A gallery that collects architecture figures from The Big LLM Architecture Comparison and related articles, with fact sheets and links back to the original sections.

cover image

Mistral's Small 4 combines reasoning, multimodal analysis and agentic coding in a single open-source model with configurable inference effort, offering enterprises a lower-cost alternative to running separate models for each task.

cover image

LLM calls are black boxes in production. Learn how to add structured observability to your RubyLLM-powered app with OpenTelemetry.

cover image

Nvidia's KV Cache Transform Coding (KVTC) compresses LLM key-value cache by 20x without model changes, cutting GPU memory costs and time-to-first-token by up to 8x for multi-turn AI applications.

cover image
Unsloth Docs | Unsloth Documentation
17 Mar 2026
unsloth.ai

Train your own model with Unsloth, an open-source framework for LLM fine-tuning and reinforcement learning.

cover image
New LLM Architecture Gallery
15 Mar 2026
sebastianraschka.com

I put together a new LLM Architecture Gallery that collects the architecture figures from my recent comparison articles in one place, together with compact fact sheets and links.

cover image

Detect your hardware and find out which AI models you can run locally. GPU, CPU, and RAM analysis in your browser.

cover image
Billion-Parameter Theories
10 Mar 2026
worldgov.org

We assumed good theories are small. But the minimum viable compression of a complex system might be billions of parameters large.

cover image
The Sword of Damocles in Software
8 Mar 2026
tomtunguz.com

"GitHub Copilot had 20 million users. First to market. Then Claude Code arrived and installs peaked within six months. If the sword can cut the leader, no one is safe."

cover image

Learn these five Python decorators based on diverse libraries, that take particular significance when used in the context of LLM-based applications.

cover image

OpenAI has indicated that a new version of its large language model, GPT-5.4, is in development following a post on

cover image

AI writes the code now. The skill that matters is controlling what it builds.

cover image

Cambridge Researchers Introduce SymTorch: A PyTorch Library that Translates Deep Learning Models into Human-Readable Equations

cover image

Whether it is a 0.8B model running on a smartphone or a 9B model powering a coding terminal, the Qwen3.5 series is effectively democratizing the "agentic era."

microgpt
2 Mar 2026
karpathy.github.io

Musings of a Computer Scientist.

cover image

Transfer your preferences, projects, and context from other AI providers into Claude. Switch without losing what makes your AI useful.

cover image

This leap is made possible by near-lossless accuracy under 4-bit weight and KV cache quantization, allowing developers to process massive datasets without server-grade infrastructure.

cover image

Investors wiped $40 billion from IBM's market cap after Anthropic released COBOL translation tools. Analysts say the market got the news right and the conclusion wrong.

cover image
Getting Started
24 Feb 2026
rubyllm.com

Start building AI apps in Ruby in 5 minutes. Chat, generate images, create embeddings - all with one gem.

cover image

A pragmatic, code-first argument for Ruby as the best language to ship AI products in 2026.

cover image

Taalas is replacing programmable GPUs with hardwired AI chips to achieve 17,000 tokens per second for ubiquitous inference

cover image

BinaryAudit benchmarks AI agents using Ghidra to find backdoors in compiled binaries of real open-source servers, proxies, and network infrastructure.

cover image

Alibaba on Monday unveiled a new artificial intelligence model Qwen 3.5 designed to execute complex tasks independently, with big improvements in performance and cost that the Chinese tech giant claims beat major U.S. rival models on several benchmarks.

cover image

Nvidia researchers developed dynamic memory sparsification (DMS), a technique that compresses the KV cache in large language models by up to 8x while maintaining reasoning accuracy — and it can be retrofitted onto existing models in hours.

cover image

Peter Steinberger is the creator of OpenClaw, an open-source AI agent framework that's the fastest-growing project in GitHub history.Thank you for listening ...

cover image

A curated list of awesome Claude Skills, resources, and tools for customizing Claude AI workflows - ComposioHQ/awesome-claude-skills

cover image

While standard models suffer from context rot as data grows, MIT’s new Recursive Language Model (RLM) framework treats prompts like code variables, unlocking infinite context without the retraining costs.

cover image

Argos improves multimodal RL by evaluating whether an agent’s reasoning aligns with what it observes over time. The approach reduces visual hallucinations and produces more reliable, data-efficient agents for real-world applications:

cover image

Zhipu AI Releases GLM-4.7-Flash: A 30B-A3B MoE (Mixture of Experts) Model for Efficient Local Coding and Agents

cover image

Goose, Block’s open-source AI coding agent, is emerging as a free alternative to Anthropic’s Claude Code, as developers weigh offline control, rate limits, and the rising cost of AI coding tools.

cover image

In this article, I’ll walk you through a guided project to add reasoning skills to your LLM apps. Add Reasoning Skills to Your LLM Apps.

cover image

Anthropic’s Cowork brings Claude Code–style AI agents to the desktop, letting Claude access and manage local files and browse the web—boosting productivity while raising new security and trust risks.

cover image

New from Anthropic today is Claude Cowork, a “research preview” that they describe as “Claude Code for the rest of your work”. It’s currently available only to Max subscribers ($100 …

Sampling at negative temperature
12 Jan 2026
cavendishlabs.org
cover image
GitHub Copilot
11 Jan 2026
github.com

AI that builds with you

cover image
2025 LLM Year in Review from Andrej Karpathy
4 Jan 2026
open.substack.com

Training GPT-2 on a budget from Vishwanath Sangale

cover image
The Big LLM Architecture Comparison
3 Jan 2026
magazine.sebastianraschka.com

From DeepSeek-V3 to Kimi K2: A Look At Modern LLM Architecture Design

cover image

The problem with Nano Banana Pro is that it’s too good.

cover image

Nano Banana allows 32,768 input tokens and I’m going to try to use them all dammit.

cover image
Proximal Policy Optimization
3 Jan 2026
openai.com

We’re releasing a new class of reinforcement learning algorithms, Proximal Policy Optimization (PPO), which perform comparably or better than state-of-the-art approaches while being much simpler to implement and tune. PPO has become the default reinforcement learning algorithm at OpenAI because of its ease of use and good performance.

cover image
LLM Research Papers: The 2025 List (July to December)
2 Jan 2026
magazine.sebastianraschka.com

In June, I shared a bonus article with my curated and bookmarked research paper lists to the paid subscribers who make this Substack possible.

cover image
2025: The year in LLMs
1 Jan 2026
simonwillison.net

This is the third in my annual series reviewing everything that happened in the LLM space over the past 12 months. For previous years see Stuff we figured out about …

cover image

A curated list of LLM research papers from July–December 2025, organized by reasoning models, inference-time scaling, architectures, training efficiency, and...

cover image
The State of Reinforcement Learning for LLM Reasoning
31 Dec 2025
magazine.sebastianraschka.com

Understanding GRPO and New Insights from Reasoning Model Papers

cover image

A 2025 review of large language models, from DeepSeek R1 and RLVR to inference-time scaling, benchmarks, architectures, and predictions for 2026.

cover image

We run a multi-tenant Rails application with sensitive data and layered authorization. In this post, I walk through how I added the first AI agent tool using RubyLLM, Pundit policies, and our existing Algolia search, without introducing a parallel system or loosening constraints.

cover image

In this article, I’ll walk you through a complete guide to LangGraph from the ground up. LangGraph Explained from Scratch.

cover image
2025 LLM Year in Review
19 Dec 2025
karpathy.bearblog.dev

2025 Year in Review of LLM paradigm changes

cover image
Mistral 3 Live!
12 Dec 2025
open.substack.com

Frontier AI by hand ✍️

cover image
The Concise Guide to Perplexity
26 Nov 2025
statology.org
cover image

A step-by-step practical guide on building AI agents using Gemini 3 Pro, covering tool integration, context management, and best practices for creating effective and reliable agents.

cover image

Learn how different memory systems affect multi-agent planning. Comparing Memory Systems for LLM Agents highlights key performance metrics.

cover image

Compare the top 7 large language models and systems for coding in 2025. Discover which ones excel for software engineering tasks.

cover image

If language is what makes us human, what does it mean now that large language models have gained “metalinguistic” abilities?

cover image
How I Use Every Claude Code Feature
2 Nov 2025
blog.sshh.io

A brain dump of all the ways I've been using Claude Code.

cover image
The Big LLM Architecture Comparison
28 Oct 2025
open.substack.com

From DeepSeek-V3 to Kimi K2: A Look At Modern LLM Architecture Design

cover image

Codes/Notebooks for AI Projects. Contribute to Marktechpost/AI-Tutorial-Codes-Included development by creating an account on GitHub.

cover image

Learn the 5 common LLM parameters explained with examples to optimize your model's performance and generate desired results.

cover image
Machine Learning Mastery
14 Oct 2025
MachineLearningMastery.com

Making developers awesome at machine learning.

cover image
A Guide to Fine-Tuning LLMs using LoRA
14 Oct 2025
amanxai.com

In this article, I'll take you through a step-by-step guide to fine-tuning LLMs with LoRA. A Guide to Fine-Tuning LLMs using LoRA.

cover image

Seven LLM generation parameters: max tokens, temperature, top-p, top-k, penalties, stop sequences, tuning guidance, defaults

cover image

Understand LLM reasoning by creating your own reasoning model–from scratch! LLM reasoning models have the power to tackle truly challenging problems that require finding the right path through multiple steps. In Build A Reasoning Model (From Scratch) you’ll learn how to build a working reasoning model from the ground up. You will start with an existing pre-trained LLM and then implement reasoning-focused improvements from scratch. Sebastian Raschka, the bestselling author of Build a Large Language Model (From Scratch), is your guide on this exciting journey. Sebastian mentors you every step of the way with clear explanations, practical code, and a keen focus on what really matters. In Build A Reasoning Model (From Scratch) you’ll learn how to: Implement core reasoning improvements for LLMs Evaluate models using judgment-based and benchmark-based methods Improve reasoning without updating model weights Use reinforcement learning to integrate external tools like calculators Apply distillation techniques to learn from larger reasoning models Understand the full reasoning model development pipeline Reasoning models break problems into steps, producing more reliable answers in math, logic, and code. These improvements aren’t just a curiosity–they’re already integrated into top models like Grok 4 and GPT-5. Build A Reasoning Model (From Scratch) demystifies these complex models with a simple philosophy: the best way to learn how something works is to build it yourself! You’ll begin with a pre-trained LLM, adding and improving its reasoning capabilities in ways you can see, test, and understand.

cover image

Multiple-Choice Benchmarks, Verifiers, Leaderboards, and LLM Judges with Code Examples

cover image
LoRA Without Regret
3 Oct 2025
thinkingmachines.ai

How LoRA matches full training performance more broadly than expected

cover image

The new A.I. app generated videos of store robberies and home intrusions — even bomb explosions on city streets — that never happened.

cover image

How we spent under half a million dollars to build a 30 petabyte data storage cluster in downtown San Francisco

cover image

Compare top local LLMs for 2025: context windows, VRAM tiers, licenses, quantization, runtimes, dense vs MoE tradeoffs clearly

cover image

In this article, we discuss five cutting-edge NLP trends that will shape 2026.

cover image
GPT-5-Codex
24 Sep 2025
simonwillison.net

OpenAI half-relased this model earlier this month, adding it to their Codex CLI tool but not their API. Today they've fixed that - the new model can now be accessed …

cover image
Four new releases from Qwen
22 Sep 2025
simonwillison.net

It's been an extremely busy day for team Qwen. Within the last 24 hours (all links to Twitter, which seems to be their preferred platform for these announcements): Qwen3-Next-80B-A3B-Instruct-FP8 and …

cover image
Understanding and Implementing Qwen3 From Scratch
6 Sep 2025
magazine.sebastianraschka.com

A Detailed Look at One of the Leading Open-Source LLMs

cover image

Study shows how patterns in LLM training data can lead to “parahuman” responses.

cover image

[WIP] Resources for AI engineers. Also contains supporting materials for the book AI Engineering (Chip Huyen, 2025) - chiphuyen/aie-book

cover image
Open Source LLM Tools
31 Aug 2025
huyenchip.com

Best viewed on desktops. On a phone screen, some columns are hidden. When a new repo is indexed, changes in stars in the last day/week are default to 0. Full analysis: What I learned from...

cover image

TLDR: Method Iteration is a prompting technique that gives better responses to hard problems. …

cover image

In January 2025, DeepSeek, a Chinese AI startup, launched R1, an AI model that rivaled top-tier LLMs from OpenAI and Anthropic. Built at a fraction of the

cover image

What is Artificial Intelligence AI Inference? A Technical Deep Dive and Top 9 AI Inference Providers (2025 Edition)

cover image

Transcribe PDFs with local LLMs

cover image

Discover top AI red teaming tools for robust AI security. Learn how adversarial testing protects machine learning models

The Timmy Trap – Scott Jenson
15 Aug 2025
jenson.org
cover image

New research reveals open-source AI models use up to 10 times more computing resources than closed alternatives, potentially negating cost advantages for enterprise deployments.

cover image

Comparison and analysis of AI models and API hosting providers. Independent benchmarks across key performance metrics including quality, price, output speed & latency.

cover image

A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization. - google/langextract

cover image

Google’s active learning method fine-tunes LLMs with 10,000x less data using high-fidelity expert-labeled examples

cover image

A new study from Anthropic introduces "persona vectors," a technique for developers to monitor, predict and control unwanted LLM behaviors.

cover image

Context engineering for large language models—frameworks, architectures, and strategies to optimize AI reasoning, and scalability

cover image
Hierarchical Reasoning Model
28 Jul 2025
arxiv.org

Reasoning, the process of devising and executing complex goal-oriented action sequences, remains a critical challenge in AI. Current large language models (LLMs) primarily employ Chain-of-Thought (CoT) techniques, which suffer from brittle task decomposition, extensive data requirements, and high latency. Inspired by the hierarchical and multi-timescale processing in the human brain, we propose the Hierarchical Reasoning Model (HRM), a novel recurrent architecture that attains significant computational depth while maintaining both training stability and efficiency. HRM executes sequential reasoning tasks in a single forward pass without explicit supervision of the intermediate process, through two interdependent recurrent modules: a high-level module responsible for slow, abstract planning, and a low-level module handling rapid, detailed computations. With only 27 million parameters, HRM achieves exceptional performance on complex reasoning tasks using only 1000 training samples. The model operates without pre-training or CoT data, yet achieves nearly perfect performance on challenging tasks including complex Sudoku puzzles and optimal path finding in large mazes. Furthermore, HRM outperforms much larger models with significantly longer context windows on the Abstraction and Reasoning Corpus (ARC), a key benchmark for measuring artificial general intelligence capabilities. These results underscore HRM's potential as a transformative advancement toward universal computation and general-purpose reasoning systems.

cover image

How Language Models Turn Text into Meaning, From Traditional

cover image
The Complete LLM Tech Stack
25 Jul 2025
amanxai.com

In this article, I'll take you through the complete LLM tech stack you should know to develop & deploy real-world LLM applications.

cover image
LLM Research Papers: The 2025 List (January to June)
19 Jul 2025
sebastianraschka.com

The latest in LLM research with a hand-curated, topic-organized list of over 200 research papers from 2025.

cover image
I sent ChatGPT Agent out to shop for me
19 Jul 2025
theverge.com

We tested OpenAI’s ChatGPT Agent, currently only available via its $200-per-month Pro subscription.

cover image

Discover OpenAI's red team blueprint: How 110 coordinated attacks and 7 exploit fixes created ChatGPT Agent's revolutionary 95% security defense system.

cover image
ChatGPT agent System Card | OpenAI
19 Jul 2025
openai.com

ChatGPT agent System Card: OpenAI’s agentic model unites research, browser automation, and code tools with safeguards under the Preparedness Framework.

cover image

An inquiry into emergent collusion in Large Language Models. Agent S2 to Agent S3: “Let's set all asks at 63 next cycle… No undercutting ensur…

cover image

A practical handbook for engineers building, optimizing, scaling and operating LLM inference systems in production.

cover image
Shirin Khosravi Jam on Substack
13 Jul 2025
substack.com

I taught myself how to build RAG + AI Agents in production. Been running them live for over a year now. Here are 4 steps + the only resources you really need to do the same. … Ugly truth: most “AI Engineers” shouting on social media haven’t built a single real production AI Agent or RAG system. If you want to be different - actually build and ship these systems: here’s a laser-focused roadmap from my own journey. .. 🚀 𝗦𝘁𝗮𝗿𝘁 𝘄𝗶𝘁𝗵 𝗳𝘂𝗻𝗱𝗮𝗺𝗲𝗻𝘁𝗮𝗹𝘀 Because no matter how fast LLM/GenAI evolves, your ML & software foundations keep you relevant. ✅ Hands-On ML with TensorFlow & Keras: https://lnkd.in/dWrf5pbS ✅ ISLR: https://lnkd.in/djGPVVwJ ✅ Machine Learning for Beginners by Microsoft (free curriculum): https://lnkd.in/d8kZA3es … 1️⃣ 𝗠𝗮𝘀𝘁𝗲𝗿 𝗟𝗟𝗠𝘀 & 𝗚𝗲𝗻𝗔𝗜 𝗦𝘆𝘀𝘁𝗲𝗺𝘀 → Learn to build & deploy LLMs, understand system design tradeoffs, and handle real constraints. 📚 Must-reads: ✅ Designing ML Systems – Chip Huyen: https://lnkd.in/guN-UhXA ✅ The LLM Engineering Handbook – Iusztin & Labonne: https://lnkd.in/gyA4vFXz ✅ Build a LLM (From Scratch) – Raschka: https://lnkd.in/gXNa-SPb ✅ Hands-On LLMs GitHub: https://lnkd.in/eV4qrgNW … 2️⃣ 𝗚𝗼 𝗯𝗲𝘆𝗼𝗻𝗱 𝘁𝗵𝗲 𝗵𝘆𝗽𝗲 𝗼𝗻 𝗔𝗜 𝗔𝗴𝗲𝗻𝘁𝘀 → Most demos = “if user says hello, return hello.” Actual agents? Handle memory, tools, workflows, costs. ✅ AI Agents for Beginners (GitHub): https://lnkd.in/eik2btmq ✅ GenAI Agents – build step by step: https://lnkd.in/dnhwk75V ✅ OpenAI’s guide to agents: https://lnkd.in/guRfXsFK ✅ Anthropic’s Building Effective Agents: https://lnkd.in/gRWKANS4 … 3️⃣ 𝗥𝗔𝗚 𝗶𝘀 𝗻𝗼𝘁 𝗷𝘂𝘀𝘁 𝗮 𝘃𝗲𝗰𝘁𝗼𝗿 𝗗𝗕 Real Retrieval-Augmented Generation requires: → Chunking, hybrid BM25 + vectors, reranking → Query routing & fallback → Evaluating retrieval quality, not just LLM output ✅ RAG Techniques repo: https://lnkd.in/dD4S8Cq2 ✅ Advanced RAG: https://lnkd.in/g2ZHwZ3w ✅ Cost-efficient retrieval with Postgres/OpenSearch/Qdrant ✅ Monitoring with Langfuse / Comet … 4️⃣ 𝗚𝗲𝘁 𝘀𝗲𝗿𝗶𝗼𝘂𝘀 𝗼𝗻 𝗦𝗼𝗳𝘁𝘄𝗮𝗿𝗲 & 𝗜𝗻𝗳𝗿𝗮 → FastAPI, async Python, Pydantic → Docker, CI/CD, blue-green deploys → ETL orchestration (Airflow, Step Functions) → Logs + metrics (CloudWatch, Prometheus) ✅ Move to production: https://lnkd.in/dnnkrJbE ✅ Made with ML (full ML+infra): https://lnkd.in/e-XQwXqS ✅ AWS GenAI path: https://lnkd.in/dmhR3uPc … 5️⃣ 𝗪𝗵𝗲𝗿𝗲 𝗱𝗼 𝗜 𝗹𝗲𝗮𝗿𝗻 𝗳𝗿𝗼𝗺? → Stanford CS336 / CS236 / CS229 (Google it) → MIT 6.S191, Karpathy’s Zero to Hero: https://lnkd.in/dT7vqqQ5 → Google Kaggle GenAI sprint: https://lnkd.in/ga5X7tVJ → NVIDIA’s end-to-end LLM stack: https://lnkd.in/gCtDnhni → DeepLearning.AI’s short courses: https://lnkd.in/gAYmJqS6 … 💥 𝗞𝗲𝗲𝗽 𝗶𝘁 𝗿𝗲𝗮𝗹: Don’t fall for “built in 5 min, dead in 10 min” demos. In prod, it’s about latency, cost, maintainability, guardrails. ♻️ Let's repost to help more people on this journey 💚

cover image

Your complete playbook for transforming how you research with AI's most powerful search engine

cover image
Inside India’s scramble for AI independence
8 Jul 2025
technologyreview.com

Structural challenges and the nation’s many languages have made it tough to develop foundational AI models. But the government is keen not to be left behind.

cover image

Coders' Colaboratory mini-hackathon on `llm` by simonw - llm-hackathon.md

cover image

Christopher Smith ran a mini hackathon in Albany New York at the weekend around uses of my LLM - the first in-person event I'm aware of dedicated to that project! …

cover image

Ready to become a certified watsonx AI Assistant Engineer? Register now and use code IBMTechYT20 for 20% off of your exam → https://ibm.biz/Bdnd3dLearn more ...

cover image

I presented a three hour workshop at PyCon US yesterday titled Building software on top of Large Language Models. The goal of the workshop was to give participants everything they …

Usage
4 Jul 2025
llm.datasette.io
cover image

Get a quick overview of GPT, BERT, LLaMA, and more!

cover image

a study by ohio state university investigates whether large language models can represent human concepts without physically experiencing them.

cover image
What I learned trying seven coding agents
28 Jun 2025
understandingai.org

There's still room for improvement, but don't underestimate this technology.

cover image
Gemini CLI
25 Jun 2025
simonwillison.net

First there was Claude Code in February, then OpenAI Codex (CLI) in April, and now Gemini CLI in June. All three of the largest AI labs now have their own …

cover image

TL;DR: I developed a simple, open-source benchmark to test if LLM agents follow high-level safety principles when they conflict with a given task acc…

cover image
Building Effective AI Agents
17 Jun 2025
anthropic.com

Discover how Anthropic approaches the development of reliable AI agents. Learn about our research on agent capabilities, safety considerations, and technical framework for building trustworthy AI.

cover image

Explore list of top MCP servers that enable seamless integration of LLMs with tools like databases, APIs, communication platforms, and more, helping you automate workflows and enhance AI applications.

LLM 0.26 is out with the biggest new feature since I started the project: support for tools. You can now use the LLM CLI tool—and Python library—to grant LLMs from …

cover image
Highlights from the Claude 4 system prompt
27 May 2025
simonwillison.net

Anthropic publish most of the system prompts for their chat models as part of their release notes. They recently shared the new prompts for both Claude Opus 4 and Claude …

cover image
System Card: Claude Opus 4 & Claude Sonnet 4
25 May 2025
simonwillison.net

Direct link to a PDF on Anthropic's CDN because they don't appear to have a landing page anywhere for this document. Anthropic's system cards are always worth a look, and …

cover image

Learn about turning your notes and sources into a personalized, AI-powered tutor with NotebookLM.

llm-anthropic 0.16
22 May 2025
simonwillison.net

New release of my LLM plugin for Anthropic adding the new Claude 4 Opus and Sonnet models. You can see pelicans on bicycles generated using the new plugin at the …

cover image

View recent discussion. Abstract: The rapid scaling of large language models (LLMs) has unveiled critical limitations in current hardware architectures, including constraints in memory capacity, computational efficiency, and interconnection bandwidth. DeepSeek-V3, trained on 2,048 NVIDIA H800 GPUs, demonstrates how hardware-aware model co-design can effectively address these challenges, enabling cost-efficient training and inference at scale. This paper presents an in-depth analysis of the DeepSeek-V3/R1 model architecture and its AI infrastructure, highlighting key innovations such as Multi-head Latent Attention (MLA) for enhanced memory efficiency, Mixture of Experts (MoE) architectures for optimized computation-communication trade-offs, FP8 mixed-precision training to unlock the full potential of hardware capabilities, and a Multi-Plane Network Topology to minimize cluster-level network overhead. Building on the hardware bottlenecks encountered during DeepSeek-V3's development, we engage in a broader discussion with academic and industry peers on potential future hardware directions, including precise low-precision computation units, scale-up and scale-out convergence, and innovations in low-latency communication fabrics. These insights underscore the critical role of hardware and model co-design in meeting the escalating demands of AI workloads, offering a practical blueprint for innovation in next-generation AI systems.

cover image

Confused by AI agent frameworks? This article makes sense of A2A and MCP.

cover image

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API. - mendableai/firecrawl

22365_3_Prompt Engineering_v7 (1).pdf
7 May 2025
drive.google.com
cover image
Prompt Engineering | Kaggle
7 May 2025
kaggle.com

Kaggle is the world’s largest data science community with powerful tools and resources to help you achieve your data science goals.

cover image

Qwen3 surpassed R1 in LiveBench tests that gauge open-source AI models’ capabilities including coding, maths and data analysis.

cover image

Understanding architectural differences between large language models (LLMs) remains challenging, particularly at academic-scale pretraining (e.g., 1.3B

Dummy’s Guide to Modern LLM Sampling
4 May 2025
simonwillison.net

This is an extremely useful, detailed set of explanations by [@AlpinDale](https://x.com/AlpinDale) covering the various different sampling strategies used by modern LLMs. LLMs return a set of next-token probabilities for every …

Creating an MCP Server Using Go
3 May 2025
eltonminetto.dev

In November 2024, Anthropic published a blog post announcing what may be its most significant contribution to the AI ecosystem so far: the Model Context Protocol.

cover image
ollama with docker compose
3 May 2025
geshan.com.np

Learn how to use Ollama and Open WebUI inside Docker with Docker compose to run any open LLM and create your own mini ChatGPT.

cover image
ollama APIs
3 May 2025
geshan.com.np

Learn how to use Ollama APIs like generate, chat and more like list model, pull model, etc with cURL and Jq with useful examples

cover image

Learn what Ollama is, its features and how to run it on your local machine with DeepSeek R1 and Smollm2 models

cover image

Learn about the important Ollama commands to run Ollama on your local machine with Smollm2 and Qwen 2.5 models

cover image
LLM Projects with Python
3 May 2025
thecleverprogrammer.com

In this article, I'll take you through a list of 10 hands-on LLM projects with Python you should try to master LLMs. LLM Projects with Python.

cover image

MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining - XiaomiMiMo/MiMo

cover image
The Leaderboard Illusion
30 Apr 2025
arxiv.org

Measuring progress is fundamental to the advancement of any scientific field. As benchmarks play an increasingly central role, they also grow more susceptible to distortion. Chatbot Arena has emerged as the go-to leaderboard for ranking the most capable AI systems. Yet, in this work we identify systematic issues that have resulted in a distorted playing field. We find that undisclosed private testing practices benefit a handful of providers who are able to test multiple variants before public release and retract scores if desired. We establish that the ability of these providers to choose the best score leads to biased Arena scores due to selective disclosure of performance results. At an extreme, we identify 27 private LLM variants tested by Meta in the lead-up to the Llama-4 release. We also establish that proprietary closed models are sampled at higher rates (number of battles) and have fewer models removed from the arena than open-weight and open-source alternatives. Both these policies lead to large data access asymmetries over time. Providers like Google and OpenAI have received an estimated 19.2% and 20.4% of all data on the arena, respectively. In contrast, a combined 83 open-weight models have only received an estimated 29.7% of the total data. We show that access to Chatbot Arena data yields substantial benefits; even limited additional data can result in relative performance gains of up to 112% on the arena distribution, based on our conservative estimates. Together, these dynamics result in overfitting to Arena-specific dynamics rather than general model quality. The Arena builds on the substantial efforts of both the organizers and an open community that maintains this valuable evaluation platform. We offer actionable recommendations to reform the Chatbot Arena's evaluation framework and promote fairer, more transparent benchmarking for the field

cover image

DeepSeek is set to drop another model pretty soon, as details about their next DeepSeek R2 model have surfaced on the internet

cover image

We insist that large language models repeatedly translate their mathematical processes into words. There may be a better way.

cover image

A lot has happened this month, especially with the releases of new flagship models like GPT-4.5 and Llama 4. But you might have noticed that reactions to these releases were relatively muted. Why? One reason could be that GPT-4.5 and Llama 4 remain conventional models, which means they were trained without explicit reinforcement learning for reasoning. However, OpenAI's recent release of the o3 reasoning model demonstrates there is still considerable room for improvement when investing compute strategically, specifically via reinforcement learning methods tailored for reasoning tasks. While reasoning alone isn't a silver bullet, it reliably improves model accuracy and problem-solving capabilities on challenging tasks (so far). And I expect reasoning-focused post-training to become standard practice in future LLM pipelines. So, in this article, let's explore the latest developments in reasoning via reinforcement learning.

cover image
How To Build An Agent | Amp
16 Apr 2025
ampcode.com

Building a fully functional, code-editing agent in less than 400 lines.

cover image
humanlayer/12-factor-agents
13 Apr 2025
github.com

What are the principles we can use to build LLM-powered software that is actually good enough to put in the hands of production customers? - humanlayer/12-factor-agents

cover image

Slopsquatting is a new supply chain threat where AI-assisted code generators recommend hallucinated packages that attackers register and weaponize.

An LLM Query Understanding Service
10 Apr 2025
simonwillison.net

Doug Turnbull recently wrote about how [all search is structured now](https://softwaredoug.com/blog/2025/04/02/all-search-structured-now): Many times, even a small open source LLM will be able to turn a search query into reasonable …

cover image
The Man Out to Prove How Dumb AI Still Is
10 Apr 2025
theatlantic.com

François Chollet has constructed the ultimate test for the bots.

cover image

MCP, short for Model Context Protocol, is the hot new standard behind how Large Language Models (LLMs) like Claude, GPT, or Cursor integrate with tools and data. It’s been described as the “USB-C for…

cover image

we explore how combining LightThinker and Multi-Head Latent Attention cuts memory and boosts performance

cover image

We’re introducing Llama 4 Scout and Llama 4 Maverick, the first open-weight natively multimodal models with unprecedented context support and our first built using a mixture-of-experts (MoE) architecture.

cover image
Model Context Protocol (MCP) an overview
6 Apr 2025
philschmid.de

Overview of the Model Context Protocol (MCP) how it works, what are MCP servers and clients, and how to use it.

cover image
Use MCP servers in VS Code (Preview)
6 Apr 2025
code.visualstudio.com

Learn how to configure and use Model Context Protocol (MCP) servers with GitHub Copilot in Visual Studio Code.

cover image

The brother goes on vision quests. The sister is a former English major. Together, they defected from OpenAI, started Anthropic, and built (they say) AI’s most upstanding citizen, Claude.

cover image

The past few years have witnessed the rise in popularity of generative AI and large language models (LLMs), as part of a broad AI revolution.

cover image

Deploying LLMs presents challenges, particularly in optimizing efficiency, managing computational costs, and ensuring high-quality performance. LLM routing has emerged as a strategic solution to these challenges, enabling intelligent task allocation to the most suitable models or tools. Let’s delve into the intricacies of LLM routing, explore various tools and frameworks designed for its implementation, and […]

cover image
First Look at Reasoning From Scratch: Chapter 1
29 Mar 2025
sebastianraschka.com

As you know, I've been writing a lot lately about the latest research on reasoning in LLMs. Before my next research-focused blog post, I wanted to offer something special to my paid subscribers as a thank-you for your ongoing support. So, I've started writing a new book on how reasoning works in LLMs, and here I'm sharing the first Chapter 1 with you. This ~15-page chapter is an introduction reasoning in the context of LLMs and provides an overview of methods like inference-time scaling and reinforcement learning. Thanks for your support! I hope you enjoy the chapter, and stay tuned for my next blog post on reasoning research!

cover image

Thanks to KiwiCo for sponsoring today’s video! Go to https://www.kiwico.com/welchlabs and use code WELCHLABS for 50% off your first monthly club crate or for...

Tracing the thoughts of a large language model
28 Mar 2025
simonwillison.net

In a follow-up to the research that brought us the [delightful Golden Gate Claude](https://simonwillison.net/2024/May/24/golden-gate-claude/) last year, Anthropic have published two new papers about LLM interpretability: - [Circuit Tracing: Revealing Computational …

cover image

What they found challenges some basic assumptions about how this technology really works.

cover image
10 Must-Know Python Libraries for LLMs in 2025
26 Mar 2025
machinelearningmastery.com

In this article, we explore 10 of the Python libraries every developer should know in 2025.

Function calling with Gemma
26 Mar 2025
simonwillison.net

Google's Gemma 3 model (the 27B variant is particularly capable, I've been trying it out [via Ollama](https://ollama.com/library/gemma3)) supports function calling exclusively through prompt engineering. The official documentation describes two recommended …

cover image
Putting Gemini 2.5 Pro through its paces
26 Mar 2025
simonwillison.net

There’s a new release from Google Gemini this morning: the first in the Gemini 2.5 series. Google call it “a thinking model, designed to tackle increasingly complex problems”. It’s already …

cover image
Introducing 4o Image Generation
25 Mar 2025
openai.com

At OpenAI, we have long believed image generation should be a primary capability of our language models. That’s why we’ve built our most advanced image generator yet into GPT‑4o. The result—image generation that is not only beautiful, but useful.

cover image
What is the hallucination index?
25 Mar 2025
dataconomy.com

The Hallucination Index is a benchmark that measures the frequency of inaccuracies in large language models, indicating their reliability and contextual understanding.

cover image
Quickstart | Mistral AI Large Language Models
23 Mar 2025
docs.mistral.ai

[platform_url]//console.mistral.ai/

cover image

Model architectures, data generation, training paradigms, and unified frameworks inspired by LLMs.

cover image

Anthropic launches real-time web search for Claude AI, challenging ChatGPT's dominance while securing $3.5 billion in funding at a $61.5 billion valuation.

cover image

Paris-based artificial intelligence startup Mistral AI has announced the open-source release of its lightweight AI model, Mistral Small 3.1, which the company

Mistral Small 3.1
17 Mar 2025
simonwillison.net

Mistral Small 3 [came out in January](https://simonwillison.net/2025/Jan/30/mistral-small-3/) and was a notable, genuinely excellent local model that used an Apache 2.0 license. Mistral Small 3.1 offers a significant improvement: it's multi-modal …

cover image

The ellmer package for using LLMs with R is a game changer for scientists Why is ellmer a game changer for scientists? In this tutorial we’ll look at how we can access LLM agents through API calls. We’ll use this skill for created structued data fro...

cover image

Catastrophic Forgetting is a phenomenon where neural networks lose previously learned information when trained on new data, similar to human memory loss.

cover image
Top 7 Open-Source LLMs in 2025 - KDnuggets
13 Mar 2025
kdnuggets.com

These models are free to use, can be fine-tuned, and offer enhanced privacy and security since they can run directly on your machine, and match the performance of proprietary solutions like o3-min and Gemini 2.0.

cover image
What are model cards? - Dataconomy
12 Mar 2025
dataconomy.com

Model cards are documentation tools in machine learning that provide essential information about models, promoting transparency, trust, and ethical considerations in AI systems.

cover image
How I use LLMs to help me write code
11 Mar 2025
open.substack.com

Plus CSS view transitions and a major update to llm-openrouter

cover image
On GPT-4.5
8 Mar 2025
thezvi.substack.com

It’s happening.

cover image
The State of LLM Reasoning Models
8 Mar 2025
open.substack.com

Part 1: Inference-Time Compute Scaling Methods

cover image
Mistral OCR
7 Mar 2025
simonwillison.net

New closed-source specialist OCR model by Mistral - you can feed it images or a PDF and it produces Markdown with optional embedded images. It's available [via their API](https://docs.mistral.ai/api/#tag/ocr), or …

cover image
Mistral OCR | Mistral AI
6 Mar 2025
mistral.ai

Introducing the world’s best document understanding API.

llm-ollama 0.9.0
4 Mar 2025
simonwillison.net

This release of the `llm-ollama` plugin adds support for [schemas](https://simonwillison.net/2025/Feb/28/llm-schemas/), thanks to a [PR by Adam Compton](https://github.com/taketwo/llm-ollama/pull/36). Ollama provides very robust support for this pattern thanks to their [structured outputs](https://ollama.com/blog/structured-outputs) …

cover image
Claude 3.7 Sonnet and Claude Code
26 Feb 2025
anthropic.com

Today, we’re announcing Claude 3.7 Sonnet, our most intelligent model to date and the first hybrid reasoning model generally available on the market.

cover image

OpenAI’s Deep Research is built for me, and I can’t use it. It’s another amazing demo, until it breaks. But it breaks in really interesting ways.

cover image
5 Principles for Writing Effective Prompts (2025 Update)
24 Feb 2025
blog.tobiaszwingmann.com

Solid techniques to get really good results from any LLM

cover image

OpenAI's president Greg Brockman recently shared this cool template for prompting their reasoning models o1/o3. Turns out, this is great for ANY reasoning… | 32 comments on LinkedIn

cover image
LLM Leaderboard
21 Feb 2025
artificialanalysis.ai

Comparison and ranking the performance of over 30 AI models (LLMs) across key metrics including quality, price, performance and speed (output speed - tokens per second & latency - TTFT), context window & others.

cover image
Here Are My Go-To AI Tools
17 Feb 2025
open.substack.com

I share my preferences for LLMs, image models, AI video, AI music, AI-powered research, and more. These are the AI tools I regularly use or recommend to others.

cover image

A Step-by-Step Guide to Setting Up a Custom BPE Tokenizer with Tiktoken for Advanced NLP Applications in Python

cover image
We Were Wrong About GPUs
15 Feb 2025
fly.io

Do my tears surprise you? Strong CEOs also cry.

cover image

I just released llm-smollm2, a new plugin for LLM that bundles a quantized copy of the SmolLM2-135M-Instruct LLM inside of the Python package. This means you can now pip install …

cover image
Understanding Reasoning LLMs
5 Feb 2025
sebastianraschka.com

In this article, I will describe the four main approaches to building reasoning models, or how we can enhance LLMs with reasoning capabilities. I hope this p...

cover image

Check out this comparison of 5 AI frameworks to determine which you should choose.

cover image

The Reinforcement Learning from Human Feedback Book

cover image

In our previous tutorial, we built an AI agent capable of answering queries by surfing the web. However, when building agents for longer-running tasks, two critical concepts come into play: persistence and streaming. Persistence allows you to save the state of an agent at any given point, enabling you to resume from that state in future interactions. This is crucial for long-running applications. On the other hand, streaming lets you emit real-time signals about what the agent is doing at any moment, providing transparency and control over its actions. In this tutorial, we’ll enhance our agent by adding these powerful

cover image

Aidan Bench attempts to measure in LLMs. - aidanmclaughlin/AidanBench

OpenAI o3-mini, now available in LLM
31 Jan 2025
simonwillison.net

o3-mini is out today. As with other o-series models it’s a slightly difficult one to evaluate—we now need to decide if a prompt is best run using GPT-4o, o1, o3-mini …

cover image

How a Key-Value (KV) cache reduces Transformer inference time by trading memory for computation

cover image

The field of artificial intelligence is evolving rapidly, with increasing efforts to develop more capable and efficient language models. However, scaling these models comes with challenges, particularly regarding computational resources and the complexity of training. The research community is still exploring best practices for scaling extremely large models, whether they use a dense or Mixture-of-Experts (MoE) architecture. Until recently, many details about this process were not widely shared, making it difficult to refine and improve large-scale AI systems. Qwen AI aims to address these challenges with Qwen2.5-Max, a large MoE model pretrained on over 20 trillion tokens and further refined

cover image

The unusual timing of the Qwen 2.5-Max's release points to the pressure DeepSeek's meteoric rise in the past three weeks has placed on overseas rivals and domestic competition.

cover image
On MLA
28 Jan 2025
planetbanatt.net
cover image
The Illustrated DeepSeek-R1
27 Jan 2025
newsletter.languagemodels.co

A recipe for reasoning LLMs

cover image

AI has entered an era of the rise of competitive and groundbreaking large language models and multimodal models. The development has two sides, one with open source and the other being propriety models. DeepSeek-R1, an open-source AI model developed by DeepSeek-AI, a Chinese research company, exemplifies this trend. Its emergence has challenged the dominance of proprietary models such as OpenAI’s o1, sparking discussions on cost efficiency, open-source innovation, and global technological leadership in AI. Let’s delve into the development, capabilities, and implications of DeepSeek-R1 while comparing it with OpenAI’s o1 system, considering the contributions of both spaces. DeepSeek-R1 DeepSeek-R1 is

cover image

Developers have tricks to stop artificial intelligence from making things up, but large language models are still struggling to tell the truth, the whole truth and nothing but the truth.

cover image
Noteworthy LLM Research Papers of 2024
23 Jan 2025
sebastianraschka.com

This article covers 12 influential AI research papers of 2024, ranging from mixture-of-experts models to new LLM scaling laws for precision..

LLM 0.20
23 Jan 2025
simonwillison.net

New release of my [LLM](https://llm.datasette.io/) CLI tool and Python library. A bunch of accumulated fixes and features since the start of December, most notably: - Support for OpenAI's [o1 model](https://platform.openai.com/docs/models#o1) …

cover image

The company built a cheaper, competitive chatbot with fewer high-end computer chips than U.S. behemoths like Google and OpenAI, showing the limits of chip export control.

cover image

DeepSeek are the Chinese AI lab who dropped the best currently available open weights LLM on Christmas day, DeepSeek v3. That model was trained in part using their unreleased R1 …

cover image

The rapid advancement and widespread adoption of generative AI systems across various domains have increased the critical importance of AI red teaming for evaluating technology safety and security. While AI red teaming aims to evaluate end-to-end systems by simulating real-world attacks, current methodologies face significant challenges in effectiveness and implementation. The complexity of modern AI systems, with their expanding capabilities across multiple modalities including vision and audio, has created an unprecedented array of potential vulnerabilities and attack vectors. Moreover, integrating agentic systems that grant AI models higher privileges and access to external tools has substantially increased the attack surface and

New paper from Microsoft describing their top eight lessons learned red teaming (deliberately seeking security vulnerabilities in) 100 different generative AI models and products over the past few years. …

cover image

This is a standalone notebook implementing the popular byte pair encoding (BPE) tokenization algorithm, which is used in models like GPT-2 to GPT-4, Llama 3,...

cover image
This Rumor About GPT-5 Changes Everything
17 Jan 2025
open.substack.com

Let’s start the year on an exciting note

cover image
The 2025 AI Engineering Reading List
14 Jan 2025
latent.space

We picked 50 paper/models/blogs across 10 fields in AI Eng: LLMs, Benchmarks, Prompting, RAG, Agents, CodeGen, Vision, Voice, Diffusion, Finetuning. If you're starting from scratch, start here.

cover image
Agents
12 Jan 2025
huyenchip.com

Intelligent agents are considered by many to be the ultimate goal of AI. The classic book by Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach (Prentice Hall, 1995), defines the field of AI research as “the study and design of rational agents.”

cover image
100 Must-Read Generative AI Papers from 2024
12 Jan 2025
open.substack.com

A comprehensive list of some of the most impactful generative papers from last year

cover image

[caption align=

cover image

Two powerful workflows that unlock everything else. Intro: Golden Age of AI Tools and AI agent frameworks begins in 2025.

cover image

A long reading list of evals papers with recommendations and comments by the evals team.

cover image
Things we learned out about LLMs in 2024
31 Dec 2024
simonwillison.net

A lot has happened in the world of Large Language Models over the course of 2024. Here’s a review of things we figured out about the field in the past …

cover image
How to Build a Graph RAG App
30 Dec 2024
towardsdatascience.com

Using knowledge graphs and AI to retrieve, filter, and summarize medical journal articles

cover image
Gemini 2.0 Flash "Thinking Mode"
24 Dec 2024
open.substack.com

Plus building Python tools with a one-shot prompt using uv run and Claude Projects

cover image
LLM Research Papers: The 2024 List
22 Dec 2024
magazine.sebastianraschka.com

A curated list of interesting LLM-related research papers from 2024, shared for those looking for something to read over the holidays.

cover image
Why AI language models choke on too much text
22 Dec 2024
arstechnica.com

Compute costs scale with the square of the input size. That’s not great.

cover image

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step - rasbt/LLMs-from-scratch

cover image

Large Language Models (LLMs) have become a cornerstone of artificial intelligence, driving advancements in natural language processing and decision-making tasks. However, their extensive power demands, resulting from high computational overhead and frequent external memory access, significantly hinder their scalability and deployment, especially in energy-constrained environments such as edge devices. This escalates the cost of operation while also limiting accessibility to these LLMs, which therefore calls for energy-efficient approaches designed to handle billion-parameter models. Current approaches to reduce the computational and memory needs of LLMs are based either on general-purpose processors or on GPUs, with a combination of weight quantization and

cover image

The artificial intelligence start-up said the new system, OpenAI o3, outperformed leading A.I. technologies on tests that rate skills in math, science, coding and logic.

cover image
Building effective agents \ Anthropic
19 Dec 2024
anthropic.com

A post for developers with advice and workflows for building effective AI agents

cover image

Large Language Models (LLMs) have achieved remarkable advancements in natural language processing (NLP), enabling applications in text generation, summarization, and question-answering. However, their reliance on token-level processing—predicting one word at a time—presents challenges. This approach contrasts with human communication, which often operates at higher levels of abstraction, such as sentences or ideas. Token-level modeling also struggles with tasks requiring long-context understanding and may produce outputs with inconsistencies. Moreover, extending these models to multilingual and multimodal applications is computationally expensive and data-intensive. To address these issues, researchers at Meta AI have proposed a new approach: Large Concept Models (LCMs). Large Concept

cover image

Large language models (LLMs) can understand and generate human-like text by encoding vast knowledge repositories within their parameters. This capacity enables them to perform complex reasoning tasks, adapt to various applications, and interact effectively with humans. However, despite their remarkable achievements, researchers continue to investigate the mechanisms underlying the storage and utilization of knowledge in these systems, aiming to enhance their efficiency and reliability further. A key challenge in using large language models is their propensity to generate inaccurate, biased, or hallucinatory outputs. These problems arise from a limited understanding of how such models organize and access knowledge. Without clear

cover image

This blog explores a detailed comparison between the OpenAI API and LangChain, highlighting key differences in performance and developer experience and the low level code for why these differences exist.

cover image
Transformers Key-Value (KV) Caching Explained
12 Dec 2024
towardsdatascience.com

Speed up your LLM inference

cover image

There has been an increasing amount of fear, uncertainty and doubt (FUD) regarding AI Scaling laws. A cavalcade of part-time AI industry prognosticators have latched on to any bearish narrative the…

cover image

It’s largely up to companies to test whether their AI is capable of superhuman harm. At Anthropic, the Frontier Red Team assesses the risk of catastrophe.

cover image

In large language models (LLMs), “hallucination” refers to instances where models generate semantically or syntactically plausible outputs but are factually incorrect or nonsensical. For example, a hallucination occurs when a model provides erroneous information, such as stating that Addison's disease causes “bright yellow skin” when, in fact, it causes fatigue and low blood pressure. This phenomenon is a significant concern in AI, as it can lead to the spread of false or misleading information. The issue of AI hallucinations has been explored in various research studies. A survey in “ACM Computing Surveys” describes hallucinations as “unreal perceptions that feel real.”

cover image
Countless.dev | AI Model Comparison
7 Dec 2024
countless.dev

Compare AI models easily! All providers in one place.

cover image

LLMs are driving major advances in research and development today. A significant shift has been observed in research objectives and methodologies toward an LLM-centric approach. However, they are associated with high expenses, making LLMs for large-scale utilization inaccessible to many. It is, therefore, a significant challenge to reduce the latency of operations, especially in dynamic applications that demand responsiveness. KV cache is used for autoregressive decoding in LLMs. It stores key-value pairs in multi-headed attention during the pre-filling phase of inference. During the decoding stage, new KV pairs get appended to the memory. KV cache stores the intermediate key and

cover image
How to Build a General-Purpose LLM Agent
5 Dec 2024
towardsdatascience.com

A Step-by-Step Guide

cover image
Treemap
5 Dec 2024
aiworld.eu

Navigate Tomorrow's Intelligence Today

cover image

Kapa.ai turns your knowledge base into a reliable and production-ready LLM-powered AI assistant that answers technical questions instantly. Trusted by 100+ startups and enterprises incl. OpenAI, Docker, Mapbox, Mixpanel and NextJS.

cover image

Psst, kid, want some cheap and small LLMs?

cover image

The advent of LLMs has propelled advancements in AI for decades. One such advanced application of LLMs is Agents, which replicate human reasoning remarkably. An agent is a system that can perform complicated tasks by following a reasoning process similar to humans: think (solution to the problem), collect (context from past information), analyze(the situations and data), and adapt (based on the style and feedback). Agents encourage the system through dynamic and intelligent activities, including planning, data analysis, data retrieval, and utilizing the model's past experiences.  A typical agent has four components: Brain: An LLM with advanced processing capabilities, such as

cover image

Notes from the Latent Space paper club. Follow along or start your own! - eugeneyan/llm-paper-notes

cover image
Understanding Multimodal LLMs
21 Nov 2024
magazine.sebastianraschka.com

An introduction to the main techniques and latest models

cover image
Something weird is happening with LLMs and chess
17 Nov 2024
open.substack.com

Are they good or bad?

9 October 2024, Mathias Parisot, Jakub Zavrel.Even in the red hot global race for AI dominance, you publish and you perish, unless your peers pick up your work, build further on it, and you manage to drive real progress in the field. And of course, we are all very curious who is currently having that kind of impact. Are the billions of dollars spent on AI R&D paying off in the long run? So here is, in continuation of our popular publication impact analysis of last year, Zeta Alpha's ranking of t

cover image

LLM Chunking, Indexing, Scoring and Agents, in a Nutshell. The new PageRank of RAG/LLM. With details on building relevancy scores.

cover image
Developing a computer use model
28 Oct 2024
anthropic.com

A discussion of how Anthropic's researchers developed Claude's new computer use skill, along with some relevant safety considerations

cover image
5 LLM Tools I Can’t Live Without
19 Oct 2024
kdnuggets.com

In this article, I share the five essential LLM tools that I currently find indispensable, and which have the potential to help revolutionize the way you work.

cover image

Anthropic, the AI vendor second in size only to OpenAI, has a powerful family of generative AI models called Claude. These models can perform a range of

cover image

Nvidia quietly launched a groundbreaking AI model that surpasses OpenAI’s GPT-4 and Anthropic’s Claude 3.5, signaling a major shift in the competitive landscape of artificial intelligence.

cover image
dpo-from-scratch.ipynb
4 Aug 2024
github.com

Implementing a ChatGPT-like LLM in PyTorch from scratch, step by step - rasbt/LLMs-from-scratch

cover image

Understanding the mechanistic interpretability research problem and reverse-engineering these large language models

cover image

Llama 3.1 is the latest version of Meta's large language models, with a new model weight, 405 billion parameters, the biggest model it's trained.

cover image

The newly unveiled Llama 3.1 collection of 8B, 70B, and 405B large language models (LLMs) is narrowing the gap between proprietary and open-source models. Their open nature is attracting more…

cover image

Meta announced the release of Llama 3.1, the most capable model in the LLama Series. This latest iteration of the Llama series, particularly the 405B model, represents a substantial advancement in open-source AI capabilities, positioning Meta at the forefront of AI innovation.  Meta has long advocated for open-source AI, a stance underscored by Mark Zuckerberg’s assertion that open-source benefits developers, Meta, and society. Llama 3.1 embodies this philosophy by offering state-of-the-art capabilities in an openly accessible model. The release aims to democratize AI, making cutting-edge technology available to various users and applications. The Llama 3.1 405B model stands out for

cover image

Meta llama 3.1 405b kicks off a fresh chapter for open-source language models. This breakthrough brings unmatched skills to AI

cover image

A deep dive into absolute, relative, and rotary positional embeddings with code examples

cover image
Claude 3.5 Sonnet
15 Jul 2024
anthropic.com

Introducing Claude 3.5 Sonnet—our most intelligent model yet. Sonnet now outperforms competitor models and Claude 3 Opus on key evaluations, at twice the speed.

cover image

In addition to its practical implications, recent work on “meaning representations” could shed light on some old philosophical questions.

cover image

Anyscale is the leading AI application platform. With Anyscale, developers can build, run and scale AI applications instantly.

cover image

We would like to thank Voltage Park, Dell, H5, and NVIDIA for their invaluable partnership and help with setting up our cluster. A special…

cover image

Experience the leading models to build enterprise generative AI apps now.

cover image

AI startup Gradient and cloud platform Crusoe teamed up to extend the context window of Meta's Llama 3 models to 1 million tokens.

cover image

In the developing field of Artificial Intelligence (AI), the ability to think quickly has become increasingly significant. The necessity of communicating with AI models efficiently becomes critical as these models get more complex. In this article we will explain a number of sophisticated prompt engineering strategies, simplifying these difficult ideas through straightforward human metaphors. The techniques and their examples have been discussed to see how they resemble human approaches to problem-solving. Chaining Methods Analogy: Solving a problem step-by-step. Chaining techniques are similar to solving an issue one step at a time. Chaining techniques include directing the AI via a systematic

cover image

Evaluating Large Language Models (LLMs) is a challenging problem in language modeling, as real-world problems are complex and variable. Conventional benchmarks frequently fail to fully represent LLMs' all-encompassing performance. A recent LinkedIn post has emphasized a number of important measures that are essential to comprehend how well new models function, which are as follows. MixEval Achieving a balance between thorough user inquiries and effective grading systems is necessary for evaluating LLMs. Conventional standards based on ground truth and LLM-as-judge benchmarks encounter difficulties such as biases in grading and possible contamination over time.  MixEval solves these problems by combining real-world user

cover image

In the rapidly advancing field of Artificial Intelligence (AI), effective use of web data can lead to unique applications and insights. A recent tweet has brought attention to Firecrawl, a potent tool in this field created by the Mendable AI team. Firecrawl is a state-of-the-art web scraping program made to tackle the complex problems involved in getting data off the internet. Web scraping is useful, but it frequently requires overcoming various challenges like proxies, caching, rate limitations, and material generated with JavaScript. Firecrawl is a vital tool for data scientists because it addresses these issues head-on. Even without a sitemap,

cover image
Let's reproduce GPT-2 (124M)
19 Jun 2024
m.youtube.com

We reproduce the GPT-2 (124M) from scratch. This video covers the whole process: First we build the GPT-2 network, then we optimize its training to be really fast, then we set up the training run following the GPT-2 and GPT-3 paper and their hyperparameters, then we hit run, and come back the next morning to see our results, and enjoy some amusing model generations. Keep in mind that in some places this video builds on the knowledge from earlier videos in the Zero to Hero Playlist (see my channel). You could also see this video as building my nanoGPT repo, which by the end is about 90% similar. Links: - build-nanogpt GitHub repo, with all the changes in this video as individual commits: https://github.com/karpathy/build-nanogpt - nanoGPT repo: https://github.com/karpathy/nanoGPT - llm.c repo: https://github.com/karpathy/llm.c - my website: https://karpathy.ai - my twitter: https://twitter.com/karpathy - our Discord channel: https://discord.gg/3zy8kqD9Cp Supplementary links: - Attention is All You Need paper: https://arxiv.org/abs/1706.03762 - OpenAI GPT-3 paper: https://arxiv.org/abs/2005.14165 - OpenAI GPT-2 paper: https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf- The GPU I'm training the model on is from Lambda GPU Cloud, I think the best and easiest way to spin up an on-demand GPU instance in the cloud that you can ssh to: https://lambdalabs.com Chapters: 00:00:00 intro: Let’s reproduce GPT-2 (124M) 00:03:39 exploring the GPT-2 (124M) OpenAI checkpoint 00:13:47 SECTION 1: implementing the GPT-2 nn.Module 00:28:08 loading the huggingface/GPT-2 parameters 00:31:00 implementing the forward pass to get logits 00:33:31 sampling init, prefix tokens, tokenization 00:37:02 sampling loop 00:41:47 sample, auto-detect the device 00:45:50 let’s train: data batches (B,T) → logits (B,T,C) 00:52:53 cross entropy loss 00:56:42 optimization loop: overfit a single batch 01:02:00 data loader lite 01:06:14 parameter sharing wte and lm_head 01:13:47 model initialization: std 0.02, residual init 01:22:18 SECTION 2: Let’s make it fast. GPUs, mixed precision, 1000ms 01:28:14 Tensor Cores, timing the code, TF32 precision, 333ms 01:39:38 float16, gradient scalers, bfloat16, 300ms 01:48:15 torch.compile, Python overhead, kernel fusion, 130ms 02:00:18 flash attention, 96ms 02:06:54 nice/ugly numbers. vocab size 50257 → 50304, 93ms 02:14:55 SECTION 3: hyperpamaters, AdamW, gradient clipping 02:21:06 learning rate scheduler: warmup + cosine decay 02:26:21 batch size schedule, weight decay, FusedAdamW, 90ms 02:34:09 gradient accumulation 02:46:52 distributed data parallel (DDP) 03:10:21 datasets used in GPT-2, GPT-3, FineWeb (EDU) 03:23:10 validation data split, validation loss, sampling revive 03:28:23 evaluation: HellaSwag, starting the run 03:43:05 SECTION 4: results in the morning! GPT-2, GPT-3 repro 03:56:21 shoutout to llm.c, equivalent but faster code in raw C/CUDA 03:59:39 summary, phew, build-nanogpt github repo Corrections: I will post all errata and followups to the build-nanogpt GitHub repo (link above) SuperThanks: I experimentally enabled them on my channel yesterday. Totally optional and only use if rich. All revenue goes to to supporting my work in AI + Education.

cover image

Run an open source language model in your local machine and remotely.

cover image

Midjourney model personalization is now live, offering you a more tailored image generation experience by teaching the AI your preferences.

cover image
How to use Perplexity in your PM work
12 Jun 2024
lennysnewsletter.com

27 examples (with actual prompts) of how product managers are using Perplexity today

cover image

The linear representation hypothesis is the informal idea that semantic concepts are encoded as linear directions in the representation spaces of large language models (LLMs). Previous work has...

cover image

The ability to discern relevant and essential information from noise is paramount in AI, particularly within large language models (LLMs). With the surge of information and the complexity of tasks, there's a need for efficient mechanisms to enhance the performance and reliability of these models. Let’s explore the essential tools & techniques for refining LLMs and delivering precise, actionable insights. The focus will be on Retrieval-Augmented Generation (RAG), agentic functions, Chain of Thought (CoT) prompting, few-shot learning, prompt engineering, and prompt optimization. Retrieval-Augmented Generation (RAG): Providing Relevant Context RAG combines the power of retrieval mechanisms with generative models, ensuring that

cover image

Choosing large language models (LLMs) tailored for specific tasks is crucial for maximizing efficiency and accuracy. With natural language processing (NLP) advancements, different models have emerged, each excelling in unique domains. Here is a comprehensive guide to the most suitable LLMs for various activities in the AI world. Hard Document Understanding: Claude Opus Claude Opus excels at tasks requiring deep understanding and interpretation of complex documents. This model excels in parsing dense legal texts, scientific papers, and intricate technical manuals. Claude Opus is designed to handle extensive context windows, ensuring it captures nuanced details and complicated relationships within the text.

cover image
Three Things to Know About Prompting LLMs
11 Jun 2024
sloanreview.mit.edu

Apply these techniques when crafting prompts for large language models to elicit more relevant responses.

cover image

In most cases, Perplexity produced the desired Pages, but what we found missing was the option to edit the content manually.

cover image

We tested OpenAI’s ChatGPT against Microsoft’s Copilot and Google’s Gemini, along with Perplexity and Anthropic’s Claude. Here’s how they ranked.

cover image

if the centralizing forces of data and compute hold, open and closed-source AI cannot both dominate long-term

cover image

Vision-language models (VLMs), capable of processing both images and text, have gained immense popularity due to their versatility in solving a wide range of tasks, from information retrieval in scanned documents to code generation from screenshots. However, the development of these powerful models has been hindered by a lack of understanding regarding the critical design choices that truly impact their performance. This knowledge gap makes it challenging for researchers to make meaningful progress in this field. To address this issue, a team of researchers from Hugging Face and Sorbonne Université conducted extensive experiments to unravel the factors that matter the

cover image

What goes on in artificial neural networks work is largely a mystery, even to their creators. But researchers from Anthropic have caught a glimpse.

cover image
naklecha/llama3-from-scratch
21 May 2024
github.com

llama3 implementation one matrix multiplication at a time - naklecha/llama3-from-scratch

cover image

Artificial intelligence (AI) has revolutionized various fields by introducing advanced models for natural language processing (NLP). NLP enables computers to understand, interpret, and respond to human language in a valuable way. This field encompasses text generation, translation, and sentiment analysis applications, significantly impacting industries like healthcare, finance, and customer service. The evolution of NLP models has driven these advancements, continually pushing the boundaries of what AI can achieve in understanding and generating human language. Despite these advancements, developing models that can effectively handle complex multi-turn conversations remains a persistent challenge. Existing models often fail to maintain context and coherence over

cover image

Now that LLMs can retrieve 1 million tokens at once, how long will it be until we don’t need retrieval augmented generation for accurate AI responses?

cover image

What a month! We had four major open LLM releases: Mixtral, Meta AI's Llama 3, Microsoft's Phi-3, and Apple's OpenELM. In my new article, I review and discus...

cover image

The capacity of large language models (LLMs) to produce adequate text in various application domains has caused a revolution in natural language creation. These models are essentially two types: 1) Most model weights and data sources are open source. 2) All model-related information is publicly available, including training data, data sampling ratios, training logs, intermediate checkpoints, and assessment methods (Tiny-Llama, OLMo, and StableLM 1.6B). Full access to open language models for the research community is vital for thoroughly investigating these models' capabilities and limitations and understanding their inherent biases and potential risks. This is necessary despite the continued breakthroughs in

cover image

We introduce a decoder-decoder architecture, YOCO, for large language models, which only caches key-value pairs once. It consists of two components, i.e., a cross-decoder stacked upon a...

cover image

Generative AI (GenAI) tools have come a long way. Believe it or not, the first generative AI tools were introduced in the 1960s in a Chatbot. Still, it was only in 2014 that generative adversarial networks (GANs) were introduced, a type of Machine Learning (ML) algorithm that allowed generative AI to finally create authentic images, videos, and audio of real people. In 2024, we can create anything imaginable using generative AI tools like ChatGPT, DALL-E, and others.  However, there is a problem. We can use those AI tools but can not get the most out of them or use them

cover image
Cleaning
11 May 2024
docs.unstructured.io

As part of data preparation for an NLP model, it’s common to need to clean up your data prior to passing it into the model. If there’s unwanted content in your output, for example, it could impact the quality of your NLP model. To help with this, the `unstructured` library includes cleaning functions to help users sanitize output before sending it to downstream applications.

cover image

Large language models such as GPT and Llama are trained with a next-token prediction loss. In this work, we suggest that training language models to predict multiple future tokens at once results...

cover image

The rapid evolution in AI demands models that can handle large-scale data and deliver accurate, actionable insights. Researchers in this field aim to create systems capable of continuous learning and adaptation, ensuring they remain relevant in dynamic environments. A significant challenge in developing AI models lies in overcoming the issue of catastrophic forgetting, where models fail to retain previously acquired knowledge when learning new tasks. This challenge becomes more pressing as applications increasingly demand continuous learning capabilities. For instance, models must update their understanding of healthcare, financial analysis, and autonomous systems while retaining prior knowledge to make informed decisions. The

cover image
Hugging Face - Documentation
5 May 2024
huggingface.co

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

cover image

Are you curious about the intricate world of large language models (LLMs) and the technical jargon that surrounds them? Understanding the terminology, from the foundational aspects of training and fine-tuning to the cutting-edge concepts of transformers and reinforcement learning, is the first step towards demystifying the powerful algorithms that drive modern AI language systems. In this article, we delve into 25 essential terms to enhance your technical vocabulary and provide insights into the mechanisms that make LLMs so transformative. Heatmap representing the relative importance of terms in the context of LLMs Source: marktechpost.com 1. LLM (Large Language Model) Large Language

cover image

Prompt Fuzzer: The Prompt Fuzzer is an interactive tool designed to evaluate the security of GenAI application system prompts by simulating various dynamic LLM-based attacks. It assesses security by analyzing the results of these simulations, helping users fortify their system prompts accordingly. This tool specifically customizes its tests to fit the unique configuration and domain of the user's application. The Fuzzer also features a Playground chat interface, allowing users to refine their system prompts iteratively, enhancing their resilience against a broad range of generative AI attacks. Users should be aware that using the Prompt Fuzzer will consume tokens. Garak: Garak

cover image

The models have some pretty good general knowledge.

cover image

A collection of notebooks/recipes showcasing some fun and effective ways of using Claude. - anthropics/anthropic-cookbook

cover image

Deep learning architectures have revolutionized the field of artificial intelligence, offering innovative solutions for complex problems across various domains, including computer vision, natural language processing, speech recognition, and generative models. This article explores some of the most influential deep learning architectures: Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Generative Adversarial Networks (GANs), Transformers, and Encoder-Decoder architectures, highlighting their unique features, applications, and how they compare against each other. Convolutional Neural Networks (CNNs) CNNs are specialized deep neural networks for processing data with a grid-like topology, such as images. A CNN automatically detects the important features without any human supervision.

cover image
Tips for LLM Pretraining and Evaluating Reward Models
15 Apr 2024
magazine.sebastianraschka.com

Discussing AI Research Papers in March 2024

My startup Truss (gettruss.io) released a few LLM-heavy features in the last six months, and the narrative around LLMs that I read on Hacker News is now starting to diverge from my reality, so I thought I’d share some of the more “surprising” lessons after churning through just north of 500 million tokens, by my […]

cover image
5 Ways To Use LLMs On Your Laptop
13 Apr 2024
kdnuggets.com

Run large language models on your local PC for customized AI capabilities with more control, privacy, and personalization.

cover image

Gemini 1.5 Pro launch, new version of GPT-4 Turbo, new Mistral model, and more.

cover image
Peter Gostev’s Post
10 Apr 2024
linkedin.com

We are seeing some clear categories emerge in the world of LLMs - 1) affordable (~$1 per million tokens); 2) mid-range ($8/m) and 3) top end ($25-50/m)… | 32 comments on LinkedIn

cover image

In the world of LLMs, there is a phenomenon known as "hallucinations." These hallucinations are...

cover image

The top open source Large Language Models available for commercial use are as follows. Llama - 2 Meta released Llama 2, a set of pretrained and refined LLMs, along with Llama 2-Chat, a version of Llama 2. These models are scalable up to 70 billion parameters. It was discovered after extensive testing on safety and helpfulness-focused benchmarks that Llama 2-Chat models perform better than current open-source models in most cases. Human evaluations have shown that they align well with several closed-source models.  The researchers have even taken a few steps to guarantee the security of these models. This includes annotating

cover image
LLaMA Now Goes Faster on CPUs
2 Apr 2024
justine.lol

I wrote 84 new matmul kernels to improve llamafile CPU performance.

cover image

Researchers find large language models use a simple mechanism to retrieve stored knowledge when they respond to a user prompt. These mechanisms can be leveraged to see what the model knows about different subjects and possibly to correct false information it has stored.

cover image
ChatGPT vs Perplexity AI: AI App Comparison
1 Apr 2024
marktechpost.com

What is ChatGPT? ChatGPT, developed by OpenAI, is an AI platform renowned for its conversational AI capabilities. Leveraging the power of the Generative Pre-trained Transformer models, ChatGPT generates human-like text responses across various topics, from casual conversations to complex, technical discussions. Its ability to engage users with coherent, contextually relevant dialogues stands out, making it highly versatile for various applications, including content creation, education, customer service, and more. Its integration with tools like DALL-E for image generation from textual descriptions and its continual updates for enhanced performance showcase its commitment to providing an engaging and innovative user experience. ChatGPT Key

cover image
Mamba Explained
30 Mar 2024
thegradient.pub

Is Attention all you need? Mamba, a novel AI model based on State Space Models (SSMs), emerges as a formidable alternative to the widely used Transformer models, addressing their inefficiency in processing long sequences.

cover image

We like datacenter compute engines here at The Next Platform, but as the name implies, what we really like are platforms – how compute, storage,

cover image

Large language models do better at solving problems when they show their work. Researchers are beginning to understand why.

cover image
Why and How to Achieve Longer Context Windows for LLMs
11 Mar 2024
towardsdatascience.com

Language models (LLMs) have revolutionized the field of natural language processing (NLP) over the last few years, achieving…

cover image

Reference architecture patterns and mental models for working with Large Language Models (LLM’s)

cover image

We’re releasing an open source system, based on FSDP and QLoRA, that can train a 70b model on two 24GB GPUs.

cover image

Training a specialized LLM over your own data is easier than you think…

cover image

The search giant is unifying its AI-assistant efforts under one name and trying to show it can match rivals.

cover image
Anthropic’s Post
5 Mar 2024
linkedin.com

Today, we're announcing the Claude 3 model family, which sets new industry benchmarks across a wide range of cognitive tasks. The family includes three… | 429 comments on LinkedIn

cover image

The Amazon-backed AI startup said its "most intelligent model" outperformed OpenAI's powerful GPT-4

cover image
rasbt/LLMs-from-scratch
29 Feb 2024
github.com

Implementing a ChatGPT-like LLM in PyTorch from scratch, step by step - rasbt/LLMs-from-scratch

cover image

Understanding how well they comprehend and organize information is crucial in advanced language models. A common challenge arises in visualizing the intricate relationships between different document parts, especially when using complex models like the Retriever-Answer Generator (RAG). Existing tools can only sometimes provide a clear picture of how chunks of information relate to each other and specific queries. Several attempts have been made to address this issue, but they often need to deliver the need to provide an intuitive and interactive solution. These tools need help breaking down documents into manageable pieces and visualizing their semantic landscape effectively. As a

cover image

Step into the future of video creation with Google Lumiere, the latest breakthrough from Google Research that promises to redefine

cover image

Keep up with the latest ML research

cover image
The killer app of Gemini Pro 1.5 is video
29 Feb 2024
simonwillison.net

Last week Google introduced Gemini Pro 1.5, an enormous upgrade to their Gemini series of AI models. Gemini Pro 1.5 has a 1,000,000 token context size. This is huge—previously that …

cover image
Understanding Direct Preference Optimization
29 Feb 2024
towardsdatascience.com

This blog post will look at the “Direct Preference Optimization: Your Language Model is Secretly a Reward Model” paper and its findings.

cover image

When it comes to context windows, size matters

cover image

Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single...

cover image

Are you looking for the news everyday for Sora early access like us? Well you are absolutely right because OpenAI's

cover image

Mistral Large is our flagship model, with top-tier reasoning capacities. It is also available on Azure.

cover image
Claude
22 Feb 2024
claude.ai

Talk with Claude, an AI assistant from Anthropic

A deep dive into the internals of a small transformer model to learn how it turns self-attention calculations into accurate predictions for the next token.

cover image

We will deep dive into understanding how transformer model work like BERT(Non-mathematical Explanation of course!). system design to use the transformer to build a Sentiment Analysis

cover image

Faster than Nvidia? Dissecting the economics

cover image

In artificial intelligence, the capacity of Large Language Models (LLMs) to negotiate mirrors a leap toward achieving human-like interactions in digital negotiations. At the heart of this exploration is the NEGOTIATION ARENA, a pioneering framework devised by researchers from Stanford University and Bauplan. This innovative platform delves into the negotiation prowess of LLMs, offering a dynamic environment where AI can mimic, strategize, and engage in nuanced dialogues across a spectrum of scenarios, from splitting resources to intricate trade and price negotiations. The NEGOTIATION ARENA is a tool and a gateway to understanding how AI can be shaped to think, react,

cover image
Sora
17 Feb 2024
openai.com

Sora is an AI model that can create realistic and imaginative scenes from text instructions.

cover image

LoRA (Low-Rank Adaptation) is a popular technique to finetune LLMs more efficiently. This Studio explains how LoRA works by coding it from scratch, which is an excellent exercise for looking under …

cover image

AI community is once again filled with excitement as Bard is now Gemini and Gemini Advanced offering users an exceptional

Ask HN: What have you built with LLMs?
11 Feb 2024
news.ycombinator.com
cover image

The use of NLP in the realm of financial technology is broad and complex, with applications ranging from sentiment analysis and named entity recognition to question answering. Large Language...

cover image

Zephyr is a series of Large Language Models released by Hugging Face trained using distilled supervised fine-tuning (dSFT) on larger models with significantly improved task accuracy.

cover image

LlamaIndex is a simple, flexible data framework for connecting custom data sources to large language models (LLMs).

cover image

This article will teach you about self-attention mechanisms used in transformer architectures and large language models (LLMs) such as GPT-4 and Llama.

cover image
Dashboard - SciSummary
16 Jan 2024
scisummary.com

AI Driven tools for researchers and students. Use AI to summarize and understand scientific articles and research papers.

cover image

Autoregressive language models have excelled at predicting the subsequent subword in a sentence without the need for any predefined grammar or parsing concepts. This method has been expanded to include continuous data domains like audio and image production, where data is represented as discrete tokens, much like language model vocabularies. Due to their versatility, sequence models have attracted interest for use in increasingly complicated and dynamic contexts, such as behavior. Road users are compared to participants in a continuous conversation when driving since they exchange actions and replies. The question is whether similar sequence models may be used to forecast

cover image

As Midjourney rolls out new features, it continues to make some artists furious.

cover image
10 Noteworthy AI Research Papers of 2023
7 Jan 2024
magazine.sebastianraschka.com

This year has felt distinctly different. I've been working in, on, and with machine learning and AI for over a decade, yet I can't recall a time when these fields were as popular and rapidly evolving as they have been this year. To conclude an eventful 2023 in machine learning and AI research, I'm excited to share 10 noteworthy papers I've read this year. My personal focus has been more on large language models, so you'll find a heavier emphasis on large language model (LLM) papers than computer vision papers this year.

cover image

Large Language Models (LLMs) have unlocked a new era in natural language processing. So why not learn more about them? Go from learning what large language models are to building and deploying LLM apps in 7 easy steps with this guide.

cover image

The emergence of Large Language Models (LLMs) in natural language processing represents a groundbreaking development. These models, trained on vast amounts of data and leveraging immense computational resources, promise to transform human interactions with the digital world. As they evolve through scaling and rapid deployment, their potential use cases become increasingly intricate and complex. They extend their capabilities to tasks such as analyzing dense, knowledge-rich documents, enhancing chatbot experiences to make them more genuine and engaging, and assisting human users in iterative creative processes like coding and design. One crucial feature that empowers this evolution is the capacity to effectively

cover image

In a comparative study, Researchers from Nvidia investigated the impact of retrieval augmentation and context window size on the performance of large language models (LLMs) in downstream tasks. The findings reveal that retrieval augmentation consistently enhances LLM performance, irrespective of context window size. Their research sheds light on the effectiveness of retrieval mechanisms in optimizing LLMs for various applications. Researchers delve into the domain of long-context language models, investigating the efficacy of retrieval augmentation and context window size in enhancing LLM performance across various downstream tasks. It conducts a comparative analysis of different pretrained LLMs, demonstrating that retrieval mechanisms significantly

cover image

LoRA is one of the most widely used, parameter-efficient finetuning techniques for training custom LLMs. From saving memory with QLoRA to selecting the optimal LoRA settings, this article provides practical insights for those interested in applying it.

cover image

As a machine learning engineer who has witnessed the rise of Large Language Models (LLMs), I find it daunting to comprehend how the ecosystem surrounding LLMs is developing.

cover image

Unlock the power of GPT-4 summarization with Chain of Density (CoD), a technique that attempts to balance information density for high-quality summaries.

cover image

Our weekly selection of must-read Editors’ Picks and original features

cover image

In this guide, we will learn how to develop and productionize a retrieval augmented generation (RAG) based LLM application, with a focus on scale and evaluation.

cover image

The definitive guide for choosing the right method for your use case

cover image

Discuss the concept of large language models (LLMs) and how they are implemented with a set of data to develop an application. Joas compares a collection of no-code and low-code apps designed to help you get a feel for not only how the concept works but also to get a sense of what types of models are available to train AI on different skill sets.

cover image
Augmenting LLMs with RAG
20 Oct 2023
towardsdatascience.com

An End to End Example Of Seeing How Well An LLM Model Can Answer Amazon SageMaker Related Questions

cover image

Explore how the Skeleton-of-Thought prompt engineering technique enhances generative AI by reducing latency, offering structured output, and optimizing projects.

cover image

In the past few years we have seen the meteoric appearance of dozens of foundation models of the Transformer family, all of which have memorable and sometimes funny, but not self-explanatory,...

cover image
Hey, Computer, Make Me a Font
4 Oct 2023
serce.me

This is a story of my journey learning to build generative ML models from scratch and teaching a computer to create fonts in the process.

Eliciting product feedback elegantly is a competitive advantage for LLM-software. Over the weekend, I queried Google’s Bard, & noticed the elegant feedback loop the product team has incorporated into their product. I asked Bard to compare the 3rd-row leg room of the leading 7-passenger SUVs. At the bottom of the post is a little G button, which double-checks the response using Google searches. I decided to click it. This is what I would be doing in any case ; spot-checking some of the results.

cover image

Participants rated Bing Chat as less helpful and trustworthy than ChatGPT or Bard. These results can be attributed to Bing’s richer yet imperfect UI and to its poorer information aggregation.

cover image
Bard
3 Oct 2023
bard.google.com

Bard is now Gemini. Get help with writing, planning, learning, and more from Google AI.

cover image
The State of Large Language Models
3 Oct 2023
scientificamerican.com

We present the latest updates on ChatGPT, Bard and other competitors in the artificial intelligence arms race.

cover image

Tools to go from prototype to production

cover image
How to Build an LLM from Scratch
25 Sep 2023
towardsdatascience.com

Data Curation, Transformers, Training at Scale, and Model Evaluation

cover image

Learn how to use GPT / LLMs to create complex summaries such as for medical text

cover image

Track, rank and evaluate open LLMs and chatbots

cover image
Llama from scratch
25 Sep 2023
blog.briankitano.com

I want to provide some tips from my experience implementing a paper. I'm going to cover my tips so far from implementing a dramatically scaled-down versio...

cover image
Cracking Open the OpenAI (Python) API
25 Sep 2023
towardsdatascience.com

A complete beginner-friendly introduction with example code

cover image
Cracking Open the Hugging Face Transformers Library
25 Sep 2023
towardsdatascience.com

A quick-start guide to using open-source LLMs

Asking 60+ LLMs a set of 20 questions
25 Sep 2023
benchmarks.llmonitor.com

Human-readable benchmarks of 60+ open-source and proprietary LLMs.

cover image

In a significant technological leap, OpenAI has announced the launch of DALL·E 3, the latest iteration in their groundbreaking text-to-image generation technology. With an unprecedented capacity to understand nuanced and detailed descriptions, DALL·E 3 promises to revolutionize the creative landscape by allowing users to translate their textual ideas into astonishingly accurate images effortlessly. DALL·E 3 is currently in research preview, offering a tantalizing glimpse into its capabilities. However, the broader availability of this cutting-edge technology is set for early October, when it will be accessible to ChatGPT Plus and Enterprise customers through the API and Labs later in the fall.

cover image
Comparison: DALL-E 3 vs Midjourney
24 Sep 2023
dataconomy.com

DALL-E 3, the latest version of OpenAI's ground-breaking generative AI visual art platform, was just announced with groundbreaking features, including

cover image
What OpenAI Really Wants
17 Sep 2023
wired.com

The young company sent shock waves around the world when it released ChatGPT. But that was just the start. The ultimate goal: Change everything. Yes. Everything.

cover image

If you're a developer or simply someone passionate about technology, you've likely encountered AI...

cover image

Seamlessly integrate LLMs into scikit-learn.

cover image

7 prompting tricks, Langchain, and Python example code

cover image
A Beginner’s Guide to LLM Fine-Tuning
30 Aug 2023
towardsdatascience.com

How to fine-tune Llama and other LLMs with one tool

cover image

A multifaceted challenge has arisen in the expansive realm of natural language processing: the ability to adeptly comprehend and respond to intricate and lengthy instructions. As communication nuances become more complicated, the shortcomings of prevailing models in dealing with extensive contextual intricacies have been laid bare. Within these pages, an extraordinary solution crafted by the dedicated minds at Together AI comes to light—a solution that holds the promise of reshaping the very fabric of language processing. This innovation has profound implications, especially in tasks requiring an acute grasp of extended contextual nuances. Contemporary natural language processing techniques rely heavily on

cover image
A Practical Introduction to LLMs
25 Aug 2023
towardsdatascience.com

3 levels of using LLMs in practice

cover image

Word embedding vector databases have become increasingly popular due to the proliferation of massive language models. Using the power of sophisticated machine learning techniques, data is stored in a vector database. It allows for very fast similarity search, essential for many AI uses such as recommendation systems, picture recognition, and NLP. The essence of complicated data is captured in a vector database by representing each data point as a multidimensional vector. Quickly retrieving related vectors is made possible by modern indexing techniques like k-d trees and hashing. To transform big data analytics, this architecture generates highly scalable, efficient solutions for

cover image

Use these text extraction techniques to get quality data for your LLM models

cover image

A user-friendly platform for operating large language models (LLMs) in production, with features such as fine-tuning, serving, deployment, and monitoring of any LLMs.

cover image

Recent language models can take long contexts as input; more is needed to know about how well they use longer contexts. Can LLMs be extended to longer contexts? This is an unanswered question. Researchers at Abacus AI conducted multiple experiments involving different schemes for developing the context length ability of Llama, which is pre-trained on context length 2048. They linear rescaled these models with IFT at scales 4 and 16. Scaling the model to scale 16 can perform world tasks up to 16k context length or even up to 20-24k context length.  Different methods of extending context length are Linear

cover image
How to use LLMs for PDF parsing
6 Aug 2023
nanonets.com

Using ChatGPT & OpenAI's GPT API, this code tutorial teaches how to chat with PDFs, automate PDF tasks, and build PDF chatbots.

cover image

Complete guide to building an AI assistant that can answer questions about any file

cover image

Practical Advice from Experts: Fine-Tuning, Deployment, and Best Practices

cover image

LangChain is a Python library that helps you build GPT-powered applications in minutes. Get started with LangChain by building a simple question-answering app.

cover image

Latest blogs from the team at Mosaic Research

cover image

Navigating the maze of pricing plans for digital services can sometimes be a daunting task. Today, we are unveiling Midjourney

cover image

Exploring the Development of the 3 Leading Open LLMs and Their Chatbot Derivatives

cover image
Chain of Thought Prompting for LLMs
28 Jul 2023
towardsdatascience.com

A practical and simple approach for “reasoning” with LLMs

cover image

Anthropic released Claude 2, a new iteration of its AI model, to take on ChatGPT and Google Bard...

cover image

A reference architecture for the LLM app stack. It shows the most common systems, tools, and design patterns used by AI startups and tech companies.

cover image
ELI5: FlashAttention
24 Jul 2023
gordicaleksa.medium.com

Step by step explanation of how one of the most important MLSys breakthroughs work — in gory detail.

cover image

Organizations are in a race to adopt Large Language Models. Let’s dive into how you can build industry-specific LLMs Through RAG

cover image
Free Full Stack LLM Bootcamp
24 Jul 2023
kdnuggets.com

Want to learn more about LLMs and build cool LLM-powered applications? This free Full Stack LLM Bootcamp is all you need!

cover image

The model quickly top the Open LLM Leaderboard that ranks the performance of open source LLMs.

cover image

tldr; techniques to speed up training and inference of LLMs to use large context window up to 100K input tokens during training and…

cover image
All You Need to Know to Build Your First LLM App
23 Jul 2023
towardsdatascience.com

A step-by-step tutorial to document loaders, embeddings, vector stores and prompt templates

cover image

The Observe.AI contact center LLM showed a 35% increase in accuracy compared to GPT-3.5 when automatically summarizing conversations.

cover image

With the release of PyTorch 2.0 and ROCm 5.4, we are excited to announce that LLM training works out of the box on AMD MI250 accelerators with zero code changes and at high performance!

cover image

This article provides a series of techniques that can lower memory consumption in PyTorch (when training vision transformers and LLMs) by approximately 20x without sacrificing modeling performance and prediction accuracy.

cover image
Deploying Falcon-7B Into Production
23 Jul 2023
towardsdatascience.com

Running Falcon-7B in the cloud as a microservice

cover image

Anthropic, the AI startup founded by ex-OpenAI execs, has released its newest chatbot, Claude 2. It's ostensibly improved in several ways.

cover image

Google is launching its AI-backed note-taking tool to "a small group of users in the US," the company said in a blog post. Formerly referred to as Project Tailwind at Google I/O earlier this year, the new app is now known as NotebookLM (the LM stands for Language Model). The Verge reports: The core...

Ecosystem Graphs for Foundation Models
23 Jul 2023
crfm.stanford.edu
cover image
Meet LMQL: An Open Source Query Language for LLMs
23 Jul 2023
thesequence.substack.com

Developed by ETH Zürich, the language explores new paradigms for LLM programming.

cover image
Leandro von Werra’s Post
23 Jul 2023
linkedin.com

It crazy how far the ML field has come when it comes to fine-tuning LLMs. A year ago: it was challenging to fine-tune GPT-2 (1.5B) on a single GPU without… | 76 comments on LinkedIn

cover image

A comprehensive guide on how to use Meta's LLaMA 2, the new open-source AI model challenging OpenAI's ChatGPT and Google's Bard.

cover image
Beyond LLaMA: The Power of Open LLMs
22 Jul 2023
towardsdatascience.com

How LLaMA is making open-source cool again

cover image

Not only has LLaMA been trained on more data, with more parameters, the model also performs better than its predecessor, according to Meta.

cover image

MosaicML claims that the MPT-7B-8K LLM exhibits exceptional proficiency in summarization and answering tasks compared to previous models.

cover image

The founders of Anthropic quit OpenAI to make a safe AI company. It’s easier said than done.

cover image

This article delves into the concept of Chain-of-Thought (CoT) prompting, a technique that enhances the reasoning capabilities of large language models (LLMs). It discusses the principles behind CoT prompting, its application, and its impact on the performance of LLMs.

cover image

A curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers) - Mooler0410/LLMsPracticalGuide

cover image

Falcon LLM, is the new large language model that has taken the crown from LLaMA.

cover image

Get started using Falcon-7B, Falcon-40B, and their instruct versions

cover image
Meet FinGPT: An Open-Source Financial Large Language Model (LLMs)
18 Jun 2023
www-marktechpost-com.cdn.ampproject.org

Large language models have increased due to the ongoing development and advancement of artificial intelligence, which has profoundly impacted the state of natural language processing in various fields. The potential use of these models in the financial sector has sparked intense attention in light of this radical upheaval. However, constructing an effective and efficient open-source economic language model depends on gathering high-quality, pertinent, and current data. The use of language models in the financial sector exposes many barriers. These vary from challenges in getting data, maintaining various data forms and kinds, and coping with inconsistent data quality to the crucial

cover image

Welcome to the LMM garden! A searchable list of open-source and off-the-shelf LLMs available to ML practitioners. Know of a new LLM? Add it

cover image
iryna-kondr/scikit-llm
8 Jun 2023
github.com

Seamlessly integrate LLMs into scikit-learn.

cover image

GPUs may dominate, but CPUs could be perfect for smaller AI models

cover image

Learn how standard greedy tokenization introduces a subtle and powerful bias that can have all kinds of unintended consequences.

cover image

AI companies are using LangChain to supercharge their LLM apps. Here is a comprehensive guide of resources to build your LangChain + LLM journey.  🔗 What is… | 45 comments on LinkedIn

cover image
The Non-Silence of the LLMs
19 May 2023
informationisbeautiful.net

AI is getting very chatty! Here’s a visualisation charting the rise of Large Language Models like GPT4, LaMDA, LLaMa, PaLM and their bots...

cover image

A new AI Bard powered by PaLM V2 that can write, translate, and code better than ChatGPT.

cover image
Edge 291: Reinforcement Learning with Human Feedback
18 May 2023
thesequence.substack.com

1) Reinforcement Learning with Human Feedback(RLHF) 2) The RLHF paper, 3) The transformer reinforcement learning framework.

cover image

Google's new machines combine Nvidia H100 GPUs with Google’s high-speed interconnections for AI tasks like training very large language models.

cover image

Deploying large language models (LLMs) is challenging because they are memory inefficient and compute-intensive for practical applications. In reaction, researchers train smaller task-specific...

cover image

We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of...

cover image

OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset - openlm-research/open_llama

cover image

A guidance language for controlling large language models. - guidance-ai/guidance

Blog | Anyscale
29 Apr 2023
anyscale.com

Anyscale is the leading AI application platform. With Anyscale, developers can build, run and scale AI applications instantly.

cover image

In the rapidly evolving field of AI, using large language models in an efficient and effective manner is becoming more and more important. In this article, y...

cover image

Created by researchers from UC Berkeley, CMU, Stanford, and UC San Diego, Vicuna is part of the new wave of models that use Meta's LLaMA as its foundation.

cover image

Many intelligent robots have come and gone, failing to become a commercial success. We’ve lost Aibo, Romo, Jibo, Baxter—even Alexa is reducing staff. Perhaps they failed to reach their potential because you can’t have a meaningful conversation with them. We are now at an inflection point: AI

cover image
Data Machina #198
25 Apr 2023
datamachina.substack.com

Your own LLM. MiniGPT-4. WebGPT on WebGPU. Transformers from scratch. ChatGTP Plugins demo live. Whisper JAX. LLaVA. MetaAI DINO SoTA Computer Vision. Autonomous agents in LangChain. RedPajama.

cover image
Finetuning Large Language Models
25 Apr 2023
magazine.sebastianraschka.com

An introduction to the core ideas and approaches

cover image

Sundays, The Sequence Scope brings a summary of the most important research papers, technology releases and VC funding deals in the artificial intelligence space.

Stanford CRFM
21 Apr 2023
crfm.stanford.edu
cover image

Facebook’s parent company is inviting researchers to pore over and pick apart the flaws in its version of GPT-3

cover image

The widespread public deployment of large language models (LLMs) in recent months has prompted a wave of new attention and engagement from advocates, policymakers, and scholars from many fields....

cover image

Introducing the new fully autonomous task manager that can create, track and prioritize your company's projects using artificial intelligence.

cover image
Hacker News
19 Apr 2023
magazine.sebastianraschka.com

A Cross-Section of the Most Relevant Literature To Get Up to Speed

cover image

In this guest post, Filip Haltmayer, a Software Engineer at Zilliz, explains how LangChain and Milvus can enhance the usefulness of Large Language Models (LLMs) by allowing for the storage and retrieval of relevant documents. By integrating Milvus, a vector database, with LangChain, LLMs can process more tokens and improve their conversational abilities.

Prompt Engineering
14 Apr 2023
lilianweng.github.io

Prompt Engineering, also known as In-Context Prompting, refers to methods for how to communicate with LLM to steer its behavior for desired outcomes without updating the model weights. It is an empirical science and the effect of prompt engineering methods can vary a lot among models, thus requiring heavy experimentation and heuristics. This post only focuses on prompt engineering for autoregressive language models, so nothing with Cloze tests, image generation or multimodality models.

cover image
A Survey of Large Language Models
14 Apr 2023
arxiv.org

Language is essentially a complex, intricate system of human expressions governed by grammatical rules. It poses a significant challenge to develop capable AI algorithms for comprehending and...

cover image

Explore what LLMs are, how they work, and gain insights into real-world examples, use cases, and best practices.

cover image
The Magic of LLMs — Prompt Engineering
13 Apr 2023
towardsdatascience.com

Garbage in, garbage out has never been more true.

cover image

If you're looking for a way to improve the performance of your large language model (LLM) application while reducing costs, consider utilizing a semantic cache to store LLM responses.

cover image

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

cover image
OpenAI Platform
10 Feb 2023
platform.openai.com

Explore developer resources, tutorials, API docs, and dynamic examples to get the most out of OpenAI's platform.

cover image

Large Language Models (LLMs) have gained significant prominence in modern machine learning, largely due to the attention mechanism. This mechanism employs a sequence-to-sequence mapping to construct context-aware token representations. Traditionally, attention relies on the softmax function (SoftmaxAttn) to generate token representations as data-dependent convex combinations of values. However, despite its widespread adoption and effectiveness, SoftmaxAttn faces several challenges. One key issue is the tendency of the softmax function to concentrate attention on a limited number of features, potentially overlooking other informative aspects of the input data. Also, the application of SoftmaxAttn necessitates a row-wise reduction along the input sequence length,

cover image

The challenge of managing and recalling facts from complex, evolving conversations is a key problem for many AI-driven applications. As information grows and changes over time, maintaining accurate context becomes increasingly difficult. Current systems often struggle to handle the evolving nature of relationships and facts, leading to incomplete or irrelevant results when retrieving information. This can affect the effectiveness of AI agents, especially when dealing with user memories and context in real-time applications. Some existing solutions have attempted to address this problem. One common approach is using a Retrieval-Augmented Generation (RAG) pipeline, which involves storing extracted facts and using techniques

cover image

Retrieval-Augmented Generation (RAG) is a machine learning framework that combines the advantages of both retrieval-based and generation-based models. The RAG framework is highly regarded for its ability to handle large amounts of information and produce coherent, contextually accurate responses. It leverages external data sources by retrieving relevant documents or facts and then generating an answer or output based on the retrieved information and the user query. This blend of retrieval and generation leads to better-informed outputs that are more accurate and comprehensive than models that rely solely on generation. The evolution of RAG has led to various types and approaches,

cover image

[caption align=

LlamaIndex : LlamaIndex
24 Sep 2009
docs.llamaindex.ai
cover image

Combining LLM reasoning for text-based models in Scikit-Learn.

cover image

Large Language Models (LLMs) have gained significant prominence in recent years, driving the need for efficient GPU utilization in machine learning tasks. However, researchers face a critical challenge in accurately assessing GPU performance. The commonly used metric, GPU Utilization, accessed through nvidia-smi or integrated observability tools, has proven to be an unreliable indicator of actual computational efficiency. Surprisingly, 100% GPU utilization can be achieved merely by reading and writing to memory without performing any computations. This revelation has sparked a reevaluation of performance metrics and methodologies in the field of machine learning, prompting researchers to seek more accurate ways to

cover image

Large language models (LLMs) have advanced significantly in recent years. However, its real-world applications are restricted due to substantial processing power and memory requirements. The need to make LLMs more accessible on smaller and resource-limited devices drives the development of more efficient frameworks for model inference and deployment. Existing methods for running LLMs include hardware acceleration techniques and optimizations like quantization and pruning. However, these methods often fail to provide a balance between model size, performance, and usability in constrained environments.  Researchers developed an efficient, scalable, and lightweight framework for LLM inference, LightLLM, to address the challenge of efficiently deploying

cover image

Nvidia has released NVLM 1.0, a powerful open-source AI model that rivals GPT-4 and Google’s systems, marking a major breakthrough in multimodal language models for vision and text tasks.

cover image

Large Language Models (LLMs) have become a cornerstone in artificial intelligence, powering everything from chatbots and virtual assistants to advanced text generation and translation systems. Despite their prowess, one of the most pressing challenges associated with these models is the high cost of inference. This cost includes computational resources, time, energy consumption, and hardware wear. Optimizing these costs is paramount for businesses and researchers aiming to scale their AI operations without breaking the bank. Here are ten proven strategies to reduce LLM inference costs while maintaining performance and accuracy: Quantization Quantization is a technique that decreases the precision of model