llms | The Mud Dauber Chronicles

Google's new compression algorithm cut memory stocks within hours of publication

25 Mar 2026

thenextweb.com

Google published a research blog post on Tuesday about a new compression algorithm for AI models. Within hours, memory stocks were falling. Micron dropped 3 per cent, Western Digital ...

Release: llm 0.29

23 Mar 2026

simonwillison.net

Access large language models from the command-line

Safely Deploying ML Models to Production: Four Controlled Strategies (A/B, Canary, Interleaved, Shadow Testing) - MarkTechPost

22 Mar 2026

marktechpost.com

Safely Deploying Machine Learning Models to Production: Four Controlled Strategies (A/B, Canary, Interleaved, Shadow Testing)

LLM Architecture Gallery

21 Mar 2026

sebastianraschka.com

A gallery that collects architecture figures from The Big LLM Architecture Comparison and related articles, with fact sheets and links back to the original sections.

Mistral's Small 4 consolidates reasoning, vision and coding into one model — at a fraction of the inference cost

20 Mar 2026

venturebeat.com

Mistral's Small 4 combines reasoning, multimodal analysis and agentic coding in a single open-source model with configurable inference effort, offering enterprises a lower-cost alternative to running separate models for each task.

Observability for your LLM-powered apps: OTel Instrumentation for RubyLLM

18 Mar 2026

thoughtbot.com

LLM calls are black boxes in production. Learn how to add structured observability to your RubyLLM-powered app with OpenTelemetry.

Nvidia says it can shrink LLM memory 20x without changing model weights

18 Mar 2026

venturebeat.com

Nvidia's KV Cache Transform Coding (KVTC) compresses LLM key-value cache by 20x without model changes, cutting GPU memory costs and time-to-first-token by up to 8x for multi-turn AI applications.

Unsloth Docs | Unsloth Documentation

17 Mar 2026

unsloth.ai

Train your own model with Unsloth, an open-source framework for LLM fine-tuning and reinforcement learning.

New LLM Architecture Gallery

15 Mar 2026

sebastianraschka.com

I put together a new LLM Architecture Gallery that collects the architecture figures from my recent comparison articles in one place, together with compact fact sheets and links.

CanIRun.ai — Can your machine run AI models?

14 Mar 2026

canirun.ai

Detect your hardware and find out which AI models you can run locally. GPU, CPU, and RAM analysis in your browser.

Billion-Parameter Theories

10 Mar 2026

worldgov.org

We assumed good theories are small. But the minimum viable compression of a complex system might be billions of parameters large.

The Sword of Damocles in Software

8 Mar 2026

tomtunguz.com

"GitHub Copilot had 20 million users. First to market. Then Claude Code arrived and installs peaked within six months. If the sword can cut the leader, no one is safe."

5 Powerful Python Decorators to Optimize LLM Applications

6 Mar 2026

kdnuggets.com

Learn these five Python decorators based on diverse libraries, that take particular significance when used in the context of LLM-based applications.

New GPT-5.4 Model To Feature "extreme" Reasoning - Dataconomy

5 Mar 2026

dataconomy.com

OpenAI has indicated that a new version of its large language model, GPT-5.4, is in development following a post on

Harness Engineering: How to Supervise Code You Can’t Read

4 Mar 2026

open.substack.com

AI writes the code now. The skill that matters is controlling what it builds.

Meet SymTorch: A PyTorch Library that Translates Deep Learning Models into Human-Readable Equations - MarkTechPost

4 Mar 2026

marktechpost.com

Cambridge Researchers Introduce SymTorch: A PyTorch Library that Translates Deep Learning Models into Human-Readable Equations

Alibaba's small, open source Qwen3.5-9B beats OpenAI's gpt-oss-120B and can run on standard laptops

2 Mar 2026

venturebeat.com

Whether it is a 0.8B model running on a smartphone or a 9B model powering a coding terminal, the Qwen3.5 series is effectively democratizing the "agentic era."

microgpt

2 Mar 2026

karpathy.github.io

Musings of a Computer Scientist.

Switch to Claude without starting over | Claude

2 Mar 2026

claude.com

Transfer your preferences, projects, and context from other AI providers into Claude. Switch without losing what makes your AI useful.

Qwen3.5 122B and 35B models offer Sonnet 4.5 performance on local computers | Hacker News

1 Mar 2026

news.ycombinator.com

Alibaba's new open source Qwen3.5 Medium model offers near Sonnet 4.5 performance on local computers

1 Mar 2026

venturebeat.com

This leap is made possible by near-lossless accuracy under 4-bit weight and KV cache quantization, allowing developers to process massive datasets without server-grade infrastructure.

IBM's $40B stock wipeout is built on a misconception: Translating COBOL isn't the same as modernizing it

24 Feb 2026

venturebeat.com

Investors wiped $40 billion from IBM's market cap after Anthropic released COBOL translation tools. Analysts say the market got the news right and the conclusion wrong.

Getting Started

24 Feb 2026

rubyllm.com

Start building AI apps in Ruby in 5 minutes. Chat, generate images, create embeddings - all with one gem.

Ruby Is the Best Language for Building AI Apps

24 Feb 2026

paolino.me

A pragmatic, code-first argument for Ruby as the best language to ship AI products in 2026.

Taalas is replacing programmable GPUs with hardwired AI chips to achieve 17,000 tokens per second for ubiquitous inference - MarkTechPost

23 Feb 2026

marktechpost.com

Taalas is replacing programmable GPUs with hardwired AI chips to achieve 17,000 tokens per second for ubiquitous inference

We hid backdoors in ~40MB binaries and asked AI + Ghidra to find them - Quesma Blog

22 Feb 2026

quesma.com

BinaryAudit benchmarks AI agents using Ghidra to find backdoors in compiled binaries of real open-source servers, proxies, and network infrastructure.

https://garryslist.org/posts/half-the-ai-agent-market-is-one-category-the-rest-is-wide-open

22 Feb 2026

garryslist.org

🐣(Claude Code Beginners Guide) Multi-Agent Orchestration: Run Claude Code Like a 5-Person Team | Notion

21 Feb 2026

notion.so

Quick note before you jump in:

Alibaba unveils new Qwen3.5 model for 'agentic AI era'

16 Feb 2026

reuters.com

Alibaba on Monday unveiled a new artificial intelligence model Qwen 3.5 designed to execute complex tasks independently, with big improvements in performance and cost that the Chinese tech giant claims beat major U.S. rival models on several benchmarks.

Nvidia’s new technique cuts LLM reasoning costs by 8x without losing accuracy

12 Feb 2026

venturebeat.com

Nvidia researchers developed dynamic memory sparsification (DMS), a technique that compresses the KV cache in large language models by up to 8x while maintaining reasoning accuracy — and it can be retrofitted onto existing models in hours.

OpenClaw: The Viral AI Agent that Broke the Internet - Peter Steinberger | Lex Fridman Podcast #491

12 Feb 2026

youtu.be

Peter Steinberger is the creator of OpenClaw, an open-source AI agent framework that's the fastest-growing project in GitHub history.Thank you for listening ...

ComposioHQ/awesome-claude-skills: A curated list of awesome Claude Skills, resources, and tools for customizing Claude AI workflows

5 Feb 2026

github.com

A curated list of awesome Claude Skills, resources, and tools for customizing Claude AI workflows - ComposioHQ/awesome-claude-skills

MIT’s new ‘recursive’ framework lets LLMs process 10 million tokens without context rot

20 Jan 2026

venturebeat.com

While standard models suffer from context rot as data grows, MIT’s new Recursive Language Model (RLM) framework treats prompts like code variables, unlocking infinite context without the retraining costs.

Multimodal reinforcement learning with agentic verifier for AI agents - Microsoft Research

20 Jan 2026

microsoft.com

Argos improves multimodal RL by evaluating whether an agent’s reasoning aligns with what it observes over time. The approach reduces visual hallucinations and produces more reliable, data-efficient agents for real-world applications:

Zhipu AI Releases GLM-4.7-Flash: A 30B-A3B MoE Model for Efficient Local Coding and Agents - MarkTechPost

20 Jan 2026

marktechpost.com

Zhipu AI Releases GLM-4.7-Flash: A 30B-A3B MoE (Mixture of Experts) Model for Efficient Local Coding and Agents

Claude Code costs up to $200 a month. Goose does the same thing for free.

20 Jan 2026

venturebeat.com

Goose, Block’s open-source AI coding agent, is emerging as a free alternative to Anthropic’s Claude Code, as developers weigh offline control, rate limits, and the rising cost of AI coding tools.

Add Reasoning Skills to Your LLM Apps | Aman Kharwal

20 Jan 2026

amanxai.com

In this article, I’ll walk you through a guided project to add reasoning skills to your LLM apps. Add Reasoning Skills to Your LLM Apps.

This dead simple prompt technique boosts accuracy on LLMs by up to 76%

13 Jan 2026

venturebeat.com

Anthropic launches Cowork, a Claude Desktop agent that works in your files — no coding required

13 Jan 2026

venturebeat.com

Anthropic’s Cowork brings Claude Code–style AI agents to the desktop, letting Claude access and manage local files and browse the web—boosting productivity while raising new security and trust risks.

First impressions of Claude Cowork, Anthropic’s general agent

13 Jan 2026

simonwillison.net

New from Anthropic today is Claude Cowork, a “research preview” that they describe as “Claude Code for the rest of your work”. It’s currently available only to Max subscribers ($100 …

Sampling at negative temperature

12 Jan 2026

cavendishlabs.org

GitHub Copilot

11 Jan 2026

github.com

AI that builds with you

Why your LLM bill is exploding — and how semantic caching can cut it by 73%

10 Jan 2026

venturebeat.com

2025 LLM Year in Review from Andrej Karpathy

4 Jan 2026

open.substack.com

Training GPT-2 on a budget from Vishwanath Sangale

The Big LLM Architecture Comparison

3 Jan 2026

magazine.sebastianraschka.com

From DeepSeek-V3 to Kimi K2: A Look At Modern LLM Architecture Design

Nano Banana Pro is the best AI image generator, with caveats

3 Jan 2026

minimaxir.com

The problem with Nano Banana Pro is that it’s too good.

Nano Banana can be prompt engineered for extremely nuanced AI image generation

3 Jan 2026

minimaxir.com

Nano Banana allows 32,768 input tokens and I’m going to try to use them all dammit.

Proximal Policy Optimization

3 Jan 2026

openai.com

We’re releasing a new class of reinforcement learning algorithms, Proximal Policy Optimization (PPO), which perform comparably or better than state-of-the-art approaches while being much simpler to implement and tune. PPO has become the default reinforcement learning algorithm at OpenAI because of its ease of use and good performance.

LLM Research Papers: The 2025 List (July to December)

2 Jan 2026

magazine.sebastianraschka.com

In June, I shared a bonus article with my curated and bookmarked research paper lists to the paid subscribers who make this Substack possible.

2025: The year in LLMs

1 Jan 2026

simonwillison.net

This is the third in my annual series reviewing everything that happened in the LLM space over the past 12 months. For previous years see Stuff we figured out about …

LLM Research Papers: The 2025 List (July to December)

31 Dec 2025

sebastianraschka.com

A curated list of LLM research papers from July–December 2025, organized by reasoning models, inference-time scaling, architectures, training efficiency, and...

The State of Reinforcement Learning for LLM Reasoning

31 Dec 2025

magazine.sebastianraschka.com

Understanding GRPO and New Insights from Reasoning Model Papers

The State Of LLMs 2025: Progress, Problems, and Predictions

31 Dec 2025

sebastianraschka.com

A 2025 review of large language models, from DeepSeek R1 and RLVR to inference-time scaling, benchmarks, architectures, and predictions for 2026.

Building an AI agent inside a 7-year old Rails application

26 Dec 2025

catalinionescu.dev

We run a multi-tenant Rails application with sensitive data and layered authorization. In this post, I walk through how I added the first AI agent tool using RubyLLM, Pundit policies, and our existing Algolia search, without introducing a parallel system or loosening constraints.

LangGraph Explained from Scratch | Aman Kharwal

23 Dec 2025

amanxai.com

In this article, I’ll walk you through a complete guide to LangGraph from the ground up. LangGraph Explained from Scratch.

2025 LLM Year in Review

19 Dec 2025

karpathy.bearblog.dev

2025 Year in Review of LLM paradigm changes

Mistral 3 Live!

12 Dec 2025

open.substack.com

Frontier AI by hand ✍️

The 70% factuality ceiling: why Google’s new ‘FACTS’ benchmark is a wake-up call

11 Dec 2025

venturebeat.com

The Concise Guide to Perplexity

26 Nov 2025

statology.org

Practical Guide on how to build an Agent from scratch with Gemini 3

23 Nov 2025

philschmid.de

A step-by-step practical guide on building AI agents using Gemini 3 Pro, covering tool integration, context management, and best practices for creating effective and reliable agents.

Comparing Memory Systems for LLM Agents: Vector, Graph, and Event Logs

10 Nov 2025

marktechpost.com

Learn how different memory systems affect multi-agent planning. Comparing Memory Systems for LLM Agents highlights key performance metrics.

Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

4 Nov 2025

marktechpost.com

Compare the top 7 large language models and systems for coding in 2025. Discover which ones excel for software engineering tasks.

In a First, AI Models Analyze Language As Well As a Human Expert | Quanta Magazine

3 Nov 2025

quantamagazine.org

If language is what makes us human, what does it mean now that large language models have gained “metalinguistic” abilities?

How I Use Every Claude Code Feature

2 Nov 2025

blog.sshh.io

A brain dump of all the ways I've been using Claude Code.

The Big LLM Architecture Comparison

28 Oct 2025

open.substack.com

From DeepSeek-V3 to Kimi K2: A Look At Modern LLM Architecture Design

Marktechpost/AI-Tutorial-Codes-Included: Codes/Notebooks for AI Projects

27 Oct 2025

github.com

Codes/Notebooks for AI Projects. Contribute to Marktechpost/AI-Tutorial-Codes-Included development by creating an account on GitHub.

5 Common LLM Parameters Explained with Examples

26 Oct 2025

marktechpost.com

Learn the 5 common LLM parameters explained with examples to optimize your model's performance and generate desired results.

https://venturebeat.com/ai/researchers-find-adding-this-one-simple-sentence-to-prompts-makes-ai-models

17 Oct 2025

venturebeat.com

Machine Learning Mastery

14 Oct 2025

MachineLearningMastery.com

Making developers awesome at machine learning.

A Guide to Fine-Tuning LLMs using LoRA

14 Oct 2025

amanxai.com

In this article, I'll take you through a step-by-step guide to fine-tuning LLMs with LoRA. A Guide to Fine-Tuning LLMs using LoRA.

7 LLM Generation Parameters—What They Do and How to Tune Them?

14 Oct 2025

marktechpost.com

Seven LLM generation parameters: max tokens, temperature, top-p, top-k, penalties, stop sequences, tuning guidance, defaults

Build a Reasoning Model (From Scratch) - Sebastian Raschka

5 Oct 2025

mng.bz

Understand LLM reasoning by creating your own reasoning model–from scratch! LLM reasoning models have the power to tackle truly challenging problems that require finding the right path through multiple steps. In Build A Reasoning Model (From Scratch) you’ll learn how to build a working reasoning model from the ground up. You will start with an existing pre-trained LLM and then implement reasoning-focused improvements from scratch. Sebastian Raschka, the bestselling author of Build a Large Language Model (From Scratch), is your guide on this exciting journey. Sebastian mentors you every step of the way with clear explanations, practical code, and a keen focus on what really matters. In Build A Reasoning Model (From Scratch) you’ll learn how to: Implement core reasoning improvements for LLMs Evaluate models using judgment-based and benchmark-based methods Improve reasoning without updating model weights Use reinforcement learning to integrate external tools like calculators Apply distillation techniques to learn from larger reasoning models Understand the full reasoning model development pipeline Reasoning models break problems into steps, producing more reliable answers in math, logic, and code. These improvements aren’t just a curiosity–they’re already integrated into top models like Grok 4 and GPT-5. Build A Reasoning Model (From Scratch) demystifies these complex models with a simple philosophy: the best way to learn how something works is to build it yourself! You’ll begin with a pre-trained LLM, adding and improving its reasoning capabilities in ways you can see, test, and understand.

Understanding the 4 Main Approaches to LLM Evaluation (From Scratch)

5 Oct 2025

magazine.sebastianraschka.com

Multiple-Choice Benchmarks, Verifiers, Leaderboards, and LLM Judges with Code Examples

LoRA Without Regret

3 Oct 2025

thinkingmachines.ai

How LoRA matches full training performance more broadly than expected

OpenAI’s Sora Makes Disinformation Extremely Easy and Extremely Real

3 Oct 2025

nytimes.com

The new A.I. app generated videos of store robberies and home intrusions — even bomb explosions on city streets — that never happened.

Building the heap: racking 30 petabytes of hard drives for pretraining

2 Oct 2025

si.inc

How we spent under half a million dollars to build a 30 petabyte data storage cluster in downtown San Francisco

Top 10 Local LLMs (2025): Context Windows, VRAM Targets, and Licenses Compared

28 Sep 2025

marktechpost.com

Compare top local LLMs for 2025: context windows, VRAM tiers, licenses, quantization, runtimes, dense vs MoE tradeoffs clearly

5 Cutting-Edge Natural Language Processing Trends Shaping 2026

24 Sep 2025

kdnuggets.com

In this article, we discuss five cutting-edge NLP trends that will shape 2026.

GPT-5-Codex

24 Sep 2025

simonwillison.net

OpenAI half-relased this model earlier this month, adding it to their Codex CLI tool but not their API. Today they've fixed that - the new model can now be accessed …

Four new releases from Qwen

22 Sep 2025

simonwillison.net

It's been an extremely busy day for team Qwen. Within the last 24 hours (all links to Twitter, which seems to be their preferred platform for these announcements): Qwen3-Next-80B-A3B-Instruct-FP8 and …

Understanding and Implementing Qwen3 From Scratch

6 Sep 2025

magazine.sebastianraschka.com

A Detailed Look at One of the Leading Open-Source LLMs

These psychological tricks can get LLMs to respond to “forbidden” prompts

3 Sep 2025

arstechnica.com

Study shows how patterns in LLM training data can lead to “parahuman” responses.

chiphuyen/aie-book: [WIP] Resources for AI engineers. Also contains supporting materials for the book AI Engineering (Chip Huyen, 2025)

31 Aug 2025

github.com

[WIP] Resources for AI engineers. Also contains supporting materials for the book AI Engineering (Chip Huyen, 2025) - chiphuyen/aie-book

Open Source LLM Tools

31 Aug 2025

huyenchip.com

Best viewed on desktops. On a phone screen, some columns are hidden. When a new repo is indexed, changes in stars in the last day/week are default to 0. Full analysis: What I learned from...

Method Iteration: An LLM Prompting Technique

30 Aug 2025

lesswrong.com

TLDR: Method Iteration is a prompting technique that gives better responses to hard problems. …

DeepSeek V3.1 Rivals GPT-5 With 685B Parameter Model - Dataconomy

22 Aug 2025

dataconomy.com

In January 2025, DeepSeek, a Chinese AI startup, launched R1, an AI model that rivaled top-tier LLMs from OpenAI and Anthropic. Built at a fraction of the

What is AI Inference? A Technical Deep Dive and Top 9 AI Inference Providers (2025 Edition)

18 Aug 2025

marktechpost.com

What is Artificial Intelligence AI Inference? A Technical Deep Dive and Top 9 AI Inference Providers (2025 Edition)

ngafar/llama-scan: Transcribe PDFs with local LLMs

18 Aug 2025

github.com

Transcribe PDFs with local LLMs

What Is AI Red Teaming? Top 18 AI Red Teaming Tools (2025)

17 Aug 2025

marktechpost.com

Discover top AI red teaming tools for robust AI security. Learn how adversarial testing protects machine learning models

The Timmy Trap – Scott Jenson

15 Aug 2025

jenson.org

That ‘cheap’ open-source AI model is actually burning through your compute budget

15 Aug 2025

venturebeat.com

New research reveals open-source AI models use up to 10 times more computing resources than closed alternatives, potentially negating cost advantages for enterprise deployments.

AI Model & API Providers Analysis | Artificial Analysis

14 Aug 2025

artificialanalysis.ai

Comparison and analysis of AI models and API hosting providers. Independent benchmarks across key performance metrics including quality, price, output speed & latency.

google/langextract: A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.

12 Aug 2025

github.com

A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization. - google/langextract

From 100,000 to Under 500 Labels: How Google AI Cuts LLM Training Data by Orders of Magnitude

10 Aug 2025

marktechpost.com

Google’s active learning method fine-tunes LLMs with 10,000x less data using high-fidelity expert-labeled examples

New ‘persona vectors’ from Anthropic let you decode and direct an LLM’s personality

7 Aug 2025

venturebeat.com

A new study from Anthropic introduces "persona vectors," a technique for developers to monitor, predict and control unwanted LLM behaviors.

A Technical Roadmap to Context Engineering in LLMs: Mechanisms, Benchmarks, and Open Challenges

3 Aug 2025

marktechpost.com

Context engineering for large language models—frameworks, architectures, and strategies to optimize AI reasoning, and scalability

Hierarchical Reasoning Model

28 Jul 2025

arxiv.org

Reasoning, the process of devising and executing complex goal-oriented action sequences, remains a critical challenge in AI. Current large language models (LLMs) primarily employ Chain-of-Thought (CoT) techniques, which suffer from brittle task decomposition, extensive data requirements, and high latency. Inspired by the hierarchical and multi-timescale processing in the human brain, we propose the Hierarchical Reasoning Model (HRM), a novel recurrent architecture that attains significant computational depth while maintaining both training stability and efficiency. HRM executes sequential reasoning tasks in a single forward pass without explicit supervision of the intermediate process, through two interdependent recurrent modules: a high-level module responsible for slow, abstract planning, and a low-level module handling rapid, detailed computations. With only 27 million parameters, HRM achieves exceptional performance on complex reasoning tasks using only 1000 training samples. The model operates without pre-training or CoT data, yet achieves nearly perfect performance on challenging tasks including complex Sudoku puzzles and optimal path finding in large mazes. Furthermore, HRM outperforms much larger models with significantly longer context windows on the Abstraction and Reasoning Corpus (ARC), a key benchmark for measuring artificial general intelligence capabilities. These results underscore HRM's potential as a transformative advancement toward universal computation and general-purpose reasoning systems.

LLM Embeddings Explained: A Visual and Intuitive Guide

28 Jul 2025

huggingface.co

How Language Models Turn Text into Meaning, From Traditional

The Complete LLM Tech Stack

25 Jul 2025

amanxai.com

In this article, I'll take you through the complete LLM tech stack you should know to develop & deploy real-world LLM applications.

LLM Research Papers: The 2025 List (January to June)

19 Jul 2025

sebastianraschka.com

The latest in LLM research with a hand-curated, topic-organized list of over 200 research papers from 2025.

I sent ChatGPT Agent out to shop for me

19 Jul 2025

theverge.com

We tested OpenAI’s ChatGPT Agent, currently only available via its $200-per-month Pro subscription.

How OpenAI’s red team made ChatGPT agent into an AI fortress

19 Jul 2025

venturebeat.com

Discover OpenAI's red team blueprint: How 110 coordinated attacks and 7 exploit fixes created ChatGPT Agent's revolutionary 95% security defense system.

ChatGPT agent System Card | OpenAI

19 Jul 2025

openai.com

ChatGPT agent System Card: OpenAI’s agentic model unites research, browser automation, and code tools with safeguards under the Preparedness Framework.

Emergent Price-Fixing by LLM Auction Agents

16 Jul 2025

lesswrong.com

An inquiry into emergent collusion in Large Language Models. Agent S2 to Agent S3: “Let's set all asks at 63 next cycle… No undercutting ensur…

Introduction | LLM Inference in Production

14 Jul 2025

bentoml.com

A practical handbook for engineers building, optimizing, scaling and operating LLM inference systems in production.

Shirin Khosravi Jam on Substack

13 Jul 2025

substack.com

I taught myself how to build RAG + AI Agents in production. Been running them live for over a year now. Here are 4 steps + the only resources you really need to do the same. … Ugly truth: most “AI Engineers” shouting on social media haven’t built a single real production AI Agent or RAG system. If you want to be different - actually build and ship these systems: here’s a laser-focused roadmap from my own journey. .. 🚀 𝗦𝘁𝗮𝗿𝘁 𝘄𝗶𝘁𝗵 𝗳𝘂𝗻𝗱𝗮𝗺𝗲𝗻𝘁𝗮𝗹𝘀 Because no matter how fast LLM/GenAI evolves, your ML & software foundations keep you relevant. ✅ Hands-On ML with TensorFlow & Keras: https://lnkd.in/dWrf5pbS ✅ ISLR: https://lnkd.in/djGPVVwJ ✅ Machine Learning for Beginners by Microsoft (free curriculum): https://lnkd.in/d8kZA3es … 1️⃣ 𝗠𝗮𝘀𝘁𝗲𝗿 𝗟𝗟𝗠𝘀 & 𝗚𝗲𝗻𝗔𝗜 𝗦𝘆𝘀𝘁𝗲𝗺𝘀 → Learn to build & deploy LLMs, understand system design tradeoffs, and handle real constraints. 📚 Must-reads: ✅ Designing ML Systems – Chip Huyen: https://lnkd.in/guN-UhXA ✅ The LLM Engineering Handbook – Iusztin & Labonne: https://lnkd.in/gyA4vFXz ✅ Build a LLM (From Scratch) – Raschka: https://lnkd.in/gXNa-SPb ✅ Hands-On LLMs GitHub: https://lnkd.in/eV4qrgNW … 2️⃣ 𝗚𝗼 𝗯𝗲𝘆𝗼𝗻𝗱 𝘁𝗵𝗲 𝗵𝘆𝗽𝗲 𝗼𝗻 𝗔𝗜 𝗔𝗴𝗲𝗻𝘁𝘀 → Most demos = “if user says hello, return hello.” Actual agents? Handle memory, tools, workflows, costs. ✅ AI Agents for Beginners (GitHub): https://lnkd.in/eik2btmq ✅ GenAI Agents – build step by step: https://lnkd.in/dnhwk75V ✅ OpenAI’s guide to agents: https://lnkd.in/guRfXsFK ✅ Anthropic’s Building Effective Agents: https://lnkd.in/gRWKANS4 … 3️⃣ 𝗥𝗔𝗚 𝗶𝘀 𝗻𝗼𝘁 𝗷𝘂𝘀𝘁 𝗮 𝘃𝗲𝗰𝘁𝗼𝗿 𝗗𝗕 Real Retrieval-Augmented Generation requires: → Chunking, hybrid BM25 + vectors, reranking → Query routing & fallback → Evaluating retrieval quality, not just LLM output ✅ RAG Techniques repo: https://lnkd.in/dD4S8Cq2 ✅ Advanced RAG: https://lnkd.in/g2ZHwZ3w ✅ Cost-efficient retrieval with Postgres/OpenSearch/Qdrant ✅ Monitoring with Langfuse / Comet … 4️⃣ 𝗚𝗲𝘁 𝘀𝗲𝗿𝗶𝗼𝘂𝘀 𝗼𝗻 𝗦𝗼𝗳𝘁𝘄𝗮𝗿𝗲 & 𝗜𝗻𝗳𝗿𝗮 → FastAPI, async Python, Pydantic → Docker, CI/CD, blue-green deploys → ETL orchestration (Airflow, Step Functions) → Logs + metrics (CloudWatch, Prometheus) ✅ Move to production: https://lnkd.in/dnnkrJbE ✅ Made with ML (full ML+infra): https://lnkd.in/e-XQwXqS ✅ AWS GenAI path: https://lnkd.in/dmhR3uPc … 5️⃣ 𝗪𝗵𝗲𝗿𝗲 𝗱𝗼 𝗜 𝗹𝗲𝗮𝗿𝗻 𝗳𝗿𝗼𝗺? → Stanford CS336 / CS236 / CS229 (Google it) → MIT 6.S191, Karpathy’s Zero to Hero: https://lnkd.in/dT7vqqQ5 → Google Kaggle GenAI sprint: https://lnkd.in/ga5X7tVJ → NVIDIA’s end-to-end LLM stack: https://lnkd.in/gCtDnhni → DeepLearning.AI’s short courses: https://lnkd.in/gAYmJqS6 … 💥 𝗞𝗲𝗲𝗽 𝗶𝘁 𝗿𝗲𝗮𝗹: Don’t fall for “built in 5 min, dead in 10 min” demos. In prod, it’s about latency, cost, maintainability, guardrails. ♻️ Let's repost to help more people on this journey 💚

🔍 Perplexity 101: Ultimate Guide to Deep Search, Labs, Templates & 53 Pro Prompts

13 Jul 2025

sidsaladi.substack.com

Your complete playbook for transforming how you research with AI's most powerful search engine

Inside India’s scramble for AI independence

8 Jul 2025

technologyreview.com

Structural challenges and the nation’s many languages have made it tough to develop foundational AI models. But the government is keen not to be left behind.

Coders' Colaboratory mini-hackathon on `llm` by simonw

8 Jul 2025

gist.github.com

Coders' Colaboratory mini-hackathon on `llm` by simonw - llm-hackathon.md

Become a command-line superhero with Simon Willison’s llm tool

7 Jul 2025

simonwillison.net

Christopher Smith ran a mini hackathon in Albany New York at the weekend around uses of my LLM - the first in-person event I'm aware of dedicated to that project! …

What is Ollama? Running Local LLMs Made Simple

4 Jul 2025

youtube.com

Ready to become a certified watsonx AI Assistant Engineer? Register now and use code IBMTechYT20 for 20% off of your exam → https://ibm.biz/Bdnd3dLearn more ...

Building software on top of Large Language Models

4 Jul 2025

simonwillison.net

I presented a three hour workshop at PyCon US yesterday titled Building software on top of Large Language Models. The goal of the workshop was to give participants everything they …

Usage

4 Jul 2025

llm.datasette.io

7 Popular LLMs Explained in 7 Minutes - KDnuggets

2 Jul 2025

kdnuggets.com

Get a quick overview of GPT, BERT, LLaMA, and more!

why AI language models like chatGPT can’t understand flowers

30 Jun 2025

designboom.com

a study by ohio state university investigates whether large language models can represent human concepts without physically experiencing them.

Getting started with Gemini Command Line Interface (CLI)

28 Jun 2025

marktechpost.com

What I learned trying seven coding agents

28 Jun 2025

understandingai.org

There's still room for improvement, but don't underestimate this technology.

Gemini CLI

25 Jun 2025

simonwillison.net

First there was Claude Code in February, then OpenAI Codex (CLI) in April, and now Gemini CLI in June. All three of the largest AI labs now have their own …

I Tested LLM Agents on Simple Safety Rules. They Failed in Surprising and Informative Ways. — LessWrong

25 Jun 2025

lesswrong.com

TL;DR: I developed a simple, open-source benchmark to test if LLM agents follow high-level safety principles when they conflict with a given task acc…

Building Effective AI Agents

17 Jun 2025

anthropic.com

Discover how Anthropic approaches the development of reliable AI agents. Learn about our research on agent capabilities, safety considerations, and technical framework for building trustworthy AI.

32 MCP Servers You Need To Check Out Now - KDnuggets

28 May 2025

kdnuggets.com

Explore list of top MCP servers that enable seamless integration of LLMs with tools like databases, APIs, communication platforms, and more, helping you automate workflows and enhance AI applications.

Large Language Models can run tools in your terminal with LLM 0.26

27 May 2025

simonwillison.net

LLM 0.26 is out with the biggest new feature since I started the project: support for tools. You can now use the LLM CLI tool—and Python library—to grant LLMs from …

Highlights from the Claude 4 system prompt

27 May 2025

simonwillison.net

Anthropic publish most of the system prompts for their chat models as part of their release notes. They recently shared the new prompts for both Claude Opus 4 and Claude …

System Card: Claude Opus 4 & Claude Sonnet 4

25 May 2025

simonwillison.net

Direct link to a PDF on Anthropic's CDN because they don't appear to have a landing page anywhere for this document. Anthropic's system cards are always worth a look, and …

The Ultimate Guide to Learning Anything with NotebookLM - KDnuggets

22 May 2025

kdnuggets.com

Learn about turning your notes and sources into a personalized, AI-powered tutor with NotebookLM.

llm-anthropic 0.16

22 May 2025

simonwillison.net

New release of my LLM plugin for Anthropic adding the new Claude 4 Opus and Sonnet models. You can see pelicans on bicycles generated using the new plugin at the …

Meta Introduces KernelLLM: An 8B LLM that Translates PyTorch Modules into Efficient Triton GPU Kernels

20 May 2025

marktechpost.com

Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures | alphaXiv

19 May 2025

alphaxiv.org

View recent discussion. Abstract: The rapid scaling of large language models (LLMs) has unveiled critical limitations in current hardware architectures, including constraints in memory capacity, computational efficiency, and interconnection bandwidth. DeepSeek-V3, trained on 2,048 NVIDIA H800 GPUs, demonstrates how hardware-aware model co-design can effectively address these challenges, enabling cost-efficient training and inference at scale. This paper presents an in-depth analysis of the DeepSeek-V3/R1 model architecture and its AI infrastructure, highlighting key innovations such as Multi-head Latent Attention (MLA) for enhanced memory efficiency, Mixture of Experts (MoE) architectures for optimized computation-communication trade-offs, FP8 mixed-precision training to unlock the full potential of hardware capabilities, and a Multi-Plane Network Topology to minimize cluster-level network overhead. Building on the hardware bottlenecks encountered during DeepSeek-V3's development, we engage in a broader discussion with academic and industry peers on potential future hardware directions, including precise low-precision computation units, scale-up and scale-out convergence, and innovations in low-latency communication fabrics. These insights underscore the critical role of hardware and model co-design in meeting the escalating demands of AI workloads, offering a practical blueprint for innovation in next-generation AI systems.

Building AI Agents? A2A vs. MCP Explained Simply - KDnuggets

15 May 2025

kdnuggets.com

Confused by AI agent frameworks? This article makes sense of A2A and MCP.

mendableai/firecrawl: 🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

11 May 2025

github.com

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API. - mendableai/firecrawl

22365_3_Prompt Engineering_v7 (1).pdf

7 May 2025

drive.google.com

Prompt Engineering | Kaggle

7 May 2025

kaggle.com

Kaggle is the world’s largest data science community with powerful tools and resources to help you achieve your data science goals.

Alibaba’s Qwen3 topples DeepSeek’s R1 as world’s highest-ranked open-source AI model

6 May 2025

scmp.com

Qwen3 surpassed R1 in LiveBench tests that gauge open-source AI models’ capabilities including coding, maths and data analysis.

Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers

5 May 2025

papers.ssrn.com

Understanding architectural differences between large language models (LLMs) remains challenging, particularly at academic-scale pretraining (e.g., 1.3B

Dummy’s Guide to Modern LLM Sampling

4 May 2025

simonwillison.net

This is an extremely useful, detailed set of explanations by [@AlpinDale](https://x.com/AlpinDale) covering the various different sampling strategies used by modern LLMs. LLMs return a set of next-token probabilities for every …

Creating an MCP Server Using Go

3 May 2025

eltonminetto.dev

In November 2024, Anthropic published a blog post announcing what may be its most significant contribution to the AI ecosystem so far: the Model Context Protocol.

ollama with docker compose

3 May 2025

geshan.com.np

Learn how to use Ollama and Open WebUI inside Docker with Docker compose to run any open LLM and create your own mini ChatGPT.

ollama APIs

3 May 2025

geshan.com.np

Learn how to use Ollama APIs like generate, chat and more like list model, pull model, etc with cURL and Jq with useful examples

What is Ollama and how to use it: a quick guide [part 1]

3 May 2025

geshan.com.np

Learn what Ollama is, its features and how to run it on your local machine with DeepSeek R1 and Smollm2 models

Ollama commands: How to use Ollama in the command line [Part 2]

3 May 2025

geshan.com.np

Learn about the important Ollama commands to run Ollama on your local machine with Smollm2 and Qwen 2.5 models

LLM Projects with Python

3 May 2025

thecleverprogrammer.com

In this article, I'll take you through a list of 10 hands-on LLM projects with Python you should try to master LLMs. LLM Projects with Python.

XiaomiMiMo/MiMo: MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining

1 May 2025

github.com

MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining - XiaomiMiMo/MiMo

Chatbot Arena (formerly LMSYS): Free AI Chat to Compare & Test Best AI Chatbots

1 May 2025

lmarena.ai

The Leaderboard Illusion

30 Apr 2025

arxiv.org

Measuring progress is fundamental to the advancement of any scientific field. As benchmarks play an increasingly central role, they also grow more susceptible to distortion. Chatbot Arena has emerged as the go-to leaderboard for ranking the most capable AI systems. Yet, in this work we identify systematic issues that have resulted in a distorted playing field. We find that undisclosed private testing practices benefit a handful of providers who are able to test multiple variants before public release and retract scores if desired. We establish that the ability of these providers to choose the best score leads to biased Arena scores due to selective disclosure of performance results. At an extreme, we identify 27 private LLM variants tested by Meta in the lead-up to the Llama-4 release. We also establish that proprietary closed models are sampled at higher rates (number of battles) and have fewer models removed from the arena than open-weight and open-source alternatives. Both these policies lead to large data access asymmetries over time. Providers like Google and OpenAI have received an estimated 19.2% and 20.4% of all data on the arena, respectively. In contrast, a combined 83 open-weight models have only received an estimated 29.7% of the total data. We show that access to Chatbot Arena data yields substantial benefits; even limited additional data can result in relative performance gains of up to 112% on the arena distribution, based on our conservative estimates. Together, these dynamics result in overfitting to Arena-specific dynamics rather than general model quality. The Arena builds on the substantial efforts of both the organizers and an open community that maintains this valuable evaluation platform. We offer actionable recommendations to reform the Chatbot Arena's evaluation framework and promote fairer, more transparent benchmarking for the field

DeepSeek R2 AI Model Rumors Begin to Swirl Online; Reported to Feature 97% Lower Costs Compared to GPT-4…

26 Apr 2025

wccftech.com

DeepSeek is set to drop another model pretty soon, as details about their next DeepSeek R2 model have surfaced on the internet

To Make Language Models Work Better, Researchers Sidestep Language | Quanta Magazine

20 Apr 2025

quantamagazine.org

We insist that large language models repeatedly translate their mathematical processes into words. There may be a better way.

The State of Reinforcement Learning for LLM Reasoning

20 Apr 2025

sebastianraschka.com

A lot has happened this month, especially with the releases of new flagship models like GPT-4.5 and Llama 4. But you might have noticed that reactions to these releases were relatively muted. Why? One reason could be that GPT-4.5 and Llama 4 remain conventional models, which means they were trained without explicit reinforcement learning for reasoning. However, OpenAI's recent release of the o3 reasoning model demonstrates there is still considerable room for improvement when investing compute strategically, specifically via reinforcement learning methods tailored for reasoning tasks. While reasoning alone isn't a silver bullet, it reliably improves model accuracy and problem-solving capabilities on challenging tasks (so far). And I expect reasoning-focused post-training to become standard practice in future LLM pipelines. So, in this article, let's explore the latest developments in reasoning via reinforcement learning.

OpenAI Releases a Practical Guide to Building LLM Agents for Real-World Applications

18 Apr 2025

marktechpost.com

LLM Post-Training: A Deep Dive into Reasoning Large Language Models

17 Apr 2025

arxiv.org

How To Build An Agent | Amp

16 Apr 2025

ampcode.com

Building a fully functional, code-editing agent in less than 400 lines.

humanlayer/12-factor-agents

13 Apr 2025

github.com

What are the principles we can use to build LLM-powered software that is actually good enough to put in the hands of production customers? - humanlayer/12-factor-agents

The Rise of Slopsquatting: How AI Hallucinations Are Fueling...

13 Apr 2025

socket.dev

Slopsquatting is a new supply chain threat where AI-assisted code generators recommend hallucinated packages that attackers register and weaponize.

12-factor-agents: Principles to build LLM-powered software good enough to put in the hands of production customers

11 Apr 2025

lobste.rs

0 comments

An LLM Query Understanding Service

10 Apr 2025

simonwillison.net

Doug Turnbull recently wrote about how [all search is structured now](https://softwaredoug.com/blog/2025/04/02/all-search-structured-now): Many times, even a small open source LLM will be able to turn a search query into reasonable …

The Man Out to Prove How Dumb AI Still Is

10 Apr 2025

theatlantic.com

François Chollet has constructed the ultimate test for the bots.

The “S” in MCP Stands for Security - Elena Cross - Medium

8 Apr 2025

elenacross7.medium.com

MCP, short for Model Context Protocol, is the hot new standard behind how Large Language Models (LLMs) like Claude, GPT, or Cursor integrate with tools and data. It’s been described as the “USB-C for…

Topic 31: How to Reduce Memory Use in Reasoning Models

8 Apr 2025

turingpost.com

we explore how combining LightThinker and Multi-Head Latent Attention cuts memory and boosts performance

Topic 33: Slim Attention, KArAt, XAttention and Multi-Token Attention Explained – What’s Really Changing in Transformers?

7 Apr 2025

huggingface.co

A Blog post by Ksenia Se on Hugging Face

A look at the ARC-AGI exam designed by French computer scientist François Chollet to show the gulf between AI models' memorized answers and “fluid intelligence”

7 Apr 2025

techmeme.com

By Matteo Wong / The Atlantic. View the full context on Techmeme.

The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation

6 Apr 2025

ai.meta.com

We’re introducing Llama 4 Scout and Llama 4 Maverick, the first open-weight natively multimodal models with unprecedented context support and our first built using a mixture-of-experts (MoE) architecture.

Model Context Protocol (MCP) an overview

6 Apr 2025

philschmid.de

Overview of the Model Context Protocol (MCP) how it works, what are MCP servers and clients, and how to use it.

Use MCP servers in VS Code (Preview)

6 Apr 2025

code.visualstudio.com

Learn how to configure and use Model Context Protocol (MCP) servers with GitHub Copilot in Visual Studio Code.

If Anthropic Succeeds, a Nation of Benevolent AI Geniuses Could Be Born

5 Apr 2025

wired.com

The brother goes on vision quests. The sister is a former English major. Together, they defected from OpenAI, started Anthropic, and built (they say) AI’s most upstanding citizen, Claude.

A Code Implementation to Building a Context-Aware AI Assistant in Google Colab Using LangChain, LangGraph, Gemini Pro, and Model Context Protocol (MCP) Principles with Tool Integration Support

5 Apr 2025

marktechpost.com

LLM Benchmarking: Fundamental Concepts | NVIDIA Technical Blog

2 Apr 2025

developer.nvidia.com

The past few years have witnessed the rise in popularity of generative AI and large language models (LLMs), as part of a broad AI revolution.

A Comprehensive Guide to LLM Routing: Tools and Frameworks

2 Apr 2025

marktechpost.com

Deploying LLMs presents challenges, particularly in optimizing efficiency, managing computational costs, and ensuring high-quality performance. LLM routing has emerged as a strategic solution to these challenges, enabling intelligent task allocation to the most suitable models or tools. Let’s delve into the intricacies of LLM routing, explore various tools and frameworks designed for its implementation, and […]

First Look at Reasoning From Scratch: Chapter 1

29 Mar 2025

sebastianraschka.com

As you know, I've been writing a lot lately about the latest research on reasoning in LLMs. Before my next research-focused blog post, I wanted to offer something special to my paid subscribers as a thank-you for your ongoing support. So, I've started writing a new book on how reasoning works in LLMs, and here I'm sharing the first Chapter 1 with you. This ~15-page chapter is an introduction reasoning in the context of LLMs and provides an overview of methods like inference-time scaling and reinforcement learning. Thanks for your support! I hope you enjoy the chapter, and stay tuned for my next blog post on reasoning research!

How DeepSeek Rewrote the Transformer [MLA]

28 Mar 2025

youtube.com

Thanks to KiwiCo for sponsoring today’s video! Go to https://www.kiwico.com/welchlabs and use code WELCHLABS for 50% off your first monthly club crate or for...

Tracing the thoughts of a large language model

28 Mar 2025

simonwillison.net

In a follow-up to the research that brought us the [delightful Golden Gate Claude](https://simonwillison.net/2024/May/24/golden-gate-claude/) last year, Anthropic have published two new papers about LLM interpretability: - [Circuit Tracing: Revealing Computational …

Anthropic can now track the bizarre inner workings of a large language model

27 Mar 2025

technologyreview.com

What they found challenges some basic assumptions about how this technology really works.

10 Must-Know Python Libraries for LLMs in 2025

26 Mar 2025

machinelearningmastery.com

In this article, we explore 10 of the Python libraries every developer should know in 2025.

Function calling with Gemma

26 Mar 2025

simonwillison.net

Google's Gemma 3 model (the 27B variant is particularly capable, I've been trying it out [via Ollama](https://ollama.com/library/gemma3)) supports function calling exclusively through prompt engineering. The official documentation describes two recommended …

Putting Gemini 2.5 Pro through its paces

26 Mar 2025

simonwillison.net

There’s a new release from Google Gemini this morning: the first in the Gemini 2.5 series. Google call it “a thinking model, designed to tackle increasingly complex problems”. It’s already …

Introducing 4o Image Generation

25 Mar 2025

openai.com

At OpenAI, we have long believed image generation should be a primary capability of our language models. That’s why we’ve built our most advanced image generator yet into GPT‑4o. The result—image generation that is not only beautiful, but useful.

What is the hallucination index?

25 Mar 2025

dataconomy.com

The Hallucination Index is a benchmark that measures the frequency of inaccuracies in large language models, indicating their reliability and contextual understanding.

Quickstart | Mistral AI Large Language Models

23 Mar 2025

docs.mistral.ai

[platform_url]//console.mistral.ai/

Improving Recommender Systems & Search in the Age of LLMs

22 Mar 2025

eugeneyan.com

Model architectures, data generation, training paradigms, and unified frameworks inspired by LLMs.

Anthropic just gave Claude a superpower: real-time web search. Here’s why it changes everything

20 Mar 2025

venturebeat.com

Anthropic launches real-time web search for Claude AI, challenging ChatGPT's dominance while securing $3.5 billion in funding at a $61.5 billion valuation.

Mistral Small 3.1 runs on a MacBook and beats giants - Dataconomy

18 Mar 2025

dataconomy.com

Paris-based artificial intelligence startup Mistral AI has announced the open-source release of its lightweight AI model, Mistral Small 3.1, which the company

Mistral Small 3.1

17 Mar 2025

simonwillison.net

Mistral Small 3 [came out in January](https://simonwillison.net/2025/Jan/30/mistral-small-3/) and was a notable, genuinely excellent local model that used an Apache 2.0 license. Mistral Small 3.1 offers a significant improvement: it's multi-modal …

https://www.r-bloggers.com/2025/03/the-ellmer-package-for-using-llms-with-r-is-a-game-changer-for-scientists-2/

16 Mar 2025

r-bloggers.com

The ellmer package for using LLMs with R is a game changer for scientists Why is ellmer a game changer for scientists? In this tutorial we’ll look at how we can access LLM agents through API calls. We’ll use this skill for created structued data fro...

What is catastrophic forgetting? - Dataconomy

13 Mar 2025

dataconomy.com

Catastrophic Forgetting is a phenomenon where neural networks lose previously learned information when trained on new data, similar to human memory loss.

Top 7 Open-Source LLMs in 2025 - KDnuggets

13 Mar 2025

kdnuggets.com

These models are free to use, can be fine-tuned, and offer enhanced privacy and security since they can run directly on your machine, and match the performance of proprietary solutions like o3-min and Gemini 2.0.

What are model cards? - Dataconomy

12 Mar 2025

dataconomy.com

Model cards are documentation tools in machine learning that provide essential information about models, promoting transparency, trust, and ethical considerations in AI systems.

How I use LLMs to help me write code

11 Mar 2025

open.substack.com

Plus CSS view transitions and a major update to llm-openrouter

On GPT-4.5

8 Mar 2025

thezvi.substack.com

It’s happening.

The State of LLM Reasoning Models

8 Mar 2025

open.substack.com

Part 1: Inference-Time Compute Scaling Methods

Mistral OCR

7 Mar 2025

simonwillison.net

New closed-source specialist OCR model by Mistral - you can feed it images or a PDF and it produces Markdown with optional embedded images. It's available [via their API](https://docs.mistral.ai/api/#tag/ocr), or …

Mistral OCR | Mistral AI

6 Mar 2025

mistral.ai

Introducing the world’s best document understanding API.

llm-ollama 0.9.0

4 Mar 2025

simonwillison.net

This release of the `llm-ollama` plugin adds support for [schemas](https://simonwillison.net/2025/Feb/28/llm-schemas/), thanks to a [PR by Adam Compton](https://github.com/taketwo/llm-ollama/pull/36). Ollama provides very robust support for this pattern thanks to their [structured outputs](https://ollama.com/blog/structured-outputs) …

Claude 3.7 Sonnet and Claude Code

26 Feb 2025

anthropic.com

Today, we’re announcing Claude 3.7 Sonnet, our most intelligent model to date and the first hybrid reasoning model generally available on the market.

The Deep Research problem — Benedict Evans

26 Feb 2025

ben-evans.com

OpenAI’s Deep Research is built for me, and I can’t use it. It’s another amazing demo, until it breaks. But it breaks in really interesting ways.

5 Principles for Writing Effective Prompts (2025 Update)

24 Feb 2025

blog.tobiaszwingmann.com

Solid techniques to get really good results from any LLM

Greg Brockman shared this template for prompting

24 Feb 2025

linkedin.com

OpenAI's president Greg Brockman recently shared this cool template for prompting their reasoning models o1/o3. Turns out, this is great for ANY reasoning… | 32 comments on LinkedIn

LLM Leaderboard

21 Feb 2025

artificialanalysis.ai

Comparison and ranking the performance of over 30 AI models (LLMs) across key metrics including quality, price, performance and speed (output speed - tokens per second & latency - TTFT), context window & others.

Here Are My Go-To AI Tools

17 Feb 2025

open.substack.com

I share my preferences for LLMs, image models, AI video, AI music, AI-powered research, and more. These are the AI tools I regularly use or recommend to others.

A Step-by-Step Guide to Setting Up a Custom BPE Tokenizer with Tiktoken for Advanced NLP Applications in Python

17 Feb 2025

marktechpost.com

A Step-by-Step Guide to Setting Up a Custom BPE Tokenizer with Tiktoken for Advanced NLP Applications in Python

We Were Wrong About GPUs

15 Feb 2025

fly.io

Do my tears surprise you? Strong CEOs also cry.

Using pip to install a Large Language Model that’s under 100MB

7 Feb 2025

simonwillison.net

I just released llm-smollm2, a new plugin for LLM that bundles a quantized copy of the SmolLM2-135M-Instruct LLM inside of the Python package. This means you can now pip install …

Understanding Reasoning LLMs

5 Feb 2025

sebastianraschka.com

In this article, I will describe the four main approaches to building reasoning models, or how we can enhance LLMs with reasoning capabilities. I hope this p...

5 AI Agent Frameworks Compared - KDnuggets

3 Feb 2025

kdnuggets.com

Check out this comparison of 5 AI frameworks to determine which you should choose.

(WIP) A Little Bit of Reinforcement Learning from Human Feedback

2 Feb 2025

rlhfbook.com

The Reinforcement Learning from Human Feedback Book

Creating an AI Agent-Based System with LangGraph: Adding Persistence and Streaming (Step by Step Guide)

2 Feb 2025

marktechpost.com

In our previous tutorial, we built an AI agent capable of answering queries by surfing the web. However, when building agents for longer-running tasks, two critical concepts come into play: persistence and streaming. Persistence allows you to save the state of an agent at any given point, enabling you to resume from that state in future interactions. This is crucial for long-running applications. On the other hand, streaming lets you emit real-time signals about what the agent is doing at any moment, providing transparency and control over its actions. In this tutorial, we’ll enhance our agent by adding these powerful

aidanmclaughlin/AidanBench: Aidan Bench attempts to measure in LLMs.

1 Feb 2025

github.com

Aidan Bench attempts to measure in LLMs. - aidanmclaughlin/AidanBench

OpenAI o3-mini, now available in LLM

31 Jan 2025

simonwillison.net

o3-mini is out today. As with other o-series models it’s a slightly difficult one to evaluate—we now need to decide if a prompt is best run using GPT-4o, o1, o3-mini …

Multi-Head Latent Attention and Other KV Cache Tricks

29 Jan 2025

pyspur.dev

How a Key-Value (KV) cache reduces Transformer inference time by trading memory for computation

Qwen AI Introduces Qwen2.5-Max: A large MoE LLM Pretrained on Massive Data and Post-Trained with Curated SFT and RLHF Recipes

29 Jan 2025

marktechpost.com

The field of artificial intelligence is evolving rapidly, with increasing efforts to develop more capable and efficient language models. However, scaling these models comes with challenges, particularly regarding computational resources and the complexity of training. The research community is still exploring best practices for scaling extremely large models, whether they use a dense or Mixture-of-Experts (MoE) architecture. Until recently, many details about this process were not widely shared, making it difficult to refine and improve large-scale AI systems. Qwen AI aims to address these challenges with Qwen2.5-Max, a large MoE model pretrained on over 20 trillion tokens and further refined

Alibaba releases AI model it says surpasses DeepSeek

29 Jan 2025

reuters.com

The unusual timing of the Qwen 2.5-Max's release points to the pressure DeepSeek's meteoric rise in the past three weeks has placed on overseas rivals and domestic competition.

On MLA

28 Jan 2025

planetbanatt.net

The Illustrated DeepSeek-R1

27 Jan 2025

newsletter.languagemodels.co

A recipe for reasoning LLMs

DeepSeek-R1 vs. OpenAI’s o1: A New Step in Open Source and Proprietary Models

26 Jan 2025

marktechpost.com

AI has entered an era of the rise of competitive and groundbreaking large language models and multimodal models. The development has two sides, one with open source and the other being propriety models. DeepSeek-R1, an open-source AI model developed by DeepSeek-AI, a Chinese research company, exemplifies this trend. Its emergence has challenged the dominance of proprietary models such as OpenAI’s o1, sparking discussions on cost efficiency, open-source innovation, and global technological leadership in AI. Let’s delve into the development, capabilities, and implications of DeepSeek-R1 while comparing it with OpenAI’s o1 system, considering the contributions of both spaces. DeepSeek-R1 DeepSeek-R1 is

AI hallucinations can’t be stopped — but these techniques can limit their damage

25 Jan 2025

nature.com

Developers have tricks to stop artificial intelligence from making things up, but large language models are still struggling to tell the truth, the whole truth and nothing but the truth.

Noteworthy LLM Research Papers of 2024

23 Jan 2025

sebastianraschka.com

This article covers 12 influential AI research papers of 2024, ranging from mixture-of-experts models to new LLM scaling laws for precision..

LLM 0.20

23 Jan 2025

simonwillison.net

New release of my [LLM](https://llm.datasette.io/) CLI tool and Python library. A bunch of accumulated fixes and features since the start of December, most notably: - Support for OpenAI's [o1 model](https://platform.openai.com/docs/models#o1) …

How Chinese A.I. Start-Up DeepSeek Is Competing With OpenAI and Google

23 Jan 2025

nytimes.com

The company built a cheaper, competitive chatbot with fewer high-end computer chips than U.S. behemoths like Google and OpenAI, showing the limits of chip export control.

DeepSeek-R1 and exploring DeepSeek-R1-Distill-Llama-8B

20 Jan 2025

simonwillison.net

DeepSeek are the Chinese AI lab who dropped the best currently available open weights LLM on Christmas day, DeepSeek v3. That model was trained in part using their unreleased R1 …

Microsoft Presents a Comprehensive Framework for Securing Generative AI Systems Using Lessons from Red Teaming 100 Generative AI Products

18 Jan 2025

marktechpost.com

The rapid advancement and widespread adoption of generative AI systems across various domains have increased the critical importance of AI red teaming for evaluating technology safety and security. While AI red teaming aims to evaluate end-to-end systems by simulating real-world attacks, current methodologies face significant challenges in effectiveness and implementation. The complexity of modern AI systems, with their expanding capabilities across multiple modalities including vision and audio, has created an unprecedented array of potential vulnerabilities and attack vectors. Moreover, integrating agentic systems that grant AI models higher privileges and access to external tools has substantially increased the attack surface and

Lessons From Red Teaming 100 Generative AI Products

18 Jan 2025

simonwillison.net

New paper from Microsoft describing their top eight lessons learned red teaming (deliberately seeking security vulnerabilities in) 100 different generative AI models and products over the past few years. …

Implementing A Byte Pair Encoding (BPE) Tokenizer From Scratch

18 Jan 2025

sebastianraschka.com

This is a standalone notebook implementing the popular byte pair encoding (BPE) tokenization algorithm, which is used in models like GPT-2 to GPT-4, Llama 3,...

This Rumor About GPT-5 Changes Everything

17 Jan 2025

open.substack.com

Let’s start the year on an exciting note

The 2025 AI Engineering Reading List

14 Jan 2025

latent.space

We picked 50 paper/models/blogs across 10 fields in AI Eng: LLMs, Benchmarks, Prompting, RAG, Agents, CodeGen, Vision, Voice, Diffusion, Finetuning. If you're starting from scratch, start here.

Agents

12 Jan 2025

huyenchip.com

Intelligent agents are considered by many to be the ultimate goal of AI. The classic book by Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach (Prentice Hall, 1995), defines the field of AI research as “the study and design of rational agents.”

100 Must-Read Generative AI Papers from 2024

12 Jan 2025

open.substack.com

A comprehensive list of some of the most impactful generative papers from last year

7 Next-Generation Prompt Engineering Techniques - MachineLearningMastery.com

9 Jan 2025

machinelearningmastery.com

[caption align=

How to use NotebookLM for personalized knowledge synthesis

8 Jan 2025

open.substack.com

Two powerful workflows that unlock everything else. Intro: Golden Age of AI Tools and AI agent frameworks begins in 2025.

An Opinionated Evals Reading List — Apollo Research

7 Jan 2025

apolloresearch.ai

A long reading list of evals papers with recommendations and comments by the evals team.

Things we learned out about LLMs in 2024

31 Dec 2024

simonwillison.net

A lot has happened in the world of Large Language Models over the course of 2024. Here’s a review of things we figured out about the field in the past …

How to Build a Graph RAG App

30 Dec 2024

towardsdatascience.com

Using knowledge graphs and AI to retrieve, filter, and summarize medical journal articles

Gemini 2.0 Flash "Thinking Mode"

24 Dec 2024

open.substack.com

Plus building Python tools with a one-shot prompt using uv run and Claude Projects

LLM Research Papers: The 2024 List

22 Dec 2024

magazine.sebastianraschka.com

A curated list of interesting LLM-related research papers from 2024, shared for those looking for something to read over the holidays.

Why AI language models choke on too much text

22 Dec 2024

arstechnica.com

Compute costs scale with the square of the input size. That’s not great.

rasbt/LLMs-from-scratch: Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

21 Dec 2024

github.com

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step - rasbt/LLMs-from-scratch

Slim-Llama: An Energy-Efficient LLM ASIC Processor Supporting 3-Billion Parameters at Just 4.69mW

21 Dec 2024

marktechpost.com

Large Language Models (LLMs) have become a cornerstone of artificial intelligence, driving advancements in natural language processing and decision-making tasks. However, their extensive power demands, resulting from high computational overhead and frequent external memory access, significantly hinder their scalability and deployment, especially in energy-constrained environments such as edge devices. This escalates the cost of operation while also limiting accessibility to these LLMs, which therefore calls for energy-efficient approaches designed to handle billion-parameter models. Current approaches to reduce the computational and memory needs of LLMs are based either on general-purpose processors or on GPUs, with a combination of weight quantization and

OpenAI Unveils o3 System That Reasons Through Math, Science Problems

21 Dec 2024

nytimes.com

The artificial intelligence start-up said the new system, OpenAI o3, outperformed leading A.I. technologies on tests that rate skills in math, science, coding and logic.

Building effective agents \ Anthropic

19 Dec 2024

anthropic.com

A post for developers with advice and workflows for building effective AI agents

Meta AI Proposes Large Concept Models (LCMs): A Semantic Leap Beyond Token-based Language Modeling

16 Dec 2024

marktechpost.com

Large Language Models (LLMs) have achieved remarkable advancements in natural language processing (NLP), enabling applications in text generation, summarization, and question-answering. However, their reliance on token-level processing—predicting one word at a time—presents challenges. This approach contrasts with human communication, which often operates at higher levels of abstraction, such as sentences or ideas. Token-level modeling also struggles with tasks requiring long-context understanding and may produce outputs with inconsistencies. Moreover, extending these models to multilingual and multimodal applications is computationally expensive and data-intensive. To address these issues, researchers at Meta AI have proposed a new approach: Large Concept Models (LCMs). Large Concept

How LLMs Store and Use Knowledge? This AI Paper Introduces Knowledge Circuits: A Framework for Understanding and Improving Knowledge Storage in Transformer-Based LLMs

15 Dec 2024

marktechpost.com

Large language models (LLMs) can understand and generate human-like text by encoding vast knowledge repositories within their parameters. This capacity enables them to perform complex reasoning tasks, adapt to various applications, and interact effectively with humans. However, despite their remarkable achievements, researchers continue to investigate the mechanisms underlying the storage and utilization of knowledge in these systems, aiming to enhance their efficiency and reliability further. A key challenge in using large language models is their propensity to generate inaccurate, biased, or hallucinatory outputs. These problems arise from a limited understanding of how such models organize and access knowledge. Without clear

LangChain vs OpenAI API: When Simplicity Meets Scalability | Aditya Bhattacharya | Blogs Website

13 Dec 2024

blogs.adityabh.is-a.dev

This blog explores a detailed comparison between the OpenAI API and LangChain, highlighting key differences in performance and developer experience and the low level code for why these differences exist.

Transformers Key-Value (KV) Caching Explained

12 Dec 2024

towardsdatascience.com

Speed up your LLM inference

Scaling Laws – O1 Pro Architecture, Reasoning Training Infrastructure, Orion and Claude 3.5 Opus “Failures”

12 Dec 2024

semianalysis.com

There has been an increasing amount of fear, uncertainty and doubt (FUD) regarding AI Scaling laws. A cavalcade of part-time AI industry prognosticators have latched on to any bearish narrative the…

The AI Researchers Pushing Computers to Launch Nightmare Scenarios

11 Dec 2024

wsj.com

It’s largely up to companies to test whether their AI is capable of superhuman harm. At Anthropic, the Frontier Red Team assesses the risk of catastrophe.

What are Hallucinations in LLMs and 6 Effective Strategies to Prevent Them

9 Dec 2024

marktechpost.com

In large language models (LLMs), “hallucination” refers to instances where models generate semantically or syntactically plausible outputs but are factually incorrect or nonsensical. For example, a hallucination occurs when a model provides erroneous information, such as stating that Addison's disease causes “bright yellow skin” when, in fact, it causes fatigue and low blood pressure. This phenomenon is a significant concern in AI, as it can lead to the spread of false or misleading information. The issue of AI hallucinations has been explored in various research studies. A survey in “ACM Computing Surveys” describes hallucinations as “unreal perceptions that feel real.”

Countless.dev | AI Model Comparison

7 Dec 2024

countless.dev

Compare AI models easily! All providers in one place.

CPU-GPU I/O-Aware LLM Inference Reduces Latency in GPUs by Optimizing CPU-GPU Interactions

7 Dec 2024

marktechpost.com

LLMs are driving major advances in research and development today. A significant shift has been observed in research objectives and methodologies toward an LLM-centric approach. However, they are associated with high expenses, making LLMs for large-scale utilization inaccessible to many. It is, therefore, a significant challenge to reduce the latency of operations, especially in dynamic applications that demand responsiveness. KV cache is used for autoregressive decoding in LLMs. It stores key-value pairs in multi-headed attention during the pre-filling phase of inference. During the decoding stage, new KV pairs get appended to the memory. KV cache stores the intermediate key and

How to Build a General-Purpose LLM Agent

5 Dec 2024

towardsdatascience.com

A Step-by-Step Guide

Treemap

5 Dec 2024

aiworld.eu

Navigate Tomorrow's Intelligence Today

AI Hallucinations: Why Large Language Models Make Things Up (And How to Fix It) - kapa.ai - Instant AI answers to technical questions

5 Dec 2024

kapa.ai

Kapa.ai turns your knowledge base into a reliable and production-ready LLM-powered AI assistant that answers technical questions instantly. Trusted by 100+ startups and enterprises incl. OpenAI, Docker, Mapbox, Mixpanel and NextJS.

llama.cpp guide - Running LLMs locally, on any hardware, from scratch

29 Nov 2024

steelph0enix.github.io

Psst, kid, want some cheap and small LLMs?

Four Cutting-Edge Methods for Evaluating AI Agents and Enhancing LLM Performance

28 Nov 2024

marktechpost.com

The advent of LLMs has propelled advancements in AI for decades. One such advanced application of LLMs is Agents, which replicate human reasoning remarkably. An agent is a system that can perform complicated tasks by following a reasoning process similar to humans: think (solution to the problem), collect (context from past information), analyze(the situations and data), and adapt (based on the style and feedback). Agents encourage the system through dynamic and intelligent activities, including planning, data analysis, data retrieval, and utilizing the model's past experiences. A typical agent has four components: Brain: An LLM with advanced processing capabilities, such as

eugeneyan/llm-paper-notes: Notes from the Latent Space paper club. Follow along or start your own!

26 Nov 2024

github.com

Notes from the Latent Space paper club. Follow along or start your own! - eugeneyan/llm-paper-notes

Understanding Multimodal LLMs

21 Nov 2024

magazine.sebastianraschka.com

An introduction to the main techniques and latest models

Something weird is happening with LLMs and chess

17 Nov 2024

open.substack.com

Are they good or bad?

Analyzing the homerun year for LLMs: the top-100 most cited AI papers in 2023, with all medals for open models.

11 Nov 2024

zeta-alpha.com

9 October 2024, Mathias Parisot, Jakub Zavrel.Even in the red hot global race for AI dominance, you publish and you perish, unless your peers pick up your work, build further on it, and you manage to drive real progress in the field. And of course, we are all very curious who is currently having that kind of impact. Are the billions of dollars spent on AI R&D paying off in the long run? So here is, in continuation of our popular publication impact analysis of last year, Zeta Alpha's ranking of t

LLM Chunking, Indexing, Scoring and Agents, in a Nutshell - DataScienceCentral.com

31 Oct 2024

datasciencecentral.com

LLM Chunking, Indexing, Scoring and Agents, in a Nutshell. The new PageRank of RAG/LLM. With details on building relevancy scores.

Developing a computer use model

28 Oct 2024

anthropic.com

A discussion of how Anthropic's researchers developed Claude's new computer use skill, along with some relevant safety considerations

5 LLM Tools I Can’t Live Without

19 Oct 2024

kdnuggets.com

In this article, I share the five essential LLM tools that I currently find indispensable, and which have the potential to help revolutionize the way you work.

Claude: Everything you need to know about Anthropic's AI | TechCrunch

19 Oct 2024

techcrunch.com

Anthropic, the AI vendor second in size only to OpenAI, has a powerful family of generative AI models called Claude. These models can perform a range of

Nvidia just dropped a new AI model that crushes OpenAI’s GPT-4—no big launch, just big results

17 Oct 2024

venturebeat.com

Nvidia quietly launched a groundbreaking AI model that surpasses OpenAI’s GPT-4 and Anthropic’s Claude 3.5, signaling a major shift in the competitive landscape of artificial intelligence.

dpo-from-scratch.ipynb

4 Aug 2024

github.com

Implementing a ChatGPT-like LLM in PyTorch from scratch, step by step - rasbt/LLMs-from-scratch

What We Learned from a Year of Building with LLMs (Part I)

4 Aug 2024

oreilly.com

Towards Monosemanticity: A step towards understanding large language models

1 Aug 2024

towardsdatascience.com

Understanding the mechanistic interpretability research problem and reverse-engineering these large language models

Meta unleashes its most powerful AI model, Llama 3.1, with 405B parameters

24 Jul 2024

venturebeat.com

Llama 3.1 is the latest version of Meta's large language models, with a new model weight, 405 billion parameters, the biggest model it's trained.

Customize Generative AI Models for Enterprise Applications with Llama 3.1

24 Jul 2024

developer.nvidia.com

The newly unveiled Llama 3.1 collection of 8B, 70B, and 405B large language models (LLMs) is narrowing the gap between proprietary and open-source models. Their open nature is attracting more…

Llama 3.1 Released: Meta’s New Open-Source AI Model that You can Fine-Tune,

24 Jul 2024

marktechpost.com

Meta announced the release of Llama 3.1, the most capable model in the LLama Series. This latest iteration of the Llama series, particularly the 405B model, represents a substantial advancement in open-source AI capabilities, positioning Meta at the forefront of AI innovation. Meta has long advocated for open-source AI, a stance underscored by Mark Zuckerberg’s assertion that open-source benefits developers, Meta, and society. Llama 3.1 embodies this philosophy by offering state-of-the-art capabilities in an openly accessible model. The release aims to democratize AI, making cutting-edge technology available to various users and applications. The Llama 3.1 405B model stands out for

Meta Llama 3.1 405b is outperforming private models with open access

24 Jul 2024

dataconomy.com

Meta llama 3.1 405b kicks off a fresh chapter for open-source language models. This breakthrough brings unmatched skills to AI

Understanding Positional Embeddings in Transformers: From Absolute to Rotar

20 Jul 2024

towardsdatascience.com

A deep dive into absolute, relative, and rotary positional embeddings with code examples

Claude 3.5 Sonnet

15 Jul 2024

anthropic.com

Introducing Claude 3.5 Sonnet—our most intelligent model yet. Sonnet now outperforms competitor models and Claude 3 Opus on key evaluations, at twice the speed.

Do large language models understand the world?

13 Jul 2024

amazon.science

In addition to its practical implications, recent work on “meaning representations” could shed light on some old philosophical questions.

Building an LLM Router for High-Quality and Cost-Effective Responses

4 Jul 2024

anyscale.com

Anyscale is the leading AI application platform. With Anyscale, developers can build, run and scale AI applications instantly.

From bare metal to a 70B model: infrastructure set-up and scripts - imbue

3 Jul 2024

imbue.com

We would like to thank Voltage Park, Dell, H5, and NVIDIA for their invaluable partnership and help with setting up our cluster. A special…

StarCoder2-15B: A Powerful LLM for Code Generation, Summarization, and Docu

2 Jul 2024

nvda.ws

Experience the leading models to build enterprise generative AI apps now.

How Gradient created an open LLM with a million-token context window

27 Jun 2024

venturebeat.com

AI startup Gradient and cloud platform Crusoe teamed up to extend the context window of Meta's Llama 3 models to 1 million tokens.

Some Commonly Used Advanced Prompt Engineering Techniques Explained Using S

22 Jun 2024

marktechpost.com

In the developing field of Artificial Intelligence (AI), the ability to think quickly has become increasingly significant. The necessity of communicating with AI models efficiently becomes critical as these models get more complex. In this article we will explain a number of sophisticated prompt engineering strategies, simplifying these difficult ideas through straightforward human metaphors. The techniques and their examples have been discussed to see how they resemble human approaches to problem-solving. Chaining Methods Analogy: Solving a problem step-by-step. Chaining techniques are similar to solving an issue one step at a time. Chaining techniques include directing the AI via a systematic

Key Metrics for Evaluating Large Language Models (LLMs)

20 Jun 2024

marktechpost.com

Evaluating Large Language Models (LLMs) is a challenging problem in language modeling, as real-world problems are complex and variable. Conventional benchmarks frequently fail to fully represent LLMs' all-encompassing performance. A recent LinkedIn post has emphasized a number of important measures that are essential to comprehend how well new models function, which are as follows. MixEval Achieving a balance between thorough user inquiries and effective grading systems is necessary for evaluating LLMs. Conventional standards based on ground truth and LLM-as-judge benchmarks encounter difficulties such as biases in grading and possible contamination over time. MixEval solves these problems by combining real-world user

Firecrawl: A Powerful Web Scraping Tool for Turning Websites into Large Lan

20 Jun 2024

marktechpost.com

In the rapidly advancing field of Artificial Intelligence (AI), effective use of web data can lead to unique applications and insights. A recent tweet has brought attention to Firecrawl, a potent tool in this field created by the Mendable AI team. Firecrawl is a state-of-the-art web scraping program made to tackle the complex problems involved in getting data off the internet. Web scraping is useful, but it frequently requires overcoming various challenges like proxies, caching, rate limitations, and material generated with JavaScript. Firecrawl is a vital tool for data scientists because it addresses these issues head-on. Even without a sitemap,

Let's reproduce GPT-2 (124M)

19 Jun 2024

m.youtube.com

We reproduce the GPT-2 (124M) from scratch. This video covers the whole process: First we build the GPT-2 network, then we optimize its training to be really fast, then we set up the training run following the GPT-2 and GPT-3 paper and their hyperparameters, then we hit run, and come back the next morning to see our results, and enjoy some amusing model generations. Keep in mind that in some places this video builds on the knowledge from earlier videos in the Zero to Hero Playlist (see my channel). You could also see this video as building my nanoGPT repo, which by the end is about 90% similar. Links: - build-nanogpt GitHub repo, with all the changes in this video as individual commits: https://github.com/karpathy/build-nanogpt - nanoGPT repo: https://github.com/karpathy/nanoGPT - llm.c repo: https://github.com/karpathy/llm.c - my website: https://karpathy.ai - my twitter: https://twitter.com/karpathy - our Discord channel: https://discord.gg/3zy8kqD9Cp Supplementary links: - Attention is All You Need paper: https://arxiv.org/abs/1706.03762 - OpenAI GPT-3 paper: https://arxiv.org/abs/2005.14165 - OpenAI GPT-2 paper: https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf- The GPU I'm training the model on is from Lambda GPU Cloud, I think the best and easiest way to spin up an on-demand GPU instance in the cloud that you can ssh to: https://lambdalabs.com Chapters: 00:00:00 intro: Let’s reproduce GPT-2 (124M) 00:03:39 exploring the GPT-2 (124M) OpenAI checkpoint 00:13:47 SECTION 1: implementing the GPT-2 nn.Module 00:28:08 loading the huggingface/GPT-2 parameters 00:31:00 implementing the forward pass to get logits 00:33:31 sampling init, prefix tokens, tokenization 00:37:02 sampling loop 00:41:47 sample, auto-detect the device 00:45:50 let’s train: data batches (B,T) → logits (B,T,C) 00:52:53 cross entropy loss 00:56:42 optimization loop: overfit a single batch 01:02:00 data loader lite 01:06:14 parameter sharing wte and lm_head 01:13:47 model initialization: std 0.02, residual init 01:22:18 SECTION 2: Let’s make it fast. GPUs, mixed precision, 1000ms 01:28:14 Tensor Cores, timing the code, TF32 precision, 333ms 01:39:38 float16, gradient scalers, bfloat16, 300ms 01:48:15 torch.compile, Python overhead, kernel fusion, 130ms 02:00:18 flash attention, 96ms 02:06:54 nice/ugly numbers. vocab size 50257 → 50304, 93ms 02:14:55 SECTION 3: hyperpamaters, AdamW, gradient clipping 02:21:06 learning rate scheduler: warmup + cosine decay 02:26:21 batch size schedule, weight decay, FusedAdamW, 90ms 02:34:09 gradient accumulation 02:46:52 distributed data parallel (DDP) 03:10:21 datasets used in GPT-2, GPT-3, FineWeb (EDU) 03:23:10 validation data split, validation loss, sampling revive 03:28:23 evaluation: HellaSwag, starting the run 03:43:05 SECTION 4: results in the morning! GPT-2, GPT-3 repro 03:56:21 shoutout to llm.c, equivalent but faster code in raw C/CUDA 03:59:39 summary, phew, build-nanogpt github repo Corrections: I will post all errata and followups to the build-nanogpt GitHub repo (link above) SuperThanks: I experimentally enabled them on my channel yesterday. Totally optional and only use if rich. All revenue goes to to supporting my work in AI + Education.

How to use an open source LLM model locally and remotely

19 Jun 2024

thoughtbot.com

Run an open source language model in your local machine and remotely.

“The” Midjourney model personalization guide

12 Jun 2024

dataconomy.com

Midjourney model personalization is now live, offering you a more tailored image generation experience by teaching the AI your preferences.

How to use Perplexity in your PM work

12 Jun 2024

lennysnewsletter.com

27 examples (with actual prompts) of how product managers are using Perplexity today

[2406.01506] The Geometry of Categorical and Hierarchical Concepts in Large

11 Jun 2024

arxiv.org

The linear representation hypothesis is the informal idea that semantic concepts are encoded as linear directions in the representation spaces of large language models (LLMs). Previous work has...

What We Learned from a Year of Building with LLMs (Part II)

11 Jun 2024

oreilly.com

Sharpening LLMs: The Sharpest Tools and Essential Techniques for Precision

11 Jun 2024

marktechpost.com

The ability to discern relevant and essential information from noise is paramount in AI, particularly within large language models (LLMs). With the surge of information and the complexity of tasks, there's a need for efficient mechanisms to enhance the performance and reliability of these models. Let’s explore the essential tools & techniques for refining LLMs and delivering precise, actionable insights. The focus will be on Retrieval-Augmented Generation (RAG), agentic functions, Chain of Thought (CoT) prompting, few-shot learning, prompt engineering, and prompt optimization. Retrieval-Augmented Generation (RAG): Providing Relevant Context RAG combines the power of retrieval mechanisms with generative models, ensuring that

List of Activities and Their Corresponding Suitable LLMs in the Artificial

11 Jun 2024

marktechpost.com

Choosing large language models (LLMs) tailored for specific tasks is crucial for maximizing efficiency and accuracy. With natural language processing (NLP) advancements, different models have emerged, each excelling in unique domains. Here is a comprehensive guide to the most suitable LLMs for various activities in the AI world. Hard Document Understanding: Claude Opus Claude Opus excels at tasks requiring deep understanding and interpretation of complex documents. This model excels in parsing dense legal texts, scientific papers, and intricate technical manuals. Claude Opus is designed to handle extensive context windows, ensuring it captures nuanced details and complicated relationships within the text.

Three Things to Know About Prompting LLMs

11 Jun 2024

sloanreview.mit.edu

Apply these techniques when crafting prompts for large language models to elicit more relevant responses.

Perplexity goes beyond AI search, launches publishing platform ‘Pages’

31 May 2024

venturebeat.com

In most cases, Perplexity produced the desired Pages, but what we found missing was the option to edit the content manually.

The Great AI Chatbot Challenge: ChatGPT vs. Gemini vs. Copilot vs. Perplexi

28 May 2024

wsj.com

We tested OpenAI’s ChatGPT against Microsoft’s Copilot and Google’s Gemini, along with Perplexity and Anthropic’s Claude. Here’s how they ranked.

The future of foundation models is closed-source

26 May 2024

thediff.co

if the centralizing forces of data and compute hold, open and closed-source AI cannot both dominate long-term

Demystifying Vision-Language Models: An In-Depth Exploration

24 May 2024

marktechpost.com

Vision-language models (VLMs), capable of processing both images and text, have gained immense popularity due to their versatility in solving a wide range of tasks, from information retrieval in scanned documents to code generation from screenshots. However, the development of these powerful models has been hindered by a lack of understanding regarding the critical design choices that truly impact their performance. This knowledge gap makes it challenging for researchers to make meaningful progress in this field. To address this issue, a team of researchers from Hugging Face and Sorbonne Université conducted extensive experiments to unravel the factors that matter the

AI Is a Black Box. Anthropic Figured Out a Way to Look Inside

22 May 2024

wired.com

What goes on in artificial neural networks work is largely a mystery, even to their creators. But researchers from Anthropic have caught a glimpse.

naklecha/llama3-from-scratch

21 May 2024

github.com

llama3 implementation one matrix multiplication at a time - naklecha/llama3-from-scratch

Abacus AI Releases Smaug-Llama-3-70B-Instruct: The New Benchmark in Open-So

21 May 2024

marktechpost.com

Artificial intelligence (AI) has revolutionized various fields by introducing advanced models for natural language processing (NLP). NLP enables computers to understand, interpret, and respond to human language in a valuable way. This field encompasses text generation, translation, and sentiment analysis applications, significantly impacting industries like healthcare, finance, and customer service. The evolution of NLP models has driven these advancements, continually pushing the boundaries of what AI can achieve in understanding and generating human language. Despite these advancements, developing models that can effectively handle complex multi-turn conversations remains a persistent challenge. Existing models often fail to maintain context and coherence over

Do Enormous LLM Context Windows Spell the End of RAG?

13 May 2024

thenewstack.io

Now that LLMs can retrieve 1 million tokens at once, how long will it be until we don’t need retrieval augmented generation for accurate AI responses?

How Good Are the Latest Open LLMs? And Is DPO Better Than PPO?

13 May 2024

sebastianraschka.com

What a month! We had four major open LLM releases: Mixtral, Meta AI's Llama 3, Microsoft's Phi-3, and Apple's OpenELM. In my new article, I review and discus...

ChuXin: A Fully Open-Sourced Language Model with a Size of 1.6 Billion Para

12 May 2024

marktechpost.com

The capacity of large language models (LLMs) to produce adequate text in various application domains has caused a revolution in natural language creation. These models are essentially two types: 1) Most model weights and data sources are open source. 2) All model-related information is publicly available, including training data, data sampling ratios, training logs, intermediate checkpoints, and assessment methods (Tiny-Llama, OLMo, and StableLM 1.6B). Full access to open language models for the research community is vital for thoroughly investigating these models' capabilities and limitations and understanding their inherent biases and potential risks. This is necessary despite the continued breakthroughs in

Title:You Only Cache Once: Decoder-Decoder Architectures for Language Model

11 May 2024

arxiv.org

We introduce a decoder-decoder architecture, YOCO, for large language models, which only caches key-value pairs once. It consists of two components, i.e., a cross-decoder stacked upon a...

Anthropic AI Launches a Prompt Engineering Tool that Generates Production-R

11 May 2024

marktechpost.com

Generative AI (GenAI) tools have come a long way. Believe it or not, the first generative AI tools were introduced in the 1960s in a Chatbot. Still, it was only in 2014 that generative adversarial networks (GANs) were introduced, a type of Machine Learning (ML) algorithm that allowed generative AI to finally create authentic images, videos, and audio of real people. In 2024, we can create anything imaginable using generative AI tools like ChatGPT, DALL-E, and others. However, there is a problem. We can use those AI tools but can not get the most out of them or use them

Cleaning

11 May 2024

docs.unstructured.io

As part of data preparation for an NLP model, it’s common to need to clean up your data prior to passing it into the model. If there’s unwanted content in your output, for example, it could impact the quality of your NLP model. To help with this, the `unstructured` library includes cleaning functions to help users sanitize output before sending it to downstream applications.

[2404.19737] Better & Faster Large Language Models via Multi-token Predicti

8 May 2024

arxiv.org

Large language models such as GPT and Llama are trained with a next-token prediction loss. In this work, we suggest that training language models to predict multiple future tokens at once results...

Researchers at NVIDIA AI Introduce ‘VILA’: A Vision Language Model that can

7 May 2024

marktechpost.com

The rapid evolution in AI demands models that can handle large-scale data and deliver accurate, actionable insights. Researchers in this field aim to create systems capable of continuous learning and adaptation, ensuring they remain relevant in dynamic environments. A significant challenge in developing AI models lies in overcoming the issue of catastrophic forgetting, where models fail to retain previously acquired knowledge when learning new tasks. This challenge becomes more pressing as applications increasingly demand continuous learning capabilities. For instance, models must update their understanding of healthcare, financial analysis, and autonomous systems while retaining prior knowledge to make informed decisions. The

Hugging Face - Documentation

5 May 2024

huggingface.co

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Understanding Key Terminologies in Large Language Model (LLM) Universe

25 Apr 2024

marktechpost.com

Are you curious about the intricate world of large language models (LLMs) and the technical jargon that surrounds them? Understanding the terminology, from the foundational aspects of training and fine-tuning to the cutting-edge concepts of transformers and reinforcement learning, is the first step towards demystifying the powerful algorithms that drive modern AI language systems. In this article, we delve into 25 essential terms to enhance your technical vocabulary and provide insights into the mechanisms that make LLMs so transformative. Heatmap representing the relative importance of terms in the context of LLMs Source: marktechpost.com 1. LLM (Large Language Model) Large Language

Top 15 AI Libraries/Frameworks for Automatically Red-Teaming Your Generativ

25 Apr 2024

marktechpost.com

Prompt Fuzzer: The Prompt Fuzzer is an interactive tool designed to evaluate the security of GenAI application system prompts by simulating various dynamic LLM-based attacks. It assesses security by analyzing the results of these simulations, helping users fortify their system prompts accordingly. This tool specifically customizes its tests to fit the unique configuration and domain of the user's application. The Fuzzer also features a Playground chat interface, allowing users to refine their system prompts iteratively, enhancing their resilience against a broad range of generative AI attacks. Users should be aware that using the Prompt Fuzzer will consume tokens. Garak: Garak

Meta says Llama 3 beats most other models, including Gemini - The Verge

19 Apr 2024

theverge.com

The models have some pretty good general knowledge.

anthropics/anthropic-cookbook: A collection of notebooks/recipes showcasing

17 Apr 2024

github.com

A collection of notebooks/recipes showcasing some fun and effective ways of using Claude. - anthropics/anthropic-cookbook

Deep Learning Architectures From CNN, RNN, GAN, and Transformers To Encoder

15 Apr 2024

marktechpost.com

Deep learning architectures have revolutionized the field of artificial intelligence, offering innovative solutions for complex problems across various domains, including computer vision, natural language processing, speech recognition, and generative models. This article explores some of the most influential deep learning architectures: Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Generative Adversarial Networks (GANs), Transformers, and Encoder-Decoder architectures, highlighting their unique features, applications, and how they compare against each other. Convolutional Neural Networks (CNNs) CNNs are specialized deep neural networks for processing data with a grid-like topology, such as images. A CNN automatically detects the important features without any human supervision.

Tips for LLM Pretraining and Evaluating Reward Models

15 Apr 2024

magazine.sebastianraschka.com

Discussing AI Research Papers in March 2024

Lessons after a half-billion GPT tokens - Ken Kantzer's Blog

14 Apr 2024

kenkantzer.com

My startup Truss (gettruss.io) released a few LLM-heavy features in the last six months, and the narrative around LLMs that I read on Hacker News is now starting to diverge from my reality, so I thought I’d share some of the more “surprising” lessons after churning through just north of 500 million tokens, by my […]

5 Ways To Use LLMs On Your Laptop

13 Apr 2024

kdnuggets.com

Run large language models on your local PC for customized AI capabilities with more control, privacy, and personalization.

Words are flowing out like endless rain: Recapping a busy week of LLM news

13 Apr 2024

arstechnica.com

Gemini 1.5 Pro launch, new version of GPT-4 Turbo, new Mistral model, and more.

Gemini: A Family of Highly Capable Multimodal Models

12 Apr 2024

dev.to

Peter Gostev’s Post

10 Apr 2024

linkedin.com

We are seeing some clear categories emerge in the world of LLMs - 1) affordable (~$1 per million tokens); 2) mid-range ($8/m) and 3) top end ($25-50/m)… | 32 comments on LinkedIn

Detecting Hallucinations in Large Language Models with Text Similarity Metr

5 Apr 2024

dev.to

In the world of LLMs, there is a phenomenon known as "hallucinations." These hallucinations are...

Top Open Source Large Language Models (LLMs) Available For Commercial Use

5 Apr 2024

marktechpost.com

The top open source Large Language Models available for commercial use are as follows. Llama - 2 Meta released Llama 2, a set of pretrained and refined LLMs, along with Llama 2-Chat, a version of Llama 2. These models are scalable up to 70 billion parameters. It was discovered after extensive testing on safety and helpfulness-focused benchmarks that Llama 2-Chat models perform better than current open-source models in most cases. Human evaluations have shown that they align well with several closed-source models. The researchers have even taken a few steps to guarantee the security of these models. This includes annotating

LLaMA Now Goes Faster on CPUs

2 Apr 2024

justine.lol

I wrote 84 new matmul kernels to improve llamafile CPU performance.

Large language models use a surprisingly simple mechanism to retrieve some

2 Apr 2024

news.mit.edu

Researchers find large language models use a simple mechanism to retrieve stored knowledge when they respond to a user prompt. These mechanisms can be leveraged to see what the model knows about different subjects and possibly to correct false information it has stored.

Introducing DBRX: A New State-of-the-Art Open LLM

2 Apr 2024

databricks.com

ChatGPT vs Perplexity AI: AI App Comparison

1 Apr 2024

marktechpost.com

What is ChatGPT? ChatGPT, developed by OpenAI, is an AI platform renowned for its conversational AI capabilities. Leveraging the power of the Generative Pre-trained Transformer models, ChatGPT generates human-like text responses across various topics, from casual conversations to complex, technical discussions. Its ability to engage users with coherent, contextually relevant dialogues stands out, making it highly versatile for various applications, including content creation, education, customer service, and more. Its integration with tools like DALL-E for image generation from textual descriptions and its continual updates for enhanced performance showcase its commitment to providing an engaging and innovative user experience. ChatGPT Key

Mamba Explained

30 Mar 2024

thegradient.pub

Is Attention all you need? Mamba, a novel AI model based on State Space Models (SSMs), emerges as a formidable alternative to the widely used Transformer models, addressing their inefficiency in processing long sequences.

How Nvidia Blackwell Systems Attack 1 Trillion Parameter AI Models

29 Mar 2024

nextplatform.com

We like datacenter compute engines here at The Next Platform, but as the name implies, what we really like are platforms – how compute, storage,

How Chain-of-Thought Reasoning Helps Neural Networks Compute

29 Mar 2024

quantamagazine.org

Large language models do better at solving problems when they show their work. Researchers are beginning to understand why.

Why and How to Achieve Longer Context Windows for LLMs

11 Mar 2024

towardsdatascience.com

Language models (LLMs) have revolutionized the field of natural language processing (NLP) over the last few years, achieving…

Generative AI Design Patterns: A Comprehensive Guide | by Vincent Koc | Feb

11 Mar 2024

towardsdatascience.com

Reference architecture patterns and mental models for working with Large Language Models (LLM’s)

You can now train a 70b language model at home

11 Mar 2024

answer.ai

We’re releasing an open source system, based on FSDP and QLoRA, that can train a 70b model on two 24GB GPUs.

Easily Train a Specialized LLM: PEFT, LoRA, QLoRA, LLaMA-Adapter, and More

11 Mar 2024

towardsdatascience.com

Training a specialized LLM over your own data is easier than you think…

Google Bard is called Gemini now and expands to mobile, paid versions

7 Mar 2024

axios.com

The search giant is unifying its AI-assistant efforts under one name and trying to show it can match rivals.

Anthropic’s Post

5 Mar 2024

linkedin.com

Today, we're announcing the Claude 3 model family, which sets new industry benchmarks across a wide range of cognitive tasks. The family includes three… | 429 comments on LinkedIn

OpenAI's ChatGPT may have its first true rival in Anthropic's new chatbot

5 Mar 2024

qz.com

The Amazon-backed AI startup said its "most intelligent model" outperformed OpenAI's powerful GPT-4

rasbt/LLMs-from-scratch

29 Feb 2024

github.com

Implementing a ChatGPT-like LLM in PyTorch from scratch, step by step - rasbt/LLMs-from-scratch

Meet RAGxplorer: An interactive AI Tool to Support the Building of Retrieva

29 Feb 2024

marktechpost.com

Understanding how well they comprehend and organize information is crucial in advanced language models. A common challenge arises in visualizing the intricate relationships between different document parts, especially when using complex models like the Retriever-Answer Generator (RAG). Existing tools can only sometimes provide a clear picture of how chunks of information relate to each other and specific queries. Several attempts have been made to address this issue, but they often need to deliver the need to provide an intuitive and interactive solution. These tools need help breaking down documents into manageable pieces and visualizing their semantic landscape effectively. As a

Meet Google Lumiere AI, Bard’s video maker cousin

29 Feb 2024

dataconomy.com

Step into the future of video creation with Google Lumiere, the latest breakthrough from Google Research that promises to redefine

How To Build an LLM-Powered App To Chat with PapersWithCode

29 Feb 2024

towardsdatascience.com

Keep up with the latest ML research

The killer app of Gemini Pro 1.5 is video

29 Feb 2024

simonwillison.net

Last week Google introduced Gemini Pro 1.5, an enormous upgrade to their Gemini series of AI models. Gemini Pro 1.5 has a 1,000,000 token context size. This is huge—previously that …

Understanding Direct Preference Optimization

29 Feb 2024

towardsdatascience.com

This blog post will look at the “Direct Preference Optimization: Your Language Model is Secretly a Reward Model” paper and its findings.

I Spent a Week With Gemini Pro 1.5—It’s Fantastic

29 Feb 2024

every.to

When it comes to context windows, size matters

Title:The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

29 Feb 2024

arxiv.org

Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single...

Sora early access: Your guide to securing a spot

29 Feb 2024

dataconomy.com

Are you looking for the news everyday for Sora early access like us? Well you are absolutely right because OpenAI's

Au Large | Mistral AI | Frontier AI in your hands

29 Feb 2024

mistral.ai

Mistral Large is our flagship model, with top-tier reasoning capacities. It is also available on Azure.

Claude

22 Feb 2024

claude.ai

Talk with Claude, an AI assistant from Anthropic

Beyond Self-Attention: How a Small Language Model Predicts the Next Token

22 Feb 2024

shyam.blog

A deep dive into the internals of a small transformer model to learn how it turns self-attention calculations into accurate predictions for the next token.

How do transformers work?+Design a Multi-class Sentiment Analysis for Custo

22 Feb 2024

open.substack.com

We will deep dive into understanding how transformer model work like BERT(Non-mathematical Explanation of course!). system design to use the transformer to build a Sentiment Analysis

1708022141659 (JPEG Image, 1280 × 1600 pixels) — Scaled (56%)

22 Feb 2024

media.licdn.com

Groq Inference Tokenomics: Speed, But At What Cost?

22 Feb 2024

semianalysis.com

Faster than Nvidia? Dissecting the economics

How Well Can LLMs Negotiate? Stanford Researchers Developed ‘NegotiationAre

20 Feb 2024

marktechpost.com

In artificial intelligence, the capacity of Large Language Models (LLMs) to negotiate mirrors a leap toward achieving human-like interactions in digital negotiations. At the heart of this exploration is the NEGOTIATION ARENA, a pioneering framework devised by researchers from Stanford University and Bauplan. This innovative platform delves into the negotiation prowess of LLMs, offering a dynamic environment where AI can mimic, strategize, and engage in nuanced dialogues across a spectrum of scenarios, from splitting resources to intricate trade and price negotiations. The NEGOTIATION ARENA is a tool and a gateway to understanding how AI can be shaped to think, react,

Sora

17 Feb 2024

openai.com

Sora is an AI model that can create realistic and imaginative scenes from text instructions.

Code LoRA from Scratch - a Lightning Studio by sebastian

15 Feb 2024

lightning.ai

LoRA (Low-Rank Adaptation) is a popular technique to finetune LLMs more efficiently. This Studio explains how LoRA works by coding it from scratch, which is an excellent exercise for looking under …

Bard is now Gemini and Gemini Advanced is amazing

15 Feb 2024

dataconomy.com

AI community is once again filled with excitement as Bard is now Gemini and Gemini Advanced offering users an exceptional

Ask HN: What have you built with LLMs?

11 Feb 2024

news.ycombinator.com

Title:BloombergGPT: A Large Language Model for Finance

4 Feb 2024

arxiv.org

The use of NLP in the realm of financial technology is broad and complex, with applications ranging from sentiment analysis and named entity recognition to question answering. Large Language...

Exploring the Zephyr 7B: A Comprehensive Guide to the Latest Large Language

24 Jan 2024

kdnuggets.com

Zephyr is a series of Large Language Models released by Hugging Face trained using distilled supervised fine-tuning (dSFT) on larger models with significantly improved task accuracy.

Mastering PDFs: Extracting Sections, Headings, Paragraphs, and Tables with

17 Jan 2024

blog.llamaindex.ai

LlamaIndex is a simple, flexible data framework for connecting custom data sources to large language models (LLMs).

Understanding and Coding Self-Attention, Multi-Head Attention, Cross-Attent

16 Jan 2024

magazine.sebastianraschka.com

This article will teach you about self-attention mechanisms used in transformer architectures and large language models (LLMs) such as GPT-4 and Llama.

Dashboard - SciSummary

16 Jan 2024

scisummary.com

AI Driven tools for researchers and students. Use AI to summarize and understand scientific articles and research papers.

Meet Waymo’s MotionLM: The State-of-the-Art Multi-Agent Motion Prediction A

7 Jan 2024

marktechpost.com

Autoregressive language models have excelled at predicting the subsequent subword in a sentence without the need for any predefined grammar or parsing concepts. This method has been expanded to include continuous data domains like audio and image production, where data is represented as discrete tokens, much like language model vocabularies. Due to their versatility, sequence models have attracted interest for use in increasingly complicated and dynamic contexts, such as behavior. Road users are compared to participants in a continuous conversation when driving since they exchange actions and replies. The question is whether similar sequence models may be used to forecast

How much detail is too much? Midjourney v6 attempts to find out

7 Jan 2024

arstechnica.com

As Midjourney rolls out new features, it continues to make some artists furious.

10 Noteworthy AI Research Papers of 2023

7 Jan 2024

magazine.sebastianraschka.com

This year has felt distinctly different. I've been working in, on, and with machine learning and AI for over a decade, yet I can't recall a time when these fields were as popular and rapidly evolving as they have been this year. To conclude an eventful 2023 in machine learning and AI research, I'm excited to share 10 noteworthy papers I've read this year. My personal focus has been more on large language models, so you'll find a heavier emphasis on large language model (LLM) papers than computer vision papers this year.

7 Steps to Mastering Large Language Models (LLMs)

20 Oct 2023

kdnuggets.com

Large Language Models (LLMs) have unlocked a new era in natural language processing. So why not learn more about them? Go from learning what large language models are to building and deploying LLM apps in 7 easy steps with this guide.

Meta AI Researchers Propose Advanced Long-Context LLMs: A Deep Dive into Up

20 Oct 2023

marktechpost.com

The emergence of Large Language Models (LLMs) in natural language processing represents a groundbreaking development. These models, trained on vast amounts of data and leveraging immense computational resources, promise to transform human interactions with the digital world. As they evolve through scaling and rapid deployment, their potential use cases become increasingly intricate and complex. They extend their capabilities to tasks such as analyzing dense, knowledge-rich documents, enhancing chatbot experiences to make them more genuine and engaging, and assisting human users in iterative creative processes like coding and design. One crucial feature that empowers this evolution is the capacity to effectively

This AI Paper from NVIDIA Explores the Power of Retrieval-Augmentation vs.

20 Oct 2023

marktechpost.com

In a comparative study, Researchers from Nvidia investigated the impact of retrieval augmentation and context window size on the performance of large language models (LLMs) in downstream tasks. The findings reveal that retrieval augmentation consistently enhances LLM performance, irrespective of context window size. Their research sheds light on the effectiveness of retrieval mechanisms in optimizing LLMs for various applications. Researchers delve into the domain of long-context language models, investigating the efficacy of retrieval augmentation and context window size in enhancing LLM performance across various downstream tasks. It conducts a comparative analysis of different pretrained LLMs, demonstrating that retrieval mechanisms significantly

Finetuning LLMs with LoRA and QLoRA: Insights from Hundreds of Experiments - Lightning AI

20 Oct 2023

lightning.ai

LoRA is one of the most widely used, parameter-efficient finetuning techniques for training custom LLMs. From saving memory with QLoRA to selecting the optimal LoRA settings, this article provides practical insights for those interested in applying it.

Getting Started with Large Language Models: Key Things to Know

20 Oct 2023

flyte.org

As a machine learning engineer who has witnessed the rise of Large Language Models (LLMs), I find it daunting to comprehend how the ecosystem surrounding LLMs is developing.

Unlocking GPT-4 Summarization with Chain of Density Prompting

20 Oct 2023

kdnuggets.com

Unlock the power of GPT-4 summarization with Chain of Density (CoD), a technique that attempts to balance information density for high-quality summaries.

The Ins and Outs of Retrieval-Augmented Generation (RAG)

20 Oct 2023

towardsdatascience.com

Our weekly selection of must-read Editors’ Picks and original features

Building RAG-based LLM Applications for Production (Part 1)

20 Oct 2023

anyscale.com

In this guide, we will learn how to develop and productionize a retrieval augmented generation (RAG) based LLM application, with a focus on scale and evaluation.

RAG vs Finetuning: Which Is the Best Tool to Boost Your LLM Application?

20 Oct 2023

towardsdatascience.com

The definitive guide for choosing the right method for your use case

A High-Level Overview Of Large Language Model Concepts, Use Cases, And Tool

20 Oct 2023

smashingmagazine.com

Discuss the concept of large language models (LLMs) and how they are implemented with a set of data to develop an application. Joas compares a collection of no-code and low-code apps designed to help you get a feel for not only how the concept works but also to get a sense of what types of models are available to train AI on different skill sets.

Augmenting LLMs with RAG

20 Oct 2023

towardsdatascience.com

An End to End Example Of Seeing How Well An LLM Model Can Answer Amazon SageMaker Related Questions

Parallel Processing in Prompt Engineering: The Skeleton-of-Thought Techniqu

7 Oct 2023

kdnuggets.com

Explore how the Skeleton-of-Thought prompt engineering technique enhances generative AI by reducing latency, offering structured output, and optimizing projects.

[2302.07730] Transformer models: an introduction and catalog

5 Oct 2023

arxiv.org

In the past few years we have seen the meteoric appearance of dozens of foundation models of the Transformer family, all of which have memorable and sometimes funny, but not self-explanatory,...

Hey, Computer, Make Me a Font

4 Oct 2023

serce.me

This is a story of my journey learning to build generative ML models from scratch and teaching a computer to create fonts in the process.

SaaS Competitive Advantage Through Elegant LLM Feedback Mechanisms

4 Oct 2023

tomtunguz.com

Eliciting product feedback elegantly is a competitive advantage for LLM-software. Over the weekend, I queried Google’s Bard, & noticed the elegant feedback loop the product team has incorporated into their product. I asked Bard to compare the 3rd-row leg room of the leading 7-passenger SUVs. At the bottom of the post is a little G button, which double-checks the response using Google searches. I decided to click it. This is what I would be doing in any case ; spot-checking some of the results.

ChatGPT, Bard, or Bing Chat? Differences Among 3 Generative-AI Bots

3 Oct 2023

nngroup.com

Participants rated Bing Chat as less helpful and trustworthy than ChatGPT or Bard. These results can be attributed to Bing’s richer yet imperfect UI and to its poorer information aggregation.

Bard

3 Oct 2023

bard.google.com

Bard is now Gemini. Get help with writing, planning, learning, and more from Google AI.

The State of Large Language Models

3 Oct 2023

scientificamerican.com

We present the latest updates on ChatGPT, Bard and other competitors in the artificial intelligence arms race.

10 Ways to Improve the Performance of Retrieval Augmented Generation System

25 Sep 2023

towardsdatascience.com

Tools to go from prototype to production

How to Build an LLM from Scratch

25 Sep 2023

towardsdatascience.com

Data Curation, Transformers, Training at Scale, and Model Evaluation

Large Language Model Prompt Engineering for Complex Summarization - ISE Dev

25 Sep 2023

devblogs.microsoft.com

Learn how to use GPT / LLMs to create complex summaries such as for medical text

Open LLM Leaderboard : a Hugging Face Space by HuggingFaceH4

25 Sep 2023

huggingface.co

Track, rank and evaluate open LLMs and chatbots

Llama from scratch

25 Sep 2023

blog.briankitano.com

I want to provide some tips from my experience implementing a paper. I'm going to cover my tips so far from implementing a dramatically scaled-down versio...

Cracking Open the OpenAI (Python) API

25 Sep 2023

towardsdatascience.com

A complete beginner-friendly introduction with example code

Cracking Open the Hugging Face Transformers Library

25 Sep 2023

towardsdatascience.com

A quick-start guide to using open-source LLMs

Asking 60+ LLMs a set of 20 questions

25 Sep 2023

benchmarks.llmonitor.com

Human-readable benchmarks of 60+ open-source and proprietary LLMs.

OpenAI Unveils DALL·E 3: A Revolutionary Leap in Text-to-Image Generation

24 Sep 2023

marktechpost.com

In a significant technological leap, OpenAI has announced the launch of DALL·E 3, the latest iteration in their groundbreaking text-to-image generation technology. With an unprecedented capacity to understand nuanced and detailed descriptions, DALL·E 3 promises to revolutionize the creative landscape by allowing users to translate their textual ideas into astonishingly accurate images effortlessly. DALL·E 3 is currently in research preview, offering a tantalizing glimpse into its capabilities. However, the broader availability of this cutting-edge technology is set for early October, when it will be accessible to ChatGPT Plus and Enterprise customers through the API and Labs later in the fall.

Comparison: DALL-E 3 vs Midjourney

24 Sep 2023

dataconomy.com

DALL-E 3, the latest version of OpenAI's ground-breaking generative AI visual art platform, was just announced with groundbreaking features, including

What OpenAI Really Wants

17 Sep 2023

wired.com

The young company sent shock waves around the world when it released ChatGPT. But that was just the start. The ultimate goal: Change everything. Yes. Everything.

A Beginner’s Guide to Building LLM-Powered Applications with LangChain!

12 Sep 2023

dev.to

If you're a developer or simply someone passionate about technology, you've likely encountered AI...

iryna-kondr/scikit-llm: Seamlessly integrate LLMs into scikit-learn.

31 Aug 2023

github.com

Seamlessly integrate LLMs into scikit-learn.

Prompt Engineering — How to trick AI into solving your problems

31 Aug 2023

towardsdatascience.com

7 prompting tricks, Langchain, and Python example code

A Beginner’s Guide to LLM Fine-Tuning

30 Aug 2023

towardsdatascience.com

How to fine-tune Llama and other LLMs with one tool

Together AI Unveils Llama-2-7B-32K-Instruct: A Breakthrough in Extended-Con

27 Aug 2023

marktechpost.com

A multifaceted challenge has arisen in the expansive realm of natural language processing: the ability to adeptly comprehend and respond to intricate and lengthy instructions. As communication nuances become more complicated, the shortcomings of prevailing models in dealing with extensive contextual intricacies have been laid bare. Within these pages, an extraordinary solution crafted by the dedicated minds at Together AI comes to light—a solution that holds the promise of reshaping the very fabric of language processing. This innovation has profound implications, especially in tasks requiring an acute grasp of extended contextual nuances. Contemporary natural language processing techniques rely heavily on

A Practical Introduction to LLMs

25 Aug 2023

towardsdatascience.com

3 levels of using LLMs in practice

Meet Chroma: An AI-Native Open-Source Vector Database For LLMs: A Faster Wa

20 Aug 2023

marktechpost.com

Word embedding vector databases have become increasingly popular due to the proliferation of massive language models. Using the power of sophisticated machine learning techniques, data is stored in a vector database. It allows for very fast similarity search, essential for many AI uses such as recommendation systems, picture recognition, and NLP. The essence of complicated data is captured in a vector database by representing each data point as a multidimensional vector. Quickly retrieving related vectors is made possible by modern indexing techniques like k-d trees and hashing. To transform big data analytics, this architecture generates highly scalable, efficient solutions for

How to Extract Text from Any PDF and Image for Large Language Model | by Zo

7 Aug 2023

towardsdatascience.com

Use these text extraction techniques to get quality data for your LLM models

Introducing OpenLLM: Open Source Library for LLMs

7 Aug 2023

kdnuggets.com

A user-friendly platform for operating large language models (LLMs) in production, with features such as fine-tuning, serving, deployment, and monitoring of any LLMs.

Abacus AI Introduces A New Open Long-Context Large Language Model LLM: Meet

7 Aug 2023

marktechpost.com

Recent language models can take long contexts as input; more is needed to know about how well they use longer contexts. Can LLMs be extended to longer contexts? This is an unanswered question. Researchers at Abacus AI conducted multiple experiments involving different schemes for developing the context length ability of Llama, which is pre-trained on context length 2048. They linear rescaled these models with IFT at scales 4 and 16. Scaling the model to scale 16 can perform world tasks up to 16k context length or even up to 20-24k context length. Different methods of extending context length are Linear

How to use LLMs for PDF parsing

6 Aug 2023

nanonets.com

Using ChatGPT & OpenAI's GPT API, this code tutorial teaches how to chat with PDFs, automate PDF tasks, and build PDF chatbots.

How to Chat With Any File from PDFs to Images Using Large Language Models —

6 Aug 2023

towardsdatascience.com

Complete guide to building an AI assistant that can answer questions about any file

How to Leverage Open-Source LLMs in Your Project

6 Aug 2023

turingpost.com

Practical Advice from Experts: Fine-Tuning, Deployment, and Best Practices

LangChain 101: Build Your Own GPT-Powered Applications

2 Aug 2023

kdnuggets.com

LangChain is a Python library that helps you build GPT-powered applications in minutes. Get started with LangChain by building a simple question-answering app.

MPT-30B: Raising the bar for open-source foundation models

28 Jul 2023

mosaicml.com

Latest blogs from the team at Mosaic Research

Midjourney pricing plans and free alternatives to try

28 Jul 2023

dataconomy.com

Navigating the maze of pricing plans for digital services can sometimes be a daunting task. Today, we are unveiling Midjourney

A Deep Dive Into LLaMA, Falcon, Llama 2 and Their Remarkable Fine-Tuned Ver

28 Jul 2023

turingpost.com

Exploring the Development of the 3 Leading Open LLMs and Their Chatbot Derivatives

Chain of Thought Prompting for LLMs

28 Jul 2023

towardsdatascience.com

A practical and simple approach for “reasoning” with LLMs

Is Anthropic's Claude 2 model ready to take down GPT-4? We put them to the

28 Jul 2023

dev.to

Anthropic released Claude 2, a new iteration of its AI model, to take on ChatGPT and Google Bard...

Emerging Architectures for LLM Applications

24 Jul 2023

a16z.com

A reference architecture for the LLM app stack. It shows the most common systems, tools, and design patterns used by AI startups and tech companies.

ELI5: FlashAttention

24 Jul 2023

gordicaleksa.medium.com

Step by step explanation of how one of the most important MLSys breakthroughs work — in gory detail.

Build Industry-Specific LLMs Using Retrieval Augmented Generation

24 Jul 2023

towardsdatascience.com

Organizations are in a race to adopt Large Language Models. Let’s dive into how you can build industry-specific LLMs Through RAG

Free Full Stack LLM Bootcamp

24 Jul 2023

kdnuggets.com

Want to learn more about LLMs and build cool LLM-powered applications? This free Full Stack LLM Bootcamp is all you need!

Edge 300: Meet Falcon LLM: The Most Powerful Open Source LLM Released to Da

24 Jul 2023

thesequence.substack.com

The model quickly top the Open LLM Leaderboard that ranks the performance of open source LLMs.

The Secret Sauce behind 100K context window in LLMs: all tricks in one plac

23 Jul 2023

blog.gopenai.com

tldr; techniques to speed up training and inference of LLMs to use large context window up to 100K input tokens during training and…

All You Need to Know to Build Your First LLM App

23 Jul 2023

towardsdatascience.com

A step-by-step tutorial to document loaders, embeddings, vector stores and prompt templates

Observe.ai unveils 30-billion-parameter contact center LLM and a generative

23 Jul 2023

venturebeat.com

The Observe.AI contact center LLM showed a 35% increase in accuracy compared to GPT-3.5 when automatically summarizing conversations.

Training LLMs with AMD MI250 GPUs and MosaicML

23 Jul 2023

mosaicml.com

With the release of PyTorch 2.0 and ROCm 5.4, we are excited to announce that LLM training works out of the box on AMD MI250 accelerators with zero code changes and at high performance!

Optimizing Memory Usage for Training LLMs and Vision Transformers in PyTorc

23 Jul 2023

lightning.ai

This article provides a series of techniques that can lower memory consumption in PyTorch (when training vision transformers and LLMs) by approximately 20x without sacrificing modeling performance and prediction accuracy.

Deploying Falcon-7B Into Production

23 Jul 2023

towardsdatascience.com

Running Falcon-7B in the cloud as a microservice

Anthropic releases Claude 2, its second-gen AI chatbot

23 Jul 2023

techcrunch.com

Anthropic, the AI startup founded by ex-OpenAI execs, has released its newest chatbot, Claude 2. It's ostensibly improved in several ways.

Google Launches AI-Powered Notes App Called NotebookLM

23 Jul 2023

tech.slashdot.org

Google is launching its AI-backed note-taking tool to "a small group of users in the US," the company said in a blog post. Formerly referred to as Project Tailwind at Google I/O earlier this year, the new app is now known as NotebookLM (the LM stands for Language Model). The Verge reports: The core...

Ecosystem Graphs for Foundation Models

23 Jul 2023

crfm.stanford.edu

Meet LMQL: An Open Source Query Language for LLMs

23 Jul 2023

thesequence.substack.com

Developed by ETH Zürich, the language explores new paradigms for LLM programming.

Leandro von Werra’s Post

23 Jul 2023

linkedin.com

It crazy how far the ML field has come when it comes to fine-tuning LLMs. A year ago: it was challenging to fine-tune GPT-2 (1.5B) on a single GPU without… | 76 comments on LinkedIn

LLaMA 2: How to access and use Meta’s versatile open-source chatbot right n

23 Jul 2023

venturebeat.com

A comprehensive guide on how to use Meta's LLaMA 2, the new open-source AI model challenging OpenAI's ChatGPT and Google's Bard.

Beyond LLaMA: The Power of Open LLMs

22 Jul 2023

towardsdatascience.com

How LLaMA is making open-source cool again

Facebook parent Meta unveils LLaMA 2 open-source AI model for commercial us

22 Jul 2023

venturebeat.com

Not only has LLaMA been trained on more data, with more parameters, the model also performs better than its predecessor, according to Meta.

MosaicML launches MPT-7B-8K, a 7B-parameter open-source LLM with 8k context

22 Jul 2023

venturebeat.com

MosaicML claims that the MPT-7B-8K LLM exhibits exceptional proficiency in summarization and answering tasks compared to previous models.

The $1 billion gamble to ensure AI doesn’t destroy humanity

22 Jul 2023

thediff.co

The founders of Anthropic quit OpenAI to make a safe AI company. It’s easier said than done.

Unraveling the Power of Chain-of-Thought Prompting in Large Language Models

12 Jul 2023

kdnuggets.com

This article delves into the concept of Chain-of-Thought (CoT) prompting, a technique that enhances the reasoning capabilities of large language models (LLMs). It discusses the principles behind CoT prompting, its application, and its impact on the performance of LLMs.

GitHub - Mooler0410/LLMsPracticalGuide: A curated list of practical guide r

12 Jul 2023

github.com

A curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers) - Mooler0410/LLMsPracticalGuide

Falcon LLM: The New King of Open-Source LLMs

19 Jun 2023

kdnuggets.com

Falcon LLM, is the new large language model that has taken the crown from LLaMA.

Introduction to the Open LLM Falcon-40B: Performance, Training Data, and Ar

19 Jun 2023

towardsdatascience.com

Get started using Falcon-7B, Falcon-40B, and their instruct versions

Meet FinGPT: An Open-Source Financial Large Language Model (LLMs)

18 Jun 2023

www-marktechpost-com.cdn.ampproject.org

Large language models have increased due to the ongoing development and advancement of artificial intelligence, which has profoundly impacted the state of natural language processing in various fields. The potential use of these models in the financial sector has sparked intense attention in light of this radical upheaval. However, constructing an effective and efficient open-source economic language model depends on gathering high-quality, pertinent, and current data. The use of language models in the financial sector exposes many barriers. These vary from challenges in getting data, maintaining various data forms and kinds, and coping with inconsistent data quality to the crucial

LMM Garden | Discover, search, and compare LLMs

9 Jun 2023

llm.garden

Welcome to the LMM garden! A searchable list of open-source and off-the-shelf LLMs available to ML practitioners. Know of a new LLM? Add it

iryna-kondr/scikit-llm

8 Jun 2023

github.com

Seamlessly integrate LLMs into scikit-learn.

The Case for Running AI on CPUs Isn’t Dead Yet

2 Jun 2023

spectrum.ieee.org

GPUs may dominate, but CPUs could be perfect for smaller AI models

The Art of Prompt Design: Prompt Boundaries and Token Healing

28 May 2023

towardsdatascience.com

Learn how standard greedy tokenization introduces a subtle and powerful bias that can have all kinds of unintended consequences.

Sonali Pattnaik on LinkedIn: #generativeai #ai | 45 comments

21 May 2023

linkedin.com

AI companies are using LangChain to supercharge their LLM apps. Here is a comprehensive guide of resources to build your LangChain + LLM journey. 🔗 What is… | 45 comments on LinkedIn

The Non-Silence of the LLMs

19 May 2023

informationisbeautiful.net

AI is getting very chatty! Here’s a visualisation charting the rise of Large Language Models like GPT4, LaMDA, LLaMa, PaLM and their bots...

Super Bard: The AI That Can Do It All and Better

19 May 2023

kdnuggets.com

A new AI Bard powered by PaLM V2 that can write, translate, and code better than ChatGPT.

Edge 291: Reinforcement Learning with Human Feedback

18 May 2023

thesequence.substack.com

1) Reinforcement Learning with Human Feedback(RLHF) 2) The RLHF paper, 3) The transformer reinforcement learning framework.

Google dives into the ‘supercomputer’ game by knitting together purpose-bui

12 May 2023

venturebeat.com

Google's new machines combine Nvidia H100 GPUs with Google’s high-speed interconnections for AI tasks like training very large language models.

Distilling Step-by-Step! Outperforming Larger Language Models with...

5 May 2023

arxiv.org

Deploying large language models (LLMs) is challenging because they are memory inefficient and compute-intensive for practical applications. In reaction, researchers train smaller task-specific...

SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot

5 May 2023

arxiv.org

We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of...

openlm-research/open_llama: OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset

5 May 2023

github.com

OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset - openlm-research/open_llama

guidance-ai/guidance: A guidance language for controlling large language models.

3 May 2023

github.com

A guidance language for controlling large language models. - guidance-ai/guidance

Blog | Anyscale

29 Apr 2023

anyscale.com

Anyscale is the leading AI application platform. With Anyscale, developers can build, run and scale AI applications instantly.

Parameter-Efficient LLM Finetuning With Low-Rank Adaptation (LoRA)

29 Apr 2023

sebastianraschka.com

In the rapidly evolving field of AI, using large language models in an efficient and effective manner is becoming more and more important. In this article, y...

Edge 286: Vicuna, the LLaMA-Based Model that Matches ChatGPT Performance

29 Apr 2023

thesequence.substack.com

Created by researchers from UC Berkeley, CMU, Stanford, and UC San Diego, Vicuna is part of the new wave of models that use Meta's LLaMA as its foundation.

Grounding Large Language Models in a Cognitive Foundation: How to Build Som

26 Apr 2023

thegradient.pub

Many intelligent robots have come and gone, failing to become a commercial success. We’ve lost Aibo, Romo, Jibo, Baxter—even Alexa is reducing staff. Perhaps they failed to reach their potential because you can’t have a meaningful conversation with them. We are now at an inflection point: AI

Data Machina #198

25 Apr 2023

datamachina.substack.com

Your own LLM. MiniGPT-4. WebGPT on WebGPU. Transformers from scratch. ChatGTP Plugins demo live. Whisper JAX. LLaVA. MetaAI DINO SoTA Computer Vision. Autonomous agents in LangChain. RedPajama.

Finetuning Large Language Models

25 Apr 2023

magazine.sebastianraschka.com

An introduction to the core ideas and approaches

The LLama Effect: How an Accidental Leak Sparked a Series of Impressive Ope

21 Apr 2023

thesequence.substack.com

Sundays, The Sequence Scope brings a summary of the most important research papers, technology releases and VC funding deals in the artificial intelligence space.

Stanford CRFM

21 Apr 2023

crfm.stanford.edu

Meta has built a massive new language AI—and it’s giving it away for free

21 Apr 2023

technologyreview.com

Facebook’s parent company is inviting researchers to pore over and pick apart the flaws in its version of GPT-3

Eight Things to Know about Large Language Models

21 Apr 2023

arxiv.org

The widespread public deployment of large language models (LLMs) in recent months has prompted a wave of new attention and engagement from advocates, policymakers, and scholars from many fields....

Baby AGI: The Birth of a Fully Autonomous AI

19 Apr 2023

kdnuggets.com

Introducing the new fully autonomous task manager that can create, track and prioritize your company's projects using artificial intelligence.

Hacker News

19 Apr 2023

magazine.sebastianraschka.com

A Cross-Section of the Most Relevant Literature To Get Up to Speed

📝 Guest Post: How to Enhance the Usefulness of Large Language Models*

17 Apr 2023

thesequence.substack.com

In this guest post, Filip Haltmayer, a Software Engineer at Zilliz, explains how LangChain and Milvus can enhance the usefulness of Large Language Models (LLMs) by allowing for the storage and retrieval of relevant documents. By integrating Milvus, a vector database, with LangChain, LLMs can process more tokens and improve their conversational abilities.

Prompt Engineering

14 Apr 2023

lilianweng.github.io

Prompt Engineering, also known as In-Context Prompting, refers to methods for how to communicate with LLM to steer its behavior for desired outcomes without updating the model weights. It is an empirical science and the effect of prompt engineering methods can vary a lot among models, thus requiring heavy experimentation and heuristics. This post only focuses on prompt engineering for autoregressive language models, so nothing with Cloze tests, image generation or multimodality models.

A Survey of Large Language Models

14 Apr 2023

arxiv.org

Language is essentially a complex, intricate system of human expressions governed by grammatical rules. It poses a significant challenge to develop capable AI algorithms for comprehending and...

New Ebook: A Beginner’s Guide to Large Language Models

14 Apr 2023

nvidia.com

Explore what LLMs are, how they work, and gain insights into real-world examples, use cases, and best practices.

Maximizing the Potential of LLMs: A Guide to Prompt Engineering

13 Apr 2023

ruxu.dev

The Magic of LLMs — Prompt Engineering

13 Apr 2023

towardsdatascience.com

Garbage in, garbage out has never been more true.

📝 Guest Post: Caching LLM Queries for Improved Performance and Cost Savings*

12 Apr 2023

thesequence.substack.com

If you're looking for a way to improve the performance of your large language model (LLM) application while reducing costs, consider utilizing a semantic cache to store LLM responses.

StackLLaMA: A hands-on guide to train LLaMA with RLHF

8 Apr 2023

huggingface.co

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

OpenAI Platform

10 Feb 2023

platform.openai.com

Explore developer resources, tutorials, API docs, and dynamic examples to get the most out of OpenAI's platform.

FlashSigmoid: A Hardware-Aware and Memory-Efficient Implementation of Sigmoid Attention Yielding a 1

24 Sep 2014

marktechpost.com

Large Language Models (LLMs) have gained significant prominence in modern machine learning, largely due to the attention mechanism. This mechanism employs a sequence-to-sequence mapping to construct context-aware token representations. Traditionally, attention relies on the softmax function (SoftmaxAttn) to generate token representations as data-dependent convex combinations of values. However, despite its widespread adoption and effectiveness, SoftmaxAttn faces several challenges. One key issue is the tendency of the softmax function to concentrate attention on a limited number of features, potentially overlooking other informative aspects of the input data. Also, the application of SoftmaxAttn necessitates a row-wise reduction along the input sequence length,

Graphiti: A Python Library for Building Temporal Knowledge Graphs Using LLMs

24 Sep 2014

marktechpost.com

The challenge of managing and recalling facts from complex, evolving conversations is a key problem for many AI-driven applications. As information grows and changes over time, maintaining accurate context becomes increasingly difficult. Current systems often struggle to handle the evolving nature of relationships and facts, leading to incomplete or irrelevant results when retrieving information. This can affect the effectiveness of AI agents, especially when dealing with user memories and context in real-time applications. Some existing solutions have attempted to address this problem. One common approach is using a Retrieval-Augmented Generation (RAG) pipeline, which involves storing extracted facts and using techniques

Top 9 Different Types of Retrieval-Augmented Generation (RAGs)

24 Sep 2014

marktechpost.com

Retrieval-Augmented Generation (RAG) is a machine learning framework that combines the advantages of both retrieval-based and generation-based models. The RAG framework is highly regarded for its ability to handle large amounts of information and produce coherent, contextually accurate responses. It leverages external data sources by retrieving relevant documents or facts and then generating an answer or output based on the retrieved information and the user query. This blend of retrieval and generation leads to better-informed outputs that are more accurate and comprehensive than models that rely solely on generation. The evolution of RAG has led to various types and approaches,

Building a Simple RAG Application Using LlamaIndex - MachineLearningMastery.com

24 Aug 2014

machinelearningmastery.com

[caption align=

LlamaIndex : LlamaIndex

24 Sep 2009

docs.llamaindex.ai

Integrating LLMs with Scikit-Learn Using Scikit-LLM

24 Oct 2007

kdnuggets.com

Combining LLM reasoning for text-based models in Scikit-Learn.

Why GPU Utilization Falls Short: Understanding Streaming Multiprocessor (SM) Efficiency for Better L

24 Sep 2003

marktechpost.com

Large Language Models (LLMs) have gained significant prominence in recent years, driving the need for efficient GPU utilization in machine learning tasks. However, researchers face a critical challenge in accurately assessing GPU performance. The commonly used metric, GPU Utilization, accessed through nvidia-smi or integrated observability tools, has proven to be an unreliable indicator of actual computational efficiency. Surprisingly, 100% GPU utilization can be achieved merely by reading and writing to memory without performing any computations. This revelation has sparked a reevaluation of performance metrics and methodologies in the field of machine learning, prompting researchers to seek more accurate ways to

LightLLM: A Lightweight Scalable and High-Speed Python Framework for LLM Inference and Serving

24 Oct 2002

marktechpost.com

Large language models (LLMs) have advanced significantly in recent years. However, its real-world applications are restricted due to substantial processing power and memory requirements. The need to make LLMs more accessible on smaller and resource-limited devices drives the development of more efficient frameworks for model inference and deployment. Existing methods for running LLMs include hardware acceleration techniques and optimizations like quantization and pruning. However, these methods often fail to provide a balance between model size, performance, and usability in constrained environments. Researchers developed an efficient, scalable, and lightweight framework for LLM inference, LightLLM, to address the challenge of efficiently deploying

Nvidia just dropped a bombshell: Its new AI model is open massive and ready to rival GPT-4

24 Oct 2002

venturebeat.com

Nvidia has released NVLM 1.0, a powerful open-source AI model that rivals GPT-4 and Google’s systems, marking a major breakthrough in multimodal language models for vision and text tasks.

Ten Effective Strategies to Lower Large Language Model (LLM) Inference Costs

24 Oct 2001

marktechpost.com

Large Language Models (LLMs) have become a cornerstone in artificial intelligence, powering everything from chatbots and virtual assistants to advanced text generation and translation systems. Despite their prowess, one of the most pressing challenges associated with these models is the high cost of inference. This cost includes computational resources, time, energy consumption, and hardware wear. Optimizing these costs is paramount for businesses and researchers aiming to scale their AI operations without breaking the bank. Here are ten proven strategies to reduce LLM inference costs while maintaining performance and accuracy: Quantization Quantization is a technique that decreases the precision of model

llms — my Raindrop.io articles