reinforcement-learning

Multimodal reinforcement learning with agentic verifier for AI agents - Microsoft Research

20 Jan 2026

microsoft.com

Argos improves multimodal RL by evaluating whether an agent’s reasoning aligns with what it observes over time. The approach reduces visual hallucinations and produces more reliable, data-efficient agents for real-world applications:

The State of Reinforcement Learning for LLM Reasoning

31 Dec 2025

magazine.sebastianraschka.com

Understanding GRPO and New Insights from Reasoning Model Papers

A Gentle Introduction to Q-Learning

13 Aug 2025

machinelearningmastery.com

Get a gentle introduction to one of the most widely used reinforcement learning algorithms to learn optimal courses of action through trial and error.

AI-Driven Antitrust and Competition Law: Algorithmic Collusion, Self-Learning Pricing Tools, and Legal Challenges in the US and EU

10 Aug 2025

marktechpost.com

AI algorithmic collusion challenges traditional antitrust law. Legal AI frameworks adapt to prosecute self-learning pricing tools

The State of Reinforcement Learning for LLM Reasoning

20 Apr 2025

sebastianraschka.com

A lot has happened this month, especially with the releases of new flagship models like GPT-4.5 and Llama 4. But you might have noticed that reactions to these releases were relatively muted. Why? One reason could be that GPT-4.5 and Llama 4 remain conventional models, which means they were trained without explicit reinforcement learning for reasoning. However, OpenAI's recent release of the o3 reasoning model demonstrates there is still considerable room for improvement when investing compute strategically, specifically via reinforcement learning methods tailored for reasoning tasks. While reasoning alone isn't a silver bullet, it reliably improves model accuracy and problem-solving capabilities on challenging tasks (so far). And I expect reasoning-focused post-training to become standard practice in future LLM pipelines. So, in this article, let's explore the latest developments in reasoning via reinforcement learning.

What is Q-learning? - Dataconomy

28 Mar 2025

dataconomy.com

Q-learning is a model-free reinforcement learning algorithm that enables agents to learn optimal actions through interaction with their environment.

GitHub - MathFoundationRL/Book-Mathematical-Foundation-of-Reinforcement-Learning: This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."

11 Mar 2025

github.com

This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning." - MathFoundationRL/Book-Mathematical-Foundation-of-Reinforcement-Learning

(WIP) A Little Bit of Reinforcement Learning from Human Feedback

2 Feb 2025

rlhfbook.com

The Reinforcement Learning from Human Feedback Book

Policy Gradient Methods in Reinforcement Learning

30 May 2024

towardsdatascience.com

Teaching a Car to Cross a Mountain using Policy Gradient Methods in Python: A Mathematical Deep Dive into Reinforcement Learning

Reinforcement Learning: Training AI Agents Through Rewards and Penalties

7 May 2024

marktechpost.com

Reinforcement learning (RL) is a fascinating field of AI focused on training agents to make decisions by interacting with an environment and learning from rewards and penalties. RL differs from supervised learning because it involves doing rather than learning from a static dataset. Let’s delve into the core principles of RL and explore its applications in game playing, robot control, and resource management. Principles of Reinforcement Learning Agent and Environment: In RL, the agent is the learner or decision-maker interacting with the environment. The environment provides context to the agent, affecting its decisions and providing feedback through rewards or penalties.

Edge 291: Reinforcement Learning with Human Feedback

18 May 2023

thesequence.substack.com

1) Reinforcement Learning with Human Feedback(RLHF) 2) The RLHF paper, 3) The transformer reinforcement learning framework.

How reinforcement learning with human feedback is unlocking the power of ge

25 Apr 2023

venturebeat.com

How reinforcement learning with human feedback helps ensure that businesses are building ethical generative AI models.

Ending an Ugly Chapter in Chip Design

6 Apr 2023

spectrum.ieee.org

Study tries to settle a bitter disagreement over Google’s chip design AI

Why People Skip Music? On Predicting Music Skips using Deep...

9 Feb 2023

arxiv.org

Music recommender systems are an integral part of our daily life. Recent research has seen a significant effort around black-box recommender based approaches such as Deep Reinforcement Learning...

6 Reinforcement Learning Algorithms Explained

28 Nov 2022

towardsdatascience.com

Introduction to reinforcement learning terminologies, basics, and concepts (model-free, model-based, online, offline RL)

Hands on introduction to reinforcement learning in Python

18 Jul 2022

towardsdatascience.com

One of the biggest barriers to traditional machine learning is that most supervised and unsupervised machine learning algorithms need huge amounts of data to be useful in real world use cases. Even…

Mastering the Game of Stratego with Model-Free Multiagent...

11 Jul 2022

arxiv.org

We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board...

scikit-and-tensorflow-workbooks/ch16-reinforcement-learning.ipynb at master · bjpcjp/scikit-and-tensorflow-workbooks

16 Jan 2022

github.com

based on "Hands-On Machine Learning with Scikit-Learn & TensorFlow" (O'Reilly, Aurelien Geron) - bjpcjp/scikit-and-tensorflow-workbooks

Reinforcement learning: The next great AI tech moving from the lab to the r

30 Mar 2021

venturebeat.com

Reinforcement learning (RL) is a powerful type of AI technology that can learn strategies to optimally control large, complex systems.

Reinforcement Learning Explained Visually (Part 6): Policy Gradients, step-

15 Jan 2021

towardsdatascience.com

A Gentle Guide to the REINFORCE algorithm, in Plain English

N-step Bootstrapping. This is part 7 of the RL tutorial… | by Sagi Shaier |

29 Nov 2020

towardsdatascience.com

This is part 7 of the RL tutorial series that will provide an overview of the book “Reinforcement Learning: An Introduction. Second…

Introduction to Various Reinforcement Learning Algorithms - DataScienceCentral.com

19 Feb 2020

datasciencecentral.com

This article was written by Steeve Huang. Reinforcement Learning (RL) refers to a kind of Machine Learning method in which the agent receives a delayed reward in the next time step to evaluate its previous action. It was mostly used in games (e.g. Atari, Mario), with performance on par with or even exceeding humans. Recently,… Read More »Introduction to Various Reinforcement Learning Algorithms