reinforcement-learning

cover image

A lot has happened this month, especially with the releases of new flagship models like GPT-4.5 and Llama 4. But you might have noticed that reactions to these releases were relatively muted. Why? One reason could be that GPT-4.5 and Llama 4 remain conventional models, which means they were trained without explicit reinforcement learning for reasoning. However, OpenAI's recent release of the o3 reasoning model demonstrates there is still considerable room for improvement when investing compute strategically, specifically via reinforcement learning methods tailored for reasoning tasks. While reasoning alone isn't a silver bullet, it reliably improves model accuracy and problem-solving capabilities on challenging tasks (so far). And I expect reasoning-focused post-training to become standard practice in future LLM pipelines. So, in this article, let's explore the latest developments in reasoning via reinforcement learning.

cover image

Q-learning is a model-free reinforcement learning algorithm that enables agents to learn optimal actions through interaction with their environment.

cover image

This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning." - MathFoundationRL/Book-Mathematical-Foundation-of-Reinforcement-Learning

cover image

The Reinforcement Learning from Human Feedback Book

cover image

Teaching a Car to Cross a Mountain using Policy Gradient Methods in Python: A Mathematical Deep Dive into Reinforcement Learning

cover image

Reinforcement learning (RL) is a fascinating field of AI focused on training agents to make decisions by interacting with an environment and learning from rewards and penalties. RL differs from supervised learning because it involves doing rather than learning from a static dataset. Let’s delve into the core principles of RL and explore its applications in game playing, robot control, and resource management. Principles of Reinforcement Learning Agent and Environment: In RL, the agent is the learner or decision-maker interacting with the environment. The environment provides context to the agent, affecting its decisions and providing feedback through rewards or penalties.

cover image

1) Reinforcement Learning with Human Feedback(RLHF) 2) The RLHF paper, 3) The transformer reinforcement learning framework.

cover image

How reinforcement learning with human feedback helps ensure that businesses are building ethical generative AI models.

cover image

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

cover image

Study tries to settle a bitter disagreement over Google’s chip design AI

cover image

Music recommender systems are an integral part of our daily life. Recent research has seen a significant effort around black-box recommender based approaches such as Deep Reinforcement Learning...

cover image

Introduction to reinforcement learning terminologies, basics, and concepts (model-free, model-based, online, offline RL)

cover image

One of the biggest barriers to traditional machine learning is that most supervised and unsupervised machine learning algorithms need huge amounts of data to be useful in real world use cases. Even…

cover image

We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board...

cover image

based on "Hands-On Machine Learning with Scikit-Learn & TensorFlow" (O'Reilly, Aurelien Geron) - bjpcjp/scikit-and-tensorflow-workbooks

cover image

Reinforcement learning (RL) is a powerful type of AI technology that can learn strategies to optimally control large, complex systems.

cover image

A Gentle Guide to the REINFORCE algorithm, in Plain English

cover image

This is part 7 of the RL tutorial series that will provide an overview of the book “Reinforcement Learning: An Introduction. Second…

cover image

This article was written by Steeve Huang.   Reinforcement Learning (RL) refers to a kind of Machine Learning method in which the agent receives a delayed reward in the next time step to evaluate its previous action. It was mostly used in games (e.g. Atari, Mario), with performance on par with or even exceeding humans. Recently,… Read More »Introduction to Various Reinforcement Learning Algorithms

cover image

Reinforcement Learning (RL) refers to a kind of Machine Learning method in which the agent receives a delayed reward in the next time step…

cover image

I recently became interested in learning more about deep reinforcement learning. Recently, big news headlines were made as deep…