cover image

Google’s active learning method fine-tunes LLMs with 10,000x less data using high-fidelity expert-labeled examples

Thinking about High-Quality Human Data | Lil'Log
22 Feb 2024
lilianweng.github.io

[Special thank you to Ian Kivlichan for many useful pointers (E.g. the 100+ year old Nature paper “Vox populi”) and nice feedback. 🙏 ] High-quality data is the fuel for modern data deep learning model training. Most of the task-specific labeled data comes from human annotation, such as classification task or RLHF labeling (which can be constructed as classification format) for LLM alignment training. Lots of ML techniques in the post can help with data quality, but fundamentally human data collection involves attention to details and careful execution.

cover image

Python library for computer vision labeling tasks. The core functionality is to translate bounding box annotations between different formats-for example, from coco to yolo. - GitHub - pylabel-proj...

cover image

If an AI model can make decisions on the company’s behalf through products and services, that model is essentially their competitive edge.

cover image
Add Labels to a Dataset for Sentiment Analysis
28 Nov 2021
thecleverprogrammer.com

In this article, I will present a tutorial on how to add labels to a dataset for sentiment analysis using Python. Adding labels to a dataset.

cover image

How does Semi-Supervised Machine Learning work, and how to use it in Python?

cover image

One of the best labelling tools I have ever used.

cover image
Layered Label Propagation Algorithm
19 Apr 2020
towardsdatascience.com

An algorithm for community finding

cover image

Learn about different types of annotations, annotation formats and annotation tools

cover image
Using Snorkel For Multi-Label Annotation.
18 Mar 2020
towardsdatascience.com

How to use snorkel’s multi-class implementation to create multi-labels

cover image
How to Label 1M Data Points per Week
14 Dec 2019
scale.com
cover image

157 votes, 15 comments. Hi, Reddit. I'm excited to share confident learning for characterizing, finding, and learning with label errors in datasets…