nlp | The Mud Dauber Chronicles

What exactly does word2vec learn?

7 Nov 2025

bair.berkeley.edu

The BAIR Blog

5 Cutting-Edge Natural Language Processing Trends Shaping 2026

24 Sep 2025

kdnuggets.com

In this article, we discuss five cutting-edge NLP trends that will shape 2026.

How to Build a Complete End-to-End NLP Pipeline with Gensim: Topic Modeling, Word Embeddings, Semantic Search, and Advanced Text Analysis

5 Sep 2025

marktechpost.com

explosion/spaCy: 💫 Industrial-strength Natural Language Processing (NLP) in Python

27 Aug 2025

github.com

💫 Industrial-strength Natural Language Processing (NLP) in Python - explosion/spaCy

Where Does Meaning Live in a Sentence? Math Might Tell Us. | Quanta Magazine

9 Apr 2025

quantamagazine.org

The mathematician Tai-Danae Bradley is using category theory to try to understand both human and AI-generated language.

What is METEOR score? - Dataconomy

2 Apr 2025

dataconomy.com

METEOR Score is a metric used to evaluate the quality of machine translation based on precision, recall, word alignment, and linguistic flexibility.

What is stemming? - Dataconomy

13 Mar 2025

dataconomy.com

Stemming is the linguistic process of reducing words to their base form, ignoring prefixes and suffixes, to enhance clarity and information retrieval.

A Step by Step Guide to Build a Trend Finder Tool with Python: Web Scraping, NLP (Sentiment Analysis & Topic Modeling), and Word Cloud Visualization

9 Mar 2025

marktechpost.com

A Step by Step Guide to Build a Trend Finder Tool with Python: Web Scraping, NLP (Sentiment Analysis & Topic Modeling), and Word Cloud Visualization

Text Preprocessing and Feature Engineering with spaCy

12 Feb 2025

statology.org

In this article, we’ll focus on how to prepare text data for machine learning and statistical modeling using spaCy.

Implementing A Byte Pair Encoding (BPE) Tokenizer From Scratch

18 Jan 2025

sebastianraschka.com

This is a standalone notebook implementing the popular byte pair encoding (BPE) tokenization algorithm, which is used in models like GPT-2 to GPT-4, Llama 3,...

The 2025 AI Engineering Reading List

14 Jan 2025

latent.space

We picked 50 paper/models/blogs across 10 fields in AI Eng: LLMs, Benchmarks, Prompting, RAG, Agents, CodeGen, Vision, Voice, Diffusion, Finetuning. If you're starting from scratch, start here.

Understanding the BM25 full text search algorithm

20 Nov 2024

emschwartz.me

BM25 is a widely used algorithm for full text search. I wanted to understand how it works, so here is my attempt at understanding by re-explaining.

Hugging Face Releases Sentence Transformers v3.3.0: A Major Leap for NLP Efficiency

11 Nov 2024

marktechpost.com

Natural Language Processing (NLP) has rapidly evolved in the last few years, with transformers emerging as a game-changing innovation. Yet, there are still notable challenges when using NLP tools to develop applications for tasks like semantic search, question answering, or document embedding. One key issue has been the need for models that not only perform well but also work efficiently on a range of devices, especially those with limited computational resources, such as CPUs. Models tend to require substantial processing power to yield high accuracy, and this trade-off often leaves developers choosing between performance and practicality. Additionally, deploying large models

Python libs for sentiment analysis

11 Nov 2024

marktechpost.com

Sentiment analysis, i.e., determining the emotional tone of a text, has become a crucial tool for researchers, developers, and businesses to comprehend social media trends, consumer feedback, and other topics. With its robust library ecosystem, Python provides a vast choice of tools to improve and streamline sentiment analysis processes. Through the use of these libraries, data scientists can easily create precise sentiment models using pre-trained models and sophisticated machine learning frameworks. In this post, the top 12 Python sentiment analysis libraries have been discussed, emphasizing their salient characteristics, advantages, and uses. TextBlob A popular Python sentiment analysis toolkit, TextBlob is

CS388: Natural Language Processing

12 May 2024

cs.utexas.edu

Cleaning

11 May 2024

docs.unstructured.io

As part of data preparation for an NLP model, it’s common to need to clean up your data prior to passing it into the model. If there’s unwanted content in your output, for example, it could impact the quality of your NLP model. To help with this, the `unstructured` library includes cleaning functions to help users sanitize output before sending it to downstream applications.

Unlocking the Best Tokenization Strategies: How Greedy Inference and SaGe L

18 Mar 2024

marktechpost.com

The inference method is crucial for NLP models in subword tokenization. Methods like BPE, WordPiece, and UnigramLM offer distinct mappings, but their performance differences must be better understood. Implementations like Huggingface Tokenizers often need to be clearer or limit inference choices, complicating compatibility with vocabulary learning algorithms. Whether a matching inference method is necessary or optimal for tokenizer vocabularies is uncertain. Previous research focused on developing vocabulary construction algorithms such as BPE, WordPiece, and UnigramLM, exploring optimal vocabulary size and multilingual vocabularies. Some studies examined the effects of vocabularies on downstream performance, information theory, and cognitive plausibility. Limited work on

Speech and Language Processing

12 Mar 2024

web.stanford.edu

Beyond Self-Attention: How a Small Language Model Predicts the Next Token

22 Feb 2024

shyam.blog

A deep dive into the internals of a small transformer model to learn how it turns self-attention calculations into accurate predictions for the next token.

Primers • AI

24 Sep 2023

aman.ai

Aman's AI Journal | Course notes and learning material for Artificial Intelligence and Deep Learning Stanford classes.

A Taxonomy of Natural Language Processing

24 Sep 2023

towardsdatascience.com

An overview of different fields of study and recent developments in NLP

Introduction to Vector Similarity Search

14 Jul 2023

zilliz.com

Learn what vector search is and the metrics pertinent to decide the distance (or similarity) between objects.

4 Ways to Do Question Answering in LangChain

14 Apr 2023

towardsdatascience.com

Chat with your long PDF docs: load_qa_chain, RetrievalQA, VectorstoreIndexCreator, ConversationalRetrievalChain

Hacker News

31 Mar 2023

johanwind.github.io

I explain what is so unique about the RWKV language model.

Dalai

15 Mar 2023

cocktailpeanut.github.io

Dead simple way to run LLaMA on your computer

Meta unveils a new large language model that can run on a single GPU

25 Feb 2023

arstechnica.com

LLaMA-13B reportedly outperforms ChatGPT-like tech despite being 10x smaller.

How to Generate GPT Output in JSON Format for Ruby developers

14 Jan 2023

dev.to

I was playing around with OpenAI (GPT-3) today, building a reasonably complicated email parser for a...

Dense Vectors | Pinecone

28 Dec 2022

pinecone.io

ChatGPT and the Imagenet moment — Benedict Evans

16 Dec 2022

ben-evans.com

The wave of enthusiasm around generative networks feels like another Imagenet moment - a step change in what ‘AI’ can do that could generalise far beyond the cool demos. What can it create, and where are the humans in the loop?

2212.03551.pdf

11 Dec 2022

arxiv.org

AI Homework – Stratechery by Ben Thompson

7 Dec 2022

stratechery.com

The first obvious casualty of large language models is homework: the real training for everyone, though, and the best way to leverage AI, will be in verifying and editing information.

Beginner’s Guide to Diffusion Models

7 Dec 2022

towardsdatascience.com

An intuitive understanding of how AI-generated art is made by Stable Diffusion, Midjourney, or DALL-E

What do countries talk about at the UN General Debate — Topic modelings using LDA.

23 Nov 2022

towardsdatascience.com

The intuition behind LDA and its limitations, along with python implementation using Gensim.

5 Linguistics Courses for NLP Practitioners

21 Nov 2022

kdnuggets.com

This collection of 5 courses is intended to help NLP practitioners or hopefuls acquire some of their lacking linguistics knowledge.

Converting Text Documents to Token Counts with CountVectorizer

19 Oct 2022

kdnuggets.com

The post explains the significance of CountVectorizer and demonstrates its implementation with Python code.

Interview: Why Mastering Language Is So Difficult for AI

17 Oct 2022

undark.org

Scientist Gary Marcus argues that “deep learning” is not the only path to true artificial intelligence.

Topic Modeling with LSA, pLSA, LDA, NMF, BERTopic, Top2Vec: a Comparison

14 Oct 2022

towardsdatascience.com

A comparison between different topic modeling strategies including practical Python examples

7 spaCy Features To Boost Your NLP Pipelines And Save Time

24 Aug 2022

towardsdatascience.com

I’ve never used spaCy beyond simple named entity recognition tasks. Boy was I wrong.

Visualizing Part-of-Speech Tags with NLTK and SpaCy

19 Aug 2022

towardsdatascience.com

Customizing displaCy’s entity visualizer

Minerva: Solving Quantitative Reasoning Problems with Language Models

5 Jul 2022

ai.googleblog.com

Posted by Ethan Dyer and Guy Gur-Ari, Research Scientists, Google Research, Blueshift Team Language models have demonstrated remarkable performance...

Generating Children's Stories Using GPT-3 and DALL·E

5 Jul 2022

surgehq.ai

We used GPT-3 and DALL·E to generate a children's storybook about Ash and Pikachu vs. Team Rocket. Read the story and marvel at the AI-generated visuals!

GitHub Copilot · Your AI pair programmer

22 Jun 2022

github.com

GitHub Copilot works alongside you directly in your editor, suggesting whole lines or entire functions for you.

snakers4/silero-models: Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple

21 Jun 2022

github.com

Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple - snakers4/silero-models

YouTube

27 May 2022

t.co

Share your videos with friends, family, and the world

Natural Language Processing with Transformers Book

23 Mar 2022

transformersbook.com

AI Virtual Assistant Technology Guide 2022

21 Mar 2022

dev.to

They can help you get an appointment or order a pizza, find the best ticket deals and bring your...

Text Summarization with NLP: TextRank vs Seq2Seq vs BART

17 Mar 2022

towardsdatascience.com

Natural Language Processing with Python, Gensim, Tensorflow, Transformers

Topic Modeling in Python | Toptal

11 Feb 2022

toptal.com

Topic modeling can bring NLP to the next level. Here’s how.

NLP_workshop/NLP_demo.ipynb at master · bjpcjp/NLP_workshop

17 Jan 2022

github.com

NLP demo code, largely based on https://github.com/hundredblocks/concrete_NLP_tutorial - bjpcjp/NLP_workshop

bjpcjp/gensim-lessons

16 Jan 2022

github.com

Working files for gensim NLP tutorials.

spaCy_hello_world/spaCy_101.ipynb at master · bjpcjp/spaCy_hello_world

16 Jan 2022

github.com

Beginner's tour of spaCy v2.0.

Add Labels to a Dataset for Sentiment Analysis

28 Nov 2021

thecleverprogrammer.com

In this article, I will present a tutorial on how to add labels to a dataset for sentiment analysis using Python. Adding labels to a dataset.

INTRODUCTION TO SPACY 3 — Introduction to spaCy 3

1 Oct 2021

spacy.pythonhumanities.com

Mastering spaCy | Data | eBook

7 Sep 2021

packtpub.com

An end-to-end practical guide to implementing NLP applications using the Python ecosystem. 1 customer review. Instant delivery. Top rated Mobile Application Development products.

Tokenization Algorithms Explained

5 Aug 2021

towardsdatascience.com

A one-stop-shop for all your tokenization needs

Semantic Search: Measuring Meaning From Jaccard to Bert

3 Jul 2021

pinecone.io

Similarity search is one of the fastest-growing domains in AI and machine learning. At its core, it is the process of matching relevant pieces of information together.

Wu Dao 2.0: A Monster of 1.75 Trillion Parameters | by Alberto Romero | Med

7 Jun 2021

towardsdatascience.com

BAAI conference presented Wu Dao 2.0. The most powerful AI to date.

Sentiment Analysis — Comparing 3 Common Approaches: Naive Bayes, LSTM, and

31 May 2021

towardsdatascience.com

Sentiment Analysis, or Opinion Mining, is a subfield of NLP (Natural Language Processing) that aims to extract attitudes, appraisals, opinions, and emotions from text. Inspired by the rapid migration…

Redirect

31 May 2021

fast.ai

http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/

29 May 2021

wildml.com

Attention and Memory in Deep Learning and NLP – WildML

29 May 2021

wildml.com

Understanding Transformers, the machine learning model behind GPT-3

22 May 2021

thenextweb.com

How this novel neural network architecture changes the way we analyze complex data types, and powers revolutionary models like GPT-3 and BERT.

Language models like GPT-3 could herald a new type of search engine

18 May 2021

technologyreview.com

The way we search online hasn’t changed in decades. A new idea from Google researchers could make it more like talking to a human expert

How Transformers work in deep learning and NLP: an intuitive introduction | AI Summer

18 May 2021

theaisummer.com

An intuitive understanding on Transformers and how they are used in Machine Translation. After analyzing all subcomponents one by one such as self-attention and positional encodings , we explain the principles behind the Encoder and Decoder and why Transformers work so well

Deploy an NLP pipeline. Flask Heroku Bert.

1 May 2021

towardsdatascience.com

A simple quick solution for deploying an NLP project and challenges you may faced during the process.

Arabic NLP: Unique Challenges and Their Solutions

8 Apr 2021

towardsdatascience.com

Pre-processing Arabic text for machine-learning using the camel-tools Python package

Foundations of NLP Explained Visually: Beam Search, How it Works | by Ketan

4 Apr 2021

towardsdatascience.com

A Gentle Guide to how Beam Search enhances predictions, in Plain English

State of the art NLP at scale with RAPIDS, HuggingFace and Dask

4 Apr 2021

medium.com

See how to build end-to-end NLP pipelines in a fast and scalable way on GPUs — from feature engineering to inference.

OpenAI’s text-generating system GPT-3 is now spewing out 4.5 billion words

30 Mar 2021

theverge.com

Text-generation is the next big thing in AI.

Using spaCy 3.0 to build a custom NER model

19 Mar 2021

gist.github.com

blog.md · GitHub

GPT-3: We’re at the very beginning of a new app ecosystem

27 Feb 2021

venturebeat.com

The NLP application ecosystem is in its earliest stages, and it's not yet clear whether GPT-3 or a different model will be the foundation.

How to Tell Stories with Sentiment Analysis

10 Feb 2021

towardsdatascience.com

A journalist’s attempt at introducing math to the newsroom while analyzing QAnon

10 NLP Terms Every Data Scientist Should Know

7 Feb 2021

towardsdatascience.com

Knowing the terminology is essential to understanding any tutorial.

Achieving High-Quality Search and Recommendation Results with DeepNLP

5 Feb 2021

developer.nvidia.com

Speech and natural language processing (NLP) have become the foundation for most of the AI development in the enterprise today, as textual data represents a significant portion of unstructured content.

12 Twitter Sentiment Analysis Algorithms Compared

1 Feb 2021

towardsdatascience.com

12 sentiment analysis algorithms were compared on the accuracy of tweet classification. The fasText deep learning system was the winner.

Release v3.0.0: Transformer-based pipelines, new training system, project t

1 Feb 2021

github.com

📣 NEW: Want to make the transition from spaCy v2 to spaCy v3 as smooth as possible for you and your organization? We're now offering commercial migration support for your spaCy pipelines! We&#3...

Cross-Topic Argument Mining: Learning How to Classify Texts

27 Jan 2021

towardsdatascience.com

Classifying cross-topic natural language texts based on their argumentative structure using deep learning

6 NLP Techniques Every Data Scientist Should Know

21 Jan 2021

towardsdatascience.com

Towards more efficient natural language processing

Browse the State-of-the-Art in Machine Learning | Papers With Code

21 Dec 2020

paperswithcode.com

**Sentiment Analysis** is the task of classifying the polarity of a given text. For instance, a text-based tweet can be categorized into either "positive", "negative", or "neutral". Given the text and accompanying labels, a model can be trained to predict the correct sentiment. **Sentiment Analysis** techniques can be categorized into machine learning approaches, lexicon-based approaches, and even hybrid methods. Some subcategories of research in sentiment analysis include: multimodal sentiment analysis, aspect-based sentiment analysis, fine-grained opinion analysis, language specific sentiment analysis. More recently, deep learning techniques, such as RoBERTa and T5, are used to train high-performing sentiment classifiers that are evaluated using metrics like F1, recall, and precision. To evaluate sentiment analysis systems, benchmark datasets like SST, GLUE, and IMDB movie reviews are used. Further readings: - [Sentiment Analysis Based on Deep Learning: A Comparative Study](https://paperswithcode.com/paper/sentiment-analysis-based-on-deep-learning-a)

A Beginner’s Guide to Use BERT for the First Time

18 Dec 2020

towardsdatascience.com

From predicting single sentence to fine-tuning using custom dataset to finding the best hyperparameter configuration.

Linguistic Fundamentals for Natural Language Processing: 100 Essentials fro

18 Dec 2020

kdnuggets.com

Algorithms for text analytics must model how language works to incorporate meaning in language—and so do the people deploying these algorithms. Bender & Lascarides 2019 is an accessible overview of what the field of linguistics can teach NLP about how meaning is encoded in human languages.

A version of the BERT language model that’s 20 times as fast

10 Dec 2020

amazon.science

Determining the optimal architectural parameters reduces network size by 84% while improving performance on natural-language-understanding tasks.

Overview of tokenization algorithms in NLP

29 Nov 2020

towardsdatascience.com

Introduction to tokenization methods, including subword, BPE and SentencePiece

Amazon shifts some Alexa and Rekognition computing to its own Inferentia ch

13 Nov 2020

venturebeat.com

Amazon said it shifted part of the computing for its Alexa voice assistant to its own custom-designed chips.

GPT-3, transformers and the wild world of NLP

3 Nov 2020

towardsdatascience.com

A review of 20+ deep learning NLP models and how to use them well

Compare Amazon Textract with Tesseract OCR — OCR & NLP Use Case

3 Nov 2020

towardsdatascience.com

Comparison of two known engines for optical character recognition (OCR) and Naturtal Language Processing

AI devs created a lean, mean, GPT-3-beating machine that uses 99.9% fewer p

3 Nov 2020

thenextweb.com

AI researchers from the Ludwig Maximilian University (LMU) of Munich have developed a bite-sized text generator capable of besting OpenAI’s state of the art GPT-3 using only a tiny fraction of its parameters. GPT-3 is a monster of an AI sys

AI Democratization in the Era of GPT-3

3 Nov 2020

thegradient.pub

What does Microsoft getting an "exclusive license" to GPT-3 mean for the future of AI democratization?

3 Natural Language Processing Tools From AWS to Python

3 Nov 2020

dev.to

Photo by Eric Krull on Unsplash. Parsing and processing documents can provide a lot of value for alm...

Hedonometer

10 Aug 2020

hedonometer.org

Hedonometer.org is an instrument that measures the happiness of large populations in real time. The hedonometer is based on people’s online expressions, capitalizing on data-rich social media, and measures how people present themselves to the outside world.

Part 4: Semantic Segmentation

24 Jun 2020

towardsdatascience.com

Identifying Change Points in Time Series Data with FLUSS and FLOSS

Comment Ranking Algorithms: Hacker News vs. YouTube vs. Reddit

1 Jun 2020

amacfie.github.io

Short technical information about Word2Vec, GloVe and Fasttext

1 Jun 2020

towardsdatascience.com

Introduction

Evolution of Language Models: N-Grams, Word Embeddings, Attention & Transfo

20 May 2020

towardsdatascience.com

This post collates research on the advancements of Natural Language Processing (NLP) over the years.

Good Grams: How to Find Predictive N-Grams for your Problem

15 May 2020

towardsdatascience.com

Figuring out what words are predictive for your problem is easy!

Natural Language Processing Recipes: Best Practices and Examples

6 May 2020

kdnuggets.com

Here is an overview of another great natural language processing resource, this time from Microsoft, which demonstrates best practices and implementation guidelines for a variety of tasks and scenarios.

Python Libraries for Natural Language Processing - Towards Data Science

28 Apr 2020

towardsdatascience.com

An Overview Of popular python libraries for Natural Language Processing

The Cost of Training NLP Models: A Concise Overview

24 Apr 2020

arxiv.org

We review the cost of training large-scale language models, and the drivers of these costs. The intended audience includes engineers and scientists budgeting their model-training experiments, as...

Twitter Sentiment Analysis with Node.js

19 Apr 2020

towardsdatascience.com

How people are talking about your brand

Topic Modeling Articles with NMF

19 Apr 2020

towardsdatascience.com

Extracting topics is a good unsupervised data-mining technique to discover the underlying relationships between texts. There are many…

TLDR This - Article Summarizer & Online Text Summarizing Tool

1 Apr 2020

tldrthis.com

TLDR This is a Free online text summarizing tool that automatically condenses long articles, documents, essays, or papers into key summary paragraphs using state-of-the-art AI.

google-research/bert: TensorFlow code and pre-trained models for BERT

1 Apr 2020

github.com

TensorFlow code and pre-trained models for BERT.

The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning) – J

1 Apr 2020

jalammar.github.io

Discussions: Hacker News (98 points, 19 comments), Reddit r/MachineLearning (164 points, 20 comments) Translations: Chinese (Simplified), French 1, French 2, Japanese, Korean, Persian, Russian, Spanish 2021 Update: I created this brief and highly accessible video intro to BERT The year 2018 has been an inflection point for machine learning models handling text (or more accurately, Natural Language Processing or NLP for short). Our conceptual understanding of how best to represent words and sentences in a way that best captures underlying meanings and relationships is rapidly evolving. Moreover, the NLP community has been putting forward incredibly powerful components that you can freely download and use in your own models and pipelines (It’s been referred to as NLP’s ImageNet moment, referencing how years ago similar developments accelerated the development of machine learning in Computer Vision tasks).

[Project] nlp-tutoral repository who is studying NLP(Natural Language Proce

24 Mar 2020

reddit.com

219 votes, 26 comments. Hello. This is my first post in reddit I created nlp-tutoral repository who is studying NLP(Natural Language Processing)…

Why BERT Fails in Commercial Environments - Intel AI

24 Mar 2020

intel.ai

Build a BERT Sci-kit Transformer

20 Mar 2020

towardsdatascience.com

BERT can get you state-of-the-art results on many NLP tasks and it only takes a few lines of code.

Beating the baseline recommender using Graph and NLP techniques in PyTorch

9 Mar 2020

towardsdatascience.com

Decoding NLP Attention Mechanisms

9 Mar 2020

topbots.com

Arguably more famous today than Michael Bay’s Transformers, the transformer architecture and transformer-based models have been breaking all kinds of state-of-the-art records. They are (rightfully) getting the attention of a big portion of the deep learning community and researchers in Natural Language Processing (NLP) since their introduction in 2017 by the Google Translation Team. This architecture has set […]

Over 150 of the Best Machine Learning, NLP, and Python Tutorials I’ve Found

9 Mar 2020

medium.com

By popular demand, I’ve updated this article with the latest tutorials from the past 12 months. Check it out here

The Big Bad NLP Database: Access Nearly 300 Datasets

9 Mar 2020

kdnuggets.com

Check out this database of nearly 300 freely-accessible NLP datasets, curated from around the internet.

Vincent Boucher on LinkedIn: #transformer #bert #nlp

9 Mar 2020

linkedin.com

Pre-training SmallBERTa - A tiny model to train on a tiny dataset An end to end colab notebook that allows you to train your own LM (using HuggingFace…

Quick Introduction to Sentiment Analysis

9 Mar 2020

towardsdatascience.com

What is sentiment analysis, how to perform it, and how it can help your business.

A Comprehensive Guide to Natural Language Generation

19 Feb 2020

kdnuggets.com

Follow this overview of Natural Language Generation covering its applications in theory and practice. The evolution of NLG architecture is also described from simple gap-filling to dynamic document creation along with a summary of the most popular NLG models.

How to train a new language model from scratch using Transformers and Tokenizers

19 Feb 2020

huggingface.co

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

A list of beginner-friendly NLP projects–using pre-trained models

19 Feb 2020

towardsdatascience.com

Build software with machine learning — no math required.

Top NLP Research Papers With Business Applications From ACL 2019

19 Feb 2020

topbots.com

This year’s annual meeting of the Association for Computational Linguistics (ACL 2019) was bigger than ever. Although the conference received 75% more submissions than last year, the quality of the research papers remained high, and so the acceptance rates are almost the same. It is becoming more and more challenging to keep track of the […]

[N] HuggingFace releases ultra-fast tokenization library for deep-learning

19 Feb 2020

reddit.com

336 votes, 25 comments. Huggingface, the NLP research company known for its transformers library, has just released a new open-source library for…

An Introductory Guide to NLP for Data Scientists with 7 Common Techniques

19 Feb 2020

kdnuggets.com

Data Scientists work with tons of data, and many times that data includes natural language text. This guide reviews 7 common techniques with code examples to introduce you the essentials of NLP, so you can begin performing analysis and building models from textual data.

Exploratory Data Analysis for Natural Language Processing: A Complete Guide to Python Tools

19 Feb 2020

neptune.ai

Explore NLP EDA with Python tools: learn about text statistics, ngrams, topic modeling with pyLDAvis, sentiment analysis, and more

Turing-NLG: A 17-billion-parameter language model by Microsoft

19 Feb 2020

microsoft.com

This figure was adapted from a similar image published in DistilBERT. Turing Natural Language Generation (T-NLG) is a 17 billion parameter language model by Microsoft that outperforms the state of the art on many downstream NLP tasks. We present a demo of the model, including its freeform generation, question answering, and summarization capabilities, to academics […]

Top NLP Algorithms & Concepts - DataScienceCentral.com

19 Feb 2020

datasciencecentral.com

Today, one of the most popular tasks in Data Science is processing information presented in the text form. Exactly this is text representation in the form of mathematical equations, formulas, paradigms, patterns in order to understand the text semantics (content) for its further processing: classification, fragmentation, etc. The general area which solves the described problems… Read More »Top NLP Algorithms & Concepts

10 Common NLP Terms Explained for the Text Analysis Novice - DataScienceCentral.com

19 Feb 2020

datasciencecentral.com

If you’re relatively new to the NLP and Text Analysis world, you’ll more than likely have come across some pretty technical terms and acronyms, that are challenging to get your head around, especially, if you’re relying on scientific definitions for a plain and simple explanation. We decided to put together a list of 10 common… Read More »10 Common NLP Terms Explained for the Text Analysis Novice

Serving GPT-2 in Google Cloud Platform

16 Feb 2020

medium.com

A CloudOps Journey

nlp-recipes/README.md at master · microsoft/nlp-recipes

14 Dec 2019

github.com

Natural Language Processing Best Practices & Examples - microsoft/nlp-recipes

Lit BERT: NLP Transfer Learning In 3 Steps - Towards Data Science

14 Dec 2019

towardsdatascience.com

In this tutorial we learn to quickly train Huggingface BERT using PyTorch Lightning for transfer learning on any NLP task

Hugging Face – The AI community building the future.

14 Dec 2019

huggingface.co

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

BERT Explained: A Complete Guide with Theory and Tutorial

5 Dec 2019

towardsml.com

Unless you have been out of touch with the Deep Learning world, chances are that you have heard about BERT — it has been the talk of the town for the last one year. At the end of 2018 researchers …

How ‘Embeddings’ Encode What Words Mean — Sort Of

24 Sep 2019

quantamagazine.org

Machines work with words by embedding their relationships with other words in a string of numbers.

Deconstructing BERT: Distilling 6 Patterns from 100 Million Parameters

30 Aug 2019

towardsdatascience.com

From BERT’s tangled web of attention, some intuitive patterns emerge.

Advanced NLP with spaCy · A free online course

29 Aug 2019

course.spacy.io

spaCy is a modern Python library for industrial-strength Natural Language Processing. In this free and interactive online course, you'll learn how to use spaCy to build advanced natural language understanding systems, using both rule-based and machine learning approaches.

Text Analytics

29 Aug 2019

monkeylearn.com

Medallia's text analytics software tool provides actionable insights via customer and employee experience sentiment data analysis from reviews & comments.

A tour of awesome features of spaCy (part 1/2) – Eliiza-AI – Medium

27 Aug 2019

medium.com

A few weeks ago I started working on a text summarisation project and I needed a Natural Language Processing library with comprehensive…

Word2vec: fish music = bass | graceavery

20 Aug 2019

graceavery.com

A tour of awesome features of spaCy (part 2/2) - Eliiza-AI - Medium

20 Aug 2019

medium.com

In the first part of this overview of spaCy we went over the features of the large English pretrained model that spaCy comes with. In this…

Introducing spaCy v2.1 · Blog · Explosion AI

1 Apr 2019

explosion.ai

Version 2.1 of the spaCy Natural Language Processing library includes a huge number of features, improvements and bug fixes. In this post, we highlight some of the things we're especially pleased with, and explain some of the most challenging parts of preparing this big release.

The Illustrated Word2vec

29 Mar 2019

jalammar.github.io

Discussions: Hacker News (347 points, 37 comments), Reddit r/MachineLearning (151 points, 19 comments) Translations: Chinese (Simplified), French, Korean, Portuguese, Russian “There is in all things a pattern that is part of our universe. It has symmetry, elegance, and grace - those qualities you find always in that which the true artist captures. You can find it in the turning of the seasons, in the way sand trails along a ridge, in the branch clusters of the creosote bush or the pattern of its leaves. We try to copy these patterns in our lives and our society, seeking the rhythms, the dances, the forms that comfort. Yet, it is possible to see peril in the finding of ultimate perfection. It is clear that the ultimate pattern contains it own fixity. In such perfection, all things move toward death.” ~ Dune (1965) I find the concept of embeddings to be one of the most fascinating ideas in machine learning. If you’ve ever used Siri, Google Assistant, Alexa, Google Translate, or even smartphone keyboard with next-word prediction, then chances are you’ve benefitted from this idea that has become central to Natural Language Processing models. There has been quite a development over the last couple of decades in using embeddings for neural models (Recent developments include contextualized word embeddings leading to cutting-edge models like BERT and GPT2). Word2vec is a method to efficiently create word embeddings and has been around since 2013. But in addition to its utility as a word-embedding method, some of its concepts have been shown to be effective in creating recommendation engines and making sense of sequential data even in commercial, non-language tasks. Companies like Airbnb, Alibaba, Spotify, and Anghami have all benefitted from carving out this brilliant piece of machinery from the world of NLP and using it in production to empower a new breed of recommendation engines. In this post, we’ll go over the concept of embedding, and the mechanics of generating embeddings with word2vec. But let’s start with an example to get familiar with using vectors to represent things. Did you know that a list of five numbers (a vector) can represent so much about your personality?

Measuring the varied sentiments of good and bad words

12 Oct 2018

flowingdata.com

There was a survey a while back that asked people to provide a 0 to 100 percent value to probabilistic words like “usually” and “likely”. YouGov did something similar for wo…

[P] Using T-SNE and word2vec embeddings to create clusters in wordclouds

5 Sep 2018

reddit.com

2.9M subscribers in the MachineLearning community. Beginners -> /r/mlquestions , AGI -> /r/singularity, career advices -> /r/cscareerquestions…

mukund109/word-mesh: A context-preserving word cloud generator

5 Sep 2018

github.com

A context-preserving word cloud generator.

Emotion and Sentiment Analysis: A Practitioner’s Guide to NLP

30 Aug 2018

kdnuggets.com

Sentiment analysis is widely used, especially as a part of social media analysis for any domain, be it a business, a recent movie, or a product launch, to understand its reception by the people and what they think of it based on their opinions or, you guessed it, sentiment!

NLP's ImageNet moment has arrived

15 Jul 2018

thegradient.pub

The time is ripe for practical transfer learning to make inroads into NLP.

Deep Meaning Beyond Thought Vectors

8 Jun 2018

machinethoughts.wordpress.com

I ended my last post by saying that I might write a follow-up post on current work that seems to exhibit progress toward natural language understanding. I am going to discuss a couple sampled pap…

How to solve 90% of NLP problems: a step-by-step guide

8 Jun 2018

blog.insightdatascience.com

Using Machine Learning to understand and leverage text.

Deep Learning Research Review Week 3: Natural Language Processing – Adit De

8 Jun 2018

adeshpande3.github.io

Engineering at Forward | UCLA CS '19

NLP Concepts with spaCy. Code examples released under CC0 https://creativec

8 Jun 2018

gist.github.com

NLP Concepts with spaCy. Code examples released under CC0 https://creativecommons.org/choose/zero/, other text released under CC BY 4.0 https://creativecommons.org/licenses/by/4.0/ · GitHub

agnusmaximus/Word2Bits: Quantized word vectors that take 8x-16x less space than regular word vectors

8 Jun 2018

github.com

Quantized word vectors that take 8x-16x less space than regular word vectors - agnusmaximus/Word2Bits

5 Fantastic Practical Natural Language Processing Resources

8 Jun 2018

kdnuggets.com

This post presents 5 practical resources for getting a start in natural language processing, covering a wide array of topics and approaches.

NLTK 3.3 is out

8 Jun 2018

reddit.com

NLTK 3.3 has been released NLTK 3.3 includes the following: Support Python 3.6 New interface to CoreNLP Support synset retrieval by sense key Minor…

Topic Modeling with Gensim (Python) - A Practical Guide

12 May 2018

machinelearningplus.com

Topic Modeling is a technique to understand and extract the hidden topics from large volumes of text. Latent Dirichlet Allocation(LDA) is an algorithm for topic modeling, which has excellent implementations in the Python's Gensim package. This tutorial tackles the problem of finding the optimal number of topics.

Understanding what is behind Sentiment Analysis (Part II)

1 May 2018

building.lang.ai

Fine-tuning our sentiment classifier

Understanding Feature Engineering (Part 3) — Traditional Methods for Text D

9 Apr 2018

towardsdatascience.com

Traditional strategies for taming unstructured, textual data

Google's trained Word2Vec model in Python · Chris McCormick

12 Feb 2018

mccormickml.com

Gensim: topic modelling for humans

2 Feb 2018

radimrehurek.com

Efficient topic modelling in Python

concrete_NLP_tutorial/NLP_notebook.ipynb at master · hundredblocks/concrete

1 Feb 2018

github.com

An NLP workshop about concrete solutions to real problems - hundredblocks/concrete_NLP_tutorial

Prodigy - Radically efficient machine teaching

1 Feb 2018

prodi.gy

A downloadable annotation tool for LLMs, NLP and computer vision tasks such as named entity recognition, text classification, object detection, image segmentation, evaluation and more.

spacy-notebooks/lightning_tour.ipynb at master · explosion/spacy-notebooks

1 Feb 2018

github.com

💫 Jupyter notebooks for spaCy examples and tutorials - explosion/spacy-notebooks

Word Tensors

28 Dec 2017

multithreaded.stitchfix.com

Counting and tensor decompositions are elegant and straightforward techniques. But these methods are grossly underepresented in business contexts. In this p...

Topic Modeling with LDA Introduction

27 Dec 2017

opendatascience.com

Stay up-to-date on the latest data science and AI news in the worlds of artificial intelligence, machine learning, deep learning, implementation, and more.

Using word embedding to enable semantic queries on relational databases

27 Dec 2017

blog.acolyer.org

nlp — my Raindrop.io articles