nlp | Perfectly Awesome

Speech and Language Processing An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models 📄

Where Does Meaning Live in a Sentence? Math Might Tell Us. | Quanta Magazine

The mathematician Tai-Danae Bradley is using category theory to try to understand both human and AI-generated language.

What is METEOR score? - Dataconomy

METEOR Score is a metric used to evaluate the quality of machine translation based on precision, recall, word alignment, and linguistic flexibility.

A Step by Step Guide to Build a Trend Finder Tool with Python: Web Scraping, NLP (Sentiment Analysis & Topic Modeling), and Word Cloud Visualization

A Step by Step Guide to Build a Trend Finder Tool with Python: Web Scraping, NLP (Sentiment Analysis & Topic Modeling), and Word Cloud Visualization

What is stemming? - Dataconomy

Stemming is the linguistic process of reducing words to their base form, ignoring prefixes and suffixes, to enhance clarity and information retrieval.

Text Preprocessing and Feature Engineering with spaCy

In this article, we’ll focus on how to prepare text data for machine learning and statistical modeling using spaCy.

The 2025 AI Engineering Reading List

We picked 50 paper/models/blogs across 10 fields in AI Eng: LLMs, Benchmarks, Prompting, RAG, Agents, CodeGen, Vision, Voice, Diffusion, Finetuning. If you're starting from scratch, start here.

Implementing A Byte Pair Encoding (BPE) Tokenizer From Scratch

This is a standalone notebook implementing the popular byte pair encoding (BPE) tokenization algorithm, which is used in models like GPT-2 to GPT-4, Llama 3,...

Hugging Face Releases Sentence Transformers v3.3.0: A Major Leap for NLP Efficiency

Natural Language Processing (NLP) has rapidly evolved in the last few years, with transformers emerging as a game-changing innovation. Yet, there are still notable challenges when using NLP tools to develop applications for tasks like semantic search, question answering, or document embedding. One key issue has been the need for models that not only perform well but also work efficiently on a range of devices, especially those with limited computational resources, such as CPUs. Models tend to require substantial processing power to yield high accuracy, and this trade-off often leaves developers choosing between performance and practicality. Additionally, deploying large models

Python libs for sentiment analysis

Sentiment analysis, i.e., determining the emotional tone of a text, has become a crucial tool for researchers, developers, and businesses to comprehend social media trends, consumer feedback, and other topics. With its robust library ecosystem, Python provides a vast choice of tools to improve and streamline sentiment analysis processes. Through the use of these libraries, data scientists can easily create precise sentiment models using pre-trained models and sophisticated machine learning frameworks. In this post, the top 12 Python sentiment analysis libraries have been discussed, emphasizing their salient characteristics, advantages, and uses. TextBlob A popular Python sentiment analysis toolkit, TextBlob is

How ‘Embeddings’ Encode What Words Mean — Sort Of

Machines work with words by embedding their relationships with other words in a string of numbers.

CS388: Natural Language Processing

Cleaning

As part of data preparation for an NLP model, it’s common to need to clean up your data prior to passing it into the model. If there’s unwanted content in your output, for example, it could impact the quality of your NLP model. To help with this, the `unstructured` library includes cleaning functions to help users sanitize output before sending it to downstream applications.

Unlocking the Best Tokenization Strategies: How Greedy Inference and SaGe L

The inference method is crucial for NLP models in subword tokenization. Methods like BPE, WordPiece, and UnigramLM offer distinct mappings, but their performance differences must be better understood. Implementations like Huggingface Tokenizers often need to be clearer or limit inference choices, complicating compatibility with vocabulary learning algorithms. Whether a matching inference method is necessary or optimal for tokenizer vocabularies is uncertain. Previous research focused on developing vocabulary construction algorithms such as BPE, WordPiece, and UnigramLM, exploring optimal vocabulary size and multilingual vocabularies. Some studies examined the effects of vocabularies on downstream performance, information theory, and cognitive plausibility. Limited work on

Speech and Language Processing

Beyond Self-Attention: How a Small Language Model Predicts the Next Token

A deep dive into the internals of a small transformer model to learn how it turns self-attention calculations into accurate predictions for the next token.

Primers • AI

Aman's AI Journal | Course notes and learning material for Artificial Intelligence and Deep Learning Stanford classes.

A Taxonomy of Natural Language Processing

An overview of different fields of study and recent developments in NLP

Introduction to Vector Similarity Search

Learn what vector search is and the metrics pertinent to decide the distance (or similarity) between objects.

4 Ways to Do Question Answering in LangChain

Chat with your long PDF docs: load_qa_chain, RetrievalQA, VectorstoreIndexCreator, ConversationalRetrievalChain

Hacker News

I explain what is so unique about the RWKV language model.

Dalai

Dead simple way to run LLaMA on your computer

Meta unveils a new large language model that can run on a single GPU

LLaMA-13B reportedly outperforms ChatGPT-like tech despite being 10x smaller.

How to Generate GPT Output in JSON Format for Ruby developers

I was playing around with OpenAI (GPT-3) today, building a reasonably complicated email parser for a...

Dense Vectors | Pinecone

ChatGPT and the Imagenet moment — Benedict Evans

The wave of enthusiasm around generative networks feels like another Imagenet moment - a step change in what ‘AI’ can do that could generalise far beyond the cool demos. What can it create, and where are the humans in the loop?

2212.03551.pdf 📄

AI Homework – Stratechery by Ben Thompson

The first obvious casualty of large language models is homework: the real training for everyone, though, and the best way to leverage AI, will be in verifying and editing information.

Beginner’s Guide to Diffusion Models

An intuitive understanding of how AI-generated art is made by Stable Diffusion, Midjourney, or DALL-E

What do countries talk about at the UN General Debate — Topic modelings using LDA.

The intuition behind LDA and its limitations, along with python implementation using Gensim.

5 Linguistics Courses for NLP Practitioners

This collection of 5 courses is intended to help NLP practitioners or hopefuls acquire some of their lacking linguistics knowledge.

Converting Text Documents to Token Counts with CountVectorizer

The post explains the significance of CountVectorizer and demonstrates its implementation with Python code.

Interview: Why Mastering Language Is So Difficult for AI

Scientist Gary Marcus argues that “deep learning” is not the only path to true artificial intelligence.

Topic Modeling with LSA, pLSA, LDA, NMF, BERTopic, Top2Vec: a Comparison

A comparison between different topic modeling strategies including practical Python examples

7 spaCy Features To Boost Your NLP Pipelines And Save Time

I’ve never used spaCy beyond simple named entity recognition tasks. Boy was I wrong.

Visualizing Part-of-Speech Tags with NLTK and SpaCy

Customizing displaCy’s entity visualizer

Minerva: Solving Quantitative Reasoning Problems with Language Models

Posted by Ethan Dyer and Guy Gur-Ari, Research Scientists, Google Research, Blueshift Team Language models have demonstrated remarkable performance...

Generating Children's Stories Using GPT-3 and DALL·E

We used GPT-3 and DALL·E to generate a children's storybook about Ash and Pikachu vs. Team Rocket. Read the story and marvel at the AI-generated visuals!

GitHub Copilot · Your AI pair programmer

GitHub Copilot works alongside you directly in your editor, suggesting whole lines or entire functions for you.

snakers4/silero-models: Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple

Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple - snakers4/silero-models

YouTube

Share your videos with friends, family, and the world

Natural Language Processing with Transformers Book

AI Virtual Assistant Technology Guide 2022

They can help you get an appointment or order a pizza, find the best ticket deals and bring your...

Text Summarization with NLP: TextRank vs Seq2Seq vs BART

Natural Language Processing with Python, Gensim, Tensorflow, Transformers

Topic Modeling in Python | Toptal

Topic modeling can bring NLP to the next level. Here’s how.

NLP_workshop/NLP_demo.ipynb at master · bjpcjp/NLP_workshop

NLP demo code, largely based on https://github.com/hundredblocks/concrete_NLP_tutorial - bjpcjp/NLP_workshop

bjpcjp/gensim-lessons

Working files for gensim NLP tutorials.

spaCy_hello_world/spaCy_101.ipynb at master · bjpcjp/spaCy_hello_world

Beginner's tour of spaCy v2.0.

Add Labels to a Dataset for Sentiment Analysis

In this article, I will present a tutorial on how to add labels to a dataset for sentiment analysis using Python. Adding labels to a dataset.

INTRODUCTION TO SPACY 3 — Introduction to spaCy 3

Mastering spaCy | Data | eBook

An end-to-end practical guide to implementing NLP applications using the Python ecosystem. 1 customer review. Instant delivery. Top rated Mobile Application Development products.

Tokenization Algorithms Explained

A one-stop-shop for all your tokenization needs

Semantic Search: Measuring Meaning From Jaccard to Bert

Similarity search is one of the fastest-growing domains in AI and machine learning. At its core, it is the process of matching relevant pieces of information together.

Wu Dao 2.0: A Monster of 1.75 Trillion Parameters | by Alberto Romero | Med

BAAI conference presented Wu Dao 2.0. The most powerful AI to date.

Sentiment Analysis — Comparing 3 Common Approaches: Naive Bayes, LSTM, and

Sentiment Analysis, or Opinion Mining, is a subfield of NLP (Natural Language Processing) that aims to extract attitudes, appraisals, opinions, and emotions from text. Inspired by the rapid migration…

Redirect

http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/

Attention and Memory in Deep Learning and NLP – WildML

Understanding Transformers, the machine learning model behind GPT-3

How this novel neural network architecture changes the way we analyze complex data types, and powers revolutionary models like GPT-3 and BERT.

Language models like GPT-3 could herald a new type of search engine

The way we search online hasn’t changed in decades. A new idea from Google researchers could make it more like talking to a human expert

How Transformers work in deep learning and NLP: an intuitive introduction | AI Summer

An intuitive understanding on Transformers and how they are used in Machine Translation. After analyzing all subcomponents one by one such as self-attention and positional encodings , we explain the principles behind the Encoder and Decoder and why Transformers work so well

Deploy an NLP pipeline. Flask Heroku Bert.

A simple quick solution for deploying an NLP project and challenges you may faced during the process.

Arabic NLP: Unique Challenges and Their Solutions

Pre-processing Arabic text for machine-learning using the camel-tools Python package

Foundations of NLP Explained Visually: Beam Search, How it Works | by Ketan

A Gentle Guide to how Beam Search enhances predictions, in Plain English

State of the art NLP at scale with RAPIDS, HuggingFace and Dask

See how to build end-to-end NLP pipelines in a fast and scalable way on GPUs — from feature engineering to inference.

OpenAI’s text-generating system GPT-3 is now spewing out 4.5 billion words

Text-generation is the next big thing in AI.

Using spaCy 3.0 to build a custom NER model

blog.md · GitHub

GPT-3: We’re at the very beginning of a new app ecosystem

The NLP application ecosystem is in its earliest stages, and it's not yet clear whether GPT-3 or a different model will be the foundation.

How to Tell Stories with Sentiment Analysis

A journalist’s attempt at introducing math to the newsroom while analyzing QAnon

10 NLP Terms Every Data Scientist Should Know

Knowing the terminology is essential to understanding any tutorial.

Achieving High-Quality Search and Recommendation Results with DeepNLP

Speech and natural language processing (NLP) have become the foundation for most of the AI development in the enterprise today, as textual data represents a significant portion of unstructured content.

12 Twitter Sentiment Analysis Algorithms Compared

12 sentiment analysis algorithms were compared on the accuracy of tweet classification. The fasText deep learning system was the winner.

Release v3.0.0: Transformer-based pipelines, new training system, project t

📣 NEW: Want to make the transition from spaCy v2 to spaCy v3 as smooth as possible for you and your organization? We're now offering commercial migration support for your spaCy pipelines! We...

Cross-Topic Argument Mining: Learning How to Classify Texts

Classifying cross-topic natural language texts based on their argumentative structure using deep learning

6 NLP Techniques Every Data Scientist Should Know

Towards more efficient natural language processing

Browse the State-of-the-Art in Machine Learning | Papers With Code

**Sentiment Analysis** is the task of classifying the polarity of a given text. For instance, a text-based tweet can be categorized into either "positive", "negative", or "neutral". Given the text and accompanying labels, a model can be trained to predict the correct sentiment. **Sentiment Analysis** techniques can be categorized into machine learning approaches, lexicon-based approaches, and even hybrid methods. Some subcategories of research in sentiment analysis include: multimodal sentiment analysis, aspect-based sentiment analysis, fine-grained opinion analysis, language specific sentiment analysis. More recently, deep learning techniques, such as RoBERTa and T5, are used to train high-performing sentiment classifiers that are evaluated using metrics like F1, recall, and precision. To evaluate sentiment analysis systems, benchmark datasets like SST, GLUE, and IMDB movie reviews are used. Further readings: - [Sentiment Analysis Based on Deep Learning: A Comparative Study](https://paperswithcode.com/paper/sentiment-analysis-based-on-deep-learning-a)

A Beginner’s Guide to Use BERT for the First Time

From predicting single sentence to fine-tuning using custom dataset to finding the best hyperparameter configuration.

Linguistic Fundamentals for Natural Language Processing: 100 Essentials fro

Algorithms for text analytics must model how language works to incorporate meaning in language—and so do the people deploying these algorithms. Bender & Lascarides 2019 is an accessible overview of what the field of linguistics can teach NLP about how meaning is encoded in human languages.

A version of the BERT language model that’s 20 times as fast

Determining the optimal architectural parameters reduces network size by 84% while improving performance on natural-language-understanding tasks.

Overview of tokenization algorithms in NLP

Introduction to tokenization methods, including subword, BPE and SentencePiece

Amazon shifts some Alexa and Rekognition computing to its own Inferentia ch

Amazon said it shifted part of the computing for its Alexa voice assistant to its own custom-designed chips.

GPT-3, transformers and the wild world of NLP

A review of 20+ deep learning NLP models and how to use them well

Compare Amazon Textract with Tesseract OCR — OCR & NLP Use Case

Comparison of two known engines for optical character recognition (OCR) and Naturtal Language Processing

AI devs created a lean, mean, GPT-3-beating machine that uses 99.9% fewer p

AI researchers from the Ludwig Maximilian University (LMU) of Munich have developed a bite-sized text generator capable of besting OpenAI’s state of the art GPT-3 using only a tiny fraction of its parameters. GPT-3 is a monster of an AI sys

AI Democratization in the Era of GPT-3

What does Microsoft getting an "exclusive license" to GPT-3 mean for the future of AI democratization?

3 Natural Language Processing Tools From AWS to Python

Photo by Eric Krull on Unsplash. Parsing and processing documents can provide a lot of value for alm...

Hedonometer

Hedonometer.org is an instrument that measures the happiness of large populations in real time. The hedonometer is based on people’s online expressions, capitalizing on data-rich social media, and measures how people present themselves to the outside world.

Part 4: Semantic Segmentation

Identifying Change Points in Time Series Data with FLUSS and FLOSS

Comment Ranking Algorithms: Hacker News vs. YouTube vs. Reddit

Short technical information about Word2Vec, GloVe and Fasttext

Introduction

Evolution of Language Models: N-Grams, Word Embeddings, Attention & Transfo

This post collates research on the advancements of Natural Language Processing (NLP) over the years.

Good Grams: How to Find Predictive N-Grams for your Problem

Figuring out what words are predictive for your problem is easy!

Natural Language Processing Recipes: Best Practices and Examples

Here is an overview of another great natural language processing resource, this time from Microsoft, which demonstrates best practices and implementation guidelines for a variety of tasks and scenarios.

Python Libraries for Natural Language Processing - Towards Data Science

An Overview Of popular python libraries for Natural Language Processing

The Cost of Training NLP Models: A Concise Overview

We review the cost of training large-scale language models, and the drivers of these costs. The intended audience includes engineers and scientists budgeting their model-training experiments, as...

Twitter Sentiment Analysis with Node.js

How people are talking about your brand

Topic Modeling Articles with NMF

Extracting topics is a good unsupervised data-mining technique to discover the underlying relationships between texts. There are many…

TLDR This - Article Summarizer & Online Text Summarizing Tool

TLDR This is a Free online text summarizing tool that automatically condenses long articles, documents, essays, or papers into key summary paragraphs using state-of-the-art AI.

google-research/bert: TensorFlow code and pre-trained models for BERT

TensorFlow code and pre-trained models for BERT.

The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning) – J

Discussions: Hacker News (98 points, 19 comments), Reddit r/MachineLearning (164 points, 20 comments) Translations: Chinese (Simplified), French 1, French 2, Japanese, Korean, Persian, Russian, Spanish 2021 Update: I created this brief and highly accessible video intro to BERT The year 2018 has been an inflection point for machine learning models handling text (or more accurately, Natural Language Processing or NLP for short). Our conceptual understanding of how best to represent words and sentences in a way that best captures underlying meanings and relationships is rapidly evolving. Moreover, the NLP community has been putting forward incredibly powerful components that you can freely download and use in your own models and pipelines (It’s been referred to as NLP’s ImageNet moment, referencing how years ago similar developments accelerated the development of machine learning in Computer Vision tasks).

[Project] nlp-tutoral repository who is studying NLP(Natural Language Proce

219 votes, 26 comments. Hello. This is my first post in reddit I created nlp-tutoral repository who is studying NLP(Natural Language Processing)…

Why BERT Fails in Commercial Environments - Intel AI

Build a BERT Sci-kit Transformer

BERT can get you state-of-the-art results on many NLP tasks and it only takes a few lines of code.

Vincent Boucher on LinkedIn: #transformer #bert #nlp

Pre-training SmallBERTa - A tiny model to train on a tiny dataset An end to end colab notebook that allows you to train your own LM (using HuggingFace…

The Big Bad NLP Database: Access Nearly 300 Datasets

Check out this database of nearly 300 freely-accessible NLP datasets, curated from around the internet.

Beating the baseline recommender using Graph and NLP techniques in PyTorch

Decoding NLP Attention Mechanisms

Arguably more famous today than Michael Bay’s Transformers, the transformer architecture and transformer-based models have been breaking all kinds of state-of-the-art records. They are (rightfully) getting the attention of a big portion of the deep learning community and researchers in Natural Language Processing (NLP) since their introduction in 2017 by the Google Translation Team. This architecture has set […]

Over 150 of the Best Machine Learning, NLP, and Python Tutorials I’ve Found

By popular demand, I’ve updated this article with the latest tutorials from the past 12 months. Check it out here

Quick Introduction to Sentiment Analysis

What is sentiment analysis, how to perform it, and how it can help your business.

A Comprehensive Guide to Natural Language Generation

Follow this overview of Natural Language Generation covering its applications in theory and practice. The evolution of NLG architecture is also described from simple gap-filling to dynamic document creation along with a summary of the most popular NLG models.

How to train a new language model from scratch using Transformers and Tokenizers

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

A list of beginner-friendly NLP projects–using pre-trained models

Build software with machine learning — no math required.

Top NLP Research Papers With Business Applications From ACL 2019

This year’s annual meeting of the Association for Computational Linguistics (ACL 2019) was bigger than ever. Although the conference received 75% more submissions than last year, the quality of the research papers remained high, and so the acceptance rates are almost the same. It is becoming more and more challenging to keep track of the […]

Exploratory Data Analysis for Natural Language Processing: A Complete Guide to Python Tools

Explore NLP EDA with Python tools: learn about text statistics, ngrams, topic modeling with pyLDAvis, sentiment analysis, and more

[N] HuggingFace releases ultra-fast tokenization library for deep-learning

336 votes, 25 comments. Huggingface, the NLP research company known for its transformers library, has just released a new open-source library for…

An Introductory Guide to NLP for Data Scientists with 7 Common Techniques

Data Scientists work with tons of data, and many times that data includes natural language text. This guide reviews 7 common techniques with code examples to introduce you the essentials of NLP, so you can begin performing analysis and building models from textual data.

Turing-NLG: A 17-billion-parameter language model by Microsoft

This figure was adapted from a similar image published in DistilBERT. Turing Natural Language Generation (T-NLG) is a 17 billion parameter language model by Microsoft that outperforms the state of the art on many downstream NLP tasks. We present a demo of the model, including its freeform generation, question answering, and summarization capabilities, to academics […]

Top NLP Algorithms & Concepts - DataScienceCentral.com

Today, one of the most popular tasks in Data Science is processing information presented in the text form. Exactly this is text representation in the form of mathematical equations, formulas, paradigms, patterns in order to understand the text semantics (content) for its further processing: classification, fragmentation, etc. The general area which solves the described problems… Read More »Top NLP Algorithms & Concepts

10 Common NLP Terms Explained for the Text Analysis Novice - DataScienceCentral.com

If you’re relatively new to the NLP and Text Analysis world, you’ll more than likely have come across some pretty technical terms and acronyms, that are challenging to get your head around, especially, if you’re relying on scientific definitions for a plain and simple explanation. We decided to put together a list of 10 common… Read More »10 Common NLP Terms Explained for the Text Analysis Novice

Serving GPT-2 in Google Cloud Platform

A CloudOps Journey

nlp-recipes/README.md at master · microsoft/nlp-recipes

Natural Language Processing Best Practices & Examples - microsoft/nlp-recipes

Lit BERT: NLP Transfer Learning In 3 Steps - Towards Data Science

In this tutorial we learn to quickly train Huggingface BERT using PyTorch Lightning for transfer learning on any NLP task

Hugging Face – The AI community building the future.

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

BERT Explained: A Complete Guide with Theory and Tutorial

Unless you have been out of touch with the Deep Learning world, chances are that you have heard about BERT — it has been the talk of the town for the last one year. At the end of 2018 researchers …

Deconstructing BERT: Distilling 6 Patterns from 100 Million Parameters

From BERT’s tangled web of attention, some intuitive patterns emerge.

Advanced NLP with spaCy · A free online course

spaCy is a modern Python library for industrial-strength Natural Language Processing. In this free and interactive online course, you'll learn how to use spaCy to build advanced natural language understanding systems, using both rule-based and machine learning approaches.

Text Analytics

Medallia's text analytics software tool provides actionable insights via customer and employee experience sentiment data analysis from reviews & comments.

A tour of awesome features of spaCy (part 1/2) – Eliiza-AI – Medium

A few weeks ago I started working on a text summarisation project and I needed a Natural Language Processing library with comprehensive…

Word2vec: fish music = bass | graceavery

A tour of awesome features of spaCy (part 2/2) - Eliiza-AI - Medium

In the first part of this overview of spaCy we went over the features of the large English pretrained model that spaCy comes with. In this…

Introducing spaCy v2.1 · Blog · Explosion AI

Version 2.1 of the spaCy Natural Language Processing library includes a huge number of features, improvements and bug fixes. In this post, we highlight some of the things we're especially pleased with, and explain some of the most challenging parts of preparing this big release.

The Illustrated Word2vec

Discussions: Hacker News (347 points, 37 comments), Reddit r/MachineLearning (151 points, 19 comments) Translations: Chinese (Simplified), French, Korean, Portuguese, Russian “There is in all things a pattern that is part of our universe. It has symmetry, elegance, and grace - those qualities you find always in that which the true artist captures. You can find it in the turning of the seasons, in the way sand trails along a ridge, in the branch clusters of the creosote bush or the pattern of its leaves. We try to copy these patterns in our lives and our society, seeking the rhythms, the dances, the forms that comfort. Yet, it is possible to see peril in the finding of ultimate perfection. It is clear that the ultimate pattern contains it own fixity. In such perfection, all things move toward death.” ~ Dune (1965) I find the concept of embeddings to be one of the most fascinating ideas in machine learning. If you’ve ever used Siri, Google Assistant, Alexa, Google Translate, or even smartphone keyboard with next-word prediction, then chances are you’ve benefitted from this idea that has become central to Natural Language Processing models. There has been quite a development over the last couple of decades in using embeddings for neural models (Recent developments include contextualized word embeddings leading to cutting-edge models like BERT and GPT2). Word2vec is a method to efficiently create word embeddings and has been around since 2013. But in addition to its utility as a word-embedding method, some of its concepts have been shown to be effective in creating recommendation engines and making sense of sequential data even in commercial, non-language tasks. Companies like Airbnb, Alibaba, Spotify, and Anghami have all benefitted from carving out this brilliant piece of machinery from the world of NLP and using it in production to empower a new breed of recommendation engines. In this post, we’ll go over the concept of embedding, and the mechanics of generating embeddings with word2vec. But let’s start with an example to get familiar with using vectors to represent things. Did you know that a list of five numbers (a vector) can represent so much about your personality?

Measuring the varied sentiments of good and bad words

There was a survey a while back that asked people to provide a 0 to 100 percent value to probabilistic words like “usually” and “likely”. YouGov did something similar for wo…

[P] Using T-SNE and word2vec embeddings to create clusters in wordclouds

2.9M subscribers in the MachineLearning community. Beginners -> /r/mlquestions , AGI -> /r/singularity, career advices -> /r/cscareerquestions…

mukund109/word-mesh: A context-preserving word cloud generator

A context-preserving word cloud generator.

Emotion and Sentiment Analysis: A Practitioner’s Guide to NLP

Sentiment analysis is widely used, especially as a part of social media analysis for any domain, be it a business, a recent movie, or a product launch, to understand its reception by the people and what they think of it based on their opinions or, you guessed it, sentiment!

NLP's ImageNet moment has arrived

The time is ripe for practical transfer learning to make inroads into NLP.

Deep Meaning Beyond Thought Vectors

I ended my last post by saying that I might write a follow-up post on current work that seems to exhibit progress toward natural language understanding. I am going to discuss a couple sampled pap…

How to solve 90% of NLP problems: a step-by-step guide

Using Machine Learning to understand and leverage text.

Deep Learning Research Review Week 3: Natural Language Processing – Adit De

Engineering at Forward | UCLA CS '19

NLP Concepts with spaCy. Code examples released under CC0 https://creativec

NLP Concepts with spaCy. Code examples released under CC0 https://creativecommons.org/choose/zero/, other text released under CC BY 4.0 https://creativecommons.org/licenses/by/4.0/ · GitHub

agnusmaximus/Word2Bits: Quantized word vectors that take 8x-16x less space than regular word vectors

Quantized word vectors that take 8x-16x less space than regular word vectors - agnusmaximus/Word2Bits

5 Fantastic Practical Natural Language Processing Resources

This post presents 5 practical resources for getting a start in natural language processing, covering a wide array of topics and approaches.

NLTK 3.3 is out

NLTK 3.3 has been released NLTK 3.3 includes the following: Support Python 3.6 New interface to CoreNLP Support synset retrieval by sense key Minor…

Topic Modeling with Gensim (Python) - A Practical Guide

Topic Modeling is a technique to understand and extract the hidden topics from large volumes of text. Latent Dirichlet Allocation(LDA) is an algorithm for topic modeling, which has excellent implementations in the Python's Gensim package. This tutorial tackles the problem of finding the optimal number of topics.

Understanding what is behind Sentiment Analysis (Part II)

Fine-tuning our sentiment classifier

Understanding Feature Engineering (Part 3) — Traditional Methods for Text D

Traditional strategies for taming unstructured, textual data

Google's trained Word2Vec model in Python · Chris McCormick

Gensim: topic modelling for humans

Efficient topic modelling in Python

concrete_NLP_tutorial/NLP_notebook.ipynb at master · hundredblocks/concrete

An NLP workshop about concrete solutions to real problems - hundredblocks/concrete_NLP_tutorial

Prodigy - Radically efficient machine teaching

A downloadable annotation tool for LLMs, NLP and computer vision tasks such as named entity recognition, text classification, object detection, image segmentation, evaluation and more.

spacy-notebooks/lightning_tour.ipynb at master · explosion/spacy-notebooks

💫 Jupyter notebooks for spaCy examples and tutorials - explosion/spacy-notebooks

Word Tensors

Counting and tensor decompositions are elegant and straightforward techniques. But these methods are grossly underepresented in business contexts. In this p...

Topic Modeling with LDA Introduction

Stay up-to-date on the latest data science and AI news in the worlds of artificial intelligence, machine learning, deep learning, implementation, and more.

Using word embedding to enable semantic queries on relational databases