llms/attention

Topic 33: Slim Attention, KArAt, XAttention and Multi-Tok...

huggingface.co (2025-04-07)

A Blog post by Ksenia Se on Hugging Face

On MLA

planetbanatt.net (2025-01-28)

Why AI language models choke on too much text

arstechnica.com (2024-12-22)

Compute costs scale with the square of the input size. That’s not great.

Understanding Positional Embeddings in Transformers: From...

towardsdatascience.com (2024-07-20)

A deep dive into absolute, relative, and rotary positional embeddings with code examples

Deep Learning Architectures From CNN, RNN, GAN, and Trans...

www.marktechpost.com (2024-04-15)

Deep learning architectures have revolutionized the field of artificial intelligence, offering innovative solutions for complex problems across various domains, including computer vision, natural language processing, speech recognition, and generative models. This article explores some of the most influential deep learning architectures: Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Generative Adversarial Networks (GANs), Transformers, and Encoder-Decoder architectures, highlighting their unique features, applications, and how they compare against each other. Convolutional Neural Networks (CNNs) CNNs are specialized deep neural networks for processing data with a grid-like topology, such as images. A CNN automatically detects the important features without any human supervision.

How Chain-of-Thought Reasoning Helps Neural Networks Compute

www.quantamagazine.org (2024-03-29)

Large language models do better at solving problems when they show their work. Researchers are beginning to understand why.

How do transformers work?+Design a Multi-class Sentiment ...

open.substack.com (2024-02-22)

We will deep dive into understanding how transformer model work like BERT(Non-mathematical Explanation of course!). system design to use the transformer to build a Sentiment Analysis

FlashSigmoid: A Hardware-Aware and Memory-Efficient Imple...

www.marktechpost.com (2014-09-24)

Large Language Models (LLMs) have gained significant prominence in modern machine learning, largely due to the attention mechanism. This mechanism employs a sequence-to-sequence mapping to construct context-aware token representations. Traditionally, attention relies on the softmax function (SoftmaxAttn) to generate token representations as data-dependent convex combinations of values. However, despite its widespread adoption and effectiveness, SoftmaxAttn faces several challenges. One key issue is the tendency of the softmax function to concentrate attention on a limited number of features, potentially overlooking other informative aspects of the input data. Also, the application of SoftmaxAttn necessitates a row-wise reduction along the input sequence length,