Engineering AI Agents
BOOK
Foundations
Training Deep Networks
Perception
State Estimation
Large Language Models
Multimodal Reasoning
Planning
Markov Decision Processes
Reinforcement Learning
COURSES
Introduction to AI
AI for Robotics
Deep Learning for Computer Vision
DATA MINING - BEING PORTED
ABOUT ME
Multimodal Reasoning
Multimodal Reasoning
Grounding language in perception.
Author
Pantelis Monogioudis
Multimodal Reasoning
Text Tokenization
Word2Vec Embeddings
Word2Vec from scratch
Word2Vec Tensorflow Tutorial
Language Models
Transformers and Self-Attention
Single-head self-attention
Multi-head self-attention
Positional Embeddings
Batch Normalization (BN)
Layer Normalization (LN)
Vision Transformer
aiml-common/lectures/vision-transformer/paper.qmd
aiml-common/lectures/vision-transformer/vision_transformer_tutorial.ipynb
CLIP: Connecting Text and Images
Categories
All
(14)
Language Models
These notes heavily borrowing from the CS224N set of notes on Language Models.
Multi-head self-attention
Earlier we have seen examples with the token
bear
being in multiple grammatical patterns that also influence its meaning. To capture such multiplicity we can use multiple…
Single-head self-attention
In the simple attention mechanism, the attention weights are computed
deterministically
from the input context. We call the combination of context-free embedding (eg…
Transformers and Self-Attention
For the explanation of decoder-based architectures such as those used by GPT, please see the repo https://github.com/pantelis/femtotransformers and the embedded comments…
Vision Transformer
This section describes the Vision Transformer (ViT) architecture, which is a transformer-based model for image classification. The ViT architecture was introduced in the…
Vision Transformer Paper
Word2Vec Embeddings
In the so called classical NLP, words were treated as atomic symbols, e.g.
hotel
,
conference
,
walk
and they were represented with on-hot encoded (sparse) vectors e.g.
No matching items
Back to top