Engineering AI Agents
BOOK
Foundations
Training Deep Networks
Perception
State Estimation
Large Language Models
Multimodal Reasoning
Planning
Markov Decision Processes
Reinforcement Learning
COURSES
Introduction to AI
AI for Robotics
Deep Learning for Computer Vision
DATA MINING - BEING PORTED
ABOUT ME
Multimodal Reasoning
Vision Transformer Paper
Multimodal Reasoning
Text Tokenization
Word2Vec Embeddings {#sec-word2vec}
Word2Vec from scratch
Word2Vec Tensorflow Tutorial
Language Models
Transformers and Self-Attention
Single-head self-attention
Multi-head self-attention
Positional Embeddings
Batch Normalization (BN)
Layer Normalization (LN)
Vision Transformer Paper
Vision Transformer (ViT) in PyTorch
Contrastive Language-Image Pretraining (CLIP)
CLIP Paper
Multimodal Reasoning
Vision Transformer Paper
Vision Transformer Paper
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Back to top
Vision Transformer (ViT) in PyTorch