Engineering AI Agents

BOOK
COURSES
MEDIA
- AI for Robotics
ABOUT ME

Multimodal Reasoning

Multimodal Reasoning

Grounding language in perception.

Author

Pantelis Monogioudis

Multimodal Reasoning

Categories

All (15)

CLIP Paper

The CLIP paper, one of the most beautifully written papers in the AI domain, is a must-read just for this reason alone: you can adopt the template for your own papers or…

Contrastive Language-Image Pretraining (CLIP)

Language is a representation of our world and so is perception - both language and vision are trying to solve the same problem.

Language Models

These notes heavily borrowing from the CS224N set of notes on Language Models.

Multi-head self-attention

Earlier we have seen examples with the token bear being in multiple grammatical patterns that also influence its meaning. For example, we have seen the subject-verb-object…

Single-head self-attention

In the simple attention mechanism, the attention weights are computed deterministically from the input context. We call the combination of context-free embedding (eg…

Transformers and Self-Attention

For the explanation of decoder-based architectures such as those used by GPT, please see the repo https://github.com/pantelis/femtotransformers and the embedded comments…

Vision Transformer Paper

Word2Vec Embeddings

In the so called classical NLP, words were treated as atomic symbols, e.g. hotel, conference, walk and they were represented with on-hot encoded (sparse) vectors e.g.

No matching items

Edit this page
View source
Report an issue