Back2classroom

The Learning Problem

The supervised learning problem statement.

MaskRCNN Inference

import torch
from torchvision import datasets, transforms, models, ops, io
from torchvision.models.detection.mask_rcnn import MaskRCNNPredictor
from torchvision.models.detec…

Mask R-CNN Demo

A quick intro to using the pre-trained model to detect and segment objects.

Mask R-CNN - Inspect Training Data

Inspect and visualize data loading and pre-processing code.

Mask R-CNN - Inspect Trained Model

Code and visualizations to test, debug, and evaluate the Mask R-CNN model.

Mask R-CNN - Inspect Weights of a Trained Model

This notebook includes code and visualizations to test, debug, and evaluate the Mask R-CNN model.

UNet Semantic Segmentation

Finetuning Language Models for Text Classification - Patent Dataset

This notebook was submitted by NYU student Sky Achitoff

Pantelis Monogioudis

Natural Language Processing

“You shall know a word by the company it keeps” (J. R. Firth 1957: 11) - many modern discoveries are in fact rediscoveries from other works sometimes decades old. NLP is…

Language Models Workshop

The following notebook is a from-scratch attempt on character-level language modeling. Its instructive for you to go through it first and then go through the corresponding…

CNN Language Model

The following was developed by Harini Appansrinivasan, NYU as part of an assignment submission.

Language Models

These notes heavily borrowing from the CS224N set of notes on Language Models.

LLM Inference

NVIDIA’s Guide
Hugging Face TGI

LSTM Language Model from scratch

This notebook was borrowed from Christina Kouridis’ github. The notation is different than the notation used in the LTSM section of the notes and will be changed in a next…

RNN Language Models

When we focus on making predictions based on a fixed window of context (i.e. the $n$ previous words), in some cases, the window may not be sufficient to capture the…

Example of an RNN Language Model

Our aim is to predict the next character given a set of previous characters from our data string. For our RNN implementation, we would take a sequence of length 25…

Introduction to NLP Pipelines

In this chapter, we will introduce the topic of processing with neural architectures language in general. This includes natural language, code etc.

Text Tokenization

In earlier chapters we have limited the discussion to tokenizers that either produce a list of words or a list of characters. Its very important though to understand the…

Word2Vec Embeddings

In the so called classical NLP, words were treated as atomic symbols, e.g. hotel, conference, walk and they were represented with on-hot encoded (sparse) vectors e.g.

Word2Vec from scratch

This self-contained implementation is instructive and you should go through it to understand the word2vec embedding.

Word2Vec Tensorflow Tutorial

word2vec is not a singular algorithm, rather, it is a family of model architectures and optimizations that can be used to learn word embeddings from large datasets.…

RNN-based Neural Machine Translation

These notes heavily borrowing from the CS229N 2019 set of notes on NMT.

The BLEU Score

In 2002, IBM researchers developed the BiLingual Evaluation Understudy (BLEU) metric that remains, with its many variants to this day, one of the most quoted metrics for…

Attention in RNN-based NMT

When you hear the sentence “the soccer ball is on the field,” you don’t assign the same importance to all 7 words. You primarily take note of the words “ball” “on,” and “field…

Character-level recurrent sequence-to-sequence model

Author: fchollet
Date created: 2017/09/29
Last modified: 2020/04/26
Description: Character-level recurrent sequence-to-sequence model.

The Annotated Transformer

Attention is All You Need

Understanding the Division by √d in the Attention Mechanism

In this notebook, we explore why the dot-product attention mechanism includes a scaling factor of $ $. We use an example with embedding dimension $d = 4 $, sequence length…

LLM inference at scale with TGI – Auto‑converter

This notebook fetches the original article from Hugging Face and recreates it locally so that all text, LaTeX equations, and images are rendered inside Markdown cells.
Just…

Multilayer Perceptron (MLP)

Notice that at its core the output of the multihead self attention is a weighted sum of the input tokens. This is a linear combination of the value vectors and the attention…

Multi-head self-attention

Earlier we have seen examples with the token bear being in multiple grammatical patterns that also influence its meaning. For example, we have seen the subject-verb-object…

Positional Embeddings

In the RNN architectures,the decoder state at time step $t$ was a function of the decoder state at time step $t-1$ and the input token at time step $t$. In other…

Scaling

import numpy as np
import matplotlib.pyplot as plt

# Creating an 8-element numpy vector with random gaussian values
# vector = np.random.randn(8)
vector = np.array([0.17148…

Single-head self-attention

In the simple attention mechanism, the attention weights are computed deterministically from the input context. We call the combination of context-free embedding (eg…

Transformers and Self-Attention

For the explanation of decoder-based architectures such as those used by GPT, please see the repo https://github.com/pantelis/femtotransformers and the embedded comments…

Automated Reasoning

We have seen in an earlier chapter where we introduced a dynamical system governing the state evolution of the environment that a state is composed of variables and such fact…

Logical Reasoning

Logical Agents

In this chapter we see how agents equipped with the ability to represent internally the state of the environment and reason about the effectiveness of possible actions using…

Logical Inference

The wumpus world despite its triviality, contains some deeper abstractions that are worth summarizing.

World Models

For each problem we can define a number of world models each representing every possible state (configuration) of the environment that the agent may be in.

Task Planning

Task planning, at least in the approach presented here combines logic and search.

Specifying the engine name

The unified_planning.plot package provides useful functions to visually plot many objects

The Unified Planning Library

In this demo we will scratch the surface of the UP: we will set it up, we manually create the blocksworld domain using the UP API, we create a problem using a bit of python…

Planning Domain Definition Language (PDDL)

In the chapter of propositional logic we have seen the combinatorial explosion problem that results from the need to include in the reasoning / inference step all possible…

Logistics Planning in PDDL

We will use the Logistics domain to illustrate how to represent a planning task in PDDL.

Manufacrturing Robot Planning in PDDL

This is a real case that we tackled for a manufacturing company. This company devises supply chains to make pieces of medical equipments. A supply chain consists of…

Bellman Expectation Backup

In this section we describe how to calculate the value functions by establishing a recursive relationship similar to the one we did for the return. We replace the…

Bellman Optimality Backup

Now that we can calculate the value functions efficiently via the Bellman expectation recursions, we can now solve the MDP which requires maximize either of the two…

Policy Iteration

In this section we start developing dynamic programming algorithms that solve a perfectly known MDP. In the Bellman expectation backup section we have derived the equations…

Policy Iteration Gridworld

This notebook implements policy iteration for the classic 4x3 grid world example in Artificial Intelligence: A Modern Approach, Figure 17.2.

Policy Iteration

# Uncomment to run the code locally 
# !git clone https://github.com/dennybritz/reinforcement-learning.git reinforcement_learning

Cloning into…

Value Iteration

We already have seen that in the Gridworld example in the policy iteration section , we may not need to reach the optimal state value function $v_*(s)$ to obtain an…

Recycling Robot: Value Iteration to Estimate $q^*$

The exerise 3.15 solution provides the closed form expressions for the bellman optimality equations for Q*.

Value Iteration Gridworld

Two

Value vs Policy Iteration for a trivial two state MDP

State | Action A | Action B s1s_1s1 | s2s_2s2 +2 | s1s_1s1 +0 s2s_2s2 | s1s_1s1 +0 | s2s_2s2 +0

Markov Decision Processes

Many of the algorithms presented here like policy and value iteration have been developed in older repos such as this and this. This site is being migrated to be compatible…

Introduction to MDP

We start by reviewing the agent-environment interface with this evolved notation and provide additional definitions that will help in grasping the concepts behind DRL. We…

Introduction to MDP

We start by reviewing the agent-environment interface with this evolved notation and provide additional definitions that will help in grasping the concepts behind DRL. We…

Jack’s Car Rental

https://github.com/zy31415/jackscarrental

Cleaning Robot - Deterministic MDP

Cleaning Robot - Stochastic MDP

The following code shows the estimation of the q value function for a policy, the optimal q_star and the optimal policy for the cleaning robot problem in the strochastic case.

Finding optimal policies in Gridworld

POMDP Example

source

Applying the Bellman Optimality Backup

Finite State Machine of a a recycling robot and MDP dynamics LUT

Optimal Capacity Control

In this section we outline a capacity control policy that is routinely used in various industries (airlines, car rentals, hospitality) to make reservations towards a…

# aima_gridworld_env.py

import gymnasium as gym
from gymnasium import spaces
from minigrid.core.grid import Grid
from minigrid.minigrid_env import MiniGridEnv

class AIMAGr…

Policy Evaluation (Prediction)

The policy $\pi$ is evaluated when we have produced the state-value function $v_\pi(s)$ for all states. In other words when we know the expected discounted returns that…

Policy Improvement (Control)

In the policy improvement step we are given the value function and simply apply the greedy heuristic to it.

Reinforcement Learning

We started looking at different agent behavior architectures starting from the planning agents where the model of the environment is known and with no interaction with it…

Generalized Policy Iteration

As we saw in the dynamic programming (DP) solution MDP problem, policy iteration is an algorithm that consists of two simultaneous, interacting processes: one making the…

$\epsilon$-greedy Monte-Carlo (MC) Control

In this section we outline methods that can result in optimal policies when the MDP is unknown and we need to learn its underlying functions / models - also known as the mode…

SARSA Gridworld Example

SARSA Gridworld

The SARSA Algorithm

SARSA implements a $Q(s,a)$ value-based GPI and naturally follows as an enhancement from the $\epsilon-greedy$ policy improvement step of MC control.

Policy Gradient - Pong Game

""" Trains an agent with (stochastic) Policy Gradients on Pong. Uses OpenAI Gym. """
import numpy as np
import cPickle as pickle
import gym

# hyperparameters
H = 200 #…

Policy Gradient Algorithms - REINFORCE

Given that RL can be posed as an MDP, in this section we continue with a policy-based algorithm that learns the policy directly by optimizing the objective function and can…

Monte-Carlo Prediction

In this chapter we find optimal policy solutions when the MDP is unknown and we need to learn its underlying value functions - also known as the model free prediction…

Example of $Q(s,a)$ Prediction

Suppose an agent is learning to play the toy environment shown above. This is a essentially a corridor and the agent has to learn to navigate to the end of the corridor to…

MC vs. TD(0)

It is instructive to see the difference between MC and TD approaches in the following example.

Temporal Difference (TD) Prediction

If one had to identify one idea as central and novel to reinforcement learning, it would undoubtedly be temporal-difference(TD) learning. TD learning is a combination of…

Welcome !

Learn the concepts and engineer AI agents with real-time perceptive and language understanding abilities.

The Learning Problem

Introduction to Scene Understanding

Mask R-CNN Semantic Segmentation

Detectron2 Beginner’s Tutorial

Mask R-CNN: A detailed guide with Detectron2:

MaskRCNN Inference

Mask R-CNN Demo

Mask R-CNN - Inspect Training Data

Mask R-CNN - Inspect Trained Model

Mask R-CNN - Inspect Weights of a Trained Model

UNet Semantic Segmentation

Finetuning Language Models for Text Classification - Patent Dataset

Natural Language Processing

Language Models Workshop

CNN Language Model

Language Models

LLM Inference

LSTM Language Model from scratch

RNN Language Models

Example of an RNN Language Model

Introduction to NLP Pipelines

Text Tokenization

Word2Vec Embeddings

Word2Vec from scratch

Word2Vec Tensorflow Tutorial

RNN-based Neural Machine Translation

The BLEU Score

Attention in RNN-based NMT

Character-level recurrent sequence-to-sequence model

The Annotated Transformer

Understanding the Division by √d in the Attention Mechanism

LLM inference at scale with TGI – Auto‑converter

Multilayer Perceptron (MLP)

Multi-head self-attention

Positional Embeddings

Scaling

Single-head self-attention

Transformers and Self-Attention

Automated Reasoning

Logical Reasoning

Logical Agents

Logical Inference

World Models

Task Planning

Specifying the engine name

The Unified Planning Library

Planning Domain Definition Language (PDDL)

Logistics Planning in PDDL

Manufacrturing Robot Planning in PDDL

Bellman Expectation Backup

Bellman Optimality Backup

Policy Iteration

Policy Iteration Gridworld

Policy Iteration

Value Iteration

Recycling Robot: Value Iteration to Estimate \(q^*\)

Value Iteration Gridworld

Value vs Policy Iteration for a trivial two state MDP

Markov Decision Processes

Introduction to MDP

Introduction to MDP

Jack’s Car Rental

Cleaning Robot - Deterministic MDP

Cleaning Robot - Stochastic MDP

Finding optimal policies in Gridworld

POMDP Example

Applying the Bellman Optimality Backup

Optimal Capacity Control

Policy Evaluation (Prediction)

Policy Improvement (Control)

Reinforcement Learning

Generalized Policy Iteration

\(\epsilon\)-greedy Monte-Carlo (MC) Control

SARSA Gridworld Example

The SARSA Algorithm

Policy Gradient - Pong Game

Policy Gradient Algorithms - REINFORCE

Monte-Carlo Prediction