Simple RNN Language Model

Open In Colab

Simple RNN Language Model#

Our aim is to predict the next character given a set of previous characters from our data string. For our RNN implementation, we would take a sequence of length 25 characters as inputs to predict the next character.

The notation used here was introduced first here. This minimal character-level Vanilla RNN model was first written by Andrej Karpathy (@karpathy) and was decorated with the forward and backprop equations by students of the CS-GY-6613 course as part of an asignment.

data = 'Chios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terrain is mountainous and arid, with a ridge of mountains running the length of the island. The two largest of these mountains, Pelineon (1,297 m (4,255 ft)) and Epos (1,188 m (3,898 ft)), are situated in the north of the island. The center of the island is divided between east and west by a range of smaller peaks, known as Provatas.'

# you can also replace the data with any other txt file you want to use
# data = open('input.txt', 'r').read() # should be simple plain text file - you can use any (small) file in txt format from the web or type your own. 
import numpy as np
# creating a vocabulary of unique characters
chars = list(set(data))                                                   
data_size, vocab_size = len(data), len(chars)
print('data has %d characters, %d unique.' % (data_size, vocab_size))
data has 508 characters, 43 unique.
# Data pre-processing
# creating a dictionary, mapping characters to index and index to characters
char_to_ix = { ch:i for i,ch in enumerate(chars) }
print(char_to_ix)
ix_to_char = { i:ch for i,ch in enumerate(chars) }
print(ix_to_char)
{'9': 0, '(': 1, ')': 2, 'q': 3, 's': 4, ',': 5, 'a': 6, 'd': 7, 'P': 8, 'E': 9, 'c': 10, 'e': 11, '4': 12, '0': 13, '.': 14, ' ': 15, 'p': 16, 'm': 17, 'f': 18, 'k': 19, 'o': 20, 'l': 21, 'b': 22, '2': 23, '3': 24, 'v': 25, 'y': 26, 'n': 27, 'u': 28, '5': 29, 'h': 30, 't': 31, '7': 32, 'r': 33, 'i': 34, 'T': 35, 'C': 36, 'w': 37, '[': 38, ']': 39, '8': 40, 'g': 41, '1': 42}
{0: '9', 1: '(', 2: ')', 3: 'q', 4: 's', 5: ',', 6: 'a', 7: 'd', 8: 'P', 9: 'E', 10: 'c', 11: 'e', 12: '4', 13: '0', 14: '.', 15: ' ', 16: 'p', 17: 'm', 18: 'f', 19: 'k', 20: 'o', 21: 'l', 22: 'b', 23: '2', 24: '3', 25: 'v', 26: 'y', 27: 'n', 28: 'u', 29: '5', 30: 'h', 31: 't', 32: '7', 33: 'r', 34: 'i', 35: 'T', 36: 'C', 37: 'w', 38: '[', 39: ']', 40: '8', 41: 'g', 42: '1'}
https://drive.google.com/uc?id=12ha59vACcd8eCEPAQdbZ4-axPGPKnn7F

Inputs to RNN

  • \(x_1\) to \(x_{25}\) is the input sequence of 25 characters, one character given as input to RNN at each time step

Hidden state of RNN

  • The state consists of a single ‘hidden’ vector h

  • At every time step, a recurrence function \(f_W\) with parameters \(W_{xh}\), \(W_{hh}\) and \(b_h\) is applied to the input \(x_t\) and the output from the previous hidden state \(h_{t-1}\), to generate \(h_t\)

\( \qquad \qquad \qquad \qquad h_t = f_W (h_{t-1},x_t)\)

\( \qquad \qquad \qquad \qquad \; \; \; \; = \tanh (W_{hh}h_{t-1} + W_{xh}x_t + b_h)\)

Outputs of the RNN

  • \(\hat y_T\) is the character that our network would predict after \(T=25\) time steps

  • At each time step, a \(o_t\) is calculated as

\( \qquad \qquad \qquad \qquad o_t = W_{hy}h_t + b_y\)

  • The softmax of \(o_t\) is the set of probabilities of occurance of each unique character in the input data

\( \qquad \qquad \qquad \qquad \hat y_t = \mathtt{softmax}(o_t)\)

  • At each time step, from \(t=1\) to \(25\), loss is calculated from the set of predicted probabilities.

\( \qquad \qquad \qquad \qquad L_t = \mathtt{CE}(\hat y_t, y_t)\)

\( \qquad \) where the \(y_t\) is the next character to the input sequence in the data string

  • The total loss is the sum of all the losses from the previously unrolled steps

\( \qquad \qquad \qquad \qquad L = ∑_{t=0}^{24}L_t\)

All the weights \(W_{xh}\), \(W_{hh}\), \(b_h\), \(W_{hy}\) and \(b_y\) are reused at each time step.

Hyperparameters

  • the size of hidden state of neurons

  • the sequence length or the time steps to unroll, which is 25 in our case

  • optimizer we use here is Adagrad

  • the learning rate for Adagrad optimizer

# hyperparameters
hidden_size = 100           # size of hidden state (number of RNN simple neurons)
seq_length = 25               # number of time steps to unroll the RNN for, taking 25 previous characters to predict the next
learning_rate = 1e-1

Dimensions of tensors

Input:

  • Each character from the data string is pre-processed before being fed then into the RNN

  • From each sequence of 25 characters (for 25 time-steps) from the data string, we create an ‘inputs’ list of tokenized integer values

  • Each character is converted to an integer token index using ‘char_to_ix’ function, which maps each character to a number between 0 and 42 (as there are 43 unique characters in our data)

  • The integer tokens from the ‘inputs’ list are then one-hot encoded in 1-of-k representations, ie, into vectors of size 43 (k=43 unique characters in our data), which are fed as inputs to the RNN

\( \qquad \) => dimension of input \(x_t =\) (43,1)

Targets (\(y_t\)):

  • For each input in the ‘inputs’ list, we create a ‘target’ list consisting of the subsequent character’s integer token

  • Our targets list, which is used during the cross-entropy loss calculation, is of length 25

Predicted output:

  • The predicted outputs are the probabilities of the next characters

  • Since k=43 unique characters, the unnormalized logits for next chars is

\( \qquad \) => dimension of output \(o_t =\) (43,1)

  • The \(softmax(o_t)\) gives the class probabilities for next characters

  • The probabilities are then converted into one-hot encoded vectors using \(\arg \max\)

  • The one-hot encoded vectors are converted into integer tokens and then to a single character using ‘ix_to_char’ function

Hidden layers:

  • Since we have chosen 100 neurons in the hidden layer,

\( \qquad \) => dimension of hidden state \(h_t =\) (100,1)

Model parameters:

  • Given the hidden_size=100, input x dimension=(43,1) and output y dimension=(43,1):

\( \qquad \) => dimension of \(W_{xh} =\) (100,43),

\( \qquad \) => dimension of \(W_{hh} =\) (100,100),

\( \qquad \) => dimension of \(b_{h} =\) (100,1),

\( \qquad \) => dimension of \(W_{hy} =\) (43,100),

\( \qquad \) => dimension of \(b_{y} =\) (43,1)

# model parameters
# we set the initial values of the weights randomly from a normal distribution and set all the bias to zero

Wxh = np.random.randn(hidden_size, vocab_size)*0.01   # input to hidden, shape = (hidden_size, vocab_size) = (100,43)
Whh = np.random.randn(hidden_size, hidden_size)*0.01  # hidden to hidden, shape = (hidden_size, hidden_size) = (100,100)
Why = np.random.randn(vocab_size, hidden_size)*0.01   # hidden to output, shape = (vocab_size, hidden_size) = (43,100)
bh = np.zeros((hidden_size, 1))       # hidden bias, shape  = (hidden_size, 1) = (100,1)
by = np.zeros((vocab_size, 1))         # output bias, shape  = (vocab_size, 1) = (43,1)

Forward Pass

  • Forward through entire sequence \(x_1\) to \(x_{25}\) to compute loss

  • Calculate hidden states at each time step

\( \qquad \qquad \qquad \qquad h_t = tanh (W_{hh}h_{t-1} + W_{xh}x_t + b_h)\)

  • Calculate output \(y_t\)

\( \qquad \qquad \qquad \qquad o_t = W_{hy}h_t + b_y\)

  • The softmax of \(o_t\) is the set of probabilities of occurance of each unique character in the input data

\( \qquad \qquad \qquad \qquad \hat y_t = softmax(o_t)\)

  • Calculate loss at each time step

\( \qquad \qquad \qquad \qquad L_t = Cross\;Entropy(\hat y_t, y_t)\)

  • Calculate the total loss, which is the negative log likelihood of our model

\( \qquad \qquad \qquad \qquad L = ∑_{t=0}^{24}L_t\)

\( \qquad \qquad \qquad \qquad \; \; \; \; = - ∑_t log \; p_{model} (y_t | x_1,...,x_t)\)

Backpropogation Through Time

  • Backward through entire sequence to compute gradient

  • The nodes include parameters \(W_{xh}\), \(W_{hh}\), \(b_h\), \(W_{hy}\) and \(b_y\)

  • The inputs and outputs of nodes are \(x_t\), \(h_t\), \(y_t\), \(p_t\) and \(L_t\) at time-step \(t\)

  • We’ll use the suffix \((i)\) to indicate the \(i^{th}\) sample

Gradients on the internal nodes:

  • We’ll be computing the gradients recursively starting with the nodes immediately preceding the final loss

\( \qquad \qquad \qquad \qquad \frac{\partial L}{\partial L_t} = 1 \)

  • The gradient with respect to the softmax layer would be:

\( \qquad \qquad \qquad \qquad \frac{\partial L}{\partial y_{(i)t}} = p_{(i)t} -1 \)

  • At t=25, the gradient with respect to the hidden layer would be:

\( \qquad \qquad \qquad \qquad \frac{\partial L}{\partial h_{t=25}} = {W_{hy}}^T \frac{\partial L}{\partial y_{t=25}} \)

  • We can now iterate backwards from t=24 down to t=1:

\( \qquad \qquad \qquad \qquad \frac{\partial L}{\partial h_{t}} = \bigg(\frac{\partial h_{t+1}}{\partial h_{t}}\bigg)^T \frac{\partial L}{\partial h_{t+1}} + \bigg(\frac{\partial y_{t}}{\partial h_{t}}\bigg)^T \frac{\partial L}{\partial y_{t}} \)

\( \qquad \qquad \qquad \qquad \qquad = (W_{hh})^T \; diag\bigg(1-(h_{t+1})^2\bigg) \frac{\partial L}{\partial h_{t+1}} + (W_{hy})^T \frac{\partial L}{\partial y_{t}} \)

\( \qquad \qquad \) where \(diag\bigg(1-(h_{t+1})^2\bigg)\) indicates the diagonal matrix containing the elements \(1-(h_{(i)t+1})^2\)

Gradients on the parameter nodes:

  • Gradients with respect to \(W_{hy}\):

\( \qquad \qquad \qquad \qquad \frac{\partial L}{\partial W_{hy}} = \sum_t \sum_i \frac{\partial L}{\partial y_{(i)t}} \frac{\partial y_{(i)t}}{\partial W_{hy}} \)

\( \qquad \qquad \qquad \qquad \qquad = \sum_t \frac{\partial L}{\partial y_t} (h_t)^T\)

  • Gradients with respect to \(b_y\):

\( \qquad \qquad \qquad \qquad \frac{\partial L}{\partial b_{y}} = \sum_t \bigg(\frac{\partial y_t}{\partial b_{y}}\bigg)^T \frac{\partial L}{\partial y_{t}} \)

\( \qquad \qquad \qquad \qquad \qquad = \sum_t \frac{\partial L}{\partial y_{t}} \)

  • Gradients with respect to \(W_{hh}\):

\( \qquad \qquad \qquad \qquad \frac{\partial L}{\partial W_{hh}} = \sum_t \sum_i \frac{\partial L}{\partial h_{(i)t}} \frac{\partial h_{(i)t}}{\partial W_{hh}} \)

\( \qquad \qquad \qquad \qquad \qquad = \sum_t diag\bigg(1-(h_{t})^2\bigg) \frac{\partial L}{\partial h_t} (h_{t-1})^T\)

  • Gradients with respect to \(b_h\):

\( \qquad \qquad \qquad \qquad \frac{\partial L}{\partial b_{h}} = \sum_t \bigg(\frac{\partial h_t}{\partial b_{h}}\bigg)^T \frac{\partial L}{\partial h_{t}} \)

\( \qquad \qquad \qquad \qquad \qquad = \sum_t diag\bigg(1-(h_{t})^2\bigg) \frac{\partial L}{\partial h_{t}} \)

  • Gradients with respect to \(W_{xh}\):

\( \qquad \qquad \qquad \qquad \frac{\partial L}{\partial W_{xh}} = \sum_t \sum_i \frac{\partial L}{\partial h_{(i)t}} \frac{\partial h_{(i)t}}{\partial W_{xh}} \)

\( \qquad \qquad \qquad \qquad \qquad = \sum_t diag\bigg(1-(h_{t})^2\bigg) \frac{\partial L}{\partial h_t} (x_t)^T\)

def lossFun(inputs, targets, hprev):
  """
  inputs,targets are both list of integers.
  hprev is Hx1 array of initial hidden state
  perform forward and backward pass
  returns the loss, gradients on model parameters, and last hidden state
  """

  xs, hs, os, ps = {}, {}, {}, {}
  hs[-1] = np.copy(hprev)
  loss = 0

  # forward pass: compute loss going forward
  for t in range(len(inputs)):                                         # looping for t timesteps, which is the size of the length of inputs
    xs[t] = np.zeros((vocab_size,1))                                   # xs = one-hot encode in 1-of-k representation
    xs[t][inputs[t]] = 1
    hs[t] = np.tanh(np.dot(Wxh, xs[t]) + np.dot(Whh, hs[t-1]) + bh)    # hs_t = tanh(W_hh.hs_t-1 + W_xh.xs_t + b_h) -> hidden state
    os[t] = np.dot(Why, hs[t]) + by                                    # os = W_hy.hs_t + b_y -> unnormalized log probabilities for next chars
    ps[t] = np.exp(os[t]) / np.sum(np.exp(os[t]))                      # ps = softmax(os) -> probabilities for next chars
    loss += -np.log(ps[t][targets[t],0])                               # cross-entropy loss

  # backward pass: compute gradients going backwards
  dWxh, dWhh, dWhy = np.zeros_like(Wxh), np.zeros_like(Whh), np.zeros_like(Why)      # create numpy arrays for right size for the weights
  dbh, dby = np.zeros_like(bh), np.zeros_like(by)                                    # create numpy arrays for right size for the biasses
  dhnext = np.zeros_like(hs[0])                                                      # h_{t-1} for the first iteration is set to all zeros
  for t in reversed(range(len(inputs))):
    dy = np.copy(ps[t])
    dy[targets[t]] -= 1                      # backprop into y by taking gradient for softmax (http://cs231n.github.io/neural-networks-case-study/#grad)
    dWhy += np.dot(dy, hs[t].T)              # gradient for Why
    dby += dy                                # gradient for by
    dh = np.dot(Why.T, dy) + dhnext          # backprop into h
    dhraw = (1 - hs[t] * hs[t]) * dh         # backprop through tanh nonlinearity
    dbh += dhraw                             # gradient for bh
    dWxh += np.dot(dhraw, xs[t].T)           # gradient for Wxh
    dWhh += np.dot(dhraw, hs[t-1].T)         # gradient for Whh
    dhnext = np.dot(Whh.T, dhraw)            # calculate h_t-1 for the next iteration
  for dparam in [dWxh, dWhh, dWhy, dbh, dby]:
    np.clip(dparam, -5, 5, out=dparam)                              # clip gradients to mitigate exploding gradients
  return loss, dWxh, dWhh, dWhy, dbh, dby, hs[len(inputs)-1]
def sample(h, seed_ix, n):
  """ 
  sample a sequence of integers from the model 
  h is memory state, seed_ix is seed letter for first time step
  predicts probabilities for each character 
  returns the set of predicted indices with the highest probabilities
  """
  # at test-time sample characters one at a time, feed back to model for next character prediction
  x = np.zeros((vocab_size, 1))
  x[seed_ix] = 1                                              # x = one-hot encode the input for seed_ix letter in 1-of-k representation
  ixes = []
  for t in range(n):
    # predicting the next character
    h = np.tanh(np.dot(Wxh, x) + np.dot(Whh, h) + bh)         # h_t = tanh(W_hh.h_t-1 + W_xh.x_t + b_h) -> hidden state                    
    y = np.dot(Why, h) + by                                   # y = W_hy.h_t + b_y -> unnormalized log probabilities for next chars
    p = np.exp(y) / np.sum(np.exp(y))                         # p = softmax(y) -> probabilities for next chars
    ix = np.random.choice(range(vocab_size), p=p.ravel())     # p.ravel gives the probabilities of each entry, with the maximum ix at argmax
    x = np.zeros((vocab_size, 1))  
    x[ix] = 1                                                 # convert probabilities to one-hot encoded vectors in 1-of-k representation
    ixes.append(ix)
  return ixes                                                 # return all the indices to convert them into characters and print the predictions
# p-data pointer, n-iteration counter
n, p = 0, 0                        # setting both to zero in the beginning

# memory variables for Adagrad, initialized to all zeros
mWxh, mWhh, mWhy = np.zeros_like(Wxh), np.zeros_like(Whh), np.zeros_like(Why)    
mbh, mby = np.zeros_like(bh), np.zeros_like(by) 

# loss at time instance 0
smooth_loss = -np.log(1.0/vocab_size)*seq_length 

# while True:
# running for 80000 epochs
for i in range(80000):

  # Data pre-processing to prepare inputs and targets
  if p+seq_length+1 >= len(data) or n == 0:                        # sweeping from left to right in steps seq_length=25 long
    hprev = np.zeros((hidden_size,1))                              # reset RNN memory
    p = 0                                                          # go from start of data
  inputs = [char_to_ix[ch] for ch in data[p:p+seq_length]]         # inputs are tokens each of length seq_length=25
  targets = [char_to_ix[ch] for ch in data[p+1:p+seq_length+1]]    # targets are the tokens of the subsequent characters for each input sequence

  # Model testing
  if n % 1000 == 0:
    sample_ix = sample(hprev, inputs[0], 200)                   # sample from the model and predict characters every 1000 iterations
    txt = ''.join(ix_to_char[ix] for ix in sample_ix)           # convert tokens into characters and add it to the list of previous predictions
    print('----\n %s \n----' % (txt, ))                         # print model predictions

  # Model training
  loss, dWxh, dWhh, dWhy, dbh, dby, hprev = lossFun(inputs, targets, hprev)   # forward seq_length characters through the net and fetch gradient
  smooth_loss = smooth_loss * 0.999 + loss * 0.001                            # RNN adds all the losses from the previously unrolled steps
  if n % 1000 == 0: print('iter %d, loss: %f' % (n, smooth_loss))             # print progress
  
  # parameter update with Adagrad
  for param, dparam, mem in zip([Wxh, Whh, Why, bh, by], 
                                [dWxh, dWhh, dWhy, dbh, dby], 
                                [mWxh, mWhh, mWhy, mbh, mby]):
    mem += dparam * dparam
    param += -learning_rate * dparam / np.sqrt(mem + 1e-8)              # adagrad parameter update

  p += seq_length                                                       # move data pointer
  n += 1                                                                # iteration counter 
----
 0yhE4P39uyheem T2lv)gkr2b 2h)2 Pcye]C)uea,mh2b(cd(Ct3qgnP[mibm5wi.(fenaw.8e7b5r5li1P2l]ctEu3i1h( Tt8EdPrE12107Tvwd[)((fPk1]g )om[n 3t37prpl5qg8h 3d)tbgdkt3)e ]PP9Pm3u7ar9[[Cboug7q4v11ss)eg45P8uEp0geqk 
----
iter 0, loss: 94.030002
----
 hios isntwd antwees, (4,297m seth aed 2.8 (1,.8 ind ang, kmiund celon thes, anlsn and.leng cos ind om rar aad lade aed, Teea nin ite miunlinesce and mThe mannd cind be ong cot, andverrl (3E(1,29t8 e n 
----
iter 1000, loss: 67.914614
----
 hios is and in ast andaleng the ted aridge orth of the islane lest staa nt rrted or (4219t t r of owe ist or the onof thest ty anda nd as krheEnof 89  br is aid  untat, artais, (18.[29 Theed leng phes 
----
iter 2000, loss: 37.206784
----
 hios island and Epos (1,25uth  fovering ff moestaiss, Peling and. (31898 m (3,295 m (3,898 ft)), and. The cende or moftts a ridge of mountaits won (4255 th, and 29 km (18 mi).[2] The sorrennd is ast b 
----
iter 3000, loss: 18.941566
----
 hios island is crescent or (4,.289 The terrain east are witust, the long fromin the irleng covernd is st, wid in the is knd(4,298 fthe orlan ain ridge of movnnous mgestaiseng the ling and wunt is divi 
----
iter 4000, loss: 14.128117
----
 hios island is crescent or kidney shaped, 50 km (325.210 sq mi).[2] The terrain is mountainounta f Peleng sheperain is mof Pean ist, aro sides wituasesislaid (1 theg area or kndeseas, are wor kinland  
----
iter 5000, loss: 6.929551
----
 hios island is cred, 50 km (31 mT) long the length of the island. The center of the island is divided between east and. kndad aris ledeated in the norta of the islath wist by a range of smaller peaks, 
----
iter 6000, loss: 4.229557
----
 hios islaing and west by a range of sma dib and between east and west ty  length of the innd s, The center of the island is divided between east and west by a f west by a range of smaller peaks, known 
----
iter 7000, loss: 2.343361
----
 hios island is and Efom s, (m,.2k ft)) and ftresnd rertn is th, and 29 km (1, mi) at itn enof mountaend w9 km (18 m (18 mi) at its wnorth of the island. The two largest of these mountains, Pelineon (1 
----
iter 8000, loss: 1.702504
----
 hios islann or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq moustlis the island. The center of the istcerrain is  
----
iter 9000, loss: 1.119716
----
 hios islan eisland is divided between east and west by a range of smaller peaks, known as Pr The center of the of the island. The two laets an nd, mi).[2] The terrinna (m,99 mi) at its widest, coverin 
----
iter 10000, loss: 0.826397
----
 hios island is crescent of the island. The two largest of these mountains, Pelineon and Epos (1,188 m (3,898 ft)), anda frtween enouth rof two  (4,255 ft)) and Efte rorth of the cent or kidney c Perin 
----
iter 11000, loss: 0.664560
----
 hisos known as2 28a of the island Epos (1,297 m (4,255 ft)) and Epos (1,188 m (3,898 ft)), are side ridge of mountains runnon mi) 8t bt a d, 50 km (31 mi) long from north to south, and 29 km (18 mi) a 
----
iter 12000, loss: 0.565176
----
 hion ino mi) at island. The center of the island is divided between east and west by a range of smaller peaks, known as Pr douthe con getuth te south, and 29 km (18 mi) at its widest, covering an arra 
----
iter 13000, loss: 0.497673
----
 hios island is crescent of 842.289 km2 (325.210 sq mi).[2] The  mountains, Pelineon (1,297 m (4,255 ft)) and Epos (1,188 m (3,898 ft)), are situated in the north of the island. The center of the islan 
----
iter 14000, loss: 0.447981
----
 hios island is crescent or kndeEpraino the island. The two largest of these mountains, Pelineon (1,297 m (4,255 ft)) and Epos (1,188 m (3,898 ft)), are situated in the north of the island. The two lar 
----
iter 15000, loss: 0.409268
----
 hios island is crescent or kidney shaped, 50 km (31 mi) long feof sorth of the island. The two largest of these mountains, Pelineon (1,297 m (4,255 ft)) and Epos (1,188 m (3,898 ft)), are situated in  
----
iter 16000, loss: 0.377955
----
 hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terrain is mountainous and ar 
----
iter 17000, loss: 0.351873
----
 hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area island island. The two largest of these mountains, Pelineon (1,297 m 
----
iter 18000, loss: 0.329921
----
 hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terrain is mountainous and ar 
----
iter 19000, loss: 0.310971
----
 hios island is crescent or kidney shaped, 50 km (31 mi) long f of touthe center of the island is divided between east and west by a range or kinleresh ft of thereathe ente covering an area of 842.289  
----
iter 20000, loss: 0.294932
----
 hios i(4,.289 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terrain is mountainous and arreng an area of 842.289 km2 (325. 10 sq mi).[2] The terrain is mountainous  
----
iter 21000, loss: 0.280158
----
 hios island is crescent or kidney shaped, 50 km (31 mi) long the length of the island. The center of the island is di) ling from north to south, and 29 km (18 mi) at its widest, covering an area of 84 
----
iter 22000, loss: 0.267610
----
 hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 8425. m4,.[2] The terrain is mountainous and arid, with a ridge o 
----
iter 23000, loss: 0.256335
----
 hios island is crescent or kidney shaped, kmountains runteins, Pelineon (1,297 m (4,255 ft)) and Eposh at ape lang the length of the island. The center of the island is divided between tast and west b 
----
iter 24000, loss: 0.246315
----
 hios island is crescent or kidney shaped, 50 km (31 mi) lonn ridge of mountains running the length of the island. The twitunon shunt is ftthe conlerarea of 842.289 km2 (325.210 sq mi).[2] The terrain  
----
iter 25000, loss: 0.237299
----
 hios island is crescent or kidney shaped, .42.289 km2e ft by a range of smaller peaks, known as Pr ind of mountains running the length of the island. The two largest of these mountains, Pelineon (1,29 
----
iter 26000, loss: 0.229051
----
 hios ithe island. The two largest of these mountains, Pelineon (1,297 m (4,255 ft)) and Epos (1,188 m (3,898 ft)), are situated in the north of the island. The two laridge of mountains running the len 
----
iter 27000, loss: 0.221695
----
 hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terrain is mountainous and ar 
----
iter 28000, loss: 0.214969
----
 hios island is Pe inland is divided between east and west by a range of smaller peaks, known as Prlanin mist its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terrain is mountainous  
----
iter 29000, loss: 0.208756
----
 hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] the om (4,255 ft)) and Epos (m,18 
----
iter 30000, loss: 0.203045
----
 hios island is crescent or The co1[t its wist by a range of smaller peaks, known as Prinnd sfthe island. The two largest of these mountains, Perineon (1,297 m slano the island. The two largest of thes 
----
iter 31000, loss: 0.197785
----
 hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terrain is mountainous and ar 
----
iter 32000, loss: 0.192925
----
 hios island is crescent of the island. The center of the island is divided between east and west by a o 25 long frof Pelineon (1,297 m (4,255 ft)) and Epos (1,188 m (3,898 ft)), are situated in the no 
----
iter 33000, loss: 0.188420
----
 hios island is creslent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terrain is mountainous and ar 
----
iter 34000, loss: 0.184230
----
 hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terrain is mountainous and ar 
----
iter 35000, loss: 0.180323
----
 hios island is Perth of the island. The center of the island is dividwd a of t)), are situated in the north of the island. The center of the island is divided between east and west by a range of small 
----
iter 36000, loss: 0.176668
----
 hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terrain is mountainous and ar 
----
iter 37000, loss: 0.173240
----
 hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terrain is mountainous and ar 
----
iter 38000, loss: 0.170019
----
 hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.589 km2 (325.210 sq mi).[2] The terrain is mountainous and ar 
----
iter 39000, loss: 0.166985
----
 hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terrain is mountainous and ar 
----
iter 40000, loss: 0.164122
----
 hios island isl rfathese ft)), are situated in the north of the island. The center of the island is divided between east and west by a range of smaller peaks, known as Pr Toe cown is,crescent os (1,29 
----
iter 41000, loss: 0.161418
----
 hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terrain is mountainous and ar 
----
iter 42000, loss: 0.158860
----
 hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terrain is mountainous and ar 
----
iter 43000, loss: 0.156438
----
 hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terea north row area nouth ro 
----
iter 44000, loss: 0.154141
----
 hios island is crescent or kidney shaped, 50 km (31 mi) long frof Pelineon (1,297 m (4,255 ft)) and Epos (1,188 m (3,898 ft)), are situated in the north of the island. The two largest iy  rea indaind. 
----
iter 45000, loss: 0.151962
----
 hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area risland. The two largest of these mountains running the length of th 
----
iter 46000, loss: 0.149890
----
 hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terrain is mountainous and ar 
----
iter 47000, loss: 0.147919
----
 hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terrain is mountainous and ar 
----
iter 48000, loss: 0.146039
----
 hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terrain is mountainous and ar 
----
iter 49000, loss: 0.144243
----
 hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terrain is mountainous and ar 
----
iter 50000, loss: 0.142517
----
 hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terrain is mountainous and ar 
----
iter 51000, loss: 0.140829
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
/workspaces/artificial_intelligence/artificial_intelligence/aiml-common/lectures/nlp/language-models/simple-rnn-language-model/index.ipynb Cell 17 line 2
     <a href='vscode-notebook-cell://dev-container%2B7b22686f737450617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e6365222c226c6f63616c446f636b6572223a66616c73652c2273657474696e6773223a7b22686f7374223a22756e69783a2f2f2f7661722f72756e2f646f636b65722e736f636b227d2c22636f6e66696746696c65223a7b22246d6964223a312c22667350617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2265787465726e616c223a2266696c653a2f2f2f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2270617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c22736368656d65223a2266696c65227d7d/workspaces/artificial_intelligence/artificial_intelligence/aiml-common/lectures/nlp/language-models/simple-rnn-language-model/index.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=25'>26</a>   print('----\n %s \n----' % (txt, ))                         # print model predictions
     <a href='vscode-notebook-cell://dev-container%2B7b22686f737450617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e6365222c226c6f63616c446f636b6572223a66616c73652c2273657474696e6773223a7b22686f7374223a22756e69783a2f2f2f7661722f72756e2f646f636b65722e736f636b227d2c22636f6e66696746696c65223a7b22246d6964223a312c22667350617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2265787465726e616c223a2266696c653a2f2f2f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2270617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c22736368656d65223a2266696c65227d7d/workspaces/artificial_intelligence/artificial_intelligence/aiml-common/lectures/nlp/language-models/simple-rnn-language-model/index.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=27'>28</a> # Model training
---> <a href='vscode-notebook-cell://dev-container%2B7b22686f737450617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e6365222c226c6f63616c446f636b6572223a66616c73652c2273657474696e6773223a7b22686f7374223a22756e69783a2f2f2f7661722f72756e2f646f636b65722e736f636b227d2c22636f6e66696746696c65223a7b22246d6964223a312c22667350617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2265787465726e616c223a2266696c653a2f2f2f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2270617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c22736368656d65223a2266696c65227d7d/workspaces/artificial_intelligence/artificial_intelligence/aiml-common/lectures/nlp/language-models/simple-rnn-language-model/index.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=28'>29</a> loss, dWxh, dWhh, dWhy, dbh, dby, hprev = lossFun(inputs, targets, hprev)   # forward seq_length characters through the net and fetch gradient
     <a href='vscode-notebook-cell://dev-container%2B7b22686f737450617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e6365222c226c6f63616c446f636b6572223a66616c73652c2273657474696e6773223a7b22686f7374223a22756e69783a2f2f2f7661722f72756e2f646f636b65722e736f636b227d2c22636f6e66696746696c65223a7b22246d6964223a312c22667350617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2265787465726e616c223a2266696c653a2f2f2f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2270617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c22736368656d65223a2266696c65227d7d/workspaces/artificial_intelligence/artificial_intelligence/aiml-common/lectures/nlp/language-models/simple-rnn-language-model/index.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=29'>30</a> smooth_loss = smooth_loss * 0.999 + loss * 0.001                            # RNN adds all the losses from the previously unrolled steps
     <a href='vscode-notebook-cell://dev-container%2B7b22686f737450617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e6365222c226c6f63616c446f636b6572223a66616c73652c2273657474696e6773223a7b22686f7374223a22756e69783a2f2f2f7661722f72756e2f646f636b65722e736f636b227d2c22636f6e66696746696c65223a7b22246d6964223a312c22667350617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2265787465726e616c223a2266696c653a2f2f2f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2270617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c22736368656d65223a2266696c65227d7d/workspaces/artificial_intelligence/artificial_intelligence/aiml-common/lectures/nlp/language-models/simple-rnn-language-model/index.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=30'>31</a> if n % 1000 == 0: print('iter %d, loss: %f' % (n, smooth_loss))             # print progress

/workspaces/artificial_intelligence/artificial_intelligence/aiml-common/lectures/nlp/language-models/simple-rnn-language-model/index.ipynb Cell 17 line 3
     <a href='vscode-notebook-cell://dev-container%2B7b22686f737450617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e6365222c226c6f63616c446f636b6572223a66616c73652c2273657474696e6773223a7b22686f7374223a22756e69783a2f2f2f7661722f72756e2f646f636b65722e736f636b227d2c22636f6e66696746696c65223a7b22246d6964223a312c22667350617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2265787465726e616c223a2266696c653a2f2f2f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2270617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c22736368656d65223a2266696c65227d7d/workspaces/artificial_intelligence/artificial_intelligence/aiml-common/lectures/nlp/language-models/simple-rnn-language-model/index.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=33'>34</a>   dWxh += np.dot(dhraw, xs[t].T)           # gradient for Wxh
     <a href='vscode-notebook-cell://dev-container%2B7b22686f737450617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e6365222c226c6f63616c446f636b6572223a66616c73652c2273657474696e6773223a7b22686f7374223a22756e69783a2f2f2f7661722f72756e2f646f636b65722e736f636b227d2c22636f6e66696746696c65223a7b22246d6964223a312c22667350617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2265787465726e616c223a2266696c653a2f2f2f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2270617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c22736368656d65223a2266696c65227d7d/workspaces/artificial_intelligence/artificial_intelligence/aiml-common/lectures/nlp/language-models/simple-rnn-language-model/index.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=34'>35</a>   dWhh += np.dot(dhraw, hs[t-1].T)         # gradient for Whh
---> <a href='vscode-notebook-cell://dev-container%2B7b22686f737450617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e6365222c226c6f63616c446f636b6572223a66616c73652c2273657474696e6773223a7b22686f7374223a22756e69783a2f2f2f7661722f72756e2f646f636b65722e736f636b227d2c22636f6e66696746696c65223a7b22246d6964223a312c22667350617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2265787465726e616c223a2266696c653a2f2f2f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2270617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c22736368656d65223a2266696c65227d7d/workspaces/artificial_intelligence/artificial_intelligence/aiml-common/lectures/nlp/language-models/simple-rnn-language-model/index.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=35'>36</a>   dhnext = np.dot(Whh.T, dhraw)            # calculate h_t-1 for the next iteration
     <a href='vscode-notebook-cell://dev-container%2B7b22686f737450617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e6365222c226c6f63616c446f636b6572223a66616c73652c2273657474696e6773223a7b22686f7374223a22756e69783a2f2f2f7661722f72756e2f646f636b65722e736f636b227d2c22636f6e66696746696c65223a7b22246d6964223a312c22667350617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2265787465726e616c223a2266696c653a2f2f2f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2270617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c22736368656d65223a2266696c65227d7d/workspaces/artificial_intelligence/artificial_intelligence/aiml-common/lectures/nlp/language-models/simple-rnn-language-model/index.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=36'>37</a> for dparam in [dWxh, dWhh, dWhy, dbh, dby]:
     <a href='vscode-notebook-cell://dev-container%2B7b22686f737450617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e6365222c226c6f63616c446f636b6572223a66616c73652c2273657474696e6773223a7b22686f7374223a22756e69783a2f2f2f7661722f72756e2f646f636b65722e736f636b227d2c22636f6e66696746696c65223a7b22246d6964223a312c22667350617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2265787465726e616c223a2266696c653a2f2f2f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2270617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c22736368656d65223a2266696c65227d7d/workspaces/artificial_intelligence/artificial_intelligence/aiml-common/lectures/nlp/language-models/simple-rnn-language-model/index.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=37'>38</a>   np.clip(dparam, -5, 5, out=dparam)                              # clip gradients to mitigate exploding gradients

File <__array_function__ internals>:180, in dot(*args, **kwargs)

KeyboardInterrupt: