Howdy! We continue our experiments with AI chatbots at ElifTech CPD (Cool Projects Department). If you've been watching our progress for some time now, you already know that we've already built a chatbot using a 3rd party service Rasa Core for user intent recognition and dialog flow management.

This time, we decided to build our own models using Google’s TensorFlow and Python 3.5. One will get user intents from user utterance and the other (an LSTM neural network) will manage the dialog flow (predict the next action of the bot, its response). Keep on reading to learn how we did it.

Chatbot UI and Flow

We really didn't want to make the interface overly complicated. So, we went with a simple, intelligent bot that greets you, introduces itself and shares some basic info regarding your private financial status.

Here's the look we ended up with:

Finance Bot. UI and Flow

NOTE: There are no if/else statements in the code. Instead, our LSTM model will decide when to respond to the user and what response to use.

NOTE: We developed a “slack connector” to test bot in the Slack chat. More on that later.

You can find the code on Github.

Tools

Like we’ve said before, we used Python 3.5 (in a virtual environment) and TensorFlow 1.12 since it has excellent high-level APIs, especially Keras. We store all dependencies in the requirements.txt source code. pip install requirements.txt installs them to your Python virtual environment.

We also downloaded GloVe vectors and mounted them in finbot-2/app/main/ai/data/glove/glove.6B.100d.txt

For demo purposes, we used pickle for object serialization and storing NLU and dialog data as an object in a file. No databases were used here.

User Intent Classification

NLU training data are labeled samples of different intents, stored in /finbot-2/app/main/ai/data/intents.json. Every intent tag starts with the “intent_” string so that it’s easier to get an eye on it. We also use the same intent naming in the dialog flow training data. Next, we list each intent name in the domain.yml file (finbot-2/app/main/ai/domain.yml under “actions_list”) for every bot utterance because that's where you describe the chatbot’s world.

Here’s our domain.yml:

actions_list:
      - intent_greeting
      - intent_goodbye
      - intent_thanks
      - intent_purse_total
      - intent_help
      - intent_affirm
      - intent_deny
      - intent_get_spend_planned
      - intent_new_buy
      - intent_inform
      - intent_get_spent
      
      - utter_goodbye
      - utter_my_pleasure
      - utter_spent
      - utter_cannot_buy
      - utter_ask_price
      - utter_spend_planned
      - utter_affirm
      - utter_help_info
      - utter_how_can_help
      
      - action_claculate_new_expence
      

The Trainer

Our general approach to embedding was using GloVe word vectors. Although we had the TensorFlow Tokenizer “word_index,” we had to use unvectored words in the Testing mode. Eventually, we had about ten examples for each intent.

Here's what our NLU model code looks like:

model = Sequential()
      embed = layers.Embedding(vocab_size, glove_dimension, weights=[embed_matrix], input_length=max_length,
                              trainable=False)
      
      model.add(embed)
      model.add(layers.Flatten())
      model.add(layers.Dense(30, activation=tf.nn.relu)),
      model.add(layers.Dropout(0.5))
      model.add(layers.Dense(num_classes, activation=tf.nn.softmax))
      

And here’s the model:

Finance Bot. NLU model code

The Sequential Structure

The sequential structure of our model involves the “Embedding” layer with the GloVe word matrix. The “Flatten” layer makes one flat data vector. Next is the “Dense” layer with 30 neurons. Another dense layer has a number of neurons equal to the maximum number of predictable classes. There's also one dropout layer to avoid overfitting.

Then, we passed the GloVe embedded matrix to our model. As it has our tf.tokenized word sequences tied to the GloVe word vectors, it transforms each word number to a vector during training. Note, because we have the GloVe vectors as trained values, we set our model embedding layer to trainable=False.

When we make a prediction, we simply pass word numbers, which are tf.tokenized utterances. In this way, the model automatically associates tokens with embedded (learned) vectors and makes a proper prediction.

Data Structures

The kind of data structure that your model can accept (x and y) is decisive. We know this for sure from our experience in this project since we messed up with the labels. Particularly, the classes were [1 2 3 4], but we built the first model with parameters for binary classification instead, so the model produced either 1 or 0. So, here’s a piece of advice for you: take your time to get a clue of what dimensions your input and output data have. Although it was a minor mistake, it cost us a lot of time and effort.

In our case, the x_train looked like this:

Finance Bot. x_train

And the labels, y_train, were the following:

Finance Bot. y_train

Unfortunately, we didn't have a dataset large enough to divide it into training and validation sets. Therefore, we skipped the validation step, and only used x_train and y_train to train the model.

NOTE: When intents.json changed, we had to train the NLU again.

The Predictor

A predictor is a method that receives user utterance in rows of text and uses the trained model for prediction. After a prediction, we get the class with a maximum score as the most probable user’s intent. Also, we implemented a threshold prediction value. In other words, if the maximum prediction score appeared lower than the value, we skipped any further actions. This makes chat bots respond to users with something like, “I didn't get you, please repeat that.”

Dialog Management

Working on dialog management was even more engaging for us! We taught our bot how to react to user intents, in particular, when to say something and what exactly to say.

In general, we had a set of dialog examples stored in /finbot-2/app/main/ai/data/dialogs.yaml. It’s just a sample of user-bot conversations. The trainer performs the following tasks:

  • dividing each dialog into smaller sequences (which turn labels into next bot actions)
  • converting each dialog, bot or user action from dialogs.yaml to tokens
  • matching the sizes of pad sample sequences
  • One-hot labels encoding
  • Compiling the LSTM neural network model
  • Fitting the training data
  • Saving the necessary data to the file storage

Here’s the dialog file structure:

# example dialogs.yaml
      dialogs:
      - flow:
       - intent_1
       - utter_1
       - intent_2
      - flow:
       - int_1
       - utter_2
       - int_3
      

We used the domain.yaml file for proper tokenization of dialog actions. It holds a list with all the names of user intents, bot actions and utterances. We made the dialog tokens dictionary based on this data in the domain.yaml file.

Here's what it looks like:

actions_list:
      - intent_greeting
      - intent_goodbye
      - intent_thanks
      - intent_purse_total
      - intent_help
      - intent_affirm
      - intent_deny
      - intent_get_spend_planned
      - intent_new_buy
      - intent_inform
      - intent_get_spent
      
      - utter_goodbye
      - utter_my_pleasure
      - utter_spent
      - utter_cannot_buy
      - utter_ask_price
      - utter_spend_planned
      - utter_affirm
      - utter_help_info
      - utter_how_can_help
      
      - action_claculate_new_expence
      

NOTE: User intent should start with “intent_,” bot utterance starts with “utter_,” bot action starts with “action_.”

Aggregating the Training Set for Dialog Neural Network

Ah, this was a tough one! We spent several days preparing dialog training data for the LSTM neural network. We wanted it to make predictions based on more than the previous user intent; it had to predict using several last interactions between the bot and the user. To achieve this, we had to prepare the data properly.

Eventually, the training set before tokenization looked like this:

sample = [
         'intent_1',
         'utter_1',
         'intent_2',
         'utter_2',
         'intent_3',
         'utter_3'
      ]
      
      x_train                                                              labels
      ['intent_1']                                                         'utter_1'
      ['intent_1','utter_1', 'intent_2']                                   'utter_2'
      ['intent_1','utter_1', 'intent_2', 'utter_2', 'intent_3']            'utter_3'
      

In the beginning, we had an issue with prediction: instead of predicting the next “utter_ ...,” the bot almost always predicted “intent_ …” To fix this, we changed the model a bit and then used the embedding layer and One-hot encoded labels. After that, we changed the training data by splitting sequences, so that each label was a bot action (either “utter_ …” or “action_ …”). As a result, the bot could no longer predict user intents and only predicted his own actions. We did it to every dialog flow example.

NOTE: When you change dialog.yaml or domain.yaml, you have to run the training dialog again.

Dialog Flow Predictor

The role of the predictor is to get the correct sequence from dilaog_state.pkl and use it to predict the next bot action. Here’s how you can approach it: after it predicts the next action, you make a response example and append the predicted action to dialog flow sequence for further prediction.

Getting the Chatbot’s Response

We used hardcoded responses with some extra variables to make the bot responsive. E.g. "utter_my_pleasure": "My pleasure to help you"

Putting All the Parts Together

We developed a request handler to manage the flow:

  1. get an https request with a text
  2. let the NLU get the intent from the text
  3. append the intent token to the Dialog State array
  4. get the dialog sample from the Dialog State array of a particular size (e.g., 5)
  5. make dialog prediction by returning the Action token
  6. append the predicted Action token to the Dialog State and save the Dialog State to a file
  7. respond to the user or make custom bot actions
  8. wait for the next user sentence

Connecting the Slack App

For testing purposes, we connected the chatbot to a Slack channel using the Slack app and took all the steps described here. Then, we used Slack API to post a message from the bot to a user following this.

What’s Next?

Our chat bot turned out to be just as cool and as we expected. Still, it’s not perfect, and there’s a long way to go. Our next step is connecting it to Google Assistant to upgrade the chatbot to a “speakable” version. Can’t wait to hear it talk to us.

As you can see, we're up to further experiments and encourage you to do the same with ElifTech’s Cool Project Department. Don’t be afraid to experiment, and if you have any troubles, we've got your back.