Skip to main content

How to Prepare NLP Datasets for Training and Testing

This step requires the most effort because the machine learning algorithms used in NLP are only as good as the data they are trained on. Both the quality and quantity of this data are crucial. Most state-of-the-art NLP engines rely on machine learning, and Botium provides tools to help gather and enhance datasets for training and testing.

Note: While it's not technically ideal to use training data for testing, this is usually the first step in Botium. It's easy to do with just a few clicks and provides initial insights into how well the NLP engine performs on its training data. This process can also reveal any flaws in the training data itself.

Option 1: Use Training Data for Testing

Instead of annotating the test cases manually, Botium includes a Conversation Model Downloader in the Test Data Wizard that will help you to download the conversation model of an NLP provider, and convert it to BotiumScript test cases. They can then be used instantly by Botium. Follow the steps here to download your NLP conversation model

Note: See NLP Analytics Support for an overview of which Botium connectors are supported by the Test Data Wizard.

Option 2: Use Included Botium Datasets

Botium comes with built-in datasets that you can use right away for testing and training your NLP engine.
  • Over 70,000 user examples
  • More than 20 languages (e.g., English, German, French, Spanish)
  • More than 40 domains (e.g., smalltalk, banking, travel, insurance, customer support, security...)
Navigate to Botium Tools & Settings > Test Sets. Here you will find hundreds of sample test sets to choose from.

Remember: To use these test sets with Botium, you need to rename the utterance lists to match the intent names used by your NLP engine.
Figure 1. Botium Sample Test Set - Customer Support

Option 3 - The Ideal Scenario: Bring your own data

As a general rule of thumb, you should never use training data for testing: It is not a challenge for an NLP engine to correctly predict the intent for a user example it already knows. The purpose of all NLP training is to finally make predictions for user examples that it has never seen before.

That’s why it is recommended to always strictly separate the data you use for training your NLU engine from the data you use for testing.

Annotate Existing Test Cases with NLP Asserters

If you are already using Botium for conversational flow testing, you can annotate the test case with NLP Asserters so Botium knows the expected outcome and can compare with the predictions.

Here you have an example test case from Botium, involving a chatbot from the tourism domain:

Note: In the Source Editor, the BotiumScript for this test case looks like this:
T01_Travel_Berlin_Vienna

#me
I want to travel from Berlin to Vienna.

#bot
Im happy to hear it. And where are you now?

#me
in Munich

#bot
So you are in Munich, and want to travel from Berlin to Vienna?
You can annotate the expected NLP intent by editing the bot conversation step and adding the NLP Intent Asserter and NLP Intent Confidence asserter.
The annotated test case then looks like this:

Note: In the Source Editor, the BotiumScript now looks like this:
T01_Travel_Berlin_Vienna

#me
I want to travel from Berlin to Vienna.

#bot
Im happy to hear it. And where are you now?
INTENT travel

#me
in Münich.

#bot
So you are in Münich, and want to travel from Berlin to Vienna?
INTENT travel
ENTITY_VALUES Berlin|Vienna|Münich
Add User Examples to Utterances Lists (Recommended)
  1. Botium works best for simple question-and-answer conversations: a user question is sent to the NLP engine, and Botium processes the response. To test, register a new test set in Botium, add an utterance list for each NLP intent (named exactly like the intent), and include the user examples you want to test.

    Note: In the Source Editor, the BotiumScript is a flat text file:
    travel
    I want to travel from Berlin to Vienna
    go to vienna, from berlin
    book a flight from berlin
    book a ticket to vienna
  2. As a final step, you have to tell Botium that this test set is only for question/answer conversations. In the Configuration menu, click Scripting and enable the Expand Utterances to Conversations as well as the Use Utterance Name as NLU Intent options - Botium Tools & Settings > Test Sets > Your Test Set > Configuration > Scripting

Tip: Use the Paraphraser tool to quickly generate new user examples.

Summary

The benefits of annotating existing conversational test cases is that you can re-use existing test data. The drawback is that the analytic results will be distorted if you have multi-step conversations. This is because a Botium test case will exit as soon as the first asserter fails, all following conversation steps are ignored.

Advanced Challenges

“The art of challenging chatbots” is the Botium tag line. If you need some special challenges for your chatbot, then read on.

Multi-Language Testing

Many chatbots out there are built to serve users in multiple languages. So you need training and test datasets in multiple languages. The internet language is English, and most public domain datasets to be used for training and testing chatbots are available only in English. That’s why we included a Test Set Translator in the Botium Test Data Wizard.

Before using the translator, you must first Configure the Google Translate Service Account Key in the Botium System Settings. This is a quick process that should only take a min or two.

Humanification Testing

Humanification in Botium stands for simulation of human behaviour or habits. It is an important step to recognize the need for automation in this area.

BotiumScript makes it is easy to verify the chatbot’s ability to follow a conversation flow. However, In the real world, you cannot expect human users to act like a computer script.
  • typographic errors are introduced

  • different typing speeds

  • sausage finger syndrome

  • etc…

Check out the following article to help you get started with the Humanification of Test Sets, and begin adding some real-world human behaviors to your test cases.

Was this article helpful?

0 out of 0 found this helpful