This step requires the most effort because the machine learning algorithms used in NLP are only as good as the data they are trained on. Both the quality and quantity of this data are crucial. Most state-of-the-art NLP engines rely on machine learning, and Botium provides tools to help gather and enhance datasets for training and testing.
Option 1: Use Training Data for Testing
Instead of annotating the test cases manually, Botium includes a Conversation Model Downloader in the Test Data Wizard that will help you to download the conversation model of an NLP provider, and convert it to BotiumScript test cases. They can then be used instantly by Botium. Follow the steps here to download your NLP conversation model
Option 2: Use Included Botium Datasets
- Over 70,000 user examples
- More than 20 languages (e.g., English, German, French, Spanish)
- More than 40 domains (e.g., smalltalk, banking, travel, insurance, customer support, security...)
Botium Tools & Settings > Test Sets
. Here you will find
hundreds of sample test sets to choose from.Option 3 - The Ideal Scenario: Bring your own data
As a general rule of thumb, you should never use training data for testing: It is not a challenge for an NLP engine to correctly predict the intent for a user example it already knows. The purpose of all NLP training is to finally make predictions for user examples that it has never seen before.
That’s why it is recommended to always strictly separate the data you use for training your NLU engine from the data you use for testing.
Annotate Existing Test Cases with NLP Asserters
If you are already using Botium for conversational flow testing, you can annotate the test case with NLP Asserters so Botium knows the expected outcome and can compare with the predictions.
T01_Travel_Berlin_Vienna
#me
I want to travel from Berlin to Vienna.
#bot
Im happy to hear it. And where are you now?
#me
in Munich
#bot
So you are in Munich, and want to travel from Berlin to Vienna?
T01_Travel_Berlin_Vienna
#me
I want to travel from Berlin to Vienna.
#bot
Im happy to hear it. And where are you now?
INTENT travel
#me
in Münich.
#bot
So you are in Münich, and want to travel from Berlin to Vienna?
INTENT travel
ENTITY_VALUES Berlin|Vienna|Münich
- Botium works best for simple question-and-answer conversations: a user
question is sent to the NLP engine, and Botium processes the response. To
test, register a new test set in Botium, add an utterance list for each NLP
intent (named exactly like the intent), and include the user examples you
want to test.Note: In the Source Editor, the BotiumScript is a flat text file:
travel I want to travel from Berlin to Vienna go to vienna, from berlin book a flight from berlin book a ticket to vienna
- As a final step, you have to tell Botium that this test set is only for
question/answer conversations. In the Configuration menu, click
Scripting and enable the Expand Utterances to
Conversations as well as the Use Utterance Name as NLU Intent
options -
Botium Tools & Settings > Test Sets > Your Test Set > Configuration > Scripting
Summary
The benefits of annotating existing conversational test cases is that you can re-use existing test data. The drawback is that the analytic results will be distorted if you have multi-step conversations. This is because a Botium test case will exit as soon as the first asserter fails, all following conversation steps are ignored.
Advanced Challenges
“The art of challenging chatbots” is the Botium tag line. If you need some special challenges for your chatbot, then read on.
Multi-Language Testing
Many chatbots out there are built to serve users in multiple languages. So you need training and test datasets in multiple languages. The internet language is English, and most public domain datasets to be used for training and testing chatbots are available only in English. That’s why we included a Test Set Translator in the Botium Test Data Wizard.
Before using the translator, you must first Configure the Google Translate Service Account Key in the Botium System Settings. This is a quick process that should only take a min or two.
Humanification Testing
Humanification in Botium stands for simulation of human behaviour or habits. It is an important step to recognize the need for automation in this area.
-
typographic errors are introduced
-
different typing speeds
-
sausage finger syndrome
-
etc…
Check out the following article to help you get started with the Humanification of Test Sets, and begin adding some real-world human behaviors to your test cases.