A Confusion Matrix shows an overview of the predicted intent vs the expected intent. It answers questions like “When sending user example X, I expect the NLU to predict intent Y, what did it actually predict?
Confusion Matrix with Column Selection Grid
Those of you with larger test sets may notice an additional grid to the left of the
confusion matrix, This grid can be used to navigate through large test sets by using
the grid to quickly focus on key areas.
Precision / Recall / F1-Score
The most used statistical measure of NLU performance are precision and recall.
Read more about it here: Quality Metrics for NLU/Chatbot Training Data.
Download
Botium provides a download link to export the whole test results for further
processing as JSON, CSV or an Excel file: