A Confusion Matrix shows an overview of the predicted intent vs the expected intent. It answers questions like When sending user example X, I expect the NLU to predict intent Y, what did it actually predict ?.
The expected intents are shown as rows, the predicted intents are shown as columns. User examples are sent to the NLU engine, and the cell value for the expected intent row and the predicted intent column is increased by 1. So whenever predicted and expected intent is a match, the cell value in the diagonal is increased — these are our successful test cases. All other cell values not on the diagonal are our failed test cases.
The most used statistical measures of NLU performance are precision and recall:
The question answered by the precision score is: How many predictions of an intent are correct ?
The question answered by the recall rate is: How many intents are correctly predicted ?
Read here to know more: Confusion Matrix/Precision/Recall/F1-Score
The confidence threshold is the lowest accepted confidence score. If the NLP engine is not sure enough at classifying an intent (its confidence score is below confidence threshold) then it will answer with incomprehension intent to show that it doesn’t understand. This chart helps in finding the best confidence threshold for your use case - it visualizes the balance between precision and recall score, and depending on your use case the one or the other may have priority.
Read here to know more: Confidence Threshold .