Precision / Recall / F1-Score
The most used statistical measure of NLU performance are precision and recall.
Read more about it here: Quality Metrics for NLU/Chatbot Training Data.
Botium provides a download link to export the confusion matrix and those metrics as Excel file.
It is possible to download the whole test results for further processing as JSON or CSV file as well: