NLP Quality Metrics

The NLP Project's dashboard is made up of several sections which contain all of your test results. Here you will find information on each of those sections.

  • Confidence Threshold

    Confidence Threshold is the lowest accepted confidence. Threshold Purpose In Some Chatbot Engines you can set Confidence Threshold. If the Engine is not sure enough at classifying an intent (its confidence is below Confidence Thresh...

  • Confusion Matrix

    A Confusion Matrix shows an overview of the predicted intent vs the expected intent. It answers questions like “When sending user example X, I expect the NLU to predict intent Y, what did it actually predict? Confusion Matr...

  • Training Progress

    The Training Progress shows several KPIs of the test session with regards to NLP analytics Number of test cases, conversations steps, asserters Predicted intents and entities Precision/Recall/F1-Score Number of expected vs predicted ...

  • Entity Confidence Risks

    Parent topic: NLP Quality Metrics

  • Entity Confidence Deviation Risks

    The confidence deviation is a measure for the bandwidth of the predicted confidence score for all the utterances of an entity. It is calculated as standard deviation of the confidence scores. Alerts A high deviation of confidenc...

  • Entity Utterance Distribution

    This chart shows the amount of utterances per predicted entity. Alerts As a rule of thumb, there should be at least 5 training utterances per entity. Actions It here are less than 5 training utterances for an entity, add additional user ...

  • Intent Utterance Distribution

    This chart shows the amount of utterances per predicted intent. Alerts As a rule of thumb, there should be at least 15 training utterances per intent. Actions It here are less than 15 training utterances for an intent, add additional user...

  • Intent Confidence Distribution

    This chart shows the average intent confidence score of all utterances in the test session, grouped by ranges. Alerts The lower the confidence the higher the failure probability. Depending on the NLU engine in question, a conf...

  • Intent Confidence Risks

    This charts shows the intents of your test set with the weakest average confidence score. A low average confidence score is an indicator that either there are single utterances with very low confidence score having high impact on ...

  • Intent Confidence Deviation Risks

    The confidence deviation is a measure for the bandwidth of the predicted confidence score for all the user examples of an intent. The top 10 risks are presented in the chart. Continue reading to learn more. Deviation risk is ...

  • Intent Mismatch Probability Risks

    This section shows some charts visualizing the risk that some intents will be mismatched - meaning that the NLU engine predicts the correct intent, but with a confidence score very close to another one. Continue reading to le...

  • Alerts / Attention

    Whenever Botium identifies anything which requires attention it is shown in the Attention box. Among others, these conditions are: Botium detected test cases where unexpected intents and/or entities have been recogni...

  • Suggestions

    Botium will detect any issues with the test results and suggest actions which will improve the overall NLU performance. It will tell you which intents require more training data, and if test data is not suitable for performin...