This is the step where most of the effort is involved: the machine learning algorithms involved in NLP - and most of the state-of-the-art NLP engines out there are based on some kind of machine learning - are only as good as the data they have been trained on. It is both a question of quality as well as quantity.
Botium has tools to support in gathering and augmenting datasets for training and testing.
Note: Although from a technical perspective, it doesn’t make a lot of
sense to use training data for testing, this is usually the first step in Botium: -
It can be done with a few clicks in Botium - It will give you first insights how the NLP
engine is performing on the data it has been trained on - It shows up any flaws within the
training data itself
**Don’t underestimate the importance of clean training data for the real-life-performance of your NLP engine!**
Comments
0 comments
Please sign in to leave a comment.