This is the step where most of the effort is involved: the machine learning algorithms involved in NLP - and most of the state-of-the-art NLP engines out there are based on some kind of machine learning - are only as good as the data they have been trained on. It is both a question of quality as well as quantity.
Botium has tools to support in gathering and augmenting datasets for training and testing.
**Don’t underestimate the importance of clean training data for the real-life-performance of your NLP engine!**