What questions can be answered with a Performance test ?
Functional tests answering a simple question: is it working, or not. Performance tests are simulating human users, so they can answer more sensitive questions:
How can the chatbot deal with many users?
Before publishing a bot, good to know what are its limits, and how it behaves beyond its limits. Is it possible to kill it? Does it recover after a hevy load?
Stress test can help you to answer this question.
Do we have some (memory) leak?
The conversation is stateless, so executing the same conversation on the same server should take about the same time. If the catbot starts a never ending process, or does not releases some resource like memory, or background storage, then the system will be slower after time, or will die suddenly.
Load test is for detecting this.
Are the conversations stateless?
Chatbots are nowdays mostly static, we got the same answer for the same qestion. So it is a good test to repeat a question. It is even possible, that first time it works well, second time we got error.
Every Performance test can detect this problem, because the nature of the Performance Tests is repeating a convo.
Are the conversations thread safe?
Many problems can be detected with parallel execution of a conversation. For example deadlock, wrong resource sharing, or incorrect synchronization.
Performance test is not the ultimate tool for those mistakes. It can happen that everything goes well 10 times parallel, then the next time we got an error. The system is too complex to do exact the same thing every time. So lets say Botium can accidentally such problems as side effect.
Every Performance test can detect this problem, but Stress test fits better for this.
Walkthrough - A First Load Test
You can imagine Performance test as a simple loop, which repeats a conversation to simulate human behavior.
First Stress Test
By default all parallel users are having a very simple conversation with the bot, saying just ‘hello’ for example. They wait for an answer and repeat the conversation until the end of the test is reached. Any response coming from the chatbot is accepted. So if the chatbot answers sometimes with ‘Hello User’ sometimes with ‘I don’t understand’, it has no impact on the performance results. The main goal here is to get an answer, while a http error code or lack of answer is interpreted as failure of course.
In order to start a Stress Test, we have to select the “Performance Tests” tab in our Test Project, and choose Stress Test there.
The duration can be 1 minute. In terms of parallel users lets use 5 for starting, 1000 for ending. (You can put 1000 there by editing the field directly).
“Required percentage of Successful Users?” can be 0 percent, because we don’t want to stop the test on failed responses (at the moment).
I have two goals with these settings; getting a first impression about the max parallel users the chatbot is able to serve, and discovering how the chatbot breaks on extreme load. In production your chatbot could have much more load than expected, so you have to test it.
The first chart in the stress test results shows that there were errors.
We see that the bot can easily handle 5 parallel users. But on the next step, when the number of parallel users was increased to over 200, the chatbot started to fail already.
For more insights into the failures you can download a detailed report.
This report contains all errors detected by Botium Box. The types of failures are:
- Timeout (Chatbot does not answer at all, or too slow)
- Invalid response (API limit reached, server down, invalid credentials)
- Response with unexpected content (We send ‘hi’, and expect ‘Hello!’, but the chatbot answers with ‘Sorry some error detected come back later!’)
In my case there is just one kind of error in the report:
2168/Line 6: error waiting for bot — Bot did not respond within 10000ms
After 10s Botium Box has given up waiting for the response. The chatbot answer time was over the limit, or did not answer at all. Slowing down is common on extreme loads, but receiving no answer is more critical, that we can’t ignore.
By checking the Response Times Chart in our stress test results, I notice that the chatbot is just too slow. The response time is increasing until it reaches 10s and most of the responses are failed. Next step is now to check chatbot side metrics/logs or increase the timeout in Botium Box. For the sake of simplicity I believe that my impression is correct.
The first stress test proved to me that the maximum number of parallel users is somewhere at 200 and on heavy loads the chatbot won’t break but slow down significantly.
Performance Test Parametrization
Example 1 - How can we simulate constant number of users, for example 5 user 7 times using Load test?
First you have to choose a Test Project with just one convo. If you would choose a Test Project with two convos, then you would emulate 10, 10, 10… users. It is not wrong, but not what we want. (You can use more convos if you want to emulate different conversations.)
Set Test Duration to 1 minute. Botium will execute a step in each 10 sec. If we set duration to 1, Botium will execute 7 steps. (7 steps are the minimal.)
Set Test Set Multiplicator to 5. It means that in the first step Botium will execute our Test Project 5 times. We have our starting number! And even better, because Load test works with constant load, we have all.
If we start the Test, and wait for 1 min, then the Total Convos Processed should be 35
Example 2 - How can we simulate increasing number of users, for example 1, 3, 5, 7, 9, 11, 13 in row using Stress test?
First you have to choose a Test Project with just one convo as in Load test.
Set Test Duration to 1 minute as in Load test.
Set Test Set Multiplicator to 1 as in Load test.
And if we set Increase Test Set Multiplicator to 2, then we will got all the users 1 to 13. (It is calculated this way: ConvoCountPerTestProject * ( Multiplicator + (i - 1) * IncreaseMultiplicator) )
And we have to set Maximum Parallel Users to 13. If Botium reaches this limit, then does not start a new until it finishes a running one. This parameter protects agent from overload, set it with care!
If we start the Test, and wait for 1 min, then the Total Convos Processed should be 49
Example 3 - How can we simulate a heavy load of 400 users per second for 5 minutes ?
You have a test set with 1 test case, typically a test case simulating the greeting process. You want to simulate 400 users per second talking to your chatbot.
Set Test Duration to 5 (minutes).
This load can only be generated with multiple Botium Box Agents - make sure you have at least 2 Botium Box Agents installed and connected to Botium Box. Switch to the Advanced mode and set the Parallel Jobs Count to 2.
Now comes the math:
You have 2 parallel jobs running so you have to make sure that there are roughly 200 test cases per second to come up with the load of the requested 400 per second
Each test step has a duration of 10 seconds, so you have to make sure to have 2000 test cases in the queue for each test step (10 * 200)
As the test set has exactly 1 test case, set the Test Case Multiplicator to 2000 to let Botium generate a load of 2000 test cases every 10 seconds, and this is done in 2 jobs in parallel, leading to a total load of 4000 every 10 seconds or 400 per second
To let Botium execute 400 test cases in parallel in 2 jobs, set the Parallel Convo Count to 200.
The number of minutes Botium generates the given load.
Test Duration determines just when the last test step is started, not the end of the test.
Test Step Duration
Botium generates the load in iterations called test steps. On each test step, Botium adds the given load to the processing queue and the Botium Box Agents are running them as fast as your Chatbot allows.
Load in this context actually means the convos to perform - the convos that are part of the Test Set connected to the Test Project.
In Load/Stress test the steps are executed in each 10 sec. In Advanced mode you can set it.
Test Set Multiplicator
On each test step, the content of the Test Set is added to the processing queue. If you want to add it multiple times on each test step, basically repeating the same convos over and over again, you can increase the test set multiplicator.
Increase Test Set Multiplicator
You can simulate increasing load over time by increasing the test set multiplicator.
The multiplicator is calculated this way: ConvoCountPerTestProject * ( Multiplicator + (i - 1) * IncreaseMultiplicator) )
Cancel on Failed Convos
Convos can fail in Botium for two reasons:
Chatbot returns another text than expected or nothing
Any of the asserters is triggering a failure
You can decide to accept a certain amount of test case failures during the performance tests. If the percentage is higher than the given percentage the performance test is cancelled.
Parallel Convos Count / Maximum Parallel Users
This is the number of worker threads each Botium Box Agent is launching for generating the load. It roughly corresponds to the number of parallel user sessions coming from a single Botium Box Agent your Chatbot will see.
Each worker thread is generating the load sequential, meaning that in case all of the available worker threads are waiting for a chatbot response no more load is generated until responses are received to free worker threads.
Parallel Jobs Count
The given load is generated by the Botium Box Agents. You can have multiple times the load generated by telling Botium to run the load from more than one Botium Box Agent.
You have to install and connect multiple Botium Box Agents
If you want generate a heavy load to your API, then the Test environment should not be the bottleneck. If you set this to 2, then two agents will work on the Performance test. They wont share the tasks, but booth will execute the same Convos. The number of users, and so the load is duplicated on Chatbot API, but not on Test environment.
If there are not enough Agents you wont get error message. If an agent finishes all convos (and so a Job), then it starts a new Job.
If the agents are running on a single PC, it is possible that the bottleneck remains in Test environment.
It is the data sampling ratio. Determines how fine is the chart, and the exported data. Too low value can add heavy load to Botium Box Server. To protect the server the data is truncated at 1000 record.
Shared Botium Core Session (Botium Box > 2.8.2)
By default, for each single convo execution a separate Botium Core session is started. Depending on the connector technology this will take additional time for session setup, which slows down the total test execution duration (but it is not included in the measured response times). If you don’t care about measuring performance for individual user sessions and if the connector technology is not depending on building individual user sessions then you can enable this switch to speed up the performance testing process.
Charts are not the only output of the Performance test. It can even fail, even if the functional tests are executed without error.
Chatbot Response Time chart
It is the most important for performance questions like “Do we have some (memory) leak?“.
What you see there depends on parameters. If you started a Stress test, then a flat line means that the Response Time does not depend on the number of users.
If the Response Time decreases, then it must be some optimalization, like caching.
If it increases, then you have to decide, is it acceptable, or not.
Convo Processing Delay chart
Convo Processing Delay chart is general purpose chart do detect performance problems in Test environment. It cant say you what the problem is, just indicates that the Test enviroment is the bottleneck.
Possible problems are:
There are not enough Agents (can happen only if you set Parallel Jobs Count above 1)
Agent is overloaded (solution vertical or horizontal scaling)
Test Step Duration is too small (see next section).
Processed Convo Count chart
To understand Processed Convo Count chart, you have to know that every convo is delayed a little. So if Delayed Convos follows Processed Convos, then everything is fine. But if Delayed Convos is above of Processed Convos, then two steps are overlapping each other (Test Step Duration is not enough to execute all convos of the step. You can try to increase it)
Better indicate not enough agents problem
It is not sure that all users are executed parallel in a step as we wanted. This depends on many conditions. It would be good to detect how many are executed parallel actually.
Clean Processed Convo Count chart. It is disturbing that every convo is delayed. Maybe just divide the two counts.
It is possible that not all convos are executed parallel in a test set. For example if there are many, and chatbot is fast.