Introduction
Once you're comfortable with basic SIP testing, it's time to explore advanced load testing capabilities. This tutorial covers high call rates, performance optimization, distributed testing, and capacity planning.
Understanding Load Testing Metrics
Key Performance Indicators
- CPS (Calls Per Second): Rate at which new calls are initiated
- Concurrent Calls: Number of active calls at any given moment
- Success Rate: Percentage of completed calls vs. attempted
- Response Time: Time between request and response
- Call Duration: Average length of completed calls
- Failure Rate: Percentage of failures
- Retransmissions: Number of retried messages (indicates network issues)
- Timeouts: Messages that never received responses
Calculating Capacity
# Concurrent calls = CPS × Average Call Duration (seconds)
# Example: 10 CPS × 60 seconds = 600 concurrent calls
#
# System capacity = Max CPS before failure rate exceeds threshold
Planning Your Load Test
Define Test Objectives
Before starting, clearly define what you're testing:
- Baseline Performance: Normal operating capacity (80% of max)
- Peak Load: Maximum expected traffic during busy hour
- Stress Test: Breaking point identification (find the limit)
- Endurance Test: Sustained load over time (4-8 hours)
- Spike Test: Sudden traffic increases (2x-3x normal)
Test Environment Setup
Critical: Always test in a non-production environment that mirrors production as closely as possible.
Ensure you have:
- Adequate hardware resources (CPU, RAM, network)
- Isolated network to avoid impacting production
- Monitoring tools in place (easySIPp, wireshark or any tracing/packet capture tools etc.)
- Clear understanding of expected results
- Rollback plan if test impacts other systems
Understanding the CheckOutput Screen
When you run a test in easySIPp, you'll see the CheckOutput button for each running SIPp process, which opens real-time SIPp statistics. This screen is basically shows the output of realtime SIPp statistics screen and in addition it shows Control buttons to control your tests.
Screen 1: Scenario Screen
The main view showing call flow progress and key metrics:
------------------------------ Scenario Screen --------
Call rate (length) Port Total-time Total-calls Remote-host
5.0(0 ms)/1.000s 5061 72.63 s 138 127.0.0.1:5062(UDP)
0 new calls during 0.514 s period
15 calls (limit 15) Peak was 15 calls, after 13 s
0 Running, 33 Paused, 15 Woken up
Key Metrics Explained:
- Call rate: Current CPS (5.0 in example)
- Total-time: Test duration (72.63 seconds)
- Total-calls: Calls attempted (138)
- Peak calls: Maximum concurrent calls (15)
- Running: Currently executing calls (0 = test paused/finished)
- Paused: Calls in pause/media state (33)
Message Statistics Table:
Messages Retrans Timeout Unexpected-Msg
0 : INVITE ----------> 138 190 14
1 : 100 <---------- 0 0 0 0
2 : 180 <---------- 109 0 0 0
3 : 200 <---------- E-RTD1 109 0 0 0
4 : ACK ----------> 109 0
5 : Pause [ 0ms] 109 0
6 : BYE ----------> 109 9 1
7 : 200 <---------- 108 0 0 0
Understanding the Columns:
- Messages: Total messages sent/received at this step
- Retrans: Retransmissions (⚠️ High = network issues)
- Timeout: Messages that timed out (❌ Very bad)
- Unexpected-Msg: Received but not expected (check scenario)
Red Flag: In the example, INVITE has 190 retransmissions and 14 timeouts out of 138 calls. This indicates serious network or server capacity issues!
Screen 2: Statistics Screen
This screen is stacked just below the Screen 1.
----------------------------- Statistics Screen -------
Counter Name | Periodic value | Cumulative value
-------------------------+---------------------------+--------------------------
Elapsed Time | 00:01:12:635000 | 00:01:12:635000
Call Rate | 0.000 cps | 1.900 cps
-------------------------+---------------------------+--------------------------
Outgoing calls created | 0 | 138
Total Calls created | | 138
Current Calls | 15 |
-------------------------+---------------------------+--------------------------
Successful call | 0 | 108
Failed call | 0 | 15
-------------------------+---------------------------+--------------------------
Response Time 1 | 00:00:02:134000 | 00:00:02:134000
Call Length | 00:00:05:736000 | 00:00:05:736000
Critical Metrics:
- Call Rate (cumulative): Average CPS over entire test (1.900)
- Successful call: 108 out of 138 = 78% success rate (❌ Below acceptable)
- Failed call: 15 failures (11% failure rate)
- Response Time 1: Average time to first response (2.134s)
- Call Length: Average call duration (5.736s)
Screen 3: Repartition Screen
Stacked below the Screen 2:
---------------------------- Repartition Screen -------
Average Response Time Repartition 1
0 ms <= n < 10 ms : 94
10 ms <= n < 20 ms : 0
n >= 200 ms : 15
Average Call Length Repartition
0 ms <= n < 10 ms : 93
n >= 10000 ms : 30
This shows that most calls (94) had very fast responses (< 10ms), but 15 calls took over 200ms, indicating timeouts.
Real-Time Control During Tests
Interactive Controls
While a test is running, you can use the Control Buttons at the Top of the screen:
- Pause/Start (p) - Pause/Start traffic (toggle)
- +10 CSP (*) - Increase rate by 10 CPS
- +1 CPS (+) - Increase rate by 1 CPS
- -1 CPS (-) - Decrease rate by 1 CPS
- -10 CPS (/) - Decrease rate by 10 CPS
- Quit (q) - Quit gracefully (finish active calls)
- Kill - Force stop immediately
Pro Tip: Use these buttons during a test to find the exact breaking point without running multiple tests. Watch the "Retrans" and "Timeout" columns - when these start increasing rapidly, you've reached capacity!
1. Progressive Load Testing
Step 1: Baseline Test
Start with a low call rate to establish baseline:
Total Calls: 100
CPS: 1
Duration: ~100 seconds
Expected Result: 100% success rate
Step 2: Incremental Increase
Gradually increase load:
Test 1: CPS 1 → Monitor
Test 2: CPS 5 → Monitor
Test 3: CPS 10 → Monitor
Test 4: CPS 25 → Monitor
Test 5: CPS 50 → Monitor
Step 3: Find the Breaking Point
Continue increasing until you see:
- Success rate drops below 95%
- Response times increase significantly
- System errors or timeouts
- Resource exhaustion (CPU, memory, network)
Interpreting Test Results
📊 Important Note: All metrics, percentages, and thresholds mentioned in this tutorial (such as "95% success rate" or "5% retransmission limit") are general guidelines and examples, not absolute rules. Your acceptable thresholds may vary based on your specific environment, use case, and business requirements. Use these numbers as starting points and adjust based on your system's characteristics and testing objectives.
Success Criteria
✅ PASS - Healthy System:
Successful call: 1970 / 2000 = 98.5%
Failed call: 30
Retrans: < 50 total
Timeout: 0
Response Time: < 200ms
⚠️ WARNING - Needs Investigation:
Successful call: 1820 / 2000 = 91%
Failed call: 180
Retrans: 50-100
Timeout: 1-5
Response Time: 200-500ms
❌ FAIL - System Overloaded:
Successful call: 1560 / 2000 = 78%
Failed call: 440
Retrans: 190 ← Very high!
Timeout: 14 ← Critical!
Response Time: > 2000ms
Common Bottlenecks and Solutions
High Retransmissions
Symptom: Retrans column shows high numbers
Causes:
- Network packet loss
- Target system slow to respond
- Firewall dropping packets
Solutions:
- Check network connectivity (ping, packet capture)
- Reduce CPS
- Verify firewall rules
- Try TCP instead of UDP
Timeouts
Symptom: Timeout column > 0
Causes:
- Target system completely unresponsive
- Network completely blocked
- Wrong IP/port configuration
Solutions:
- Verify target system is running
- Check IP addresses and ports
- Review target system logs
2. High Volume Testing Techniques
Using Call Duration to Control Concurrency
# Example: Testing 1000 concurrent calls
# Option 1: High CPS, short duration
CPS: 100
Call Duration: 10 seconds
Concurrent: ~1000 calls
Risk: High stress on call setup
# Option 2: Medium CPS, medium duration
CPS: 50
Call Duration: 20 seconds
Concurrent: ~1000 calls
Risk: Balanced
# Option 3: Low CPS, long duration
CPS: 17
Call Duration: 60 seconds
Concurrent: ~1000 calls
Risk: Tests endurance, not burst capacity
Monitoring During High-Load Tests
What to watch in CheckOutput:
- First 30 seconds: Watch for immediate failures
- Check "Running" count increases properly
- Verify no timeouts in first batch
- After 1 minute: Check statistics screen [2]
- Success rate should be >95%
- Current Calls should match expected concurrent
- After 5 minutes: Look for degradation
- Response time increasing?
- Retransmissions creeping up?
- Peak concurrent calls steady or declining?
3. Stress Testing Patterns
Ramp-Up Test
Minutes 0-2: CPS 10 (warm-up, establish baseline)
Minutes 2-4: CPS 25 (press [*] twice)
Minutes 4-6: CPS 50 (press [*] twice)
Minutes 6-8: CPS 75 (press [*] twice)
Minutes 8-10: CPS 100 (press [*] twice, watch for failures)
Spike Test
Minutes 0-3: CPS 10 (baseline)
Minutes 3-5: CPS 100 (press [*] 9 times rapidly - spike!)
Minutes 5-8: CPS 10 (press [/] 9 times - recovery)
Check: Did system recover? Are there lingering issues?
Endurance Test
# Run at 80% of max capacity
If max = 50 CPS, run at 40 CPS for 4-8 hours
Monitor every 30 minutes:
- Success rate still >95%?
- Response time stable?
- Retransmissions not increasing?
- Memory leaks? (check target system)
4. Distributed Load Testing
Multi-Machine Setup
When a single machine can't generate enough load:
-
Deploy multiple easySIPp instances
- Machine 1: UAC 1 (CPS 150)
- Machine 2: UAC 2 (CPS 150)
- Machine 3: UAC 3 (CPS 150)
- Total: 450 CPS
-
Coordinate test execution
- Start all UACs simultaneously
- Use synchronized clocks
- Aggregate results manually
Managing System Resources
CPU Optimization
- Minimize logging of your SIP server during high-load tests (loggig management is currently not supported for easySIP)
- Close unnecessary applications
easySIPp memory and cpu Monitoring
# Monitor memory usage during tests
docker stats easysipp
Network Tuning
# Increase network buffer sizes (Linux)
sudo sysctl -w net.core.rmem_max=26214400
sudo sysctl -w net.core.wmem_max=26214400
# Increase file descriptors
ulimit -n 65536
Best Practices
- Start small: Always begin with low rates
- Monitor everything: Both load generator and target
- Test incrementally: Don't jump from 10 to 1000 CPS
- Document results: Keep records of all tests
- Replicate production: Match real-world scenarios
- Plan for failure: Know your rollback strategy
- Test regularly: Capacity changes over time
Troubleshooting Checklist
When Tests Fail Immediately
- Check IP addresses and ports (most common issue!)
- Verify target system is running
- Test connectivity:
ping <target-ip>
- Check firewall rules
- Capture a wireshark pcap trace and analyze
- Review XML scenario matches target expectations
When Performance Degrades
- Check "Retrans" column in CheckOutput - increasing?
- Check "Timeout" column - any appearing?
- Monitor target system CPU/memory
- Check network utilization
- Review target system logs for errors
When Actual CPS < Target CPS
- Look for timeouts (system waiting for responses)
- Check if "Failed call" is increasing
- Reduce target CPS to achievable level
- Fix underlying issues (timeouts, retrans) first
Summary
- Minimize logging of your SIP server during high-load tests (loggig management is currently not supported for easySIP)
- Close unnecessary applications
easySIPp memory and cpu Monitoring
# Monitor memory usage during tests
docker stats easysipp
Network Tuning
# Increase network buffer sizes (Linux)
sudo sysctl -w net.core.rmem_max=26214400
sudo sysctl -w net.core.wmem_max=26214400
# Increase file descriptors
ulimit -n 65536
Best Practices
- Start small: Always begin with low rates
- Monitor everything: Both load generator and target
- Test incrementally: Don't jump from 10 to 1000 CPS
- Document results: Keep records of all tests
- Replicate production: Match real-world scenarios
- Plan for failure: Know your rollback strategy
- Test regularly: Capacity changes over time
Troubleshooting Checklist
When Tests Fail Immediately
- Check IP addresses and ports (most common issue!)
- Verify target system is running
- Test connectivity:
ping <target-ip> - Check firewall rules
- Capture a wireshark pcap trace and analyze
- Review XML scenario matches target expectations
When Performance Degrades
- Check "Retrans" column in CheckOutput - increasing?
- Check "Timeout" column - any appearing?
- Monitor target system CPU/memory
- Check network utilization
- Review target system logs for errors
When Actual CPS < Target CPS
- Look for timeouts (system waiting for responses)
- Check if "Failed call" is increasing
- Reduce target CPS to achievable level
- Fix underlying issues (timeouts, retrans) first
Summary
You now know how to:
- ✅ Read and interpret the SIPp statistics screen
- ✅ Identify warning signs (retrans, timeouts, failures)
- ✅ Use real-time controls to adjust load
- ✅ Various techniques of VoIP/SIP load testing
- ✅ Diagnose common performance issues