Advanced SIP Load Testing - easySIPp Tutorial

Introduction

Once you're comfortable with basic SIP testing, it's time to explore advanced load testing capabilities. This tutorial covers high call rates, performance optimization, distributed testing, and capacity planning.

Understanding Load Testing Metrics

Key Performance Indicators

CPS (Calls Per Second): Rate at which new calls are initiated
Concurrent Calls: Number of active calls at any given moment
Success Rate: Percentage of completed calls vs. attempted
Response Time: Time between request and response
Call Duration: Average length of completed calls
Failure Rate: Percentage of failures
Retransmissions: Number of retried messages (indicates network issues)
Timeouts: Messages that never received responses

Calculating Capacity

# Concurrent calls = CPS × Average Call Duration (seconds)
# Example: 10 CPS × 60 seconds = 600 concurrent calls
#
# System capacity = Max CPS before failure rate exceeds threshold

Planning Your Load Test

Define Test Objectives

Before starting, clearly define what you're testing:

Baseline Performance: Normal operating capacity (80% of max)
Peak Load: Maximum expected traffic during busy hour
Stress Test: Breaking point identification (find the limit)
Endurance Test: Sustained load over time (4-8 hours)
Spike Test: Sudden traffic increases (2x-3x normal)

Test Environment Setup

Critical: Always test in a non-production environment that mirrors production as closely as possible.

Ensure you have:

Adequate hardware resources (CPU, RAM, network)
Isolated network to avoid impacting production
Monitoring tools in place (easySIPp, wireshark or any tracing/packet capture tools etc.)
Clear understanding of expected results
Rollback plan if test impacts other systems

Understanding the CheckOutput Screen

When you run a test in easySIPp, you'll see the CheckOutput button for each running SIPp process, which opens real-time SIPp statistics. This screen is basically shows the output of realtime SIPp statistics screen and in addition it shows Control buttons to control your tests.

easySIPp realtime statistics screen and control buttons

Screen 1: Scenario Screen

The main view showing call flow progress and key metrics:

------------------------------ Scenario Screen --------
  Call rate (length)   Port   Total-time  Total-calls  Remote-host
  5.0(0 ms)/1.000s   5061      72.63 s          138  127.0.0.1:5062(UDP)

  0 new calls during 0.514 s period
  15 calls (limit 15)                     Peak was 15 calls, after 13 s
  0 Running, 33 Paused, 15 Woken up

Key Metrics Explained:

Call rate: Current CPS (5.0 in example)
Total-time: Test duration (72.63 seconds)
Total-calls: Calls attempted (138)
Peak calls: Maximum concurrent calls (15)
Running: Currently executing calls (0 = test paused/finished)
Paused: Calls in pause/media state (33)

Message Statistics Table:

                                 Messages  Retrans   Timeout   Unexpected-Msg
0 :      INVITE ---------->         138       190       14
1 :         100 <----------         0         0         0         0
2 :         180 <----------         109       0         0         0
3 :         200 <----------  E-RTD1 109       0         0         0
4 :         ACK ---------->         109       0
5 :       Pause [      0ms]         109                           0
6 :         BYE ---------->         109       9         1
7 :         200 <----------         108       0         0         0

Understanding the Columns:

Messages: Total messages sent/received at this step
Retrans: Retransmissions (⚠️ High = network issues)
Timeout: Messages that timed out (❌ Very bad)
Unexpected-Msg: Received but not expected (check scenario)

Red Flag: In the example, INVITE has 190 retransmissions and 14 timeouts out of 138 calls. This indicates serious network or server capacity issues!

Screen 2: Statistics Screen

This screen is stacked just below the Screen 1.

----------------------------- Statistics Screen -------
  Counter Name           | Periodic value            | Cumulative value
-------------------------+---------------------------+--------------------------
  Elapsed Time           | 00:01:12:635000           | 00:01:12:635000
  Call Rate              |    0.000 cps              |    1.900 cps
-------------------------+---------------------------+--------------------------
  Outgoing calls created |        0                  |      138
  Total Calls created    |                           |      138
  Current Calls          |       15                  |
-------------------------+---------------------------+--------------------------
  Successful call        |        0                  |      108
  Failed call            |        0                  |       15
-------------------------+---------------------------+--------------------------
  Response Time 1        | 00:00:02:134000           | 00:00:02:134000
  Call Length            | 00:00:05:736000           | 00:00:05:736000

Critical Metrics:

Call Rate (cumulative): Average CPS over entire test (1.900)
Successful call: 108 out of 138 = 78% success rate (❌ Below acceptable)
Failed call: 15 failures (11% failure rate)
Response Time 1: Average time to first response (2.134s)
Call Length: Average call duration (5.736s)

Screen 3: Repartition Screen

Stacked below the Screen 2:

---------------------------- Repartition Screen -------
  Average Response Time Repartition 1
             0 ms <= n <         10 ms :         94
            10 ms <= n <         20 ms :          0
                   n >=        200 ms :         15

  Average Call Length Repartition
             0 ms <= n <         10 ms :         93
                   n >=      10000 ms :         30

This shows that most calls (94) had very fast responses (< 10ms), but 15 calls took over 200ms, indicating timeouts.

Real-Time Control During Tests

Interactive Controls

While a test is running, you can use the Control Buttons at the Top of the screen:

Pause/Start (p) - Pause/Start traffic (toggle)
+10 CSP (*) - Increase rate by 10 CPS
+1 CPS (+) - Increase rate by 1 CPS
-1 CPS (-) - Decrease rate by 1 CPS
-10 CPS (/) - Decrease rate by 10 CPS
Quit (q) - Quit gracefully (finish active calls)
Kill - Force stop immediately

Pro Tip: Use these buttons during a test to find the exact breaking point without running multiple tests. Watch the "Retrans" and "Timeout" columns - when these start increasing rapidly, you've reached capacity!

1. Progressive Load Testing

Step 1: Baseline Test

Start with a low call rate to establish baseline:

Total Calls: 100
CPS: 1
Duration: ~100 seconds

Expected Result: 100% success rate

Step 2: Incremental Increase

Gradually increase load:

Test 1: CPS 1   → Monitor
Test 2: CPS 5   → Monitor
Test 3: CPS 10  → Monitor
Test 4: CPS 25  → Monitor
Test 5: CPS 50  → Monitor

Step 3: Find the Breaking Point

Continue increasing until you see:

Success rate drops below 95%
Response times increase significantly
System errors or timeouts
Resource exhaustion (CPU, memory, network)

Interpreting Test Results

📊 Important Note: All metrics, percentages, and thresholds mentioned in this tutorial (such as "95% success rate" or "5% retransmission limit") are general guidelines and examples, not absolute rules. Your acceptable thresholds may vary based on your specific environment, use case, and business requirements. Use these numbers as starting points and adjust based on your system's characteristics and testing objectives.

Success Criteria

✅ PASS - Healthy System:

Successful call:  1970 / 2000 = 98.5%
Failed call:      30
Retrans:          < 50 total
Timeout:          0
Response Time:    < 200ms

⚠️ WARNING - Needs Investigation:

Successful call:  1820 / 2000 = 91%
Failed call:      180
Retrans:          50-100
Timeout:          1-5
Response Time:    200-500ms

❌ FAIL - System Overloaded:

Successful call:  1560 / 2000 = 78%
Failed call:      440
Retrans:          190  ← Very high!
Timeout:          14   ← Critical!
Response Time:    > 2000ms

Common Bottlenecks and Solutions

High Retransmissions

Symptom: Retrans column shows high numbers

Causes:

Network packet loss
Target system slow to respond
Firewall dropping packets

Solutions:

Check network connectivity (ping, packet capture)
Reduce CPS
Verify firewall rules
Try TCP instead of UDP

Timeouts

Symptom: Timeout column > 0

Causes:

Target system completely unresponsive
Network completely blocked
Wrong IP/port configuration

Solutions:

Verify target system is running
Check IP addresses and ports
Review target system logs

2. High Volume Testing Techniques

Using Call Duration to Control Concurrency

# Example: Testing 1000 concurrent calls

# Option 1: High CPS, short duration
CPS: 100
Call Duration: 10 seconds
Concurrent: ~1000 calls
Risk: High stress on call setup

# Option 2: Medium CPS, medium duration
CPS: 50
Call Duration: 20 seconds
Concurrent: ~1000 calls
Risk: Balanced

# Option 3: Low CPS, long duration
CPS: 17
Call Duration: 60 seconds
Concurrent: ~1000 calls
Risk: Tests endurance, not burst capacity

Monitoring During High-Load Tests

What to watch in CheckOutput:

First 30 seconds: Watch for immediate failures
- Check "Running" count increases properly
- Verify no timeouts in first batch
After 1 minute: Check statistics screen [2]
- Success rate should be >95%
- Current Calls should match expected concurrent
After 5 minutes: Look for degradation
- Response time increasing?
- Retransmissions creeping up?
- Peak concurrent calls steady or declining?

3. Stress Testing Patterns

Ramp-Up Test

Minutes 0-2:   CPS 10  (warm-up, establish baseline)
Minutes 2-4:   CPS 25  (press [*] twice)
Minutes 4-6:   CPS 50  (press [*] twice)
Minutes 6-8:   CPS 75  (press [*] twice)
Minutes 8-10:  CPS 100 (press [*] twice, watch for failures)

Spike Test

Minutes 0-3:   CPS 10  (baseline)
Minutes 3-5:   CPS 100 (press [*] 9 times rapidly - spike!)
Minutes 5-8:   CPS 10  (press [/] 9 times - recovery)
Check: Did system recover? Are there lingering issues?

Endurance Test

# Run at 80% of max capacity
If max = 50 CPS, run at 40 CPS for 4-8 hours

Monitor every 30 minutes:
- Success rate still >95%?
- Response time stable?
- Retransmissions not increasing?
- Memory leaks? (check target system)

4. Distributed Load Testing

Multi-Machine Setup

When a single machine can't generate enough load:

Deploy multiple easySIPp instances
- Machine 1: UAC 1 (CPS 150)
- Machine 2: UAC 2 (CPS 150)
- Machine 3: UAC 3 (CPS 150)
- Total: 450 CPS
Coordinate test execution
- Start all UACs simultaneously
- Use synchronized clocks
- Aggregate results manually

Managing System Resources

CPU Optimization

Minimize logging of your SIP server during high-load tests (loggig management is currently not supported for easySIP)
Close unnecessary applications

easySIPp memory and cpu Monitoring

# Monitor memory usage during tests
docker stats easysipp

Network Tuning

# Increase network buffer sizes (Linux)
sudo sysctl -w net.core.rmem_max=26214400
sudo sysctl -w net.core.wmem_max=26214400

# Increase file descriptors
ulimit -n 65536

Best Practices

Start small: Always begin with low rates
Monitor everything: Both load generator and target
Test incrementally: Don't jump from 10 to 1000 CPS
Document results: Keep records of all tests
Replicate production: Match real-world scenarios
Plan for failure: Know your rollback strategy
Test regularly: Capacity changes over time

Troubleshooting Checklist

When Tests Fail Immediately

Check IP addresses and ports (most common issue!)
Verify target system is running
Test connectivity: ping <target-ip>
Check firewall rules
Capture a wireshark pcap trace and analyze
Review XML scenario matches target expectations

When Performance Degrades

Check "Retrans" column in CheckOutput - increasing?
Check "Timeout" column - any appearing?
Monitor target system CPU/memory
Check network utilization
Review target system logs for errors

When Actual CPS < Target CPS

Look for timeouts (system waiting for responses)
Check if "Failed call" is increasing
Reduce target CPS to achievable level
Fix underlying issues (timeouts, retrans) first

Summary

You now know how to:

✅ Read and interpret the SIPp statistics screen
✅ Identify warning signs (retrans, timeouts, failures)
✅ Use real-time controls to adjust load
✅ Various techniques of VoIP/SIP load testing
✅ Diagnose common performance issues