When a telephony call feature is failing, there are several possible categories of causes:
- physical (e.g. unplugged cable)
- misconfiguration (e.g. feature not enabled)
- software defect (e.g. mishandled scenario)
- protocol exchange (e.g. interpreting a field value differently)
- network / transport (e.g. firewall)
The early stages of troubleshooting should try to eliminate some of these broad categories as causes (or find the cause along the way, whichever happens first).
First, and usually easiest, is to make sure it is not a network connectivity problem. Ping is your friend. If you can’t ping the equipment involved in the call, you may have a physical problem. Are the cables plugged in? Is everything powered on?
Does any voice work between the two endpoints? If you’re running automated tests, is your call generator reporting failures for every test in the suite or just for a particular feature? Quick A->B calls between all involved parties proves a lot. Do other features work, especially similar features and those that exercise the same lines and features? If you can’t make a simple two way voice call between each of the endpoint pairings involved, there’s a more fundamental problem — first troubleshoot the simpler version of the problem before tackling the more complex feature. If basic calls work but not advanced features, ensure the failing features are enabled on the switch and activated for the lines in question.
When you have eliminated network and physical problems as the cause, check the configuration. Does the feature work on any lines? Do logs indicate errors like “permission denied” or “feature unavailable”? Do you get error tones or announcement messages? If the feature works on other lines, compare the config of the working lines to the non-working lines.
Once you’re sure that neither of those categories are the culprit, it’s time for deeper digging. If you have access to the entire transport network, a network traffic capture can be very useful. (Even if you only have visibility into part of the network, captures can still help with troubleshooting but there are more limits to what you can uncover.) If at all feasible, it’s best to capture all traffic, not just the protocol you’re expecting to troubleshoot (e.g. SIP or MGCP). I’ve seen failures caused by a misconfigured router that was sending ICMP redirects all over the place. This won’t show up in a filtered capture.
Armed with a traffic capture, start looking for the unexpected. ARP storms, loads of ICMP redirects, and rapid SIP reregistrations are all things I’ve seen as symptoms that point to the real problem. (These were mostly misconfigs that weren’t caught in the previous troubleshooting step.)
In the absence of anything unusual, look at the traffic that makes up the call trace. Filtering is useful here, as are tools that generate a graphical view of the call flow. First look for error packets. If present, these usually have a specific enough error code or even a text explanation of why the call is failing. If there are no error packets, check that the call is being set up as expected. Is audio fully enabled or is an endpoint simply idled? Check for voice path — is RTP flowing in both directions as expected? Does the RTP payload match what is expected (and do both directions match each other)? (If not RTP, then whatever equivalent protocol is in your trace.)
When signaling and RTP look good, it is often useful to extract the audio streams. In wireshark, you can choose to extract either direction or both directions. I usually start with both directions, and then listen on my PC’s speakers. Problems with audio here will point you to one end of the other. Often I have to backtrack to examining configs or traffic for that end of the connection.
At this point I’ve usually found the cause of the problem. If not, it’s time to ask for help. I’ve gotten help from coworkers, partner/vendor tech support, and occasionally partner/vendor developers. Another set of eyes looking at the problem is helpful. Sometimes just explaining the issue and symptoms to a new person will trigger an “Aha!” moment.
Do you have any troubleshooting techniques I’ve missed? Leave a comment. Thanks!
Related posts:


