VoIP Network Architecture
A modern VoIP network replaces the PSTN’s dedicated switching equipment and TDM circuits with software-based call control and IP packet transport. This article covers the components of a carrier-grade VoIP network and how they fit together to deliver voice service.
Core Components
Softswitch
A softswitch is the software-based replacement for a hardware telephone switch. It handles call control — processing SIP signaling, making routing decisions based on NPA/NXX data, managing call state, and generating CDRs (Call Detail Records) for billing.
Softswitches are classified similarly to traditional switches:
- Class 5 softswitch: Serves end users. Provides dial tone, voicemail, caller ID, call forwarding, and other subscriber features. Examples: Metaswitch, BroadSoft (now Cisco BroadWorks), FreeSWITCH, Asterisk.
- Class 4 softswitch: Routes transit traffic between carriers. Optimized for high throughput and least cost routing. Examples: Odin (Alcatel-Lucent), Sonus/Ribbon, Netsapiens.
Many modern platforms combine Class 4 and Class 5 functionality in a single system.
Session Border Controller (SBC)
The SBC sits at the boundary of every VoIP network. It is the gatekeeper — controlling what traffic enters and leaves the network. SBCs are typically deployed as appliances or virtual machines at every interconnection point.
SBC functions:
- Security: Protect against SIP-based attacks (toll fraud attempts, denial of service, malformed messages), rate limiting, access control
- NAT traversal: Fix SIP and RTP addressing for endpoints behind NAT (Network Address Translation) — one of the most persistent headaches in VoIP
- Topology hiding: Mask internal network IP addresses and architecture from external peers
- Protocol normalization: Reconcile SIP dialect differences between carriers (different implementations often have subtle incompatibilities in header formatting, codec negotiation, or DTMF handling)
- Media anchoring: Force RTP media through the SBC for quality monitoring, lawful intercept compliance, and recording
- Transcoding: Convert between codecs when two endpoints do not share a common codec (e.g., G.729 on one side, G.711 on the other)
- STIR/SHAKEN: Sign and verify caller ID authentication tokens
Media Gateway
A media gateway converts between TDM (PSTN) and IP (VoIP). It has TDM interfaces (T1/PRI, DS3) on one side and IP interfaces on the other. The gateway:
- Converts SS7 ISUP signaling to SIP (or vice versa)
- Transcodes audio between PCM (64 kbps TDM) and IP codecs
- Provides the physical interconnection between legacy PSTN equipment and the IP network
Media gateways are essential during the transition from SS7 to SIP — they allow VoIP carriers to reach PSTN-connected subscribers and vice versa.
Media Flow: RTP
Voice audio in a VoIP network is carried by RTP (Real-time Transport Protocol). RTP packets contain:
- Codec-encoded audio samples (typically 20ms of audio per packet)
- Sequence numbers (for reordering packets that arrive out of order)
- Timestamps (for proper playout timing)
- Payload type identifier (which codec is in use)
RTCP (RTP Control Protocol) runs alongside RTP and carries quality statistics — packet loss, jitter, round-trip time — enabling both endpoints to monitor call quality in real time.
Codec Selection
Codecs encode and decode audio. The choice of codec affects bandwidth, quality, and CPU usage:
| Codec | Bandwidth | Quality | Notes |
|---|---|---|---|
| G.711 (PCMU/PCMA) | 64 kbps | Toll quality | No compression. Standard for carrier interconnection. |
| G.729 | 8 kbps | Good | Compressed. Popular for WAN links and international trunks. Licensed codec. |
| G.722 | 48-64 kbps | HD voice (wideband) | Samples at 16 kHz vs. G.711’s 8 kHz. Noticeably better quality. |
| Opus | 6-510 kbps | Excellent | Modern, adaptive codec. Standard for WebRTC. |
Codec negotiation happens during SIP call setup via SDP — both endpoints advertise their supported codecs, and they agree on a common one.
For carrier-to-carrier interconnection, G.711 is the standard. It avoids transcoding (since the PSTN natively uses G.711/PCM) and provides the best quality. Compressed codecs like G.729 are used where bandwidth is constrained.
NAT Traversal
NAT is one of the most common sources of VoIP problems. When a SIP endpoint is behind a NAT router:
- The SIP Contact header and SDP connection address contain the endpoint’s private IP address (e.g., 192.168.1.50)
- The far end cannot send RTP packets to a private address — they are not routable on the public internet
Solutions:
- SBC media anchoring: The SBC rewrites SDP addresses and relays media, solving NAT transparently
- STUN/TURN/ICE: Protocols that help endpoints discover their public address and establish media paths through NAT (standard in WebRTC)
- Far-end NAT traversal: The SBC detects NAT and sends media to the observed source address rather than the SDP-advertised address
Network Architecture Patterns
Carrier VoIP Network
A typical carrier VoIP network:
Customer CPE (IP Phone/PBX/ATA)
|
| (SIP trunk / broadband)
|
[Access SBC] --- security, NAT fix, admission control
|
[Class 5 Softswitch] --- subscriber features, local routing
|
[Class 4 Softswitch] --- LCR, transit routing, CDR generation
|
[Peering SBC] --- interconnection with other carriers
|
[Other carriers / PSTN via media gateway]UCaaS / Cloud PBX
Cloud PBX platforms (RingCentral, Zoom Phone, Microsoft Teams) abstract this architecture into a service. The customer sees a simple web portal; behind it is a multi-tenant softswitch platform with SBCs, SIP trunking to carriers, and all the routing logic built in.
CPaaS / API-Driven Voice
Platforms like Twilio, Bandwidth, and Telnyx expose voice capabilities via APIs. Developers make API calls to place calls, and the platform handles SIP signaling, media, PSTN interconnection, and NPA/NXX routing internally.
Quality of Service
Voice quality in a VoIP network depends on controlling:
- Latency: Total one-way delay should be under 150ms for good conversation quality. Codec processing, packetization, network transit, and jitter buffer all contribute.
- Jitter: Variation in packet arrival times. Managed by jitter buffers at the receiving endpoint, which add a small delay to smooth out arrival times.
- Packet loss: Even 1% packet loss can noticeably degrade voice quality. 3%+ makes conversation difficult. Network congestion is the primary cause.
- Echo: Caused by acoustic feedback or impedance mismatches at hybrid points (2-wire to 4-wire conversion). Echo cancellers in SBCs and gateways mitigate this.
Carrier networks use QoS (Quality of Service) mechanisms — DSCP marking, traffic prioritization, dedicated bandwidth — to ensure voice packets get priority over bulk data traffic.
Further Reading
- SIP: Session Initiation Protocol — the signaling protocol powering VoIP
- Trunking and Carrier Interconnection — SIP trunks and carrier peering
- The PSTN Explained — the legacy network that VoIP interconnects with
- How a Phone Call Gets Routed — end-to-end call routing across VoIP and PSTN