SIP: Session Initiation Protocol
SIP is the signaling protocol that underlies virtually all modern voice communications. If you make a VoIP call, use a cloud PBX, place a call through Microsoft Teams or Zoom, or send traffic over a SIP trunk to a carrier, SIP is the protocol negotiating the connection. Understanding SIP is essential for anyone who operates, troubleshoots, or builds on top of voice infrastructure.
What SIP Is (and Is Not)
SIP (Session Initiation Protocol), defined in RFC 3261, is a signaling protocol. It establishes, modifies, and terminates communication sessions. It does not carry the actual audio — that is handled by RTP (Real-time Transport Protocol). SIP sets up the call; RTP carries the voice.
This separation is fundamental. SIP messages might traverse a completely different network path than the audio packets. A SIP proxy can route signaling without ever touching the media stream. This architecture enables the flexibility (and complexity) of modern voice networks.
SIP is text-based (similar in structure to HTTP), uses request/response transactions, and can run over UDP, TCP, or TLS (encrypted, sometimes called SIPS).
SIP Architecture
A SIP network consists of several types of entities:
User Agents
A User Agent (UA) is any SIP endpoint — a phone, softphone, ATA, PBX, or carrier SBC. User agents come in two roles:
- UAC (User Agent Client): Initiates requests (e.g., sends an INVITE to start a call)
- UAS (User Agent Server): Receives and responds to requests
In practice, every SIP endpoint acts as both UAC and UAS depending on the direction of the transaction. Your desk phone is a UAC when you place a call and a UAS when you receive one.
Proxy Servers
A SIP Proxy receives SIP requests and forwards them toward the destination, making routing decisions along the way. Proxies can be:
- Stateless: Forward requests without tracking the transaction state. Fast but limited in functionality.
- Stateful: Track the full transaction (e.g., retransmit if no response arrives). Required for features like forking (ringing multiple endpoints simultaneously).
Most carrier SIP infrastructure uses stateful proxies.
Registrar Servers
A Registrar handles SIP REGISTER requests. When a phone boots up, it sends a REGISTER to tell the network “I am reachable at this IP address.” The registrar stores this binding in a location service database. When someone calls that user, a proxy queries the location service to find the current IP address.
Redirect Servers
A Redirect Server responds to requests with a 3xx response containing an alternative URI, telling the UAC to try a different destination directly. Less common in modern deployments but useful for load distribution.
Back-to-Back User Agents (B2BUAs)
A B2BUA sits in the call path and acts as a UAS on one side and a UAC on the other. It terminates the incoming SIP dialog and originates a new one toward the destination. This gives the B2BUA full control over the signaling — it can modify headers, topology-hide network details, enforce policy, and manage media.
Session Border Controllers (SBCs) are the most common B2BUAs in carrier networks. They sit at the boundary between networks and handle security, NAT traversal, protocol normalization, and interoperability.
The INVITE Transaction
The core of SIP is the call setup flow. Here is a basic successful call:
Caller (UAC) Proxy/SBC Callee (UAS)
| | |
|--- INVITE ----------->|--- INVITE ----------->|
|<-- 100 Trying --------| |
| |<-- 100 Trying --------|
| |<-- 180 Ringing --------|
|<-- 180 Ringing --------| |
| |<-- 200 OK ------------|
|<-- 200 OK ------------| |
|--- ACK --------------------------------------------->|
| | |
|<================== RTP Media ==================>|
| | |
|--- BYE --------------------------------------------->|
|<-----------------------------------------200 OK ----|
| | |Step by step:
-
INVITE: The caller sends an INVITE request to initiate a call. The INVITE includes an SDP body describing the caller’s media capabilities (codecs, IP address, port).
-
100 Trying: Each hop sends a 100 Trying provisional response indicating it received the INVITE and is processing it. This stops retransmissions.
-
180 Ringing: The callee’s phone is ringing. The caller’s device plays ringback tone upon receiving this. Some networks also send 183 Session Progress with early media (actual ringback audio in the RTP stream).
-
200 OK: The callee answers. The 200 OK includes an SDP body with the callee’s chosen media parameters (codec, IP, port). This completes the SDP offer/answer exchange.
-
ACK: The caller acknowledges the 200 OK. The three-way handshake (INVITE / 200 OK / ACK) is complete and the call is established.
-
RTP Media: Audio flows directly between the endpoints (or through media relays if B2BUAs are in the path) using RTP. This is the actual voice conversation.
-
BYE: Either party hangs up by sending a BYE request.
-
200 OK (to BYE): The other side acknowledges. The call is terminated and resources are released.
SIP Message Anatomy
SIP messages are plain text, structured like HTTP. A SIP INVITE looks like this:
INVITE sip:+[email protected] SIP/2.0
Via: SIP/2.0/UDP 198.51.100.10:5060;branch=z9hG4bK776asdhds
Max-Forwards: 70
From: "Chicago Caller" <sip:+[email protected]>;tag=1928301774
To: <sip:+[email protected]>
Call-ID: [email protected]
CSeq: 314159 INVITE
Contact: <sip:+[email protected]:5060>
Content-Type: application/sdp
Content-Length: 142
v=0
o=caller 2890844526 2890844526 IN IP4 198.51.100.10
s=SIP Call
c=IN IP4 198.51.100.10
t=0 0
m=audio 49170 RTP/AVP 0 8 101
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-16
a=sendrecvKey Headers
| Header | Purpose |
|---|---|
| Request-URI | The destination (sip:[email protected]) |
| Via | Records the path the request takes; used to route responses back. The branch parameter uniquely identifies the transaction. |
| From | The caller’s identity. The tag parameter helps identify the dialog. |
| To | The callee’s identity. Gets a tag added in responses. |
| Call-ID | Globally unique identifier for this call/dialog. |
| CSeq | Sequence number and method. Used to match requests with responses and order transactions within a dialog. |
| Contact | Direct reachability address for the sender. Used for subsequent in-dialog requests (re-INVITE, BYE). |
| Max-Forwards | Hop limit (like TTL). Prevents infinite loops. Decremented by each proxy. |
SDP Body
The Session Description Protocol body describes media capabilities using the offer/answer model (RFC 3264):
- v=: Protocol version (always 0)
- o=: Originator/session identifier
- c=: Connection data — the IP address where the sender expects to receive RTP
- m=: Media line — media type (audio), port number, transport (RTP/AVP), and payload type numbers
- a=rtpmap: Maps payload type numbers to codec names (0 = PCMU/G.711 u-law, 8 = PCMA/G.711 A-law)
- a=fmtp: Format-specific parameters (e.g., DTMF event range for telephone-event)
The answerer selects from the offered codecs and responds with their own SDP. Both sides then send and receive RTP using the agreed-upon codec and addressing.
Common Codecs
| Codec | Bandwidth | Quality | Use Case |
|---|---|---|---|
| G.711 (PCMU/PCMA) | 64 kbps | Toll quality | LAN, high-bandwidth trunks |
| G.729 | 8 kbps | Good | WAN, bandwidth-constrained links |
| G.722 | 48-64 kbps | HD voice (wideband) | HD-capable endpoints |
| Opus | 6-510 kbps (variable) | Excellent | WebRTC, modern VoIP platforms |
Registration
Before a SIP endpoint can receive calls, it must register its current location:
REGISTER sip:pbx.example.com SIP/2.0
From: <sip:[email protected]>;tag=a73kszlfl
To: <sip:[email protected]>
Contact: <sip:[email protected]:5060>
Expires: 3600
Authorization: Digest username="ext200", realm="pbx.example.com",
nonce="...", response="..."The endpoint tells the registrar: “I am extension 200, and I am currently reachable at 192.168.1.50.” The registrar stores this binding for the requested duration (Expires header, typically 1-24 hours). The endpoint re-registers periodically to keep the binding alive.
Authentication uses HTTP Digest authentication (challenge/response with a shared secret) to prevent unauthorized registrations.
Mid-Call Operations
SIP supports modifying calls in progress:
- Re-INVITE: Changes session parameters (e.g., codec renegotiation, putting a call on hold by setting the SDP to
a=sendonlyora=inactive) - UPDATE: Similar to re-INVITE but can be sent before the call is established
- REFER: Transfers a call to another party. The REFER tells the other endpoint to send an INVITE to the transfer target. Used for both blind and attended transfers.
- INFO: Carries application-level information within a dialog (sometimes used for DTMF)
SIP in Carrier Networks
In the context of telecom routing, SIP is used for:
SIP Trunking
A SIP trunk replaces traditional PRI (T1) connections between a PBX and the carrier network. Instead of 23 physical voice channels on a copper circuit, SIP trunking delivers voice as packets over an IP connection. Multiple simultaneous calls share the same IP link, with capacity limited by bandwidth rather than channel count.
SIP trunks from carriers use E.164 phone numbers in the SIP URIs (sip:[email protected]), and routing decisions are based on NPA/NXX data just as they are in the PSTN.
Session Border Controllers
SBCs are deployed at every carrier boundary. They handle:
- Security: Protect against SIP-based attacks (toll fraud, denial of service)
- NAT traversal: Fix addressing issues when endpoints are behind NAT
- Topology hiding: Mask internal network structure from external peers
- Protocol normalization: Reconcile SIP dialect differences between carriers
- Media anchoring: Force RTP through the SBC for quality monitoring and lawful intercept
Phone Number Mapping
SIP uses URIs for addressing, but the PSTN uses E.164 phone numbers. The mapping is straightforward: the phone number is embedded in the SIP URI as sip:+1NPANXXXXXX@domain. The tel: URI scheme (tel:+13125551234) is also used, particularly in the diversion and P-Asserted-Identity headers.
ENUM (E.164 Number Mapping, RFC 6116) provides a DNS-based lookup from E.164 numbers to SIP URIs, but adoption has been limited outside of carrier peering networks.
SIP Security
SIP was designed in an era of more trusted networks, and security was not deeply integrated into the original protocol. Modern deployments address this with:
- TLS (Transport Layer Security) encrypts SIP signaling. The SIPS URI scheme (
sips:) indicates TLS is required. - SRTP (Secure RTP) encrypts the media stream. Negotiated via SDP using the
RTP/SAVPprofile. - STIR/SHAKEN provides cryptographic caller ID authentication, combating the spoofing that enables robocalls.
In practice, TLS for SIP signaling is standard between carriers. SRTP adoption is growing but not yet universal on carrier-to-carrier trunks.
Further Reading
- SS7: Signaling System 7 — the legacy signaling protocol that SIP is replacing
- SIP vs. SS7 — how the two protocols coexist in modern networks
- STIR/SHAKEN — caller ID authentication built on SIP
- VoIP Network Architecture — how SIP fits into the broader network
- How a Phone Call Gets Routed — SIP in action within the call routing chain