SIP: Session Initiation Protocol

SIP: Session Initiation Protocol

SIP is the signaling protocol that underlies virtually all modern voice communications. If you make a VoIP call, use a cloud PBX, place a call through Microsoft Teams or Zoom, or send traffic over a SIP trunk to a carrier, SIP is the protocol negotiating the connection. Understanding SIP is essential for anyone who operates, troubleshoots, or builds on top of voice infrastructure.

What SIP Is (and Is Not)

SIP (Session Initiation Protocol), defined in RFC 3261, is a signaling protocol. It establishes, modifies, and terminates communication sessions. It does not carry the actual audio — that is handled by RTP (Real-time Transport Protocol). SIP sets up the call; RTP carries the voice.

This separation is fundamental. SIP messages might traverse a completely different network path than the audio packets. A SIP proxy can route signaling without ever touching the media stream. This architecture enables the flexibility (and complexity) of modern voice networks.

SIP is text-based (similar in structure to HTTP), uses request/response transactions, and can run over UDP, TCP, or TLS (encrypted, sometimes called SIPS).

SIP Architecture

A SIP network consists of several types of entities:

User Agents

A User Agent (UA) is any SIP endpoint — a phone, softphone, ATA, PBX, or carrier SBC. User agents come in two roles:

  • UAC (User Agent Client): Initiates requests (e.g., sends an INVITE to start a call)
  • UAS (User Agent Server): Receives and responds to requests

In practice, every SIP endpoint acts as both UAC and UAS depending on the direction of the transaction. Your desk phone is a UAC when you place a call and a UAS when you receive one.

Proxy Servers

A SIP Proxy receives SIP requests and forwards them toward the destination, making routing decisions along the way. Proxies can be:

  • Stateless: Forward requests without tracking the transaction state. Fast but limited in functionality.
  • Stateful: Track the full transaction (e.g., retransmit if no response arrives). Required for features like forking (ringing multiple endpoints simultaneously).

Most carrier SIP infrastructure uses stateful proxies.

Registrar Servers

A Registrar handles SIP REGISTER requests. When a phone boots up, it sends a REGISTER to tell the network “I am reachable at this IP address.” The registrar stores this binding in a location service database. When someone calls that user, a proxy queries the location service to find the current IP address.

Redirect Servers

A Redirect Server responds to requests with a 3xx response containing an alternative URI, telling the UAC to try a different destination directly. Less common in modern deployments but useful for load distribution.

Back-to-Back User Agents (B2BUAs)

A B2BUA sits in the call path and acts as a UAS on one side and a UAC on the other. It terminates the incoming SIP dialog and originates a new one toward the destination. This gives the B2BUA full control over the signaling — it can modify headers, topology-hide network details, enforce policy, and manage media.

Session Border Controllers (SBCs) are the most common B2BUAs in carrier networks. They sit at the boundary between networks and handle security, NAT traversal, protocol normalization, and interoperability.

The INVITE Transaction

The core of SIP is the call setup flow. Here is a basic successful call:

    Caller (UAC)              Proxy/SBC              Callee (UAS)
         |                        |                        |
         |--- INVITE ----------->|--- INVITE ----------->|
         |<-- 100 Trying --------|                        |
         |                        |<-- 100 Trying --------|
         |                        |<-- 180 Ringing --------|
         |<-- 180 Ringing --------|                        |
         |                        |<-- 200 OK ------------|
         |<-- 200 OK ------------|                        |
         |--- ACK --------------------------------------------->|
         |                        |                        |
         |<================== RTP Media ==================>|
         |                        |                        |
         |--- BYE --------------------------------------------->|
         |<-----------------------------------------200 OK ----|
         |                        |                        |

Step by step:

  1. INVITE: The caller sends an INVITE request to initiate a call. The INVITE includes an SDP body describing the caller’s media capabilities (codecs, IP address, port).

  2. 100 Trying: Each hop sends a 100 Trying provisional response indicating it received the INVITE and is processing it. This stops retransmissions.

  3. 180 Ringing: The callee’s phone is ringing. The caller’s device plays ringback tone upon receiving this. Some networks also send 183 Session Progress with early media (actual ringback audio in the RTP stream).

  4. 200 OK: The callee answers. The 200 OK includes an SDP body with the callee’s chosen media parameters (codec, IP, port). This completes the SDP offer/answer exchange.

  5. ACK: The caller acknowledges the 200 OK. The three-way handshake (INVITE / 200 OK / ACK) is complete and the call is established.

  6. RTP Media: Audio flows directly between the endpoints (or through media relays if B2BUAs are in the path) using RTP. This is the actual voice conversation.

  7. BYE: Either party hangs up by sending a BYE request.

  8. 200 OK (to BYE): The other side acknowledges. The call is terminated and resources are released.

SIP Message Anatomy

SIP messages are plain text, structured like HTTP. A SIP INVITE looks like this:

INVITE sip:+[email protected] SIP/2.0
Via: SIP/2.0/UDP 198.51.100.10:5060;branch=z9hG4bK776asdhds
Max-Forwards: 70
From: "Chicago Caller" <sip:+[email protected]>;tag=1928301774
To: <sip:+[email protected]>
Call-ID: [email protected]
CSeq: 314159 INVITE
Contact: <sip:+[email protected]:5060>
Content-Type: application/sdp
Content-Length: 142

v=0
o=caller 2890844526 2890844526 IN IP4 198.51.100.10
s=SIP Call
c=IN IP4 198.51.100.10
t=0 0
m=audio 49170 RTP/AVP 0 8 101
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-16
a=sendrecv

Key Headers

Header Purpose
Request-URI The destination (sip:[email protected])
Via Records the path the request takes; used to route responses back. The branch parameter uniquely identifies the transaction.
From The caller’s identity. The tag parameter helps identify the dialog.
To The callee’s identity. Gets a tag added in responses.
Call-ID Globally unique identifier for this call/dialog.
CSeq Sequence number and method. Used to match requests with responses and order transactions within a dialog.
Contact Direct reachability address for the sender. Used for subsequent in-dialog requests (re-INVITE, BYE).
Max-Forwards Hop limit (like TTL). Prevents infinite loops. Decremented by each proxy.

SDP Body

The Session Description Protocol body describes media capabilities using the offer/answer model (RFC 3264):

  • v=: Protocol version (always 0)
  • o=: Originator/session identifier
  • c=: Connection data — the IP address where the sender expects to receive RTP
  • m=: Media line — media type (audio), port number, transport (RTP/AVP), and payload type numbers
  • a=rtpmap: Maps payload type numbers to codec names (0 = PCMU/G.711 u-law, 8 = PCMA/G.711 A-law)
  • a=fmtp: Format-specific parameters (e.g., DTMF event range for telephone-event)

The answerer selects from the offered codecs and responds with their own SDP. Both sides then send and receive RTP using the agreed-upon codec and addressing.

Common Codecs

Codec Bandwidth Quality Use Case
G.711 (PCMU/PCMA) 64 kbps Toll quality LAN, high-bandwidth trunks
G.729 8 kbps Good WAN, bandwidth-constrained links
G.722 48-64 kbps HD voice (wideband) HD-capable endpoints
Opus 6-510 kbps (variable) Excellent WebRTC, modern VoIP platforms

Registration

Before a SIP endpoint can receive calls, it must register its current location:

REGISTER sip:pbx.example.com SIP/2.0
From: <sip:[email protected]>;tag=a73kszlfl
To: <sip:[email protected]>
Contact: <sip:[email protected]:5060>
Expires: 3600
Authorization: Digest username="ext200", realm="pbx.example.com",
  nonce="...", response="..."

The endpoint tells the registrar: “I am extension 200, and I am currently reachable at 192.168.1.50.” The registrar stores this binding for the requested duration (Expires header, typically 1-24 hours). The endpoint re-registers periodically to keep the binding alive.

Authentication uses HTTP Digest authentication (challenge/response with a shared secret) to prevent unauthorized registrations.

Mid-Call Operations

SIP supports modifying calls in progress:

  • Re-INVITE: Changes session parameters (e.g., codec renegotiation, putting a call on hold by setting the SDP to a=sendonly or a=inactive)
  • UPDATE: Similar to re-INVITE but can be sent before the call is established
  • REFER: Transfers a call to another party. The REFER tells the other endpoint to send an INVITE to the transfer target. Used for both blind and attended transfers.
  • INFO: Carries application-level information within a dialog (sometimes used for DTMF)

SIP in Carrier Networks

In the context of telecom routing, SIP is used for:

SIP Trunking

A SIP trunk replaces traditional PRI (T1) connections between a PBX and the carrier network. Instead of 23 physical voice channels on a copper circuit, SIP trunking delivers voice as packets over an IP connection. Multiple simultaneous calls share the same IP link, with capacity limited by bandwidth rather than channel count.

SIP trunks from carriers use E.164 phone numbers in the SIP URIs (sip:[email protected]), and routing decisions are based on NPA/NXX data just as they are in the PSTN.

Session Border Controllers

SBCs are deployed at every carrier boundary. They handle:

  • Security: Protect against SIP-based attacks (toll fraud, denial of service)
  • NAT traversal: Fix addressing issues when endpoints are behind NAT
  • Topology hiding: Mask internal network structure from external peers
  • Protocol normalization: Reconcile SIP dialect differences between carriers
  • Media anchoring: Force RTP through the SBC for quality monitoring and lawful intercept

Phone Number Mapping

SIP uses URIs for addressing, but the PSTN uses E.164 phone numbers. The mapping is straightforward: the phone number is embedded in the SIP URI as sip:+1NPANXXXXXX@domain. The tel: URI scheme (tel:+13125551234) is also used, particularly in the diversion and P-Asserted-Identity headers.

ENUM (E.164 Number Mapping, RFC 6116) provides a DNS-based lookup from E.164 numbers to SIP URIs, but adoption has been limited outside of carrier peering networks.

SIP Security

SIP was designed in an era of more trusted networks, and security was not deeply integrated into the original protocol. Modern deployments address this with:

  • TLS (Transport Layer Security) encrypts SIP signaling. The SIPS URI scheme (sips:) indicates TLS is required.
  • SRTP (Secure RTP) encrypts the media stream. Negotiated via SDP using the RTP/SAVP profile.
  • STIR/SHAKEN provides cryptographic caller ID authentication, combating the spoofing that enables robocalls.

In practice, TLS for SIP signaling is standard between carriers. SRTP adoption is growing but not yet universal on carrier-to-carrier trunks.

Further Reading