doc/sip/sip_introduction.xml
a4db0a16
 <?xml version="1.0" encoding="UTF-8"?>
a96c7492
 <!DOCTYPE section PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" 
    "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
a4db0a16
 
 <section id="sip_intro" xmlns:xi="http://www.w3.org/2001/XInclude">
     <sectioninfo>
 	<authorgroup>
 	    <author>
 		<firstname>Jan</firstname>
 		<surname>Janak</surname>
 		<email>jan@iptel.org</email>
 	    </author>
 	</authorgroup>
 	<copyright>
 	    <year>2003</year>
 	    <holder>FhG FOKUS</holder>
 	</copyright>
 	<abstract>
 	    <para>
 		A brief overview of SIP describing all important aspects of the Session Initiation
 		Protocol.
 	    </para>
 	</abstract>
     </sectioninfo>
 
     <title>SIP Introduction</title>
     <section id="purpose">
 	<title>Purpose of SIP</title>
 	<simpara>
 	    SIP stands for Session Initiation Protocol. It is an application-layer control
 	    protocol which has been developed and designed within the IETF. The protocol has
 	    been designed with easy implementation, good scalability, and flexibility in mind.
 	</simpara>
 	<simpara>
 	    The specification is available in form of several <abbrev>RFCs</abbrev>, the most
 	    important one is RFC3261 which contains the core protocol specification. The
 	    protocol is used for creating, modifying, and terminating sessions with one or more
 	    participants. By sessions we understand a set of senders and receivers that
 	    communicate and the state kept in those senders and receivers during the
 	    communication. Examples of a session can include Internet telephone calls,
 	    distribution of multimedia, multimedia conferences, distributed computer games, etc.
 	</simpara>
 	<simpara>
 	    SIP is not the only protocol that the communicating devices will need. It is not
 	    meant to be a general purpose protocol. Purpose of SIP is just to make the
 	    communication possible, the communication itself must be achieved by another means
 	    (and possibly another protocol). Two protocols that are most often used along with
 	    SIP are RTP and SDP. RTP protocol is used to carry the real-time multimedia
 	    data (including audio, video, and text), the protocol makes it possible to encode
 	    and split the data into packets and transport such packets over the
 	    Internet. Another important protocol is SDP, which is used to describe and encode
 	    capabilities of session participants. Such a description is then used to negotiate
 	    the characteristics of the session so that all the devices can participate (that
 	    includes, for example, negotiation of codecs used to encode media so all the
 	    participants will be able to decode it, negotiation of transport protocol used and
 	    so on).
 	</simpara>
 	<simpara>
 	    SIP has been designed in conformance with the Internet model. It is an end-to-end
 	    oriented signaling protocol which means, that all the logic is stored in end
 	    devices (except routing of SIP messages). State is also stored in end-devices
 	    only, there is no single point of failure and networks designed this way scale
 	    well. The price that we have to pay for the distributiveness and scalability is
 	    higher message overhead, caused by the messages being sent end-to-end.
 	</simpara>
 	<simpara>
 	    It is worth of mentioning that the end-to-end concept of SIP is a significant
 	    divergence from regular PSTN (Public Switched Telephone Network) where all the
 	    state and logic is stored in the network and end devices (telephones) are very
 	    primitive. Aim of SIP is to provide the same functionality that the traditional
 	    PSTNs have, but the end-to-end design makes SIP networks much more powerful and
 	    open to the implementation of new services that can be hardly implemented in the
 	    traditional PSTNs.
 	</simpara>
 	<simpara>
 	    SIP is based on HTTP protocol. The HTTP protocol inherited format of message
 	    headers from RFC822. HTTP is probably the most successful and widely used
 	    protocol in the Internet. It tries to combine the best of the both. In fact, HTTP
 	    can be classified as a signaling protocol too, because user agents use the protocol
 	    to tell a HTTP server in which documents they are interested in. SIP is used to
 	    carry the description of session parameters, the description is encoded into a
 	    document using SDP. Both protocols (HTTP and SIP) have inherited encoding of
 	    message headers from RFC822. The encoding has proven to be robust and flexible
 	    over the years.
 	</simpara>
     </section>
     <section id="sip_uri">
 	<title>SIP URI</title>
 	<simpara>
 	    SIP entities are identified using SIP URI (Uniform Resource Identifier). A
 	    SIP URI has form of sip:username@domain, for instance,
 	    sip:joe@company.com. As we can see, SIP URI consists of username part and
 	    domain name part delimited by @ (at) character. SIP URIs are similar to
 	    e-mail addresses, it is, for instance, possible to use the same URI for e-mail
 	    and SIP communication, such URIs are easy to remember.
 	</simpara>
     </section>
     <section id="sip_network_elements">
 	<title>SIP Network Elements</title>
 	<simpara>
 	    Although in the simplest configuration it is possible to use just two user agents
 	    that send SIP messages directly to each other, a typical SIP network will
 	    contain more than one type of SIP elements. Basic SIP elements are user agents,
 	    proxies, registrars, and redirect servers. We will briefly describe them in this
 	    section.
 	</simpara>
 	<simpara>
 	    Note that the elements, as presented in this section, are often only logical
 	    entities. It is often profitable to co-locate them together, for instance, to
 	    increase the speed of processing, but that depends on a particular implementation
 	    and configuration.
 	</simpara>
 	<section id="user_agents">
 	    <title>User Agents</title>
 	    <simpara>
 		Internet end points that use SIP to find each other and to negotiate a session
 		characteristics are called <emphasis>user agents</emphasis>. User agents
 		usually, but not necessarily, reside on a user's computer in form of an
 		application--this is currently the most widely used approach, but user agents
 		can be also cellular phones, PSTN gateways, <acronym>PDAs</acronym>, automated
 		<acronym>IVR</acronym> systems and so on.
 	    </simpara>
 	    <simpara>
 		User agents are often referred to as <emphasis>User Agent Server</emphasis>
 		(UAS) and <emphasis>User Agent Client</emphasis> (UAC). UAS and UAC are
 		logical entities only, each user agent contains a UAC and UAS. UAC is the
 		part of the user agent that sends requests and receives responses. UAS is the
 		part of the user agent that receives requests and sends responses.
 	    </simpara>
 	    <simpara>
 		Because a user agent contains both UAC and UAS, we often say that a user
 		agent behaves like a UAC or UAS. For instance, caller's user agent behaves
 		like UAC when it sends an INVITE requests and receives responses to the
 		request. Callee's user agent behaves like a UAS when it receives the INVITE
 		and sends responses.
 	    </simpara>
 	    <simpara>
 		But this situation changes when the callee decides to send a BYE and terminate
 		the session. In this case the callee's user agent (sending BYE) behaves like
 		UAC and the caller's user agent behaves like UAS.
 	    </simpara>
 	    <figure id="uac_and_uas">
 		<title>UAC and UAS</title>		    
 		<mediaobject>
 		    <imageobject>
 			<imagedata fileref="figures/ua.png" format="PNG"/>
 		    </imageobject>
 		    <textobject>
 			<phrase>Picture showing UAC and UAS</phrase>
 		    </textobject>
 		</mediaobject>
 	    </figure>
 	    <simpara>
 		<xref linkend="uac_and_uas"/> shows three user agents and one stateful forking
 		    proxy. Each user agent contains UAC and UAS. The part of the proxy that
 		    receives the INVITE from the caller in fact acts as a UAS. When forwarding the
 		    request statefully the proxy creates two UACs, each of them is responsible for
 		    one branch.
 	    </simpara>
 	    <simpara>
 		In our example callee B picked up and later when he wants to tear down the call
 		it sends a BYE. At this time the user agent that was previously UAS becomes a
 		UAC and vice versa.
 	    </simpara>
 	</section>
 	<section id="proxy_servers">
 	    <title>Proxy Servers</title>
 	    <simpara>
 		In addition to that SIP allows creation of an infrastructure of network hosts
 		called <emphasis>proxy servers</emphasis>. User agents can send messages to a
 		proxy server. Proxy servers are very important entities in the SIP
 		infrastructure. They perform routing of a session invitations according to
 		invitee's current location, authentication, accounting and many other important
 		functions.
 	    </simpara>
 	    <simpara>
 		The most important task of a proxy server is to route session invitations
 		"closer" to callee. The session invitation will usually traverse a
 		set of proxies until it finds one which knows the actual location of the
 		callee. Such a proxy will forward the session invitation directly to the callee
 		and the callee will then accept or decline the session invitation.
 	    </simpara>
 	    <simpara>
 		There are two basic types of SIP proxy servers--stateless and stateful.
 	    </simpara>
 
 	    <section id="stateless_servers">
 		<title>Stateless Servers</title>
 		<simpara>
 		    Stateless server are simple message forwarders. They forward messages
 		    independently of each other. Although messages are usually arranged into
 		    transactions (see <xref linkend="sip_transactions"/>), stateless proxies
 			do not take care of transactions.
 		</simpara>
 		<simpara>
 		    Stateless proxies are simple, but faster than stateful proxy servers. They
 		    can be used as simple load balancers, message translators and routers. One
 		    of drawbacks of stateless proxies is that they are unable to absorb
 		    retransmissions of messages and perform more advanced routing, for instance,
 		    forking or recursive traversal.
 		</simpara>
 	    </section>
 	    <section id="stateful_servers">
 		<title>Stateful Servers</title>
 		<simpara>
 		    Stateful proxies are more complex. Upon reception of a request, stateful
 		    proxies create a state and keep the state until the transaction
 		    finishes. Some transactions, especially those created by INVITE, can last
 		    quite long (until callee picks up or declines the call). Because stateful
 		    proxies must maintain the state for the duration of the transactions, their
 		    performance is limited.
 		</simpara>
 		<simpara>
 		    The ability to associate SIP messages into transactions gives stateful
 		    proxies some interesting features. Stateful proxies can perform forking,
 		    that means upon reception of a message two or more messages will be sent
 		    out.
 		</simpara>
 		<simpara>
 		    Stateful proxies can absorb retransmissions because they know, from the
 		    transaction state, if they have already received the same message (stateless
 		    proxies cannot do the check because they keep no state).
 		</simpara>
 		<simpara>
 		    Stateful proxies can perform more complicated methods of finding a user. It
 		    is, for instance, possible to try to reach user's office phone and when he
 		    doesn't pick up then the call is redirected to his cell phone. Stateless
 		    proxies can't do this because they have no way of knowing how the
 		    transaction targeted to the office phone finished.
 		</simpara>
 		<simpara>
 		    Most SIP proxies today are stateful because their configuration is usually
 		    very complex. They often perform accounting, forking, some sort of NAT
 		    traversal aid and all those features require a stateful proxy.
 		</simpara>
 	    </section>
 	    <section id="proxy_server_usage">
 		<title>Proxy Server Usage</title>
 		<simpara>
 		    A typical configuration is that each centrally administered entity (a
 		    company, for instance) has it's own SIP proxy server which is used by all
 		    user agents in the entity. Let's suppose that there are two companies A and
 		    B and each of them has it's own proxy server. <xref linkend="companies"/>
 			shows how a session invitation from employee Joe in company A will reach
 			employee Bob in company B.
 		</simpara>
 		<figure id="companies">
 		    <title>Session Invitation</title>
 		    <mediaobject>
 			<imageobject>
 			    <imagedata fileref="figures/companies.png" format="PNG"/>
 			</imageobject>
 			<textobject>
 			    <phrase>Picture showing a session invitation message flow</phrase>
 			</textobject>
 		    </mediaobject> 
 		</figure>
 		<simpara>
 		    User Joe uses address sip:bob@b.com to call Bob. Joe's user agent doesn't
 		    know how to route the invitation itself but it is configured to send all
 		    outbound traffic to the company SIP proxy server proxy.a.com. The proxy
 		    server figures out that user sip:bob@b.com is in a different company so it
 		    will look up B's SIP proxy server and send the invitation there. B's proxy
 		    server can be either pre-configured at proxy.a.com or the proxy will use
 		    <acronym>DNS SRV</acronym> records to find B's proxy server. The invitation
 		    reaches proxy.bo.com. The proxy knows that Bob is currently sitting in his
 		    office and is reachable through phone on his desk, which has IP address
 		    1.2.3.4, so the proxy will send the invitation there.
 		</simpara>
 	    </section>
 	</section>
dc0de2c7
 	<section id="sip_intro.registrar">
a4db0a16
 	    <title>Registrar</title>
 	    <simpara>
 		We mentioned that the SIP proxy at proxy.b.com knows current Bob's location
 		but haven't mentioned yet how a proxy can learn current location of a
 		user. Bob's user agent (SIP phone) must register with a
 		<emphasis>registrar</emphasis>. The registrar is a special SIP entity that
 		receives registrations from users, extracts information about their current
 		location (IP address, port and username in this case) and stores the
 		information into location database. Purpose of the location database is to map
 		sip:bob@b.com to something like sip:bob@1.2.3.4:5060. The location database is
 		then used by B's proxy server. When the proxy receives an invitation for
 		sip:bob@b.com it will search the location database. It finds
 		sip:bob@1.2.3.4:5060 and will send the invitation there. A registrar is very
 		often a logical entity only. Because of their tight coupling with proxies
 		registrars, are usually co-located with proxy servers.
 	    </simpara>
 	    <simpara>
 		<xref linkend="registrar_fig"/> shows a typical SIP registration. A REGISTER
 		    message containing Address of Record sip:jan@iptel.org and contact address
 		    sip:jan@1.2.3.4:5060 where 1.2.3.4 is IP address of the phone, is sent to the
 		    registrar. The registrar extracts this information and stores it into the
 		    location database. If everything went well then the registrar sends a 200 OK
 		    response to the phone and the process of registration is finished.
 	    </simpara>
 	    <figure id="registrar_fig">
 		<title>Registrar Overview</title>
 		<mediaobject>
 		    <imageobject>
 			<imagedata fileref="figures/registrar.png" format="PNG"/>
 		    </imageobject>
 		    <textobject>
 			<phrase>Picture showing a typical registrar</phrase>
 		    </textobject>
 		</mediaobject> 
 	    </figure>
 	    <simpara>
 		Each registration has a limited lifespan. Expires header field or expires
 		parameter of Contact header field determines for how long is the registration
 		valid. The user agent must refresh the registration within the lifespan
 		otherwise it will expire and the user will become unavailable.
 	    </simpara>
 	</section>
 	<section id="redirect_server">
 	    <title>Redirect Server</title>
 	    <simpara>
 		The entity that receives a request and sends back a reply containing a list of the
 		current location of a particular user is called <emphasis>redirect server</emphasis>. A
 		redirect server receives requests and looks up the intended recipient of the request in
 		the location database created by a registrar. It then creates a list of current
 		locations of the user and sends it to the request originator in a response within 3xx
 		class.
 	    </simpara>
 	    <simpara>
 		The originator of the request then extracts the list of destinations and sends
 		another request directly to them. <xref linkend="redirect"/> shows a typical
 		    redirection.
 	    </simpara>
 	    <figure id="redirect">
 		<title>SIP Redirection</title>
 		<mediaobject>
 		    <imageobject>
 			<imagedata fileref="figures/redirect.png" format="PNG"/>
 		    </imageobject>
 		    <textobject>
 			<phrase>Picture showing a redirection</phrase>
 		    </textobject>
 		</mediaobject> 
 	    </figure>
 	</section>
     </section>
     <section id="sip_messages">
 	<title>SIP Messages</title>
 	<simpara>
 	    Communication using SIP (often called signaling) comprises of series of
 	    <emphasis>messages</emphasis>. Messages can be transported independently by the
 	    network. Usually they are transported in a separate UDP datagram each. Each
 	    message consist of "first line", message header, and message body. The
 	    first line identifies type of the message. There are two types of
 	    messages--<emphasis>requests</emphasis> and <emphasis>responses</emphasis>.
 	    Requests are usually used to initiate some action or inform recipient of the request
 	    of something. Replies are used to confirm that a request was received and processed
 	    and contain the status of the processing.
 	</simpara>
 	<simpara>
 	    A typical SIP request looks like this:
 	</simpara>
 	<programlisting>
 <![CDATA[
 INVITE sip:7170@iptel.org SIP/2.0
 Via: SIP/2.0/UDP 195.37.77.100:5040;rport
 Max-Forwards: 10
 From: "jiri" <sip:jiri@iptel.org>;tag=76ff7a07-c091-4192-84a0-d56e91fe104f
 To: <sip:jiri@bat.iptel.org>
 Call-ID: d10815e0-bf17-4afa-8412-d9130a793d96@213.20.128.35
 CSeq: 2 INVITE
 Contact: <sip:213.20.128.35:9315>
 User-Agent: Windows RTC/1.0
 Proxy-Authorization: Digest username="jiri", realm="iptel.org", 
  algorithm="MD5", uri="sip:jiri@bat.iptel.org", 
  nonce="3cef753900000001771328f5ae1b8b7f0d742da1feb5753c", 
  response="53fe98db10e1074
  b03b3e06438bda70f"
 Content-Type: application/sdp
 Content-Length: 451
 
 v=0
 o=jku2 0 0 IN IP4 213.20.128.35
 s=session
 c=IN IP4 213.20.128.35
 b=CT:1000
 t=0 0
 m=audio 54742 RTP/AVP 97 111 112 6 0 8 4 5 3 101
 a=rtpmap:97 red/8000
 a=rtpmap:111 SIREN/16000
 a=fmtp:111 bitrate=16000
 a=rtpmap:112 G7221/16000
 a=fmtp:112 bitrate=24000
 a=rtpmap:6 DVI4/16000
 a=rtpmap:0 PCMU/8000
 a=rtpmap:4 G723/8000
 a=rtpmap: 3 GSM/8000
 a=rtpmap:101 telephone-event/8000
 a=fmtp:101 0-16
 ]]>
 	</programlisting>
 	<simpara>
 	    The first line tells us that this is INVITE message which is used to establish a
 	    session. The URI on the first line--sip:7170@iptel.org is called <emphasis>Request
 		URI</emphasis> and contains URI of the next hop of the message. In this case it
 	    will be host iptel.org.
 	</simpara>
 	<simpara>
 	    A SIP request can contain one or more Via header fields which are used to record
 	    path of the request. They are later used to route SIP responses exactly the same
 	    way. The INVITE message contains just one Via header field which was created by the
 	    user agent that sent the request. From the Via field we can tell that the user agent
 	    is running on host 195.37.77.100 and port 5060.
 	</simpara>
 	<simpara>
 	    From and To header fields identify initiator (caller) and recipient (callee) of the
 	    invitation (just like in SMTP where they identify sender and recipient of a
 	    message). From header field contains a tag parameter which serves as a dialog
 	    identifier and will be described in <xref linkend="sip_dialogs"/>.
 	</simpara>
 	<simpara>
 	    Call-ID header field is a dialog identifier and it's purpose is to identify messages
 	    belonging to the same call. Such messages have the same Call-ID identifier. CSeq is
 	    used to maintain order of requests. Because requests can be sent over an unreliable
 	    transport that can re-order messages, a sequence number must be present in the
 	    messages so that recipient can identify retransmissions and out of order requests.
 	</simpara>
 	<simpara>
 	    Contact header field contains IP address and port on which the sender is awaiting
 	    further requests sent by callee. Other header fields are not important and will be
 	    not described here.
 	</simpara>
 	<simpara>
 	    Message header is delimited from message body by an empty line. Message body of the INVITE
 	    request contains a description of the media type accepted by the sender and encoded in
 	    SDP.
 	</simpara>
 	<section id="sip_requests">
 	    <title>SIP Requests</title>
 	    <simpara>
 		We have described how an INVITE request looks like and said that the request is
 		used to invite a callee to a session. Other important requests are:
 	    </simpara>
 	    <itemizedlist>
 		<listitem>
 		    <simpara>
 			<emphasis>ACK</emphasis>--This message acknowledges receipt of a final
 			response to INVITE. Establishing of a session utilizes 3-way
 			hand-shaking due to asymmetric nature of the invitation. It may take a
 			while before the callee accepts or declines the call so the callee's
 			user agent periodically retransmits a positive final response until it
 			receives an ACK (which indicates that the caller is still there and
 			ready to communicate).
 		    </simpara>
 		</listitem>
 		<listitem>
 		    <simpara>
 			<emphasis>BYE</emphasis>--Bye messages are used to tear down multimedia
 			sessions. A party wishing to tear down a session sends a BYE to the
 			other party.
 		    </simpara>
 		</listitem>
 		<listitem>
 		    <simpara>
 			<emphasis>CANCEL</emphasis>--Cancel is used to cancel not yet fully
 			established session. It is used when the callee hasn't replied with a
 			final response yet but the caller wants to abort the call (typically
 			when a callee doesn't respond for some time).
 		    </simpara>
 		</listitem>
 		<listitem>
 		    <simpara>
 			<emphasis>REGISTER</emphasis>--Purpose of REGISTER request is to let
 			registrar know of current user's location. Information about current
 			IP address and port on which a user can be reached is carried in
 			REGISTER messages. Registrar extracts this information and puts it into
 			a location database. The database can be later used by SIP proxy
 			servers to route calls to the user. Registrations are time-limited and
 			need to be periodically refreshed.
 		    </simpara>
 		</listitem>
 	    </itemizedlist>
 	    <simpara>
 		The listed requests usually have no message body because it is not needed in
 		most situations (but can have one). In addition to that many other request types
 		have been defined but their description is out of the scope of this document.
 	    </simpara>
 	</section>
 	<section id="sip_responses">
 	    <title>SIP Responses</title>
 	    <simpara>
 		When a user agent or proxy server receives a request it send a reply. Each
 		request must be replied except ACK requests which trigger no replies.
 	    </simpara>
 	    <simpara>
 		A typical reply looks like this:
 	    </simpara>
 	    <programlisting>
 <![CDATA[
 SIP/2.0 200 OK
 Via: SIP/2.0/UDP 192.168.1.30:5060;received=66.87.48.68
 From: sip:sip2@iptel.org
 To: sip:sip2@iptel.org;tag=794fe65c16edfdf45da4fc39a5d2867c.b713
 Call-ID: 2443936363@192.168.1.30
 CSeq: 63629 REGISTER
 Contact: Msip:sip2@66.87.48.68:5060;transport=udp>;q=0.00;expires=120
 Server: Sip EXpress router (0.8.11pre21xrc (i386/linux))
 Content-Length: 0
 Warning: 392 195.37.77.101:5060 "Noisy feedback tells:  
   pid=5110 req_src_ip=66.87.48.68 req_src_port=5060 in_uri=sip:iptel.org 
   out_uri=sip:iptel.org via_cnt==1"
 ]]>
 	    </programlisting>
 	    <simpara>
 		As we can see, responses are very similar to the requests, except for the first
 		line. The first line of response contains protocol version (SIP/2.0), reply
 		code, and reason phrase.
 	    </simpara>
 	    <simpara>
 		The <emphasis>reply code</emphasis> is an integer number from 100 to 699 and
 		indicates type of the response. There are 6 classes of responses:
 	    </simpara>
 	    <itemizedlist>
 		<listitem>
 		    <simpara>
 			<emphasis>1xx</emphasis> are <emphasis>provisional</emphasis>
 			responses. A provisional response is response that tells to its
 			recipient that the associated request was received but result of the
 			processing is not known yet. Provisional responses are sent only when
 			the processing doesn't finish immediately. The sender must stop
 			retransmitting the request upon reception of a provisional response.
 		    </simpara>
 		    <simpara>
 			Typically proxy servers send responses with code 100 when they start
 			processing an INVITE and user agents send responses with code 180
 			(Ringing) which means that the callee's phone is ringing.
 		    </simpara>
 		</listitem>
 		<listitem>
 		    <simpara>
 			<emphasis>2xx</emphasis> responses are <emphasis>positive
 			    final</emphasis> responses. A final response is the ultimate response
 			that the originator of the request will ever receive. Therefore final
 			responses express result of the processing of the associated
 			request. Final responses also terminate transactions. Responses with
 			code from 200 to 299 are positive responses that means that the request
 			was processed successfully and accepted. For instance a 200 OK response
 			is sent when a user accepts invitation to a session (INVITE request).
 		    </simpara>
 		    <simpara>
 			A UAC may receive several 200 messages to a single INVITE
 			request. This is because a forking proxy (described later) can fork the
 			request so it will reach several UAS and each of them will accept the
 			invitation. In this case each response is distinguished by the tag
 			parameter in To header field. Each response represents a distinct dialog
 			with unambiguous dialog identifier.
 		    </simpara>
 		</listitem>
 		<listitem>
 		    <simpara>
 			<emphasis>3xx</emphasis> responses are used to redirect a caller. A
 			redirection response gives information about the user's new location or
 			an alternative service that the caller might use to satisfy the
 			call. Redirection responses are usually sent by proxy servers. When a
 			proxy receives a request and doesn't want or can't process it for any
 			reason, it will send a redirection response to the caller and put
 			another location into the response which the caller might want to
 			try. It can be the location of another proxy or the current location of
 			the callee (from the location database created by a registrar). The
 			caller is then supposed to re-send the request to the new location. 3xx
 			responses are final.
 		    </simpara>
 		</listitem>
 		<listitem>
 		    <simpara>
 			<emphasis>4xx</emphasis> are <emphasis>negative final</emphasis>
 			responses. a 4xx response means that the problem is on the sender's
 			side. The request couldn't be processed because it contains bad syntax
 			or cannot be fulfilled at that server.
 		    </simpara>
 		</listitem>
 		<listitem>
 		    <simpara>
 			<emphasis>5xx</emphasis> means that the problem is on server's side. The
 			request is apparently valid but the server failed to fulfill it. Clients
 			should usually retry the request later.
 		    </simpara>
 		</listitem>
 		<listitem>
 		    <simpara>
 			<emphasis>6xx</emphasis> reply code means that the request cannot be
 			fulfilled at any server. This response is usually sent by a server that
 			has definitive information about a particular user. User agents usually
 			send a 603 Decline response when the user doesn't want to participate in
 			the session.
 		    </simpara>
 		</listitem>
 	    </itemizedlist>
 	    <simpara>
 		In addition to the response class the first line also contains <emphasis>reason
 		    phrase</emphasis>. The code number is intended to be processed by
 		machines. It is not very human-friendly but it is very easy to parse and
 		understand by machines. The reason phrase usually contains a human-readable
 		message describing the result of the processing. A user agent should render
 		the reason phrase to the user.
 	    </simpara>
 	    <simpara>
 		The request to which a particular response belongs is identified using the CSeq
 		header field. In addition to the sequence number this header field also contains
 		method of corresponding request. In our example it was REGISTER request.
 	    </simpara>
 	</section>
     </section> 
     <section id="sip_transactions">
 	<title>SIP Transactions</title>
 	<simpara>
 	    Although we said that SIP messages are sent independently over the network, they
 	    are usually arranged into <emphasis>transactions</emphasis> by user agents and
 	    certain types of proxy servers. Therefore SIP is said to be a
 	    <emphasis>transactional protocol</emphasis>.
 	</simpara>
 	<simpara>
 	    A transaction is a sequence of SIP messages exchanged between SIP network
 	    elements. A transaction consists of one request and all responses to that
 	    request. That includes zero or more provisional responses and one or more final
 	    responses (remember that an INVITE might be answered by more than one final response
 	    when a proxy server forks the request).
 	</simpara>
 	<simpara>
 	    If a transaction was initiated by an INVITE request then the same transaction also
 	    includes ACK, but only if the final response was not a 2xx response. If the final
 	    response was a 2xx response then the ACK is not considered part of the transaction.
 	</simpara>
 	<simpara>
 	    As we can see this is quite asymmetric behavior--ACK is part of transactions with a
 	    negative final response but is not part of transactions with positive final
 	    responses. The reason for this separation is the importance of delivery of all 200
 	    OK messages. Not only that they establish a session, but also 200 OK can be
 	    generated by multiple entities when a proxy server forks the request and all of them
 	    must be delivered to the calling user agent. Therefore user agents take
 	    responsibility in this case and retransmit 200 OK responses until they receive an
 	    ACK. Also note that only responses to INVITE are retransmitted !
 	</simpara>
 	<simpara>
 	    SIP entities that have notion of transactions are called
 	    <emphasis>stateful</emphasis>. Such entities usually create a state associated with
 	    a transaction that is kept in the memory for the duration of the transaction. When a
 	    request or response comes, a stateful entity tries to associate the request (or
 	    response) to existing transactions. To be able to do it it must extract a unique
 	    transaction identifier from the message and compare it to identifiers of all
 	    existing transactions. If such a transaction exists then it's state gets updated
 	    from the message.
 	</simpara>
 	<simpara>
 	    In the previous SIP RFC2543 the transaction identifier was calculated as hash of
 	    all important message header fields (that included To, From, Request-URI and
 	    CSeq). This proved to be very slow and complex, during interoperability tests such
 	    transaction identifiers used to be a common source of problems.
 	</simpara>
 	<simpara>
 	    In the new RFC3261 the way of calculating transaction identifiers was completely
 	    changed. Instead of complicated hashing of important header fields a SIP message now
 	    includes the identifier directly. Branch parameter of Via header fields contains directly
 	    the transaction identifier. This is significant simplification, but there still exist old
 	    implementations that don't support the new way of calculating of transaction identifier so
 	    even new implementations have to support the old way. They must be backwards compatible.
 	</simpara>
 	<simpara>
 	    <xref linkend="transactions"/> shows what messages belong to what transactions
 		during a conversation of two user agents.
 	</simpara>
 	<figure id="transactions">
 	    <title>SIP Transactions</title>
 	    <mediaobject>
 		<imageobject>
 		    <imagedata fileref="figures/transaction.png" format="PNG"/>
 		</imageobject>
 		<textobject>
 		    <phrase>Message flow showing messages belonging to the same transaction.</phrase>
 		</textobject>
 	    </mediaobject>
 	</figure>
     </section>
     <section id="sip_dialogs">
 	<title>SIP Dialogs</title>
 	<simpara>
 	    We have shown what transactions are, that one transaction includes INVITE and it's
 	    responses and another transaction includes BYE and it responses when a session is
 	    being torn down. But we feel that those two transactions should be somehow
 	    related--both of them belong to the same <emphasis>dialog</emphasis>. A dialog
 	    represents a peer-to-peer SIP relationship between two user agents. A dialog
 	    persists for some time and it is very important concept for user agents. Dialogs
 	    facilitate proper sequencing and routing of messages between SIP endpoints.
 	</simpara>
 	<simpara>
 	    Dialogs are identified using Call-ID, From tag, and To
 	    tag. Messages that have these three identifiers same belong to the
 	    same dialog. We have shown that CSeq header field is used to order
 	    messages, in fact it is used to order messages within a dialog. The
 	    number must be monotonically increased for each message sent within
 	    a dialog otherwise the peer will handle it as out of order request
 	    or retransmission. In fact, the CSeq number identifies a
 	    transaction within a dialog because we have said that requests and
 	    associated responses are called transaction. This means that only
 	    one transaction in each direction can be active within a
 	    dialog. One could also say that a <emphasis>dialog is a sequence of
 	    transactions</emphasis>. <xref linkend="dialog"/> extends <xref
 	    linkend="transactions"/> to show which messages belong to the
 	    same dialog.
 	</simpara>
 	<figure id="dialog">
 	    <title>SIP Dialog</title>
 	    <mediaobject>
 		<imageobject>
 		    <imagedata fileref="figures/dialog.png" format="PNG"/>
 		</imageobject>
 		<textobject>
 		    <phrase>Message flow showing transactions belonging to the same dialog.</phrase>
 		</textobject>
 	    </mediaobject>
 	</figure>
 	<simpara>
 	    Some messages establish a dialog and some do not. This allows to explicitly express
 	    the relationship of messages and also to send messages that are not related to other
 	    messages outside a dialog. That is easier to implement because user agent don't have
 	    to keep the dialog state.
 	</simpara>
 	<simpara>
 	    For instance, INVITE message establishes a dialog, because it will be later followed
 	    by BYE request which will tear down the session established by the INVITE. This BYE
 	    is sent within the dialog established by the INVITE.
 	</simpara>
 	<simpara>
 	    But if a user agent sends a MESSAGE request, such a request doesn't establish any
 	    dialog. Any subsequent messages (even MESSAGE) will be sent independently of the
 	    previous one.
 	</simpara>
 	<section id="dialogs_facilitate_routing">
 	    <title>Dialogs Facilitate Routing</title>
 	    <simpara>
 		We have said that dialogs are also used to route the messages between user
 		agents, let's describe this a little bit.
 	    </simpara>
 	    <simpara>
 		Let's suppose that user sip:bob@a.com wants to talk to user sip:pete@b.com. He
 		knows SIP address of the callee (sip:pete@b.com) but this address doesn't say
 		anything about current location of the user--i.e. the caller doesn't know to
 		which host to send the request. Therefore the INVITE request will be sent to a
 		proxy server.
 	    </simpara>
 	    <simpara>
 		The request will be sent from proxy to proxy until it reaches one that knows
 		current location of the callee. This process is called routing. Once the request
 		reaches the callee, the callee's user agent will create a response that will be
 		sent back to the caller. Callee's user agent will also put Contact header field
 		into the response which will contain the current location of the user. The
 		original request also contained Contact header field which means that both user
 		agents know the current location of the peer.
 	    </simpara>
 	    <simpara>
 		Because the user agents know location of each other, it is not necessary to send
 		further requests to any proxy--they can be sent directly from user agent to user
 		agent. That's exactly how dialogs facilitate routing.
 	    </simpara>
 	    <simpara>
 		Further messages within a dialog are sent directly from user agent to user
 		agent. This is a significant performance improvement because proxies do not see
 		all the messages within a dialog, they are used to route just the first request
 		that establishes the dialog. The direct messages are also delivered with much
 		smaller latency because a typical proxy usually implements complex routing
 		logic. <xref linkend="trapezoid"/> contains an example of a message
 		    within a dialog (BYE) that bypasses the proxies.
 	    </simpara>
 	    <figure id="trapezoid">
 		<title>SIP Trapezoid</title>
 		<mediaobject>
 		    <imageobject>
 			<imagedata fileref="figures/trapezoid.png" format="PNG"/>
 		    </imageobject>
 		    <textobject>
 			<phrase>Message flow showing SIP trapezoid.</phrase>
 		    </textobject>
 		</mediaobject>
 	    </figure>
 	</section>
 	<section id="dialogs_identifiers">
 	    <title>Dialog Identifiers</title>
 	    <simpara>
 		We have already shown that dialog identifiers consist of three parts, Call-Id,
 		From tag, and To tag, but it is not that clear why are dialog identifiers
 		created exactly this way and who contributes which part.
 	    </simpara>
 	    <simpara>
 		Call-ID is so called <emphasis>call identifier</emphasis>. It must be a unique
 		string that identifies a call. A call consists of one or more dialogs. Multiple
 		user agents may respond to a request when a proxy along the path forks the
 		request. Each user agent that sends a 2xx establishes a separate dialog with the
 		caller. All such dialogs are part of the same call and have the same Call-ID.
 	    </simpara>
 	    <simpara>
 		From tag is generated by the caller and it uniquely identifies the dialog in the
 		caller's user agent.
 	    </simpara>
 	    <simpara>
 		To tag is generated by a callee and it uniquely identifies, just like From tag,
 		the dialog in the callee's user agent.
 	    </simpara>
 	    <simpara>
 		This hierarchical dialog identifier is necessary because a single call
 		invitation can create several dialogs and caller must be able to distinguish
 		them.
 	    </simpara>
 	</section>
     </section>
     <section id="typical_sip_scenarios">
 	<title>Typical SIP Scenarios</title>
 	<simpara>
 	    This section gives a brief overview of typical SIP scenarios that usually make up the
 	    SIP traffic.
 	</simpara>
 	<section id="registration">
 	    <title>Registration</title>
 	    <simpara>
 		Users must register themselves with a registrar to be reachable by other
 		users. A registration comprises a REGISTER message followed by a 200 OK sent by
 		registrar if the registration was successful. Registrations are usually
 		authorized so a 407 reply can appear if the user didn't provide valid
 		credentials. <xref linkend="register_fig"/> shows an example of registration.
 	    </simpara>
 	    <figure id="register_fig">
 		<title>REGISTER Message Flow</title>
 		<mediaobject>
 		    <imageobject>
 			<imagedata fileref="figures/register.png" format="PNG"/>
 		    </imageobject>
 		    <textobject>
 			<phrase>Message flow of a registration.</phrase>
 		    </textobject>
 		</mediaobject>
 	    </figure>
 	</section>
 	<section id="session_invitation">
 	    <title>Session Invitation</title>
 	    <simpara>
 		A session invitation consists of one INVITE request which is usually sent to a
 		proxy. The proxy sends immediately a 100 Trying reply to stop retransmissions
 		and forwards the request further.
 	    </simpara>
 	    <simpara>
 		All provisional responses generated by callee are sent back to the caller. See
 		180 Ringing response in the call flow. The response is generated when callee's
 		phone starts ringing.
 	    </simpara>
 	    <figure id="invite1">
 		<title>INVITE Message Flow</title>
 		<mediaobject>
 		    <imageobject>
 			<imagedata fileref="figures/invite1.png" format="PNG"/>
 		    </imageobject>
 		    <textobject>
 			<phrase>Picture showing a session invitation.</phrase>
 		    </textobject>
 		</mediaobject>
 	    </figure>
 	    <simpara>
 		A 200 OK is generated once the callee picks up the phone and it is retransmitted
 		by the callee's user agent until it receives an ACK from the caller. The session
 		is established at this point.
 	    </simpara>
 	</section>
 	<section id="session_termination">
 	    <title>Session Termination</title>
 	    <simpara>
 		Session termination is accomplished by sending a BYE request within dialog
 		established bye INVITE. BYE messages are sent directly from one user agent to
 		the other unless a proxy on the path of the INVITE request indicated that it
 		wishes to stay on the path by using record routing (see <xref
 		    linkend="record_routing"/>.
 	    </simpara>
 	    <simpara>
 		Party wishing to tear down a session sends a BYE request to the other party
 		involved in the session. The other party sends a 200 OK response to confirm the
 		BYE and the session is terminated. See <xref linkend="bye"/>, left message
 		    flow.
 	    </simpara>
 	</section>
 	<section id="record_routing">
 	    <title>Record Routing</title>
 	    <simpara>
 		All requests sent within a dialog are by default sent directly from one user agent
 		to the other. Only requests outside a dialog traverse SIP proxies. This approach
 		makes SIP network more scalable because only a small number of SIP messages hit
 		the proxies.
 	    </simpara>
 	    <simpara>
 		There are certain situations in which a SIP proxy need to stay on the path of all
 		further messages. For instance, proxies controlling a NAT box or proxies doing
 		accounting need to stay on the path of BYE requests.
 	    </simpara>
 	    <simpara>
 		Mechanism by which a proxy can inform user agents that it wishes to stay on the path
 		of all further messages is called <emphasis>record routing</emphasis>. Such a proxy
 		would insert Record-Route header field into SIP messages which contain address of
 		the proxy. Messages sent within a dialog will then traverse all SIP proxies that
 		put a Record-Route header field into the message.
 	    </simpara>
 	    <simpara>
 		The recipient of the request receives a set of Record-Route header fields in the
 		message. It must mirror all the Record-Route header fields into responses because
 		the originator of the request also needs to know the set of proxies.
 	    </simpara>
 	    <figure id="bye">
 		<title>BYE Message Flow (With and without Record Routing)</title>
 		<mediaobject>
 		    <imageobject>
 			<imagedata fileref="figures/bye.png" format="PNG"/>
 		    </imageobject>
 		    <textobject>
 			<phrase>Picture showing BYE message flow with and without record routing.</phrase>
 		    </textobject>
 		</mediaobject>
 	    </figure>
 	    <simpara>
 		Left message flow of <xref linkend="bye"/> show how a BYE (request
 		    within dialog established by INVITE) is sent directly to the other user agent
 		    when there is no Record-Route header field in the message. Right message flow
 		    show how the situation changes when the proxy puts a Record-Route header field
 		    into the message.
 	    </simpara>
 	    <section id="strict_vs_loose">
 		<title>Strict versus Loose Routing</title>
 		<simpara>
 		    The way how record routing works has evolved. Record routing according to
 		    RFC2543 rewrote the Request-URI. That means the Request-URI always
 		    contained URI of the next hop (which can be either next proxy server which
 		    inserted Record-Route header field or destination user agent). Because of
 		    that it was necessary to save the original Request-URI as the last Route
 		    header field. This approach is called <emphasis>strict routing</emphasis>.
 		</simpara>
 		<simpara>
 		    <emphasis>Loose routing</emphasis>, as specified in RFC3261, works in a
 		    little bit different way. The Request-URI is no more overwritten, it always
 		    contains URI of the destination user agent. If there are any Route header
 		    field in a message, than the message is sent to the URI from the topmost
 		    Route header field. This is significant change--Request-URI doesn't
 		    necessarily contain URI to which the request will be sent. In fact, loose
 		    routing is very similar to IP source routing.
 		</simpara>
 		<simpara>
 		    Because transit from strict routing to loose routing would break backwards
 		    compatibility and older user agents wouldn't work, it is necessary to make
 		    loose routing backwards compatible. The backwards compatibility
 		    unfortunately adds a lot of overhead and is often source of major problems.
 		</simpara>
 	    </section>
 	</section>
 	<section id="sub_not">
 	    <title>Event Subscription And Notification</title>
 	    <simpara>
 		The SIP specification has been extended to support a general mechanism allowing
 		subscription to asynchronous events. Such evens can include SIP proxy statistics
 		changes, presence information, session changes and so on.
 	    </simpara>
 	    <simpara>
 		The mechanism is used mainly to convey information on presence (willingness to
 		communicate) of users. <xref linkend="event"/> shows the basic message
 		    flow.
 	    </simpara>
 	    <figure id="event">
 		<title>Event Subscription And Notification</title>
 		<mediaobject>
 		    <imageobject>
 			<imagedata fileref="figures/event.png" format="PNG"/>
 		    </imageobject>
 		    <textobject>
 			<phrase>Picture showing subscription and notification.</phrase>
 		    </textobject>
 		</mediaobject>
 	    </figure>
 	    <simpara>
 		A user agent interested in event notification sends a SUBSCRIBE message to a
 		SIP server. The SUBSCRIBE message establishes a dialog and is immediately
 		replied by the server using 200 OK response. At this point the dialog is
 		established. The server sends a NOTIFY request to the user every time the event
 		to which the user subscribed changes. NOTIFY messages are sent within the dialog
 		established by the SUBSCRIBE.
 	    </simpara>
 	    <simpara>
 		Note that the first NOTIFY message in <xref linkend="event"/> is sent
 		    regardless of any event that triggers notifications.
 	    </simpara>
 	    <simpara>
 		Subscriptions--as well as registrations--have limited lifespan and therefore must be
 		periodically refreshed.
 	    </simpara>
 	</section>
 	<section id="im">
 	    <title>Instant Messages</title>
 	    <simpara>
 		Instant messages are sent using MESSAGE request. MESSAGE requests do not establish a
 		dialog and therefore they will always traverse the same set of proxies. This is the
 		simplest form of sending instant messages. The text of the instant message is
 		transported in the body of the SIP request.
 	    </simpara>
 	    <figure id="message">
 		<title>Instant Messages</title>
 		<mediaobject>
 		    <imageobject>
 			<imagedata fileref="figures/message.png" format="PNG"/>
 		    </imageobject>
 		    <textobject>
 			<phrase>Picture showing a MESSAGE.</phrase>
 		    </textobject>
 		</mediaobject>
 	    </figure>
 	</section>
     </section>
 </section>