How to do SIP-based VoIP call?

You can use the VideoIO component to build the front-end for a VoIP phone that talks to the backend server capable of translating between RTMP and SIP. In particular, it can work with the API defined by the Open Source SIP-RTMP gateway (siprtmp). The siprtmp module is actually part of the Open Source Flash RTMP server in Python (rtmplite) described in How to work with media server?. The siprtmp API is documented in siprtmp.py source code. If another backend server can perform similar translation between RTMP and SIP, then you can use VideoIO with that server as well. Additionally, you need a SIP user agent and a SIP proxy and registration server. In this article, we will use the same example set up as described in siprtmp web page for illustrations.

This article requires version 1.2 or later of VideoIO.swf that includes support for methods and events.

Download and install additional software

First step is to download the latest source code version of p2p-sip and rtmplite if you have not already done so. The p2p-sip software contains a SIP server for our demonstration, which you can replace with any other SIP server of your choice. The rtmplite software contains the siprtmp.py module for SIP-RTMP gateway.

On first terminal, download p2p-sip and run its sipd.py module as follows. This will run sipd on port 5060 by default.

$ git clone https://github.com/theintencity/p2p-sip.git
$ cd p2p-sip/src
$ PYTHONPATH=app:external:.
$ export PYTHONPATH
$ python app/sipd.py -d

On second terminal, download rtmplite and run its siprtmp module as follows. This will run siprtmp on port 1935 by default.

$ git clone https://github.com/theintencity/rtmplite.git
$ cd rtmplite
$ PYTHONPATH=../p2p-sip/src:.
$ export PYTHONPATH
$ python siprtmp.py -d

Install a SIP user agent such as X-lite and register a user account with sipd server. In my example I run X-lite on local host for testing -- configure user name to "kunxlite", password to "mypass", authorization user name to "kunxlite", domain to "localhost", check to register with domain and receive incoming calls, set proxy to "127.0.0.1:5060", remove check for voice mail, set IP address to use local IP address and STUN server to use specified server without giving a server, and presence mode to peer-to-peer. These settings are needed to test X-lite in local host mode and to register with local server.

Creating and embedding web phone

The last software piece in this illustration is the web phone embedded in this page. It is just a VideoIO application which connects to siprtmp gateway to register user, and make and receive SIP calls. You can view source of this page to see how javascript and HTML are used to embed the VideoIO for web phone and how various properties, methods and indications are handled.

The VideoIO application itself is with white background and hence does not show any user interface. You can right-click on the empty area below to see the VideoIO application properties. Also the controls property is not set hence there is no VideoIO control panel. All the user interface is via HTML and javascript.

Dialpad:
Property:
src= /
?rate= &bidirection= &arg= &arg= &arg= &arg=
Method:
call("invite", " ")
call("accept")
call("reject", " ")
call("bye")
Indication:

The following example shows how to embed the VideoIO application with white background suitable for a web phone use case. The object identifier and name are phone1. Since this does not need video, we set the dimension to 215x138, which is the minimum dimension needed by Flash Player to display the device access security prompt.

<object classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000"
    id="phone1" width="215" height="138"
    codebase="http://fpdownload.macromedia.com/get/flashplayer/current/swflash.cab">
    <param name="movie" value="VideoIO.swf" />
    <param name="quality" value="high" />
    <param name="bgcolor" value="#ffffff" />
    <param name="wmode" value="transparent" />
    <param name="allowScriptAccess" value="always" />
    <embed src="VideoIO.swf" quality="high" bgcolor="#ffffff"
        width="215" height="138" name="phone1" align="middle"
        play="true" loop="false" quality="high" wmode="transparent"
        allowScriptAccess="always"
        type="application/x-shockwave-flash"
        pluginspage="http://www.adobe.com/go/getflashplayer">
    </embed>
</object>

The following Javascript code shows the definition of getFlashMovie function, and couple of other callback methods, onCreationComplete and onPropertyChange, with empty function definitions, which are invoked by VideoIO application when the application is created and property changes, respectively.

<script>
function getFlashMovie(movieName) {
    var isIE = navigator.appName.indexOf("Microsoft") != -1;
    return (isIE) ? window[movieName] : document[movieName];  
}
function onCreationComplete(event) {  } // nothing
function onPropertyChange(event) {  } // nothing
</script>

User registration or login

The first step for a user is to register or login to the SIP server using the web phone. This is done by setting the src property by clicking on set button. For connection to siprtmp to register, the value of the src property should be as follows.

rtmp://siprtmp-server/sip/username@domain
  ?rate=8&bidirection=true&arg=authuser&arg=authpass&arg=Kundan Singh&arg=narrowband

Here siprtmp-server is the host name or IP address of the machine running the siprtmp gateway, and providing the sip application. The username@domain is your SIP address-of-record with which you want to register. Generally the domain part is your SIP server host name or IP address. There are additional arguments of rate and bidirection supplied to the VideoIO application. The rate=8 enables interoperability with 8kHz Speex end-points, and bidirection=true is needed so that VideoIO can enable both play and publish. Additionally, the siprtmp gateway's API requires that you supply additional authentication and media parameters to the underlying NetConnection's connect method. The connect method takes four arguments: authentication username, authentication password, display name and rate of either narrowband or wideband. With rate=8 you must use last argument as narrowband.

Once you set this to the src property, the VideoIO attempts the connection to siprtmp gateway which in turn performs SIP registration on your behalf to the SIP server. Unregistration is done by resetting the property to null when you click on the reset button. This also disconnect the VideoIO application from the siprtmp gateway.

The following Javascript code shows how to set the src property using other attributes:

<script>
var server = "localhost";
var user = "kunweb@localhost";
var authname = "kunweb";
var authpass = "mypass";
var displayname = "Kundan Singh";
var rate = 8;
var rate_name = "narrowband";
var src_value = "rtmp://" +server+ "/sip/" +user+ "?rate=" +rate+ "&bidirection=true"
      + "&arg=" +authname+ "&arg=" +authpass+ "&arg=" +displayname+ "&arg=" +rate_name;
getFlashMovie('phone1').setProperty('src', src_value);
</script>

Call setup and termination

Once the VideoIO application is connected to the siprtmp gateway, you can invoke various API calls on the gateway server to initiate, accept, reject or terminate a call. Each connection represents a single SIP user agent on the gateway. The application provides methods and indications for various actions and events. For example, methods are available to initiate (invite) and terminate (bye) a call, accept or reject an incoming call, and to send touch-tone digits (sendDTMF) in a call. Indications are available for events: incoming call invitation (invited), call termination (byed), and outgoing call accepted or rejected. These methods and indications allow you to build your call control state in the application itself using Javascript.

To initiate an outbound call, use the call method of VideoIO as follows.

var dest_value = "sip:kunxlite@localhost";
getFlashMovie('phone1').callProperty('call', 'invite', dest_value);

This will request the siprtmp gateway to send a SIP call invitation to the given destination address. If your X-lite client is registered with sip:kunxlite@localhost then it will receive the call, which you can accept or reject. The result indication is dispatched to your Javascript application via the onCallback function as shown below.

function onCallback(event) {
    if (event.method == "accepted") {
        // call is accepted by destination
    } 
    if (event.method == "rejected") {
        // call is rejected by destination.
        // event.args[0] contains the reason for rejection.
    }
    ...
}

When the call is accepted, you should set the publish and play properties to start sending and receiving media. The siprtmp gateway's API requires the stream names to be local and remote for publish and play, respectively. Since we already set bidirection to true, we can set both publish and play stream names.

    ...
    if (event.method == "accepted") {
        getFlashMovie('phone1').setProperty('publish', 'local');
        getFlashMovie('phone1').setProperty('play', 'remote');
    } 
    ...

To terminate the call from web side, you can again use the call method as follows. When terminating the call, you should also reset the publish and play properties.

getFlashMovie('phone1').callProperty('call', 'bye');
getFlashMovie('phone1').setProperty('publish', null);
getFlashMovie('phone1').setProperty('play', null);

An inbound call set up works as follows. From your SIP user agent, call the user kunweb via that SIP server, so that the call is received by your web phone which registered with this user name. You will receive the incoming invited indication in onCallback Javascript function, which you can use to display the indication to the user. Similarly, cancellation of incoming invitation before the local web user accepts the call is indicated using the cancelled method.

    ...
    if (event.method == "invited") {
        // incoming call invitation received.
        // event.args[0] is caller address, e.g., sip:kunxlite@localhost
        // event.args[1] is callee address, e.g., sip:kunweb@localhost
    } 
    if (event.method == "cancelled") {
        // incoming call invitation is cancelled by caller.
        // event.args[0] is caller address, same as corresponding "invited".
        // event.args[1] is callee address, same as corresponding "invited".
    }
    ...

To accept or reject an incoming call invitation indicated by the invited event method, you can use the call method of VideoIO as follows.

getFlashMovie('phone1').callProperty('call', 'accept'); // To accept
getFlashMovie('phone1').callProperty('call', 'reject', '486 Busy Here'); // to reject

When rejecting a call, you can specify the SIP reason code as shown above. After accepting the call, the accepted indication is received by your onCallback Javascript function, similar to the outbound call scenario. As mentioned before, on accepted indication, you should set publish to local and play to remote so that your VideoIO's audio is connected with the call.

When the remote side terminates a call, you receive a byed indication. When the call is closed, you should reset the publish and play properties.

    if (event.method == "byed") {
        getFlashMovie('phone1').setProperty('publish', null);
        getFlashMovie('phone1').setProperty('play', null);
    }

Summary

Ideally, VideoIO should be used for either play or publish stream at a time, but not both. For example, in a multi-party video conference, your application has multiple instances of VideoIO, one with publish of local video and others with play of remote participants videos. Each instance of VideoIO has a connection to the server.

The existing SIP-RTMP gateway, siprtmp, defines an API where single connection to the server represents a single SIP user registration, and can be in at most one SIP call. The existing API does not allow co-ordination among multiple connections for the same call, where one connection represents publish stream and another represents play stream. Due to this reason, I modified VideoIO, so that it could have both play and publish stream in the same connection under this special scenario.

The new property, bidirection=true is needed to enable the modified behavior. However, only one of publish or play can be attached to the local displayed video object in VideoIO. Since the SIP-RTMP gateway will be used for interoperability with audio-only SIP devices or telephony gateways, I think this limitation is not a problem.

Additionally, I modified VideoIO to allow method invocation and event indication, so that it can be used to implement on top of the siprtmp API. The goal is to keep the SIP-specific changes to minimum in VideoIO, and implement all such SIP-specific call control behavior in your Javascript application and the siprtmp gateway.

We suggest you use the existing VideoPhone Flash application that comes with siprtmp to work with the gateway if possible. However, if you need to build your own front-end, e.g., white-labelled web-based phone, you should use the VideoIO application as described in this article.

In future, I will try to refactor siprtmp implementation, so that it separates the signaling from media, and allows co-ordinating among multiple RTMP connections for the same SIP call.