Motivation: Server Architecture

This page presents the motivation for a server architecture for scalable, robust and interoperable multimedia communication on the web. Instead of inventing new protocols, the architecture uses the existing protocols and technologies that have worked best and used extensively for a given application scenario on the current Internet. This enables easy adoption of the architecture.

Multimedia communication is a topic that covers many dimensions, uses many protocols, and has been attempted from several different application scenarios. Earlier protocols such as ITU-T’s H.323 or IETF’s SIP were invented in mid 90’s and, in practice, catered to voice-over-IP (VoIP) applications trying to replace or work with the traditional public switched telephone network (PSTN). Traditional room based video conferencing systems use ITU-T’s H.32x series of protocols, whereas SIP has dominated the Internet telephony and VoIP inter-carrier communication space. Internet telephony is closely related to other synchronous communication mechanisms such as presence and instant messaging. The Jabber community’s XMPP is used by vast majority of presence and instant messaging application on the Internet. Because of the simplicity and availability on XMPP on web platforms, many recent web-based communication systems also use this protocol. Rich Internet Applications (RIA) based on Adobe’s Flash Player browser plugin dominate the web-video application space. Finally, peer-to-peer Skype dominates the PC-to-PC audio communication on the Internet.

There have been several attempts within individual protocol family to import additional features from other protocols to make it “complete”. Such attempts have largely been unsuccessful so far. For example, the SIP community proposed SIMPLE for instant messaging and presence, but hasn’t seen wide acceptance. Skype works well for audio communication but still needs to use SIP for PC-to-phone scenarios. The complexity of H.323 has made it difficult for vast majority of developers outside of giant companies to experiment with new ideas. On the other hand, most recent startups and small companies in the web 2.0 arena want to get applications running quickly, even if it means they put together a dirty hack or something that works without having to deal with maintenance and development cost of existing protocols. For example, recent web-based video communication applications use Adobe’s Flash Player and RTMP/RTMFP. These cannot easily interwork with existing VoIP systems without using a SIP gateway, or existing presence and instant messaging infrastructure without using XMPP. The following table enumerates our views on strengths and weaknesses of several protocols and technologies.

Protocol Popular Example Strength Weakness From browser?
HTTP Apache, Tomcat, Browser Scalable, robust, RESTful Server configuration Yes
RTMP FMS, Red5, Flash Player Less development effort Only TCP available, client-server only Yes
RTMFP FMS, Flash Player P2P possible Dependency on FMS, P2P not always possible Yes
XMPP Ejabberd, Adium, Gtalk Less development effort Multimedia, client-server Yes
SIP SER, pjsip VoIP, scalable NAT and firewall traversal No
SIMPLE Ag-projects Integrates with SIP Adoption, complexity No
Jingle Gtalk Integrates with XMPP NAT and firewall traversal No
H.323 Netmeeting Comprehensive Complexity No
Skype Skype Just works, scalable Proprietary protocol No

The main contribution of this project is to simplify the architecture, maintenance and configuration of servers so that it is very easy to set up the web multimedia communication infrastructure. A system that needs co-ordination and communication among several different servers is prone to huge maintenance and development effort. This proposal presents distributed server architecture in a box that supports various protocols and technologies that are needed to quickly start a web multimedia communication service.

The servers create a self-organizing farm of nodes that facilitate scalability and robustness of the system. Each server in the farm implements a variety of protocols. Thus we move away from the traditional notion of one-server-per-protocol. Each server communicates with bunch of clients as well as other servers in the farm. A web-based client uses HTTP, XMPP and/or RTMP to communicate with any server. A VoIP client typically uses SIP, and an instant messenger uses XMPP to communicate with the server. The servers use SIP to communicate with each other, since SIP supports peer-to-peer mode unlike imposing strict client-server architecture or RTMP or XMPP.

What problem does it solve?

Existing web multimedia communication systems have limitations as discussed before. However, the most important limitation is the complexity of the overall system. Typically, a real implementation has a JavaScript or Flash based client that shows the user interface for contact list, instant message communication and audio/video conferences. The client connects to multiple backend servers, e.g., a Jabber or XMPP server to handle the presence and instant messaging communication, a web server to augment the system with APIs and accessing additional user information such as profile pictures, and a media server such as Flash media server (FMS or Red5) that supports RTMP to host the real-time video conference. Finally there are backend database servers that need to be scaled, and some file system that stores call recording or video messages and need to be backed up. The system complexity manifests itself in terms of costs in system maintenance, development and testing. The system administrator has to maintain multiple servers and their redundancy. Many system administrators prefer tested commercial servers but has expensive annual licensing fee as well. The system developers have to incorporate communication among these different servers – either have the client talk to individual server or have a server talk to another server. Finally, due to lack of standards in communicating among these different servers or protocols, developers build non-interoperable and proprietary applications.

Now imagine a single server piece that handles all the web multimedia communication requirements, and automatically configures into a server farm for scalability and robustness. Also imagine ready-made Flash and JavaScript components for common tasks such as contact list, instant messaging communication and video conference room, which automatically discover and connect to one or more server pieces. The proposed system will facilitate such applications with easy configuration and application development, and the best part is that it will use open and popular standards for each task.