Design Guidelines

This page documents high-level guidelines for the project. Instead of enforcing a single system architecture, students are encouraged to evaluate several alternatives and come up with what works best. Please remember that unlike a commercial product, this is a software research project with more emphasis on innovation, experimentation and learning.

Overall Architecture

There are several alternatives that come to mind [see platform options]. For example, in a client-server architecture, the web browser acts as a client and runs the Flash application that presents the communication GUI to the user. The client talks to a backend server using available protocol, e.g., HTTP for control and RTMP for media data. The backend server in turn provides the communication API that allows the web applications to incorporate multimedia communication within the browser. For scalability and robustness, the servers self-organize themselves in a server farm, interact with each other as well as with the outside world, e.g., SIP trunking to PSTN network for PC-to-phone calls. For further flexibility, a user can run the server application on his computer, where his local server application connects with other peers' server applications. This allows the system to work in an peer-to-peer infrastructure without managed servers. In the end, the architecture could become a hybrid mix of managed client-server and ad hoc peer-to-peer systems.

  1. Client-server vs peer-to-peer signaling: What is the tradeoff between traditional client-server (e.g., HTTP, SIP, XMPP) and peer-to-peer (e.g., Skype) signaling for VoIP?
  2. Relayed vs end-to-end media: End-to-end media is the basic assumption in real-time interactive multimedia applications. Unfortunately, it is not always possible because of lack of UDP socket in Flash Player, and UDP blocking firewalls or symmetric NATs in the network. What is the tradeoff? When is end-to-end media possible?
  3. NAT and firewall traversal: There are several alternatives in theory, e.g., ICE and HIP. Which works best in practice? Which mechanism is the least intrusive to applications? Which can work with least modifications to other elements? What is the ideal solution?
  4. Platform choice: What are the available platform options? Should we aim for a particular browser or make it browser independent? How does multi-devices communication happen?

Protocol Choice

There are many protocols invented in multimedia communications space; some of which are competing [see article]. Many issues have been researched and solved in these protocols. In our web communications project, we plan to use HTTP as the default protocol as much as possible because it just works on existing Internet without much hassle. Unfortunately, HTTP lacks some key elements crucial for interactive web communication applications. There are extensions and work-arounds available to facilitate server-triggered events, e.g., COMET, BOSH. Another drawback of HTTP is that it works over TCP and hence not suitable for real-time interactive multimedia communication. The same applies for RTMP. Currently RTP has been used as the standard media transport protocol. It can also be secured using SRTP but requires out-of-band session key delivery.

  1. What works: In general, pick a protocol that has worked well for the given scenario in the past. Never pick a protocol that cannot be easily extended if you need to add something to it. The focus should not be the choice of protocol, but the overall project. The project may incorporate multiple protocols to do the same task, to accommodate different types of clients.
  2. Avoid overuse: Avoid abusing a protocol for something that it was not meant to do. For example, HTTP is not meant to transport real-time interactive multimedia.
  3. Media transport: Is it possible to make the media transport protocol adaptable? Is there an standard to facilitate efficient transport of multiple media types such as audio, video, screen sharing and file transfer between two parties?
  4. Data format: There are many available options for data format, e.g., XML, JSON, ASN.1, custom type-length-value. For web applications XML and JSON are more common.
  5. Resource (REST) vs Procedure (RPC): A RESTful API has certain advantages over traditional RPC style applications. However, REST is not suitable for all applications, e.g., event-based.

I think the best advice to the students is to avoid the hammer and nail problem: if you hold a hammer, you tend to see all problems as nail and try to fix it with your hammer -- if you understand a protocol or data format, you tend to apply it to all applications.

Usability

Usability is about presentation and giving choices to the end-user, carefully! Often the end-users do not read the user manual, do not know what they expect from the product, get confused if presented with too many choices, but still like to be in control of what they do. Making a user interface that pleases everyone is almost impossible. The same applies to APIs.

  1. Pleasant user interface: the user interface should appeal to the user. It should not be too congested. It should be very intuitive.
  2. Alternative mechanisms: if it is possible to use the system using multiple mechanisms, e.g., embedding on a web page, installing the application or adding a browser extension, aim to implement multiple mechanisms. Let the user feedback drive the future of the product.
  3. Expertise level: if needed provide different levels of user interface -- basic vs expert -- to accommodate your complex user interface controls.