Voice over Internet Protocol (VOIP) is a general term for a family of transmission technologies for delivery of voice communications over IP networks such as the Internet or other packet switched networks. Other terms frequently encountered and synonymous with VOIP are IP telephony, Internet telephony, voice over broadband (VoBB), broadband telephony, and broadband phone.
Internet telephony refers to communications services — voice, facsimile, and/or voice-messaging applications — that are transported via the Internet, rather than the public switched telephone network (PSTN). The basic steps involved in originating an Internet telephone call are conversion of the analog voice signal to digital format and compression/translation of the signal into Internet protocol (IP) packets for transmission over the Internet; the process is reversed at the receiving end.
VOIP systems employ session control protocols to control the set-up and tear-down of calls as well as audio codecs which encode speech allowing transmission over an IP network as digital audiovia an audio stream. Codec use is varied between different implementations of VOIP (and often a range of codecs are used); some implementations rely on narrowband and compressedspeech, while others support high fidelitystereo codecs.
VOIP can be a benefit for reducing communication and infrastructure costs. Examples include:
VOIP can facilitate tasks and provide services that may be more difficult to implement using the PSTN. Examples include:
By default, IP routers handle traffic on a first-come, first-served basis. When a packet is routed to a link where another packet is already being sent, the router holds it on a queue. Should additional traffic arrive faster than the queued traffic can be sent, the queue will grow. If VOIP packets have to wait their turn in a long queue, intolerable latency may result.One way to avoid this problem is to simply ensure that the links are fast enough so that queues never build even in the worst case. This usually requires additional mechanisms to limit the amount of traffic entering the network, and for voice traffic this is usually done by limiting the number of simultaneous calls. Another approach is to use quality-of-service (QoS) mechanisms such as Diffserv to give priority to VOIP packets and other latency-sensitive traffic so they can "jump the line" and be transmitted ahead of any bulk data packets already in the queue. This can work quite well when voice constitutes a relatively small fraction of the total network load, as it usually does in today's Internet. Generally a VOIP packet still has to wait for the current packet to finish transmission; although it is possible to pre-empt (abort) a less important packet in mid-transmission, this is not commonly done, especially on high speed links where transmission times are small even for maximum-sized packets. An alternative to pre-emption on slower links, such as dialup and DSL, is to reduce the maximum transmission time by reducing the maximum transmission unit. But every packet must contain protocol headers, so this increases relative header overhead on every link along that user's Internet paths, not just the bottleneck link (which is usually his Internet access link.) ADSL modems invariably provide Ethernet (or Ethernet over USB) connections to local equipment, but inside they are actually ATM modems. They use AAL5 to segment each Ethernet packet into a series of 48-byte ATM cells for transmission and reassemble them back into Ethernet packets at the receiver. A virtual circuit identifier (VCI) is part of the 5-byte header on every ATM cell, so the transmitter can multiplex the active VCs in any arbitrary order. (Cells from the same VC are always sent sequentially.)However, the great majority of DSL providers use only one VC for each customer, even those with bundled VOIP service. Every Ethernet packet must be completely transmitted before another can begin. If a second PVC were established, given high priority and reserved for VOIP, then a low priority data packet could be suspended in mid-transmission and a VOIP packet sent right away on the high priority VC. Then the link would pick up the low priority VC where it left off. Because ATM links are multiplexed on a cell-by-cell basis, a high priority packet would have to wait at most 53 byte times to begin transmission. There would be no need to reduce the interface MTU and accept the resulting increase in higher layer protocol overhead, and no need to abort a low priority packet and resend it later. It should be noted that this doesn't come for free. ATM has substantial header overhead: 5/53 = 9.4%, roughly twice the total header overhead of a 1500 byte TCP/IP/Ethernet packet (with TCP timestamps). This "ATM tax" is incurred by every DSL user whether or not he takes advantage of multiple virtual circuits - and few can.ATM's potential for latency reduction is greatest on slow links, because worst-case latency decreases with increasing link speed. A full-size (1500 byte) Ethernet frame takes 94 ms to transmit at 128 kb/s but only 8 ms at 1.5 Mb/s. If this is the bottleneck link, this latency is probably small enough to ensure good VOIP performance without MTU reductions or multiple ATM PVCs. The latest generations of DSL, VDSL and VDSL2, carry Ethernet without intermediate ATM/AAL5 layers, and they generally support IEEE 802.1p priority tagging so that VOIP can be queued ahead of less time-critical traffic. Voice, and all other data, travel in packets over IP networks with fixed maximum capacity. This system is more prone to congestion and DoS attacks than traditional circuit switched systems; a circuit switched system of insufficient capacity will refuse new connections while carrying the remainder without impairment, while the quality of real-time data such as telephone conversations on packet-switched networks degrades dramatically. Fixed delays cannot be controlled as they are caused by the physical distance the packets travel. They are especially problematic when satellite circuits are involved because of the long distance to a geostationary satellite and back; delays of 400-600 ms are typical. When the load on a link grows so quickly that its queue overflows, congestion results and data packets are lost. This signals a transport protocol like TCPto reduce its transmission rate to alleviate the congestion. But VOIP usually does not use TCP because recovering from congestion through retransmission usually entails too much latency. So QoS mechanisms can avoid the undesirable loss of VOIP packets by immediately transmitting them ahead of any queued bulk traffic on the same link, even when that bulk traffic queue is overflowing. The receiver must resequence IP packets that arrive out of order and recover gracefully when packets arrive too late or not at all. Jitter results from the rapid and random (i.e., unpredictable) changes in queue lengths along a given Internet path due to competition from other users for the same transmission links. VOIP receivers counter jitter by storing incoming packets briefly in a "de-jitter" or "playout" buffer, deliberately increasing latency to increase the chance that each packet will be on hand when it's time for the voice engine to play it. The added delay is thus a compromise between excessive latency and excessive dropout, i.e., momentary audio interruptions. Although jitter is a random variable, it is the sum of several other random variables that are at least somewhat independent: the individual queuing delays of the routers along the Internet path in question. Thus according to the central limit theorem, we can model jitter as a gaussian random variable. This suggests continually estimating the mean delay and its standard deviation and setting the playout delay so that only packets delayed more than several standard deviations above the mean will arrive too late to be useful. In practice, however, the variance in latency of many Internet paths is dominated by a small number (often one) relatively slow and congested "bottleneck" link(s). Most Internet backbone links are now so fast (e.g., 10 Gb/s) that their delays are dominated by the transmission medium (i.e., optical fiber) and the routers driving them do not have enough buffering for queuing delays to be significant.It has been suggested to rely on the packetized nature of media in VOIP communications and transmit the stream of packets from the source phone to the destination phone simultaneously across different routes (multi-path routing). In such a way, temporary failures have less impact on the communication quality. In capillary routing it has been suggested to use at the packet level Fountain codes or particularly raptor codes for transmitting extra redundant packets making the communication more reliable. A number of protocols have been defined to support the reporting of QoS/QoE for VOIP calls. These include RTCP Extended Report (RFC 3611), SIP RTCP Summary Reports, H.460.9 Annex B (for H.323), H.248.30 and MGCP extensions. The RFC 3611 VOIP Metrics block is generated by an IP phone or gateway during a live call and contains information on packet loss rate, packet discard rate (because of jitter), packet loss/discard burst metrics (burst length/density, gap length/density), network delay, end system delay, signal / noise / echo level, Mean Opinion Scores (MOS) and R factors and configuration information related to the jitter buffer. RFC 3611 VOIP metrics reports are exchanged between IP endpoints on an occasional basis during a call, and an end of call message sent via SIP RTCP Summary Report or one of the other signaling protocol extensions. RFC 3611 VOIP metrics reports are intended to support real time feedback related to QoS problems, the exchange of information between the endpoints for improved call quality calculation and a variety of other applications.
A number of protocols that deal with the data link layer and physical layerinclude quality-of-service mechanisms that can be used to ensure that applications like VOIP work well even in congested scenarios. Some examples include: