One question we frequently get from our clients concerns voice quality; specifically, what is the difference between audio codecs? Some even wonder about high definition voice or HD voice. This is not surprising since voice quality is a key element to provide a good user experience for any VoIP service. IPsmarx Softswitch technology supports codecs like G.711 as well as more sophisticated compressing codecs like G.729 and G.723.

Telephony is a system of telecommunications in which telephonic equipment is employed in the transmission of speech or other sound between points. Sound is transformed to travel from origination to destination. How sound is transformed is key to understanding audio codecs and voice quality. The term “codec” is actually a combination of the words “coder-decoder” since conversion is bi-directional, involving at once coding and decoding.

Traditional telephony and even digital telephony standards are constrained by dated standards. Hundreds of codecs are currently in use but a few are particularly widespread. In ITU-speak, the codec known as G.711 is the most ubiquitous in North America, especially as it is the normative codec for traditional circuit-switched telephony while important also for VoIP. Various versions of “high-compression” codecs G.729 and G.723, among others, are also widely used for VoIP.

A major issue is that older standards limit the range of audio frequencies to ~300Hz to 3400Hz resulting in audio clarity issues like:

• Difficulty recognizing sounds like “s” and “f”
• Problems distinguishing “m” from “n” and “p” from “t”
• Inability to hear the fundamental resonances in spoken vowels

Codecs typically vary by many factors like the required bandwidth, the required processing power and memory of the systems handling the call, the types of audio supported by each point, etc. One of the most important dimensions on which codecs differ is “compression;” this is how much the digital data representing the voice is reduced in size to lessen the bandwidth required for transmission. There is often some trade-off between lower bandwidth usage (higher compression) and optimal voice quality. G.711, the longstanding standard for landline telephony, uses minimal compression. On the other hand, G.729 dramatically reduces the number of packets and bits needed so able to transmit more channels within given bandwidth constraints. G.711 may use three to five times as much bandwidth to send the same conversation as such high-compression codecs. G.729 is considered relatively high-quality for high-compression codec and is popular for teleconferencing and visual telephony as well as VoIP and wireless applications while using much less bandwidth than G.711.

Voice quality is a significant differentiator as competition becomes more intense among service providers, enterprises and mobile operators. Besides compression, Delay (especially exceeding 100 milliseconds), jitter (delay variation between packets) and lost and/or damaged packets are important factors affecting the experience during a call.

The process of transcoding (coding and decoding) is very demanding on computer memory and processing power. There is a significant difference between software based transcoding and hardware based transcoding. As a VoIP service grows and more concurrent calls are required, it is important to consider increasing the power of the softswitch server of other alternatives like an SBC (that will also help with security) and hardware base transcoding like transcoding cards or appliances.