VoLTE Speech Codec Scheme and Speech Traffic Model

VoLTE uses the adaptive multi-rate (AMR) or enhanced voice services (EVS) speech codec scheme.

VoLTE Speech Codec Scheme

AMR

AMR is an audio data compression scheme optimized for speech coding. It is widely used in GERAN and UTRAN. There are AMR wideband (AMR-WB) and AMR narrowb (AMR-NB) schemes. For details about AMR-WB and AMR-NB, see 3GPP TS 26.201 and 3GPP TS 26.101, respectively. The following are the voice coding rates supported b these schemes:

Voice coding rates (kbit/s) supported by AMR-WB: 6.6, 8.85, 12.65, 14.25, 15.85, 18.25, 19.85, 23.05, and 23.85.

Voice coding rates (kbit/s) supported by AMR-NB: 4.75, 5.15, 59, 6.7, 7.4, 7.95, 10.2, and 12.2.

Normally, we correspond AMR-NB to AMR in 3GPP specifications.

EVS

EVS, introduced in 3GPP TS 26.445 in September 2014, is a next-generation voice codec for HD VoLTE (Voice over LTE). It offers better voice quality and compression efficiency than its predecessor, AMR-WB, supporting various bandwidth schemes:

EVS Bandwidth Schemes and Supported Bitrates (kbit/s):

  • EVS-NB (Narrowband):
    5.9, 7.2, 8.0, 9.6, 13.2, 16.4, 24.4
  • EVS-WB (Wideband):
    5.9, 7.2, 8.0, 9.6, 13.2, 16.4, 24.4, 32, 48, 64, 96, 128
  • EVS-SWB (Super Wideband):
    9.6, 13.2, 16.4, 24.4, 32, 48, 64, 96, 128
  • EVS-FB (Fullband):
    16.4, 24.4, 32, 48, 64, 96, 128

VoLTE Speech Traffic Model

Figure below illustrates the speech traffic model when either AMR or EVS is used.

VoLTE Speech Traffic Model

The VoLTE speech traffic model defines how voice data is transmitted during a call when using either the Adaptive Multi-Rate (AMR) codec or the Enhanced Voice Services (EVS) codec. The choice between AMR and EVS is made through negotiation between the user equipment (UE) and the IP Multimedia Subsystem (IMS) at the start of the call, depending on network capabilities and device support.

During active speech, known as “talk spurts,” the UE sends or receives voice frames at regular intervals of 20 milliseconds. The size of each voice frame is determined by the codec in use and the selected voice coding rate. Higher coding rates typically offer better audio quality but require more bandwidth.

When there is no speech detected, the communication enters a “silent period.” During this time, the UE sends or receives Silence Insertion Descriptor (SID) frames instead of full voice frames. These SID frames are much smaller and help reduce network load while preserving background noise consistency and maintaining the perception of a continuous call. For AMR-coded calls, the SID frame size is fixed at 56 bits. For EVS-coded calls, the SID frame size is slightly larger, at 64 bits.

Leave a Comment

error: Content is Protected.