1 #LyX 1.5.0 created this file. For more info see http://www.lyx.org/
10 \font_typewriter courier
11 \font_default_family default
19 \papersize letterpaper
25 \paperorientation portrait
32 \paragraph_separation indent
34 \quotes_language english
37 \paperpagestyle headings
38 \listings_params "basicstyle={\ttfamily},breaklines=true,language=C,xleftmargin=0mm"
39 \tracking_changes false
48 The Speex Codec Manual
57 \begin_layout Standard
63 \begin_layout Standard
68 \begin_layout Standard
77 2002-2007 Jean-Marc Valin/Xiph.org Foundation
80 \begin_layout Standard
81 Permission is granted to copy, distribute and/or modify this document under
82 the terms of the GNU Free Documentation License, Version 1.1 or any later
83 version published by the Free Software Foundation; with no Invariant Section,
84 with no Front-Cover Texts, and with no Back-Cover.
85 A copy of the license is included in the section entitled "GNU Free Documentati
90 \begin_layout Standard
94 \begin_inset LatexCommand tableofcontents
103 \begin_layout Standard
104 \begin_inset FloatList table
113 \begin_layout Chapter
114 Introduction to Speex
117 \begin_layout Standard
120 http://www.speex.org/
122 ) exists because there is a need for a speech codec that is open-source
123 and free from software patent royalties.
124 These are essential conditions for being usable in any open-source software.
125 In essence, Speex is to speech what Vorbis is to audio/music.
126 Unlike many other speech codecs, Speex is not designed for mobile phones
127 but rather for packet networks and voice over IP (VoIP) applications.
128 File-based compression is of course also supported.
132 \begin_layout Standard
133 The Speex codec is designed to be very flexible and support a wide range
134 of speech quality and bit-rate.
135 Support for very good quality speech also means that Speex can encode wideband
136 speech (16 kHz sampling rate) in addition to narrowband speech (telephone
137 quality, 8 kHz sampling rate).
140 \begin_layout Standard
141 Designing for VoIP instead of mobile phones means that Speex is robust to
142 lost packets, but not to corrupted ones.
143 This is based on the assumption that in VoIP, packets either arrive unaltered
144 or don't arrive at all.
145 Because Speex is targeted at a wide range of devices, it has modest (adjustable
146 ) complexity and a small memory footprint.
149 \begin_layout Standard
150 All the design goals led to the choice of CELP
151 \begin_inset LatexCommand index
156 as the encoding technique.
157 One of the main reasons is that CELP has long proved that it could work
158 reliably and scale well to both low bit-rates (e.g.
159 DoD CELP @ 4.8 kbps) and high bit-rates (e.g.
164 \begin_layout Section
166 \begin_inset LatexCommand label
167 name "sec:Getting-help"
174 \begin_layout Standard
175 As for many open source projects, there are many ways to get help with Speex.
179 \begin_layout Itemize
183 \begin_layout Itemize
184 Other documentation on the Speex website (http://www.speex.org/)
187 \begin_layout Itemize
188 Mailing list: Discuss any Speex-related topic on speex-dev@xiph.org (not
192 \begin_layout Itemize
193 IRC: The main channel is #speex on irc.freenode.net.
194 Note that due to time differences, it may take a while to get someone,
195 so please be patient.
198 \begin_layout Itemize
199 Email the author privately at jean-marc.valin@usherbrooke.ca
203 for private/delicate topics you do not wish to discuss publically.
206 \begin_layout Standard
207 Before asking for help (mailing list or IRC),
209 it is important to first read this manual
211 (OK, so if you made it here it's already a good sign).
212 It is generally considered rude to ask on a mailing list about topics that
213 are clearly detailed in the documentation.
214 On the other hand, it's perfectly OK (and encouraged) to ask for clarifications
215 about something covered in the manual.
216 This manual does not (yet) cover everything about Speex, so everyone is
217 encouraged to ask questions, send comments, feature requests, or just let
218 us know how Speex is being used.
222 \begin_layout Standard
223 Here are some additional guidelines related to the mailing list.
224 Before reporting bugs in Speex to the list, it is strongly recommended
225 (if possible) to first test whether these bugs can be reproduced using
226 the speexenc and speexdec (see Section
227 \begin_inset LatexCommand ref
228 reference "sec:Command-line-encoder/decoder"
232 ) command-line utilities.
233 Bugs reported based on 3rd party code are both harder to find and far too
234 often caused by errors that have nothing to do with Speex.
238 \begin_layout Section
242 \begin_layout Standard
243 This document is divided in the following way.
245 \begin_inset LatexCommand ref
246 reference "sec:Feature-description"
250 describes the different Speex features and defines many basic terms that
251 are used throughout this manual.
253 \begin_inset LatexCommand ref
254 reference "sec:Command-line-encoder/decoder"
258 documents the standard command-line tools provided in the Speex distribution.
260 \begin_inset LatexCommand ref
261 reference "sec:Programming-with-Speex"
265 includes detailed instructions about programming using the libspeex
266 \begin_inset LatexCommand index
273 \begin_inset LatexCommand ref
274 reference "sec:Formats-and-standards"
278 has some information related to Speex and standards.
282 \begin_layout Standard
283 The three last sections describe the algorithms used in Speex.
284 These sections require signal processing knowledge, but are not required
285 for merely using Speex.
286 They are intended for people who want to understand how Speex really works
287 and/or want to do research based on Speex.
289 \begin_inset LatexCommand ref
290 reference "sec:Introduction-to-CELP"
294 explains the general idea behind CELP, while sections
295 \begin_inset LatexCommand ref
296 reference "sec:Speex-narrowband-mode"
301 \begin_inset LatexCommand ref
302 reference "sec:Speex-wideband-mode"
306 are specific to Speex.
309 \begin_layout Standard
315 \begin_layout Chapter
317 \begin_inset LatexCommand label
318 name "sec:Feature-description"
325 \begin_layout Standard
326 This section describes Speex and its features into more details.
329 \begin_layout Section
333 \begin_layout Standard
334 Before introducing all the Speex features, here are some concepts in speech
335 coding that help better understand the rest of the manual.
336 Although some are general concepts in speech/audio processing, others are
340 \begin_layout Subsection*
342 \begin_inset LatexCommand index
350 \begin_layout Standard
351 The sampling rate expressed in Hertz (Hz) is the number of samples taken
352 from a signal per second.
353 For a sampling rate of
354 \begin_inset Formula $F_{s}$
357 kHz, the highest frequency that can be represented is equal to
358 \begin_inset Formula $F_{s}/2$
362 \begin_inset Formula $F_{s}/2$
365 is known as the Nyquist frequency).
366 This is a fundamental property in signal processing and is described by
367 the sampling theorem.
368 Speex is mainly designed for three different sampling rates: 8 kHz, 16
370 These are respectively refered to as narrowband
371 \begin_inset LatexCommand index
377 \begin_inset LatexCommand index
383 \begin_inset LatexCommand index
384 name "ultra-wideband"
392 \begin_layout Subsection*
396 \begin_layout Standard
397 When encoding a speech signal, the bit-rate is defined as the number of
398 bits per unit of time required to encode the speech.
408 It is important to make the distinction between
441 \begin_layout Subsection*
443 \begin_inset LatexCommand index
451 \begin_layout Standard
452 Speex is a lossy codec, which means that it achives compression at the expense
453 of fidelity of the input speech signal.
454 Unlike some other speech codecs, it is possible to control the tradeoff
455 made between quality and bit-rate.
456 The Speex encoding process is controlled most of the time by a quality
457 parameter that ranges from 0 to 10.
459 \begin_inset LatexCommand index
460 name "constant bit-rate"
464 (CBR) operation, the quality parameter is an integer, while for variable
465 bit-rate (VBR), the parameter is a float.
469 \begin_layout Subsection*
471 \begin_inset LatexCommand index
479 \begin_layout Standard
480 With Speex, it is possible to vary the complexity allowed for the encoder.
481 This is done by controlling how the search is performed with an integer
482 ranging from 1 to 10 in a way that's similar to the -1 to -9 options to
491 compression utilities.
492 For normal use, the noise level at complexity 1 is between 1 and 2 dB higher
493 than at complexity 10, but the CPU requirements for complexity 10 is about
494 5 times higher than for complexity 1.
495 In practice, the best trade-off is between complexity 2 and 4, though higher
496 settings are often useful when encoding non-speech sounds like DTMF
497 \begin_inset LatexCommand index
505 \begin_layout Subsection*
507 \begin_inset LatexCommand index
508 name "variable bit-rate"
515 \begin_layout Standard
516 Variable bit-rate (VBR) allows a codec to change its bit-rate dynamically
518 \begin_inset Quotes eld
522 \begin_inset Quotes erd
525 of the audio being encoded.
526 In the example of Speex, sounds like vowels and high-energy transients
527 require a higher bit-rate to achieve good quality, while fricatives (e.g.
528 s,f sounds) can be coded adequately with less bits.
529 For this reason, VBR can achive lower bit-rate for the same quality, or
530 a better quality for a certain bit-rate.
531 Despite its advantages, VBR has two main drawbacks: first, by only specifying
532 quality, there's no guaranty about the final average bit-rate.
533 Second, for some real-time applications like voice over IP (VoIP), what
534 counts is the maximum bit-rate, which must be low enough for the communication
538 \begin_layout Subsection*
540 \begin_inset LatexCommand index
541 name "average bit-rate"
548 \begin_layout Standard
549 Average bit-rate solves one of the problems of VBR, as it dynamically adjusts
550 VBR quality in order to meet a specific target bit-rate.
551 Because the quality/bit-rate is adjusted in real-time (open-loop), the
552 global quality will be slightly lower than that obtained by encoding in
553 VBR with exactly the right quality setting to meet the target average bit-rate.
556 \begin_layout Subsection*
557 Voice Activity Detection
558 \begin_inset LatexCommand index
559 name "voice activity detection"
566 \begin_layout Standard
567 When enabled, voice activity detection detects whether the audio being encoded
568 is speech or silence/background noise.
569 VAD is always implicitly activated when encoding in VBR, so the option
570 is only useful in non-VBR operation.
571 In this case, Speex detects non-speech periods and encode them with just
572 enough bits to reproduce the background noise.
574 \begin_inset Quotes eld
577 comfort noise generation
578 \begin_inset Quotes erd
584 \begin_layout Subsection*
585 Discontinuous Transmission
586 \begin_inset LatexCommand index
587 name "discontinuous transmission"
594 \begin_layout Standard
595 Discontinuous transmission is an addition to VAD/VBR operation, that allows
596 to stop transmitting completely when the background noise is stationary.
597 In file-based operation, since we cannot just stop writing to the file,
598 only 5 bits are used for such frames (corresponding to 250 bps).
601 \begin_layout Subsection*
602 Perceptual enhancement
603 \begin_inset LatexCommand index
604 name "perceptual enhancement"
611 \begin_layout Standard
612 Perceptual enhancement is a part of the decoder which, when turned on, attempts
613 to reduce the perception of the noise/distortion produced by the encoding/decod
615 In most cases, perceptual enhancement brings the sound further from the
621 considering only SNR), but in the end it still
625 better (subjective improvement).
628 \begin_layout Subsection*
629 Latency and algorithmic delay
630 \begin_inset LatexCommand index
631 name "algorithmic delay"
638 \begin_layout Standard
639 Every speech codec introduces a delay in the transmission.
640 For Speex, this delay is equal to the frame size, plus some amount of
641 \begin_inset Quotes eld
645 \begin_inset Quotes erd
648 required to process each frame.
649 In narrowband operation (8 kHz), the delay is 30 ms, while for wideband
650 (16 kHz), the delay is 34 ms.
651 These values don't account for the CPU time it takes to encode or decode
655 \begin_layout Section
659 \begin_layout Standard
660 The main characteristics of Speex can be summarized as follows:
663 \begin_layout Itemize
664 Free software/open-source
665 \begin_inset LatexCommand index
671 \begin_inset LatexCommand index
679 \begin_layout Itemize
680 Integration of narrowband
681 \begin_inset LatexCommand index
687 \begin_inset LatexCommand index
692 using an embedded bit-stream
695 \begin_layout Itemize
696 Wide range of bit-rates available (from 2.15 kbps to 44 kbps)
699 \begin_layout Itemize
700 Dynamic bit-rate switching (AMR) and Variable Bit-Rate
701 \begin_inset LatexCommand index
702 name "variable bit-rate"
709 \begin_layout Itemize
710 Voice Activity Detection
711 \begin_inset LatexCommand index
712 name "voice activity detection"
716 (VAD, integrated with VBR) and discontinuous transmission (DTX)
719 \begin_layout Itemize
721 \begin_inset LatexCommand index
729 \begin_layout Itemize
730 Embedded wideband structure (scalable sampling rate)
733 \begin_layout Itemize
734 Ultra-wideband sampling rate at 32 kHz
737 \begin_layout Itemize
738 Intensity stereo encoding option
741 \begin_layout Itemize
742 Fixed-point implementation
745 \begin_layout Section
749 \begin_layout Standard
750 This part refers to the preprocessor module introduced in the 1.1.x branch.
751 The preprocessor is designed to be used on the audio
756 The preprocessor provides three main functionalities:
759 \begin_layout Itemize
763 \begin_layout Itemize
764 automatic gain control (AGC)
767 \begin_layout Itemize
768 voice activity detection (VAD)
771 \begin_layout Standard
772 The denoiser can be used to reduce the amount of background noise present
774 This provides higher quality speech whether or not the denoised signal
775 is encoded with Speex (or at all).
776 However, when using the denoised signal with the codec, there is an additional
778 Speech codecs in general (Speex included) tend to perform poorly on noisy
779 input, which tends to amplify the noise.
780 The denoiser greatly reduces this effect.
783 \begin_layout Standard
784 Automatic gain control (AGC) is a feature that deals with the fact that
785 the recording volume may vary by a large amount between different setups.
786 The AGC provides a way to adjust a signal to a reference volume.
787 This is useful for voice over IP because it removes the need for manual
788 adjustment of the microphone gain.
789 A secondary advantage is that by setting the microphone gain to a conservative
790 (low) level, it is easier to avoid clipping.
793 \begin_layout Standard
794 The voice activity detector (VAD) provided by the preprocessor is more advanced
795 than the one directly provided in the codec.
799 \begin_layout Section
800 Adaptive Jitter Buffer
803 \begin_layout Standard
804 When transmitting voice (or any content for that matter) over UDP or RTP,
805 packet may be lost, arrive with different delay, or even out of order.
806 The purpose of a jitter buffer is to reorder packets and buffer them long
807 enough (but no longer than necessary) so they can be sent to be decoded.
811 \begin_layout Section
812 Acoustic Echo Canceller
815 \begin_layout Standard
816 In any hands-free communication system (Fig.
818 \begin_inset LatexCommand ref
819 reference "fig:Acoustic-echo-model"
823 ), speech from the remote end is played in the local loudspeaker, propagates
824 in the room and is captured by the microphone.
825 If the audio captured from the microphone is sent directly to the remote
826 end, then the remove user hears an echo of his voice.
827 An acoustic echo canceller is designed to remove the acoustic echo before
828 it is sent to the remote end.
829 It is important to understand that the echo canceller is meant to improve
837 \begin_layout Standard
838 \begin_inset Float figure
843 \begin_layout Standard
847 \begin_layout Standard
857 \begin_inset Graphics
858 filename echo_path.eps
867 \begin_layout Standard
879 \begin_layout Standard
882 \begin_layout Standard
884 \begin_inset LatexCommand label
885 name "fig:Acoustic-echo-model"
902 \begin_layout Section
906 \begin_layout Standard
907 In some cases, it may be useful to convert audio from one sampling rate
909 There are many reasons for that.
910 It can be for mixing streams that have different sampling rates, for supporting
911 sampling rates that the soundcard doesn't support, for transcoding, etc.
912 That's why there is now a resampler that is part of the Speex project.
913 This resampler can be used to convert between any two arbitrary rates (the
914 ratio must only be a rational number) and there is control over the quality/com
918 \begin_layout Standard
924 \begin_layout Chapter
928 \begin_layout Standard
929 Compiling Speex under UNIX/Linux or any other platform supported by autoconf
931 Win32/cygwin) is as easy as typing:
934 \begin_layout LyX-Code
935 % ./configure [options]
938 \begin_layout LyX-Code
942 \begin_layout LyX-Code
946 \begin_layout Standard
947 The options supported by the Speex configure script are:
950 \begin_layout Description
951 --prefix=<path> Specifies the base path for installing Speex (e.g.
955 \begin_layout Description
956 --enable-shared/--disable-shared Whether to compile shared libraries
959 \begin_layout Description
960 --enable-static/--disable-static Whether to compile static libraries
963 \begin_layout Description
964 --disable-wideband Disable the wideband part of Speex (typically to save
968 \begin_layout Description
969 --enable-valgrind Enable extra hits for valgrind for debugging purposes
970 (do not use by default)
973 \begin_layout Description
974 --enable-sse Enable use of SSE instructions (x86/float only)
977 \begin_layout Description
979 \begin_inset LatexCommand index
984 Compile Speex for a processor that does not have a floating point unit
988 \begin_layout Description
989 --enable-arm4-asm Enable assembly specific to the ARMv4 architecture (gcc
993 \begin_layout Description
994 --enable-arm5e-asm Enable assembly specific to the ARMv5E architecture (gcc
998 \begin_layout Description
999 --enable-fixed-point-debug Use only for debugging the fixed-point
1000 \begin_inset LatexCommand index
1008 \begin_layout Description
1009 --enable-epic-48k Enable a special (and non-compatible) 4.8 kbps narrowband
1010 mode (broken in 1.1.x and 1.2beta)
1013 \begin_layout Description
1014 --enable-ti-c55x Enable support for the TI C5x family
1017 \begin_layout Description
1018 --enable-blackfin-asm Enable assembly specific to the Blackfin DSP architecture
1022 \begin_layout Description
1023 --enable-vorbis-psycho Make the encoder use the Vorbis psycho-acoustic model.
1024 This is very experimental and may be removed in the future.
1027 \begin_layout Section
1031 \begin_layout Standard
1032 Speex is known to compile and work on a large number of architectures, both
1033 floating-point and fixed-point.
1034 In general, any architecture that can natively compute the multiplication
1035 of two signed 16-bit numbers (32-bit result) and runs at a sufficient clock
1036 rate (architecture-dependent) is capable of running Speex.
1037 Architectures that are
1041 to be supported (it probably works on many others) are:
1044 \begin_layout Itemize
1048 \begin_layout Itemize
1052 \begin_layout Itemize
1056 \begin_layout Itemize
1060 \begin_layout Itemize
1064 \begin_layout Itemize
1068 \begin_layout Itemize
1072 \begin_layout Itemize
1073 TriMedia (experimental)
1076 \begin_layout Standard
1077 Operating systems on top of which Speex is known to work include (it probably
1078 works on many others):
1081 \begin_layout Itemize
1085 \begin_layout Itemize
1086 \begin_inset Formula $\mu$
1092 \begin_layout Itemize
1096 \begin_layout Itemize
1100 \begin_layout Itemize
1101 Other UNIX/POSIX variants
1104 \begin_layout Itemize
1108 \begin_layout Standard
1109 The source code directory include additional information for compiling on
1110 certain architectures or operating systems in README.xxx files.
1113 \begin_layout Standard
1119 \begin_layout Chapter
1120 Command-line encoder/decoder
1121 \begin_inset LatexCommand label
1122 name "sec:Command-line-encoder/decoder"
1129 \begin_layout Standard
1130 The base Speex distribution includes a command-line encoder (
1139 Those tools produce and read Speex files encapsulated in the Ogg container.
1140 Although it is possible to encapsulate Speex in any container, Ogg is the
1141 recommended container for files.
1142 This section describes how to use the command line tools for Speex files
1146 \begin_layout Section
1150 \begin_inset LatexCommand index
1158 \begin_layout Standard
1163 utility is used to create Speex files from raw PCM or wave files.
1164 It can be used by calling:
1167 \begin_layout LyX-Code
1168 speexenc [options] input_file output_file
1171 \begin_layout Standard
1172 The value '-' for input_file or output_file corresponds respectively to
1174 The valid options are:
1177 \begin_layout Description
1178 --narrowband\InsetSpace ~
1179 (-n) Tell Speex to treat the input as narrowband (8 kHz).
1183 \begin_layout Description
1184 --wideband\InsetSpace ~
1185 (-w) Tell Speex to treat the input as wideband (16 kHz)
1188 \begin_layout Description
1189 --ultra-wideband\InsetSpace ~
1190 (-u) Tell Speex to treat the input as
1191 \begin_inset Quotes eld
1195 \begin_inset Quotes erd
1201 \begin_layout Description
1202 --quality\InsetSpace ~
1203 n Set the encoding quality (0-10), default is 8
1206 \begin_layout Description
1207 --bitrate\InsetSpace ~
1208 n Encoding bit-rate (use bit-rate n or lower)
1211 \begin_layout Description
1212 --vbr Enable VBR (Variable Bit-Rate), disabled by default
1215 \begin_layout Description
1217 n Enable ABR (Average Bit-Rate) at n kbps, disabled by default
1220 \begin_layout Description
1221 --vad Enable VAD (Voice Activity Detection), disabled by default
1224 \begin_layout Description
1225 --dtx Enable DTX (Discontinuous Transmission), disabled by default
1228 \begin_layout Description
1229 --nframes\InsetSpace ~
1230 n Pack n frames in each Ogg packet (this saves space at low bit-rates)
1233 \begin_layout Description
1235 n Set encoding speed/quality tradeoff.
1236 The higher the value of n, the slower the encoding (default is 3)
1239 \begin_layout Description
1240 -V Verbose operation, print bit-rate currently in use
1243 \begin_layout Description
1248 \begin_layout Description
1249 --version\InsetSpace ~
1250 (-v) Print version information
1253 \begin_layout Subsection*
1257 \begin_layout Description
1258 --comment Add the given string as an extra comment.
1259 This may be used multiple times.
1263 \begin_layout Description
1264 --author Author of this track.
1268 \begin_layout Description
1269 --title Title for this track.
1273 \begin_layout Subsection*
1277 \begin_layout Description
1279 n Sampling rate for raw input
1282 \begin_layout Description
1283 --stereo Consider raw input as stereo
1286 \begin_layout Description
1287 --le Raw input is little-endian
1290 \begin_layout Description
1291 --be Raw input is big-endian
1294 \begin_layout Description
1295 --8bit Raw input is 8-bit unsigned
1298 \begin_layout Description
1299 --16bit Raw input is 16-bit signed
1302 \begin_layout Section
1306 \begin_inset LatexCommand index
1314 \begin_layout Standard
1319 utility is used to decode Speex files and can be used by calling:
1322 \begin_layout LyX-Code
1323 speexdec [options] speex_file [output_file]
1326 \begin_layout Standard
1327 The value '-' for input_file or output_file corresponds respectively to
1329 Also, when no output_file is specified, the file is played to the soundcard.
1330 The valid options are:
1333 \begin_layout Description
1334 --enh enable post-filter (default)
1337 \begin_layout Description
1338 --no-enh disable post-filter
1341 \begin_layout Description
1342 --force-nb Force decoding in narrowband
1345 \begin_layout Description
1346 --force-wb Force decoding in wideband
1349 \begin_layout Description
1350 --force-uwb Force decoding in ultra-wideband
1353 \begin_layout Description
1354 --mono Force decoding in mono
1357 \begin_layout Description
1358 --stereo Force decoding in stereo
1361 \begin_layout Description
1363 n Force decoding at n Hz sampling rate
1366 \begin_layout Description
1367 --packet-loss\InsetSpace ~
1368 n Simulate n % random packet loss
1371 \begin_layout Description
1372 -V Verbose operation, print bit-rate currently in use
1375 \begin_layout Description
1380 \begin_layout Description
1381 --version\InsetSpace ~
1382 (-v) Print version information
1385 \begin_layout Standard
1391 \begin_layout Chapter
1392 Programming with Speex
1393 \begin_inset LatexCommand label
1394 name "sec:Programming-with-Speex"
1401 \begin_layout Standard
1402 This section explains how to use the Speex API.
1403 Examples of code can also be found in Appendix
1404 \begin_inset LatexCommand ref
1405 reference "sec:Sample-code"
1409 and the complete API documentation is included in the Documentation section
1410 of the Speex website (http://www.speex.org/).
1413 \begin_layout Section
1419 \begin_inset LatexCommand index
1427 \begin_layout Standard
1432 library contains all the functions for encoding and decoding speech with
1434 When linking on a UNIX system, one must add
1438 to the compiler command line.
1439 One important thing to know is that
1441 libspeex calls are reentrant, but not thread-safe
1444 That means that it is fine to use calls from many threads, but
1446 calls using the same state from multiple threads must be protected by mutexes
1451 \begin_layout Subsection
1453 \begin_inset LatexCommand label
1461 \begin_layout Standard
1462 In order to encode speech using Speex, one first needs to:
1465 \begin_layout Standard
1466 \begin_inset listings
1470 \begin_layout Standard
1472 #include <speex/speex.h>
1477 Then in the code, a Speex bit-packing struct must be declared, along with
1478 a Speex encoder state:
1479 \begin_inset listings
1483 \begin_layout Standard
1488 \begin_layout Standard
1495 The two are initialized by:
1496 \begin_inset listings
1500 \begin_layout Standard
1502 speex_bits_init(&bits);
1505 \begin_layout Standard
1507 enc_state = speex_encoder_init(&speex_nb_mode);
1515 \begin_layout Standard
1516 For wideband coding,
1525 In most cases, you will need to know the frame size used at the sampling
1527 You can get that value in the
1531 variable (expressed in
1538 \begin_layout Standard
1539 \begin_inset listings
1543 \begin_layout Standard
1545 speex_encoder_ctl(enc_state,SPEEX_GET_FRAME_SIZE,&frame_size);
1553 \begin_layout Standard
1558 will correspond to 20 ms when using 8, 16, or 32 kHz sampling rate.
1559 There are many parameters that can be set for the Speex encoder, but the
1560 most useful one is the quality parameter that controls the quality vs bit-rate
1565 \begin_layout Standard
1566 \begin_inset listings
1570 \begin_layout Standard
1572 speex_encoder_ctl(enc_state,SPEEX_SET_QUALITY,&quality);
1581 is an integer value ranging from 0 to 10 (inclusively).
1582 The mapping between quality and bit-rate is described in Fig.
1584 \begin_inset LatexCommand ref
1585 reference "cap:quality_vs_bps"
1592 \begin_layout Standard
1593 Once the initialization is done, for every input frame:
1596 \begin_layout Standard
1597 \begin_inset listings
1601 \begin_layout Standard
1603 speex_bits_reset(&bits);
1606 \begin_layout Standard
1608 speex_encode_int(enc_state, input_frame, &bits);
1611 \begin_layout Standard
1613 nbBytes = speex_bits_write(&bits, byte_ptr, MAX_NB_BYTES);
1621 \begin_layout Standard
1634 pointing to the beginning of a speech frame,
1642 where the encoded frame will be written,
1646 is the maximum number of bytes that can be written to
1650 without causing an overflow and
1654 is the number of bytes actually written to
1658 (the encoded size in bytes).
1659 Before calling speex_bits_write, it is possible to find the number of bytes
1660 that need to be written by calling
1662 speex_bits_nbytes(&bits)
1664 , which returns a number of bytes.
1667 \begin_layout Standard
1668 It is still possible to use the
1672 function, which takes a
1677 However, this would make an eventual port to an FPU-less platform (like
1678 ARM) more complicated.
1687 are processed in the same way.
1688 Whether the encoder uses the fixed-point version is only decided by the
1689 compile-time flags, not at the API level.
1692 \begin_layout Standard
1693 After you're done with the encoding, free all resources with:
1696 \begin_layout Standard
1697 \begin_inset listings
1701 \begin_layout Standard
1703 speex_bits_destroy(&bits);
1706 \begin_layout Standard
1708 speex_encoder_destroy(enc_state);
1716 \begin_layout Standard
1717 That's about it for the encoder.
1721 \begin_layout Subsection
1723 \begin_inset LatexCommand label
1731 \begin_layout Standard
1732 In order to decode speech using Speex, you first need to:
1733 \begin_inset listings
1737 \begin_layout Standard
1739 #include <speex/speex.h>
1744 You also need to declare a Speex bit-packing struct
1745 \begin_inset listings
1749 \begin_layout Standard
1756 and a Speex decoder state
1757 \begin_inset listings
1761 \begin_layout Standard
1768 The two are initialized by:
1769 \begin_inset listings
1773 \begin_layout Standard
1775 speex_bits_init(&bits);
1778 \begin_layout Standard
1780 dec_state = speex_decoder_init(&speex_nb_mode);
1788 \begin_layout Standard
1789 For wideband decoding,
1798 If you need to obtain the size of the frames that will be used by the decoder,
1799 you can get that value in the
1803 variable (expressed in
1810 \begin_layout Standard
1811 \begin_inset listings
1815 \begin_layout Standard
1817 speex_decoder_ctl(dec_state, SPEEX_GET_FRAME_SIZE, &frame_size);
1825 \begin_layout Standard
1826 There is also a parameter that can be set for the decoder: whether or not
1827 to use a perceptual enhancer.
1831 \begin_layout Standard
1832 \begin_inset listings
1836 \begin_layout Standard
1838 speex_decoder_ctl(dec_state, SPEEX_SET_ENH, &enh);
1846 \begin_layout Standard
1851 is an int with value 0 to have the enhancer disabled and 1 to have it enabled.
1852 As of 1.2-beta1, the default is now to enable the enhancer.
1855 \begin_layout Standard
1856 Again, once the decoder initialization is done, for every input frame:
1859 \begin_layout Standard
1860 \begin_inset listings
1864 \begin_layout Standard
1866 speex_bits_read_from(&bits, input_bytes, nbBytes);
1869 \begin_layout Standard
1871 speex_decode_int(dec_state, &bits, output_frame);
1876 where input_bytes is a
1880 containing the bit-stream data received for a frame,
1884 is the size (in bytes) of that bit-stream, and
1892 and points to the area where the decoded speech frame will be written.
1893 A NULL value as the second argument indicates that we don't have the bits
1894 for the current frame.
1895 When a frame is lost, the Speex decoder will do its best to "guess" the
1899 \begin_layout Standard
1900 As for the encoder, the
1904 function can still be used, with a
1908 as the output for the audio.
1909 After you're done with the decoding, free all resources with:
1912 \begin_layout Standard
1913 \begin_inset listings
1917 \begin_layout Standard
1919 speex_bits_destroy(&bits);
1922 \begin_layout Standard
1924 speex_decoder_destroy(dec_state);
1932 \begin_layout Subsection
1933 Codec Options (speex_*_ctl)
1934 \begin_inset LatexCommand label
1935 name "sub:Codec-Options"
1946 Entities should not be multiplied beyond necessity -- William of Ockham.
1953 Just because there's an option for it doesn't mean you have to turn it on
1957 \begin_layout Standard
1958 The Speex encoder and decoder support many options and requests that can
1959 be accessed through the
1968 Despite that, the defaults are good for many applications and
1970 optional settings should only be used when one understands them and knows
1971 that they are needed
1974 A common error is to attempt to set many unnecessary settings.
1975 These functions are similar to the
1979 system call and their prototypes are:
1982 \begin_layout Standard
1983 \begin_inset listings
1987 \begin_layout Standard
1989 void speex_encoder_ctl(void *encoder, int request, void *ptr);
1992 \begin_layout Standard
1994 void speex_decoder_ctl(void *encoder, int request, void *ptr);
2002 \begin_layout Standard
2003 The different values of request allowed are (note that some only apply to
2004 the encoder or the decoder):
2007 \begin_layout Description
2008 SPEEX_SET_ENH** Set perceptual enhancer
2009 \begin_inset LatexCommand index
2010 name "perceptual enhancement"
2014 to on (1) or off (0) (spx_int32_t)
2017 \begin_layout Description
2018 SPEEX_GET_ENH** Get perceptual enhancer status (spx_int32_t)
2021 \begin_layout Description
2022 SPEEX_GET_FRAME_SIZE Get the number of samples per frame for the current
2026 \begin_layout Description
2027 SPEEX_SET_QUALITY* Set the encoder speech quality (spx_int32_t 0 to 10)
2030 \begin_layout Description
2031 SPEEX_GET_QUALITY* Get the current encoder speech quality (spx_int32_t 0
2035 \begin_layout Description
2036 SPEEX_SET_MODE* Set the mode number, as specified in the RTP spec (spx_int32_t)
2039 \begin_layout Description
2040 SPEEX_GET_MODE* Get the current mode number, as specified in the RTP spec
2044 \begin_layout Description
2046 \begin_inset Formula $\dagger$
2049 Use the source, Luke!
2052 \begin_layout Description
2054 \begin_inset Formula $\dagger$
2057 Use the source, Luke!
2060 \begin_layout Description
2061 SPEEX_SET_HIGH_MODE*
2062 \begin_inset Formula $\dagger$
2065 Use the source, Luke!
2068 \begin_layout Description
2069 SPEEX_GET_HIGH_MODE*
2070 \begin_inset Formula $\dagger$
2073 Use the source, Luke!
2076 \begin_layout Description
2077 SPEEX_SET_VBR* Set variable bit-rate (VBR) to on (1) or off (0) (spx_int32_t)
2080 \begin_layout Description
2081 SPEEX_GET_VBR* Get variable bit-rate
2082 \begin_inset LatexCommand index
2083 name "variable bit-rate"
2087 (VBR) status (spx_int32_t)
2090 \begin_layout Description
2091 SPEEX_SET_VBR_QUALITY* Set the encoder VBR speech quality (float 0 to 10)
2094 \begin_layout Description
2095 SPEEX_GET_VBR_QUALITY* Get the current encoder VBR speech quality (float
2099 \begin_layout Description
2100 SPEEX_SET_COMPLEXITY* Set the CPU resources allowed for the encoder (spx_int32_t
2104 \begin_layout Description
2105 SPEEX_GET_COMPLEXITY* Get the CPU resources allowed for the encoder (spx_int32_t
2109 \begin_layout Description
2110 SPEEX_SET_BITRATE* Set the bit-rate to use to the closest value not exceeding
2111 the parameter (spx_int32_t in bps)
2114 \begin_layout Description
2115 SPEEX_GET_BITRATE Get the current bit-rate in use (spx_int32_t in bps)
2118 \begin_layout Description
2119 SPEEX_SET_SAMPLING_RATE Set real sampling rate (spx_int32_t in Hz)
2122 \begin_layout Description
2123 SPEEX_GET_SAMPLING_RATE Get real sampling rate (spx_int32_t in Hz)
2126 \begin_layout Description
2127 SPEEX_RESET_STATE Reset the encoder/decoder state to its original state,
2128 clearing all memories (no argument)
2131 \begin_layout Description
2132 SPEEX_SET_VAD* Set voice activity detection
2133 \begin_inset LatexCommand index
2134 name "voice activity detection"
2138 (VAD) to on (1) or off (0) (spx_int32_t)
2141 \begin_layout Description
2142 SPEEX_GET_VAD* Get voice activity detection (VAD) status (spx_int32_t)
2145 \begin_layout Description
2146 SPEEX_SET_DTX* Set discontinuous transmission
2147 \begin_inset LatexCommand index
2148 name "discontinuous transmission"
2152 (DTX) to on (1) or off (0) (spx_int32_t)
2155 \begin_layout Description
2156 SPEEX_GET_DTX* Get discontinuous transmission (DTX) status (spx_int32_t)
2159 \begin_layout Description
2160 SPEEX_SET_ABR* Set average bit-rate
2161 \begin_inset LatexCommand index
2162 name "average bit-rate"
2166 (ABR) to a value n in bits per second (spx_int32_t in bps)
2169 \begin_layout Description
2170 SPEEX_GET_ABR* Get average bit-rate (ABR) setting (spx_int32_t in bps)
2173 \begin_layout Description
2174 SPEEX_SET_PLC_TUNING* Tell the encoder to optimize encoding for a certain
2175 percentage of packet loss (spx_int32_t in percent)
2178 \begin_layout Description
2179 SPEEX_GET_PLC_TUNING* Get the current tuning of the encoder for PLC (spx_int32_t
2183 \begin_layout Description
2184 SPEEX_SET_VBR_MAX_BITRATE* Set the maximum bit-rate allowed in VBR operation
2185 (spx_int32_t in bps)
2188 \begin_layout Description
2189 SPEEX_GET_VBR_MAX_BITRATE* Get the current maximum bit-rate allowed in VBR
2190 operation (spx_int32_t in bps)
2193 \begin_layout Description
2194 SPEEX_SET_HIGHPASS Set the high-pass filter on (1) or off (0) (spx_int32_t)
2197 \begin_layout Description
2198 SPEEX_TET_HIGHPASS Get the current high-pass filter status (spx_int32_t)
2201 \begin_layout Description
2202 * applies only to the encoder
2205 \begin_layout Description
2206 ** applies only to the decoder
2209 \begin_layout Description
2210 \begin_inset Formula $\dagger$
2213 If you can't understand from the source code what this does, you should
2214 not be using it in the first place
2217 \begin_layout Subsection
2219 \begin_inset LatexCommand label
2220 name "sub:Mode-queries"
2227 \begin_layout Standard
2228 Speex modes have a query system similar to the speex_encoder_ctl and speex_decod
2230 Since modes are read-only, it is only possible to get information about
2232 The function used to do that is:
2233 \begin_inset listings
2237 \begin_layout Standard
2239 void speex_mode_query(SpeexMode *mode, int request, void *ptr);
2244 The admissible values for request are (unless otherwise note, the values
2245 are returned through
2252 \begin_layout Description
2253 SPEEX_MODE_FRAME_SIZE Get the frame size (in samples) for the mode
2256 \begin_layout Description
2257 SPEEX_SUBMODE_BITRATE Get the bit-rate for a submode number specified through
2266 \begin_layout Subsection
2267 Packing and in-band signalling
2268 \begin_inset LatexCommand index
2269 name "in-band signalling"
2276 \begin_layout Standard
2277 Sometimes it is desirable to pack more than one frame per packet (or other
2278 basic unit of storage).
2279 The proper way to do it is to call speex_encode
2280 \begin_inset Formula $N$
2283 times before writing the stream with speex_bits_write.
2284 In cases where the number of frames is not determined by an out-of-band
2285 mechanism, it is possible to include a terminator code.
2286 That terminator consists of the code 15 (decimal) encoded with 5 bits,
2288 \begin_inset LatexCommand ref
2289 reference "cap:quality_vs_bps"
2294 Note that as of version 1.0.2, calling speex_bits_write automatically inserts
2295 the terminator so as to fill the last byte.
2296 This doesn't involves any overhead and makes sure Speex can always detect
2297 when there is no more frame in a packet.
2300 \begin_layout Standard
2301 It is also possible to send in-band
2302 \begin_inset Quotes eld
2306 \begin_inset Quotes erd
2310 All these messages are encoded as
2311 \begin_inset Quotes eld
2315 \begin_inset Quotes erd
2318 of mode 14 which contain a 4-bit message type code, followed by the message.
2320 \begin_inset LatexCommand ref
2321 reference "cap:In-band-signalling-codes"
2325 lists the available codes, their meaning and the size of the message that
2327 Most of these messages are requests that are sent to the encoder or decoder
2328 on the other end, which is free to comply or ignore them.
2329 By default, all in-band messages are ignored.
2332 \begin_layout Standard
2333 \begin_inset Float table
2339 \begin_layout Standard
2343 \begin_layout Standard
2353 \begin_inset Tabular
2354 <lyxtabular version="3" rows="17" columns="3">
2356 <column alignment="center" valignment="top" leftline="true" width="0pt">
2357 <column alignment="center" valignment="top" leftline="true" width="0pt">
2358 <column alignment="center" valignment="top" leftline="true" rightline="true" width="0pt">
2359 <row topline="true" bottomline="true">
2360 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
2363 \begin_layout Standard
2369 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
2372 \begin_layout Standard
2378 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
2381 \begin_layout Standard
2388 <row topline="true">
2389 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
2392 \begin_layout Standard
2398 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
2401 \begin_layout Standard
2407 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
2410 \begin_layout Standard
2411 Asks decoder to set perceptual enhancement off (0) or on(1)
2417 <row topline="true">
2418 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
2421 \begin_layout Standard
2427 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
2430 \begin_layout Standard
2436 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
2439 \begin_layout Standard
2440 Asks (if 1) the encoder to be less
2441 \begin_inset Quotes eld
2445 \begin_inset Quotes erd
2448 due to high packet loss
2454 <row topline="true">
2455 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
2458 \begin_layout Standard
2464 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
2467 \begin_layout Standard
2473 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
2476 \begin_layout Standard
2477 Asks encoder to switch to mode N
2483 <row topline="true">
2484 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
2487 \begin_layout Standard
2493 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
2496 \begin_layout Standard
2502 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
2505 \begin_layout Standard
2506 Asks encoder to switch to mode N for low-band
2512 <row topline="true">
2513 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
2516 \begin_layout Standard
2522 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
2525 \begin_layout Standard
2531 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
2534 \begin_layout Standard
2535 Asks encoder to switch to mode N for high-band
2541 <row topline="true">
2542 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
2545 \begin_layout Standard
2551 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
2554 \begin_layout Standard
2560 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
2563 \begin_layout Standard
2564 Asks encoder to switch to quality N for VBR
2570 <row topline="true">
2571 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
2574 \begin_layout Standard
2580 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
2583 \begin_layout Standard
2589 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
2592 \begin_layout Standard
2593 Request acknowloedge (0=no, 1=all, 2=only for in-band data)
2599 <row topline="true">
2600 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
2603 \begin_layout Standard
2609 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
2612 \begin_layout Standard
2618 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
2621 \begin_layout Standard
2622 Asks encoder to set CBR (0), VAD(1), DTX(3), VBR(5), VBR+DTX(7)
2628 <row topline="true">
2629 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
2632 \begin_layout Standard
2638 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
2641 \begin_layout Standard
2647 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
2650 \begin_layout Standard
2651 Transmit (8-bit) character to the other end
2657 <row topline="true">
2658 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
2661 \begin_layout Standard
2667 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
2670 \begin_layout Standard
2676 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
2679 \begin_layout Standard
2680 Intensity stereo information
2686 <row topline="true">
2687 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
2690 \begin_layout Standard
2696 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
2699 \begin_layout Standard
2705 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
2708 \begin_layout Standard
2709 Announce maximum bit-rate acceptable (N in bytes/second)
2715 <row topline="true">
2716 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
2719 \begin_layout Standard
2725 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
2728 \begin_layout Standard
2734 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
2737 \begin_layout Standard
2744 <row topline="true">
2745 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
2748 \begin_layout Standard
2754 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
2757 \begin_layout Standard
2763 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
2766 \begin_layout Standard
2767 Acknowledge receiving packet N
2773 <row topline="true">
2774 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
2777 \begin_layout Standard
2783 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
2786 \begin_layout Standard
2792 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
2795 \begin_layout Standard
2802 <row topline="true">
2803 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
2806 \begin_layout Standard
2812 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
2815 \begin_layout Standard
2821 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
2824 \begin_layout Standard
2831 <row topline="true" bottomline="true">
2832 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
2835 \begin_layout Standard
2841 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
2844 \begin_layout Standard
2850 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
2853 \begin_layout Standard
2868 \begin_layout Standard
2880 \begin_layout Standard
2881 \begin_inset Caption
2883 \begin_layout Standard
2884 In-band signalling codes
2885 \begin_inset LatexCommand label
2886 name "cap:In-band-signalling-codes"
2903 \begin_layout Standard
2904 Finally, applications may define custom in-band messages using mode 13.
2905 The size of the message in bytes is encoded with 5 bits, so that the decoder
2906 can skip it if it doesn't know how to interpret it.
2909 \begin_layout Section
2910 Speech Processing API (
2917 \begin_layout Standard
2918 As of version 1.2beta3, the non-codec parts of the Speex package are now
2919 in a separate library called
2924 This library includes the preprocessor, the acoustic echo canceller, the
2925 jitter buffer, and the resampler.
2926 In a UNIX environment, it can be linked into a program by adding
2930 to the compiler command line.
2931 Just like for libspeex,
2933 libspeexdsp calls are reentrant, but not thread-safe
2936 That means that it is fine to use calls from many threads, but
2938 calls using the same state from multiple threads must be protected by mutexes
2943 \begin_layout Subsection
2945 \begin_inset LatexCommand label
2946 name "sub:Preprocessor"
2953 \begin_layout Standard
2954 In order to use the Speex preprocessor
2955 \begin_inset LatexCommand index
2960 , you first need to:
2963 \begin_layout Standard
2964 \begin_inset listings
2968 \begin_layout Standard
2970 #include <speex/speex_preprocess.h>
2978 \begin_layout Standard
2979 Then, a preprocessor state can be created as:
2982 \begin_layout Standard
2983 \begin_inset listings
2987 \begin_layout Standard
2989 SpeexPreprocessState *preprocess_state = speex_preprocess_state_init(frame_size,
2998 \begin_layout Standard
2999 It is recommended to use the same value for
3003 as is used by the encoder (20
3010 \begin_layout Standard
3011 For each input frame, you need to call:
3014 \begin_layout Standard
3015 \begin_inset listings
3019 \begin_layout Standard
3021 speex_preprocess_run(preprocess_state, audio_frame);
3029 \begin_layout Standard
3034 is used both as input and output.
3037 \begin_layout Standard
3038 In cases where the output audio is not useful for a certain frame, it is
3039 possible to use instead:
3042 \begin_layout Standard
3043 \begin_inset listings
3047 \begin_layout Standard
3049 speex_preprocess_estimate_update(preprocess_state, audio_frame);
3057 \begin_layout Standard
3058 This call will update all the preprocessor internal state variables without
3059 computing the output audio, thus saving some CPU cycles.
3062 \begin_layout Standard
3063 The behaviour of the preprocessor can be changed using:
3066 \begin_layout Standard
3067 \begin_inset listings
3071 \begin_layout Standard
3073 speex_preprocess_ctl(preprocess_state, request, ptr);
3081 \begin_layout Standard
3082 which is used in the same way as the encoder and decoder equivalent.
3083 Options are listed in Section .
3086 \begin_layout Standard
3087 The preprocessor state can be destroyed using:
3090 \begin_layout Standard
3091 \begin_inset listings
3095 \begin_layout Standard
3097 speex_preprocess_state_destroy(preprocess_state);
3105 \begin_layout Subsubsection
3106 Preprocessor options
3107 \begin_inset LatexCommand label
3108 name "sub:Preprocessor-options"
3115 \begin_layout Description
3116 SPEEX_PREPROCESS_SET_DENOISE Turns denoising on(1) or off(2) (integer)
3119 \begin_layout Description
3120 SPEEX_PREPROCESS_GET_DENOISE Get denoising status (integer)
3123 \begin_layout Description
3124 SPEEX_PREPROCESS_SET_AGC Turns automatic gain control (AGC) on(1) or off(2)
3128 \begin_layout Description
3129 SPEEX_PREPROCESS_GET_AGC Get AGC status (integer)
3132 \begin_layout Description
3133 SPEEX_PREPROCESS_SET_VAD Turns voice activity detector (VAD) on(1) or off(2)
3137 \begin_layout Description
3138 SPEEX_PREPROCESS_GET_VAD Get VAD status (integer)
3141 \begin_layout Description
3142 SPEEX_PREPROCESS_SET_AGC_LEVEL
3145 \begin_layout Description
3146 SPEEX_PREPROCESS_GET_AGC_LEVEL
3149 \begin_layout Description
3150 SPEEX_PREPROCESS_SET_DEREVERB Turns reverberation removal on(1) or off(2)
3154 \begin_layout Description
3155 SPEEX_PREPROCESS_GET_DEREVERB Get reverberation removal status (integer)
3158 \begin_layout Description
3159 SPEEX_PREPROCESS_SET_DEREVERB_LEVEL
3162 \begin_layout Description
3163 SPEEX_PREPROCESS_GET_DEREVERB_LEVEL
3166 \begin_layout Description
3167 SPEEX_PREPROCESS_SET_DEREVERB_DECAY
3170 \begin_layout Description
3171 SPEEX_PREPROCESS_GET_DEREVERB_DECAY
3174 \begin_layout Description
3175 SPEEX_PREPROCESS_SET_PROB_START
3178 \begin_layout Description
3179 SPEEX_PREPROCESS_GET_PROB_START
3182 \begin_layout Description
3183 SPEEX_PREPROCESS_SET_PROB_CONTINUE
3186 \begin_layout Description
3187 SPEEX_PREPROCESS_GET_PROB_CONTINUE
3190 \begin_layout Description
3191 SPEEX_PREPROCESS_SET_NOISE_SUPPRESS Set maximum attenuation of the noise
3192 in dB (negative number)
3195 \begin_layout Description
3196 SPEEX_PREPROCESS_GET_NOISE_SUPPRESS Get maximum attenuation of the noise
3197 in dB (negative number)
3200 \begin_layout Description
3201 SPEEX_PREPROCESS_SET_ECHO_SUPPRESS Set maximum attenuation of the residual
3202 echo in dB (negative number)
3205 \begin_layout Description
3206 SPEEX_PREPROCESS_GET_ECHO_SUPPRESS Set maximum attenuation of the residual
3207 echo in dB (negative number)
3210 \begin_layout Description
3211 SPEEX_PREPROCESS_SET_ECHO_SUPPRESS_ACTIVE Set maximum attenuation of the
3212 echo in dB when near end is active (negative number)
3215 \begin_layout Description
3216 SPEEX_PREPROCESS_GET_ECHO_SUPPRESS_ACTIVE Set maximum attenuation of the
3217 echo in dB when near end is active (negative number)
3220 \begin_layout Description
3221 SPEEX_PREPROCESS_SET_ECHO_STATE Set the associated echo canceller for residual
3222 echo suppression (NULL for no residual echo suppression)
3225 \begin_layout Description
3226 SPEEX_PREPROCESS_GET_ECHO_STATE Get the associated echo canceller
3229 \begin_layout Subsection
3231 \begin_inset LatexCommand label
3232 name "sub:Echo-Cancellation"
3239 \begin_layout Standard
3240 The Speex library now includes an echo cancellation
3241 \begin_inset LatexCommand index
3242 name "echo cancellation"
3246 algorithm suitable for Acoustic Echo Cancellation
3247 \begin_inset LatexCommand index
3248 name "acoustic echo cancellation"
3253 In order to use the echo canceller, you first need to
3256 \begin_layout Standard
3257 \begin_inset listings
3261 \begin_layout Standard
3263 #include <speex/speex_echo.h>
3271 \begin_layout Standard
3272 Then, an echo canceller state can be created by:
3275 \begin_layout Standard
3276 \begin_inset listings
3280 \begin_layout Standard
3282 SpeexEchoState *echo_state = speex_echo_state_init(frame_size, filter_length);
3290 \begin_layout Standard
3295 is the amount of data (in samples) you want to process at once and
3299 is the length (in samples) of the echo cancelling filter you want to use
3305 \begin_inset LatexCommand index
3311 It is recommended to use a frame size in the order of 20 ms (or equal to
3312 the codec frame size) and make sure it is easy to perform an FFT of that
3313 size (powers of two are better than prime sizes).
3314 The recommended tail length is approximately the third of the room reverberatio
3316 For example, in a small room, reverberation time is in the order of 300
3317 ms, so a tail length of 100 ms is a good choice (800 samples at 8000 Hz
3321 \begin_layout Standard
3322 Once the echo canceller state is created, audio can be processed by:
3325 \begin_layout Standard
3326 \begin_inset listings
3330 \begin_layout Standard
3332 speex_echo_cancellation(echo_state, input_frame, echo_frame, output_frame);
3340 \begin_layout Standard
3345 is the audio as captured by the microphone,
3349 is the signal that was played in the speaker (and needs to be removed)
3354 is the signal with echo removed.
3358 \begin_layout Standard
3359 One important thing to keep in mind is the relationship between
3368 It is important that, at any time, any echo that is present in the input
3369 has already been sent to the echo canceller as
3374 In other words, the echo canceller cannot remove a signal that it hasn't
3376 On the other hand, the delay between the input signal and the echo signal
3377 must be small enough because otherwise part of the echo cancellation filter
3379 In the ideal case, you code would look like:
3380 \begin_inset listings
3381 lstparams "breaklines=true"
3385 \begin_layout Standard
3387 write_to_soundcard(echo_frame, frame_size);
3390 \begin_layout Standard
3392 read_from_soundcard(input_frame, frame_size);
3395 \begin_layout Standard
3397 speex_echo_cancellation(echo_state, input_frame, echo_frame, output_frame);
3405 \begin_layout Standard
3406 If you wish to further reduce the echo present in the signal, you can do
3407 so by associating the echo canceller to the preprocessor (see Section
3408 \begin_inset LatexCommand ref
3409 reference "sub:Preprocessor"
3414 This is done by calling:
3415 \begin_inset listings
3416 lstparams "breaklines=true"
3420 \begin_layout Standard
3422 speex_preprocess_ctl(preprocess_state, SPEEX_PREPROCESS_SET_ECHO_STATE,echo_stat
3428 in the initialisation.
3431 \begin_layout Standard
3432 As of version 1.2-beta2, there is an alternative, simpler API that can be
3435 speex_echo_cancellation()
3438 When audio capture and playback are handled asynchronously (e.g.
3439 in different threads or using the
3447 system call), it can be difficult to keep track of what input_frame comes
3448 with what echo_frame.
3449 Instead, the playback comtext/thread can simply call:
3452 \begin_layout Standard
3453 \begin_inset listings
3457 \begin_layout Standard
3459 speex_echo_playback(echo_state, echo_frame);
3467 \begin_layout Standard
3468 every time an audio frame is played.
3469 Then, the capture context/thread calls:
3472 \begin_layout Standard
3473 \begin_inset listings
3477 \begin_layout Standard
3479 speex_echo_capture(echo_state, input_frame, output_frame);
3487 \begin_layout Standard
3488 for every frame captured.
3491 speex_echo_playback()
3493 simply buffers the playback frame so it can be used by
3495 speex_echo_capture()
3502 A side effect of using this alternate API is that the playback audio is
3503 delayed by two frames, which is the normal delay caused by the soundcard.
3504 When capture and playback are already synchronised,
3506 speex_echo_cancellation()
3508 is preferable since it gives better control on the exact input/echo timing.
3511 \begin_layout Standard
3512 The echo cancellation state can be destroyed with:
3515 \begin_layout Standard
3516 \begin_inset listings
3520 \begin_layout Standard
3522 speex_echo_state_destroy(echo_state);
3530 \begin_layout Standard
3531 It is also possible to reset the state of the echo canceller so it can be
3532 reused without the need to create another state with:
3535 \begin_layout Standard
3536 \begin_inset listings
3540 \begin_layout Standard
3542 speex_echo_state_reset(echo_state);
3550 \begin_layout Subsubsection
3554 \begin_layout Standard
3555 There are several things that may prevent the echo canceller from working
3557 One of them is a bug (or something suboptimal) in the code, but there are
3558 many others you should consider first
3561 \begin_layout Itemize
3562 Using a different soundcard to do the capture and plaback will *not* work,
3563 regardless of what you may think.
3564 The only exception to that is if the two cards can be made to have their
3566 \begin_inset Quotes eld
3570 \begin_inset Quotes erd
3573 on the same clock source.
3576 \begin_layout Itemize
3577 The delay between the record and playback signals must be minimal.
3578 Any signal played has to
3579 \begin_inset Quotes eld
3583 \begin_inset Quotes erd
3586 on the playback (far end) signal slightly before the echo canceller
3587 \begin_inset Quotes eld
3591 \begin_inset Quotes erd
3594 it in the near end signal, but excessive delay means that part of the filter
3596 In the worst situations, the delay is such that it is longer than the filter
3597 length, in which case, no echo can be cancelled.
3600 \begin_layout Itemize
3601 When it comes to echo tail length (filter length), longer is *not* better.
3602 Actually, the longer the tail length, the longer it takes for the filter
3604 Of course, a tail length that is too short will not cancel enough echo,
3605 but the most common problem seen is that people set a very long tail length
3606 and then wonder why no echo is being cancelled.
3609 \begin_layout Itemize
3610 Non-linear distortion cannot (by definition) be modeled by the linear adaptive
3611 filter used in the echo canceller and thus cannot be cancelled.
3612 Use good audio gear and avoid saturation/clipping.
3615 \begin_layout Standard
3616 Also useful is reading
3618 Echo Cancellation Demystified
3624 \begin_layout Standard
3625 http://www.embeddedstar.com/articles/2003/7/article20030720-1.html
3630 , which explains the fundamental principles of echo cancellation.
3631 The details of the algorithm described in the article are different, but
3632 the general ideas of echo cancellation through adaptive filters are the
3636 \begin_layout Standard
3637 As of version 1.2beta2, a new
3641 tool is included in the source distribution.
3642 The first step is to define DUMP_ECHO_CANCEL_DATA during the build.
3643 This causes the echo canceller to automatically save the near-end, far-end
3644 and output signals to files (aec_rec.sw aec_play.sw and aec_out.sw).
3645 These are exactly what the AEC receives and outputs.
3646 From there, it is necessary to start Octave and type:
3649 \begin_layout Standard
3650 \begin_inset listings
3651 lstparams "language=Matlab"
3655 \begin_layout Standard
3657 echo_diagnostic('aec_rec.sw', 'aec_play.sw', 'aec_diagnostic.sw', 1024);
3665 \begin_layout Standard
3666 The value of 1024 is the filter length and can be changed.
3667 There will be some (hopefully) useful messages printed and echo cancelled
3668 audio will be saved to aec_diagnostic.sw .
3669 If even that output is bad (almost no cancellation) then there is probably
3670 problem with the playback or recording process.
3673 \begin_layout Subsection
3677 \begin_layout Standard
3678 The jitter buffer can be enabled by including:
3679 \begin_inset listings
3680 lstparams "breaklines=true"
3684 \begin_layout Standard
3686 #include <speex/speex_jitter.h>
3691 and a new jitter buffer state can be initialised by:
3694 \begin_layout Standard
3695 \begin_inset listings
3696 lstparams "breaklines=true"
3700 \begin_layout Standard
3702 JitterBuffer *state = jitter_buffer_init(tick);
3710 \begin_layout Standard
3712 \begin_inset listings
3716 \begin_layout Standard
3723 argument is the time resolution (in timestamp units) used for the jitter
3724 buffer, and is generally the period at which the data is played out of
3729 \begin_layout Standard
3730 The jitter buffer API is based on the
3731 \begin_inset listings
3735 \begin_layout Standard
3742 type, which is defined as:
3743 \begin_inset listings
3747 \begin_layout Standard
3752 \begin_layout Standard
3754 char *data; /* Data bytes contained in the packet */
3757 \begin_layout Standard
3759 spx_uint32_t len; /* Length of the packet in bytes */
3762 \begin_layout Standard
3764 spx_uint32_t timestamp; /* Timestamp for the packet */
3767 \begin_layout Standard
3769 spx_uint32_t span; /* Time covered by the packet (timestamp units)
3773 \begin_layout Standard
3775 } JitterBufferPacket;
3783 \begin_layout Standard
3784 When a packet arrives, it need to be inserter into the jitter buffer by:
3785 \begin_inset listings
3789 \begin_layout Standard
3791 JitterBufferPacket packet;
3794 \begin_layout Standard
3796 /* Fill in the packet fields */
3799 \begin_layout Standard
3801 jitter_buffer_put(state, &packet);
3809 \begin_layout Standard
3810 When the decoder is ready to decode a packet the packet to be decoded can
3812 \begin_inset listings
3816 \begin_layout Standard
3821 \begin_layout Standard
3823 err = jitter_buffer_get(state, &packet, &start_offset);
3831 \begin_layout Standard
3833 \begin_inset listings
3837 \begin_layout Standard
3845 \begin_inset listings
3849 \begin_layout Standard
3856 are called from different threads, then
3858 you need to protect the jitter buffer state with a mutex
3864 \begin_layout Standard
3865 Because the jitter buffer is designed not to use an explicit timer, it needs
3866 to be told about the time explicitly.
3867 This is done by calling:
3868 \begin_inset listings
3872 \begin_layout Standard
3874 jitter_buffer_tick(state);
3882 \begin_layout Standard
3883 This needs to be done every time
3884 \begin_inset listings
3888 \begin_layout Standard
3899 \begin_layout Subsection
3903 \begin_layout Standard
3904 Speex includes a resampling modules.
3905 To make use of the resampler, it is necessary to include its header file:
3908 \begin_layout Standard
3909 \begin_inset listings
3913 \begin_layout Standard
3915 #include <speex/speex_resampler.h>
3923 \begin_layout Standard
3924 For each stream that is to be resampled, it is necessary to create a resampler
3928 \begin_layout Standard
3929 \begin_inset listings
3933 \begin_layout Standard
3935 SpeexResamplerState *resampler;
3938 \begin_layout Standard
3940 resampler = speex_resampler_init(nb_channels, input_rate, output_rate, quality,
3949 \begin_layout Standard
3950 where nb_channels is the number of channels that will be used (either interleave
3951 d or non-interleaved), input_rate is the sampling rate of the input stream,
3952 output_rate is the sampling rate of the output stream and quality is the
3953 requested quality setting (0 to 10).
3954 The quality parameter is useful for controlling the quality/complexity/latency
3956 Using a higher quality setting means less noise/aliasing, a higher complexity
3957 and a higher latency.
3958 Usually, a quality of 3 is acceptable for most desktop uses and quality
3959 10 is mostly recommended for pro audio work.
3960 Quality 0 usually has a decent sound (certainly better than using linear
3961 interpolation resampling), but artifacts may be heard.
3964 \begin_layout Standard
3965 The actual resampling is performed using
3968 \begin_layout Standard
3969 \begin_inset listings
3973 \begin_layout Standard
3975 err = speex_resampler_process_int(resampler, channelID, in, &in_length,
3981 where channelID is the ID of the channel to be processed.
3982 For a mono stream, use 0.
3987 pointer points to the first sample of the input buffer for the selected
3992 points to the first sample of the output.
3993 The size of the input and output buffers are specified by
4002 Upon completion, these values are replaced by the number of samples read
4003 and written by the resampler.
4004 Unless an error occurs, either all input samples will be read or all output
4005 samples will be written to (or both).
4006 For floating-point samples, the function speex_resampler_process_float()
4010 \begin_layout Standard
4011 It is also possible to process multiple channels at once.
4015 \begin_layout Standard
4021 \begin_layout Chapter
4022 Formats and standards
4023 \begin_inset LatexCommand index
4029 \begin_inset LatexCommand label
4030 name "sec:Formats-and-standards"
4037 \begin_layout Standard
4038 Speex can encode speech in both narrowband and wideband and provides different
4040 However, not all features need to be supported by a certain implementation
4042 In order to be called
4043 \begin_inset Quotes eld
4047 \begin_inset Quotes erd
4050 (whatever that means), an implementation must implement at least a basic
4054 \begin_layout Standard
4055 At the minimum, all narrowband modes of operation MUST be supported at the
4057 This includes the decoding of a wideband bit-stream by the narrowband decoder
4061 \begin_layout Standard
4062 The wideband bit-stream contains an embedded narrowband bit-stream which
4063 can be decoded alone
4069 If present, a wideband decoder MUST be able to decode a narrowband stream,
4070 and MAY either be able to decode all wideband modes or be able to decode
4071 the embedded narrowband part of all modes (which includes ignoring the
4075 \begin_layout Standard
4076 For encoders, at least one narrowband or wideband mode MUST be supported.
4077 The main reason why all encoding modes do not have to be supported is that
4078 some platforms may not be able to handle the complexity of encoding in
4082 \begin_layout Section
4084 \begin_inset LatexCommand index
4092 \begin_layout Standard
4093 The RTP payload draft is included in appendix
4094 \begin_inset LatexCommand ref
4095 reference "sec:IETF-draft"
4099 and the latest version is available at
4100 \begin_inset LatexCommand url
4101 target "http://www.speex.org/drafts/latest"
4106 This draft has been sent (2003/02/26) to the Internet Engineering Task
4107 Force (IETF) and will be discussed at the March 18th meeting in San Francisco.
4111 \begin_layout Section
4115 \begin_layout Standard
4116 For now, you should use the MIME type audio/x-speex for Speex-in-Ogg.
4117 We will apply for type
4124 \begin_layout Section
4126 \begin_inset LatexCommand index
4134 \begin_layout Standard
4135 Speex bit-streams can be stored in Ogg files.
4136 In this case, the first packet of the Ogg file contains the Speex header
4138 \begin_inset LatexCommand ref
4139 reference "cap:ogg_speex_header"
4144 All integer fields in the headers are stored as little-endian.
4149 field must contain the
4150 \begin_inset Quotes eld
4161 \begin_inset Quotes erd
4164 (with 3 trailing spaces), which identifies the bit-stream.
4169 contains the version of Speex that encoded the file.
4170 For now, refer to speex_header.[ch] for more info.
4179 ) flag is set to 1 for the header.
4180 The header packet has
4191 \begin_layout Standard
4192 The second packet contains the Speex comment header.
4193 The format used is the Vorbis comment format described here: http://www.xiph.org/
4194 ogg/vorbis/doc/v-comment.html .
4206 \begin_layout Standard
4207 The third and subsequent packets each contain one or more (number found
4208 in header) Speex frames.
4209 These are identified with
4213 starting from 2 and the
4217 is the number of the last sample encoded in that packet.
4218 The last of these packets has the
4229 \begin_layout Standard
4230 \begin_inset Float table
4236 \begin_layout Standard
4240 \begin_layout Standard
4250 \begin_inset Tabular
4251 <lyxtabular version="3" rows="16" columns="3">
4253 <column alignment="center" valignment="top" leftline="true" width="0pt">
4254 <column alignment="center" valignment="top" leftline="true" width="0pt">
4255 <column alignment="center" valignment="top" leftline="true" rightline="true" width="0pt">
4256 <row topline="true" bottomline="true">
4257 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
4260 \begin_layout Standard
4266 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
4269 \begin_layout Standard
4275 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
4278 \begin_layout Standard
4285 <row topline="true">
4286 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
4289 \begin_layout Standard
4295 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
4298 \begin_layout Standard
4304 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
4307 \begin_layout Standard
4314 <row topline="true">
4315 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
4318 \begin_layout Standard
4324 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
4327 \begin_layout Standard
4333 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
4336 \begin_layout Standard
4343 <row topline="true">
4344 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
4347 \begin_layout Standard
4353 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
4356 \begin_layout Standard
4362 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
4365 \begin_layout Standard
4372 <row topline="true">
4373 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
4376 \begin_layout Standard
4382 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
4385 \begin_layout Standard
4391 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
4394 \begin_layout Standard
4401 <row topline="true">
4402 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
4405 \begin_layout Standard
4411 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
4414 \begin_layout Standard
4420 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
4423 \begin_layout Standard
4430 <row topline="true">
4431 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
4434 \begin_layout Standard
4440 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
4443 \begin_layout Standard
4449 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
4452 \begin_layout Standard
4459 <row topline="true">
4460 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
4463 \begin_layout Standard
4464 mode_bitstream_version
4469 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
4472 \begin_layout Standard
4478 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
4481 \begin_layout Standard
4488 <row topline="true">
4489 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
4492 \begin_layout Standard
4498 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">