From 4863bdb25a7e0fb3446c6b0292970273cccaff31 Mon Sep 17 00:00:00 2001
From: JeanMarc Valin
Date: Thu, 8 Jul 2010 15:28:08 0400
Subject: [PATCH] Updated draft for 0.8.1

configure.ac  2 +
doc/ietf/draftvalinceltcodec.xml  114 ++++++++++
2 files changed, 33 insertions(+), 83 deletions()
diff git a/configure.ac b/configure.ac
index 789b64da..5031a1dd 100644
 a/configure.ac
+++ b/configure.ac
@@ 6,7 +6,7 @@ AM_CONFIG_HEADER([config.h])
CELT_MAJOR_VERSION=0
CELT_MINOR_VERSION=8
CELT_MICRO_VERSION=0
+CELT_MICRO_VERSION=1
CELT_EXTRA_VERSION=
CELT_VERSION=$CELT_MAJOR_VERSION.$CELT_MINOR_VERSION.$CELT_MICRO_VERSION$CELT_EXTRA_VERSION
LIBCELT_SUFFIX=0
diff git a/doc/ietf/draftvalinceltcodec.xml b/doc/ietf/draftvalinceltcodec.xml
index 1aa43e6f..91c91bd6 100644
 a/doc/ietf/draftvalinceltcodec.xml
+++ b/doc/ietf/draftvalinceltcodec.xml
@@ 65,7 +65,7 @@

+
General
@@ 321,29 +321,29 @@ and normalized MDCT bins (), respectively.
Q1+
  computation  ++ 
  ++  
  ++ 
  v v
 ++ +++ ++ ++ ++ ++ ++ ++
>Window>MDCT> / +>  >Q3> Mix > * >IMDCT+
 +++ ++ ++  ++ ++ ++ ++ ++ 
   ^ ^ ^ 
   +++ 
 ++ v  
  ++ ++ +++ 
  pitch gains>Q2> *  
  ++ ++ ++ 
  ^ ^ 
  ++ 
 v  
 ++ +++ 
 Pitch period Delay, MDCT, 
 estimation > Normalize  
 ++ ++ 
 ^ ^ 
 +++
+ + Energy +>Q1+
+  computation  ++ 
+  ++  
+  ++ 
+  v v
+ ++ +++ ++ ++ ++ ++ ++ ++ ++ ++
+>Window>MDCT>>/>>Q3>Mix>*>IMDCT>++>
+ +++ ++ ++ ++ ++ ++ ++ ++ ++ ++ 
+  ^ 
+  ++ 
+ ++  
+  ++ ++ +++ 
+ +>pitch gain>Q2> *  
+  ++ ++ ++ 
+  ^ ^ 
+  ++ 
+ v  
+ ++ +++ 
+ Pitch period Delay, MDCT, 
+ estimation > Normalize  
+ ++ ++ 
+ ^ ^ 
+ +++
]]>
Block diagram of the CELT encoder
@@ 544,7 +544,7 @@ CELT uses prediction to encode the energy in each frequency band. In order to ma
CELT can use a pitch predictor (also known as longterm predictor) to improve the voice quality at lower bitrates. While the pitch period can be estimated in any way, it is RECOMMENDED for performance reasons to estimate it using a frequencydomain correlation between the current frame and the history buffer, as implemented in find_spectral_pitch() (pitch.c). When the P bit is set, the pitch period is encoded after the flag bits. The value encoded is an integer in the range [0, 1024Noverlap1].
+CELT can use a pitch predictor (also known as longterm predictor) to improve the voice quality at lower bitrates. When the P bit is set, the pitch period is encoded after the flag bits. The value encoded is an integer in the range [0, 1024Noverlap1].
@@ 689,11 +689,10 @@ using the projected allocation. In the reference implementation this is
performed by compute_allocation() (rate.c).
The target computation begins by calculating the available space as the
number of whole bits which can be fit in the frame after Q1 is stored according
to the range coder (ec_[enc/dec]_tell()), and iff the frame has pitch prediction,
subtracting the number of pitch bands and then multiplying by 16.
Then the two projected prototype allocations whose sums multiplied by 16 are nearest
+to the range coder (ec_[enc/dec]_tell()) and then multiplying by 8.
+Then the two projected prototype allocations whose sums multiplied by 8 are nearest
to that value are determined. These two projected prototype allocations are then interpolated
by finding the highest integer interpolation coefficient in the range 016
+by finding the highest integer interpolation coefficient in the range 08
such that the sum of the higher prototype times the coefficient, plus the
sum of the lower prototype multiplied by
the difference of 16 and the coefficient, is less than or equal to the
@@ 737,38 +736,9 @@ PVQ.
The pitch period T is computed in the frequency domain using a generalized
crosscorrelation, as implemented in find_spectral_pitch()
(pitch.c). An MDCT is then computed on the
synthesis signal memory using the offset T.
If there is sufficient energy in this
part of the signal, the pitch gain for each pitch band
is computed as g_a = X^T*p, where X is the normalized (nonquantized) signal and
p is the normalized pitch MDCT.
The gain is computed by compute_pitch_gain() (bands.c),
and if a sufficient number of bands have a high enough gain, then the pitch bit is set.
Otherwise, no use of pitch is made.
+This section needs to be updated.

For frequencies above the highest pitch band (~6374 Hz), the pitch prediction is replaced by
spectral folding if and only if the folding bit is set. Spectral folding is implemented in
intra_fold() (vq.c). If the folding bit is not set, then
the prediction is simply set to zero.
The folding prediction uses the quantized spectrum at lower frequencies with a gain that depends
both on the width of the band, N, and the number of pulses allocated, K:



g_a = N / (N + 2*K*(K+1)),



When the short block bit is not set, the spectral copy is performed starting with bin 0 (DC) and going up. When the short block bit is set, then the starting point is chosen between 0 and B1 in such a way that the source and destination bins belong to the same MDCT (i.e., to prevent the folding from causing preecho). Before the folding operation, each band of the source spectrum is multiplied by sqrt(N) so that the expected value of the squared value for each bin is equal to 1. The copied spectrum is then renormalized to have norm (p = g_a).


For stereo streams, the folding is performed independently for each channel.

@@ 785,17 +755,7 @@ In bands where neither pitch nor folding is used, the PVQ is used to encode
the unit vector that results from the normalization in
directly. Given a PVQ codevector y,
the unit vector X is obtained as X = y/y, where . denotes the
L2 norm. In the case where a pitch
prediction or a folding vector p is used, the quantized unit vector X' becomes:

X' = p' + g_f * y,
where g_f = ( sqrt( (y^T*p')^2 + y^2*(1p'^2) )  y^T*p' ) / y^2,

and p' = g_a * p.

The combination of the pitch with the PVQ codeword is described in
mix_pitch_and_residual() (vq.c) and is used in
both the encoder and the decoder.
+L2 norm.
@@ 841,14 +801,6 @@ J = R^T*y / y
The last pulse is the only one considering the pitch and minimizes the cost function :



J = g_f * R^T*y + (g_f)^2 * y^2



The search described above is considered to be a good tradeoff between quality
and computational cost. However, there are other possible ways to search the PVQ
codebook and the implementors MAY use any other search methods.
@@ 1147,9 +1099,7 @@ a pulse vector by decode_pulses() (cwrs.c).
The decoded normalized vector for each band is equal to
X' = p' + g_f * y,
where g_f = ( sqrt( (y^T*p')^2 + y^2*(1p'^2) )  y^T*p' ) / y^2,
and p' = g_a * p.
+X' = y/y,
This operation is implemented in mix_pitch_and_residual() (vq.c),
@@ 1347,7 +1297,7 @@ The authors would also like to thank the CELT users who contributed patches, bug
This appendix contains the complete source code for a floatingpoint
reference implementation of the CELT codec written in C. This
implementation is derived from version 0.8.0 of the implementation available on the
+implementation is derived from version 0.8.1 of the implementation available on the
, which can be compiled for
either floatingpoint or fixedpoint architectures.

2.11.0