From ad4ae86b591402a9b1fc41a88be741d2138d61f4 Mon Sep 17 00:00:00 2001
From: Kat Walsh
Date: Tue, 23 Aug 2011 03:48:43 -0400
Subject: [PATCH] Copyedited draft (up to line 4015).
Edited for correctness, clarity, and consistent usage.
No meaning should have been changed by this edit.
---
doc/draft-ietf-codec-opus.xml | 264 +++++++++++++++++++++---------------------
1 file changed, 133 insertions(+), 131 deletions(-)
diff --git a/doc/draft-ietf-codec-opus.xml b/doc/draft-ietf-codec-opus.xml
index 54973514..a1415cc8 100644
--- a/doc/draft-ietf-codec-opus.xml
+++ b/doc/draft-ietf-codec-opus.xml
@@ -292,7 +292,7 @@ At the decoder, the two decoder outputs are simply added together.
To compensate for the different look-aheads required by each layer, the CELT
encoder input is delayed by an additional 2.7 ms.
This ensures that low frequencies and high frequencies arrive at the same time.
-This extra delay MAY be reduced by an encoder by using less lookahead for noise
+This extra delay MAY be reduced by an encoder by using less look-ahead for noise
shaping or using a simpler resampler in the LP layer, but this will reduce
quality.
However, the base 2.5 ms look-ahead in the CELT layer cannot be reduced in
@@ -322,9 +322,9 @@ As described, the two layers can be combined in three possible operating modes:
-A single packet may contain multiple audio frames, however they must share a
- common set of parameters, including the operating mode, audio bandwidth, frame
- size, and channel count.
+A single packet may contain multiple audio frames.
+However, they must share a common set of parameters, including the operating
+ mode, audio bandwidth, frame size, and channel count.
This section describes the possible combinations of these parameters and the
internal framing used to pack multiple frames into a single packet.
This framing is not self-delimiting.
@@ -438,7 +438,7 @@ It is also roughly the maximum useful rate of the MDCT layer, as shortly
-No length is transmitted for the last frame in a VBR packet, or any of the
+No length is transmitted for the last frame in a VBR packet, or for any of the
frames in a CBR packet, as it can be inferred from the total size of the
packet and the size of all other data in the packet.
However, the length of any individual frame MUST NOT exceed 1275 bytes, to
@@ -528,7 +528,7 @@ The length of the first frame, N1, MUST be no larger than the size of the
-Code 3 packets may encode an arbitrary number of packets, as well as additional
+Code 3 packets may encode an arbitrary number of frames, as well as additional
padding, called "Opus padding" to indicate that this padding is added at the
Opus layer, rather than at the transport layer.
For code 3 packets, the TOC byte is followed by a byte encoding the number of
@@ -559,8 +559,8 @@ Values from 0...254 indicate that 0...254 bytes of padding are included,
If the value is 255, then the size of the additional padding is 254 bytes,
plus the padding value encoded in the next byte.
The additional padding bytes appear at the end of the packet, and SHOULD be set
- to zero by the encoder, however the decoder MUST accept any value for the
- padding bytes.
+ to zero by the encoder.
+The decoder MUST accept any value for the padding bytes, however.
By using code 255 multiple times, it is possible to create a packet of any
specific, desired size.
Let P be the total amount of padding, including both the trailing padding bytes
@@ -871,8 +871,8 @@ rng = rng - (rng/ft)*(ft-fh[k]).
-Using a special case for the first symbol, rather than the last symbol, as is
- commonly done in other arithmetic coders, ensures that all the truncation
+Using a special case for the first symbol (rather than the last symbol, as is
+ commonly done in other arithmetic coders) ensures that all the truncation
error from the finite precision arithmetic accumulates in symbol 0.
This makes the cost of coding a 0 slightly smaller, on average, than its
estimated probability indicates and makes the cost of coding any other symbol
@@ -1050,7 +1050,7 @@ Because ec_decode() is limited to a total frequency of 2**16-1, this is split
value, and, if necessary, raw bits representing the remaining bits.
The limit of 8 bits in the range coded symbol is a trade-off between
implementation complexity, modeling error (since the symbols no longer truly
- have equal coding cost) and rounding error introduced by the range coder
+ have equal coding cost), and rounding error introduced by the range coder
itself (which gets larger as more bits are included).
Using raw bits reduces the maximum number of divisions required in the worst
case, but means that it may be possible to decode a value outside the range
@@ -1108,22 +1108,22 @@ In practice, although the number of bits used so far is an upper bound,
However, this error is bounded, and periodic calls to ec_tell() or
ec_tell_frac() at precisely defined points in the decoding process prevent it
from accumulating.
-For a symbol that requires a whole number of bits (i.e., ft/(fh[k]-fl[k]) is a
- power of two, including values of ft larger than 2**8 with ec_dec_uint()), and
- there are at least p 1/8th bits available, decoding the symbol will never
- advance the decoder past the end of the frame, i.e., will never "bust" the
- budget.
-Frames contain a whole number of bits, and the return value of ec_tell_frac()
- will only advance by more than p 1/8th bits in this case if there was a
- fractional number of bits remaining, and by no more than the fractional part.
+For a range coder symbol that requires a whole number of bits (i.e.,
+ ft/(fh[k]-fl[k]) is a power of two), where there are at least p 1/8th bits
+ available, decoding the symbol will never advance the decoder past the end of
+ the frame ("bust the budget").
+In this case the return value of ec_tell_frac() will only advance by more than
+ p 1/8th bits if there was an additional, fractional number of bits remaining,
+ and it will never advance beyond the next whole-bit boundary, which is safe,
+ since frames always contain a whole number of bits.
However, when p is not a whole number of bits, an extra 1/8th bit is required
- to ensure decoding the symbol will not bust.
+ to ensure that decoding the symbol will not bust the budget.
The reference implementation keeps track of the total number of whole bits that
- have been processed by the decoder so far in a variable nbits_total, including
- the (possibly fractional number of bits) that are currently buffered (but not
- consumed) inside the range coder.
+ have been processed by the decoder so far in the variable nbits_total,
+ including the (possibly fractional) number of bits that are currently
+ buffered, but not consumed, inside the range coder.
nbits_total is initialized to 33 just after the initial range renormalization
process completes (or equivalently, it can be initialized to 9 before the
first renormalization).
@@ -1223,7 +1223,7 @@ It would be required to do so anyway for hybrid Opus frames, or to support
Frame TypeGain index
-Order of the symbols in the SILK section of the bit-stream.
+Order of the symbols in the SILK section of the bitstream.
@@ -1259,7 +1259,7 @@ An overview of the decoder is given in .
- The range decoder decodes the encoded parameters from the received bitstream. Output from this function includes the pulses and gains for the excitation signal generation, as well as LTP and LSF codebook indices, which are needed for decoding LTP and LPC coefficients needed for LTP and LPC synthesis filtering the excitation signal, respectively.
+ The range decoder decodes the encoded parameters from the received bitstream. Output from this function includes the pulses and gains for generating the excitation signal, as well as LTP and LSF codebook indices, which are needed for decoding LTP and LPC coefficients needed for LTP and LPC synthesis filtering the excitation signal, respectively.
@@ -1270,7 +1270,7 @@ An overview of the decoder is given in .
When a voiced frame is decoded and LTP codebook selection and indices are received, LTP coefficients are decoded using the selected codebook by choosing the vector that corresponds to the given codebook index in that codebook. This is done for each of the four subframes.
- The LPC coefficients are decoded from the LSF codebook by first adding the chosen LSF vector and the decoded LSF residual signal. The resulting LSF vector is stabilized using the same method that was used in the encoder, see
+ The LPC coefficients are decoded from the LSF codebook by first adding the chosen LSF vector and the decoded LSF residual signal. The resulting LSF vector is stabilized using the same method that was used in the encoder; see
. The LSF coefficients are then converted to LPC coefficients, and passed on to the LPC synthesis filter.
@@ -1283,7 +1283,7 @@ An overview of the decoder is given in .
- For voiced speech, the excitation signal e(n) is input to an LTP synthesis filter that will recreate the long term correlation that was removed in the LTP analysis filter and generate an LPC excitation signal e_LPC(n), according to
+ For voiced speech, the excitation signal e(n) is input to an LTP synthesis filter that recreates the long-term correlation removed in the LTP analysis filter and generates an LPC excitation signal e_LPC(n), according to
@@ -1378,11 +1378,12 @@ The quantized excitation signal follows these at the end of the frame.
Each SILK frame begins with a single "frame type" symbol that jointly codes the
signal type and quantization offset type of the corresponding frame.
If the current frame is a regular SILK frame whose VAD bit was not set (an
- "inactive" frame), then the frame type symbol takes on the value either 0 or 1
- and is decoded using the first PDF in .
+ "inactive" frame), then the frame type symbol takes on a value of either 0 or
+ 1 and is decoded using the first PDF in .
If the frame is an LBRR frame or a regular SILK frame whose VAD flag was set
- (an "active" frame), then the symbol ranges from 2 to 5, inclusive, and is
- decoded using the second PDF in .
+ (an "active" frame), then the value of the symbol may range from 2 to 5,
+ inclusive, and is decoded using the second PDF in
+ .
translates between the value of the
frame type symbol and the corresponding signal type and quantization offset
type.
@@ -1759,7 +1760,7 @@ Decoding the second stage residual proceeds as follows.
For each coefficient, the decoder reads a symbol using the PDF corresponding to
I1 from either or
, and subtracts 4 from the result
- to given an index in the range -4 to 4, inclusive.
+ to give an index in the range -4 to 4, inclusive.
If the index is either -4 or 4, it reads a second symbol using the PDF in
, and adds the value of this second symbol
to the index, using the same sign.
@@ -2169,9 +2170,10 @@ NLSF_Q15[k] = (cb1_Q8[k]<<7) + (res_Q10[k]<<14)/w_Q9[k] ,
]]>
where the division is exact integer division.
-However, nothing thus far in the reconstruction process, nor in the
- quantization process in the encoder, guarantees that the coefficients are
- monotonically increasing and separated well enough to ensure a stable filter.
+However, nothing in either the reconstruction process or the
+ quantization process in the encoder thus far guarantees that the coefficients
+ are monotonically increasing and separated well enough to ensure a stable
+ filter.
When using the reference encoder, roughly 2% of frames violate this constraint.
The next section describes a stabilization procedure used to make these
guarantees.
@@ -2255,12 +2257,12 @@ center_freq_Q15 = clamp(min_center_Q15[i],
NLSF_Q15[i] = NLSF_Q15[i-1] + NDeltaMin_Q15[i] .
]]>
-Then the procedure repeats again, until it has executed 20 times, or until
- it stops because the coefficients satisfy all the constraints.
+Then the procedure repeats again, until it has either executed 20 times or
+ has stopped because the coefficients satisfy all the constraints.
-After the 20th repetition of the above, the following fallback procedure
- executes once.
+After the 20th repetition of the above procedure, the following fallback
+ procedure executes once.
First, the values of NLSF_Q15[k] for 0 <= k < d_LPC
are sorted in ascending order.
Then for each value of k from 0 to d_LPC-1, NLSF_Q15[k] is set to
@@ -2282,7 +2284,7 @@ min(NLSF_Q15[k], NLSF_Q15[k+1] - NDeltaMin_Q15[k+1]) .
For 20 ms SILK frames, the first half of the frame (i.e., the first two
- sub-frames) may use normalized LSF coefficients that are interpolated between
+ subframes) may use normalized LSF coefficients that are interpolated between
the decoded LSFs for the previous frame and the current frame.
A Q2 interpolation factor follows the LSF coefficient indices in the bitstream,
which is decoded using the PDF in .
@@ -2369,7 +2371,7 @@ The function silk_NLSF2A() (silk_NLSF2A.c) implements this procedure.
To start, it approximates cos(pi*n[k]) using a table lookup with linear
interpolation.
The encoder SHOULD use the inverse of this piecewise linear approximation,
- rather than true the inverse of the cosine function, when deriving the
+ rather than the true inverse of the cosine function, when deriving the
normalized LSF coefficients.
@@ -2528,7 +2530,7 @@ Even floating-point decoders SHOULD perform these steps, to avoid mismatch.
For each round, the process first finds the index k such that abs(a32_Q17[k])
- is the largest, breaking ties by using the lower value of k.
+ is largest, breaking ties by choosing the lowest value of k.
Then, it computes the corresponding Q12 precision value, maxabs_Q12, subject to
an upper bound to avoid overflow in subsequent computations: