From ae21d4a4c9b907ae1821cf4cc90781a3492f373a Mon Sep 17 00:00:00 2001
From: "Timothy B. Terriberry"
Date: Fri, 14 Apr 2017 16:57:04 0700
Subject: [PATCH] Update the coding tools draft.
Describes an way to use a smaller multiply in the dyadic partition
function, and removes the description of the nondyadic partition
functions, since we don't currently plan to use those, longterm.

doc/ietf/draftterriberrynetvccodingtools.xml  177 +++++++++++
1 file changed, 84 insertions(+), 93 deletions()
diff git a/doc/ietf/draftterriberrynetvccodingtools.xml b/doc/ietf/draftterriberrynetvccodingtools.xml
index 8e2be5d..0762e5d 100644
 a/doc/ietf/draftterriberrynetvccodingtools.xml
+++ b/doc/ietf/draftterriberrynetvccodingtools.xml
@@ 2,7 +2,7 @@

+Coding Tools for a Next Generation Video Codec
@@ 35,7 +35,7 @@

+
ART
netvc
@@ 62,8 +62,8 @@ This draft documents some of the tools we have been working on for inclusion in
such a codec.
This is early work, and the performance of some of these tools (especially in
relation to other approaches) is not yet fully known.
Nevertheless, it still serves to outline some possibilities an eventual working
 group, if formed, could consider.
+Nevertheless, it still serves to outline some possibilities that NETVC could
+ consider.
@@ 412,106 +412,108 @@ Looking at the graph of the CDF we see that the likelihood for symbol 3
Rather than changing the context modeling, the other approach is to change the
 function used to partition the set of valid code points so that it does not
 need a division, even when ft is not a power of two.
Let the range of valid code points in the current arithmetic coder state be
[L, L + R), where L is the lower bound of the range and R is
the number of valid code points.
Assume that ft <= R < 2*ft (this is easy to enforce
 with the normal rescaling operations used with frequency counts).
Then one possible partition function is
+The goal of the arithmetic coder is to partition this interval proportional to
+ the probability of each symbol.
+When using dyadic probabilities, the partition point in the range corresponding
+ to a given CDF value can be determined via
+
 so that the new range after coding symbol k is
 [L + r[k], L + r[k+1]).

This is a variation of the partition function proposed
 by .
The size of the new partition (r[k+1]  r[k]) is no longer truly
 proportional to R*p[k].
This partition function counts values of fl[k] smaller than R  ft
 double compared to values larger than R  ft.
This overestimates the probability of symbols at the start of the alphabet
 and underestimates the probability of symbols at the end of the alphabet.
The amount of the range allocated to a symbol can be off by up to a factor of
 2 compared to its fair share, implying a peak error as large as one bit per
 symbol.
However, if the probabilities are accurate and the symbols being coded are
 independent, the average inefficiency introduced will be as low as
 log2(log2(e)*2/e) ~= 0.0861 bits per symbol.
This error can, of course, be reduced by coding fewer symbols with larger
 alphabets.
In practice the overhead is roughly equal to the overhead introduced by other
 approximate arithmetic coders like H.264's CABAC.
However, probabilities near onehalf tend to have the most overhead.
In fact, probabilities in the range of 40% to 60% for a binary symbol may not
 be worth modeling, since the compression gains may be entirely countered
 by the added overhead, making it cheaper and faster to code such values as
 raw bits.
This problem is partially alleviated by using larger alphabets.



A slightly more complicated partition function can reduce the overhead while
 still avoiding the division.
This is done by splitting things into two cases:


Case 1: ft  (R  ft)]]>
That is, we have more values that are doublecounted than singlecounted.
In this case, we still doublecount the first 2*R  3*ft values,
 but after that we alternate between singlecounting and doublecounting
 for the rest.
+Since ft is a power of two, this may be implemented using a right shift by T
+ bits in place of the division:
+
Case 2:
That is, we have more values that are singlecounted than doublecounted.
In this case, we alternate between singlecounting and doublecounting for
 the first 2*(R  ft) values, and singlecount the rest.


For two equiprobable symbols in different places in the alphabet, this
 reduces the maximum ratio of overestimation to underestimation from 2:1
 for the previous partition function to either 4:3 or 3:2 (for each of the
 two cases above, respectively), assuming symbol probabilities significantly
 greater than the minimum possible.
That reduces the worstcase persymbol overhead from 1 bit to 0.58 bits.
+The latency of the multiply still dominates the hardware timing.
+However, we can reduce this latency by using a smaller multiply, at the cost of
+ some accuracy in the partition.
+We cannot, in general, reduce the size of fl[k], since this might send a
+ probability to zero (i.e., cause u[k] to have the same value as u[k+1]).
+On the other hand, we know that the top bit of R is always 1, since it gets
+ renormalized with every symbol that is encoded.
+Suppose R contains 16 bits and that T is at least 8.
+Then we can greatly reduce the size of the multiply by using the formula
+
The resulting reducedoverhead partition function is
+The special case for k == M is required because, with the general
+ formula, u[M] no longer exactly equals R.
+Without the special case we would waste some amount of code space and require
+ the decoder to check for invalid streams.
+This special case slightly inflates the probability of the last symbol.
+Unfortunately, in codecs the usual convention is that the last symbol is the
+ least probable, while the first symbol (e.g., 0) is the most probable.
+That maximizes the coding overhead introduced by this approximation error.
+To minimize it, we instead add all of the accumulated error to the first symbol
+ by using a variation of the above update formula:
Here, e is a value that is greater than 0 in case 1, and 0 in case 2.
This function is about three times as expensive to evaluate as the
 highoverhead version, but still an order of magnitude cheaper than a
 division, since it is composed of very simple operations.
+This also aids the software decoder search, since it can prime the search loop
+ with the special case, instead of needing to check for it on every iteration
+ of the loop.
+It is easier to incorporate into a SIMD search as well.
+It does, however, add two subtractions.
+Since the encoder always operates on the difference between two partition
+ points, the first subtraction (involving R) can be eliminated.
+Similar optimizations can eliminate this subtraction in the decoder by flipping
+ its internal state (measuring the distance of the encoder output from the top
+ of the range instead of the bottom).
+To avoid the other subtraction, we can simply use "inverse CDFs" that natively
+ store ifl[k] = (ft  fl[k]) instead of fl[k].
+This produces the following partition function:
+
In practice it reduces the overhead by about 0.3% of the total bitrate.
It also tends to produce R values with a more uniform distribution compared
 to the highoverhead version, which tends to have peaks in the distribution
 of R at specific values (see for a discussion of this
 effect).
Overall, it makes it more likely that the compression gains from
 probabilities near onehalf are not eliminated by the approximation
 overhead, increasing the number of symbols that can be usefully modeled.
It is an open question whether or not these benefits are worth the increase
 in computational complexity.
+The reduction in hardware latency can be as much as 20%, and the impact on area
+ is even larger.
+The overall software complexity overhead is minimal, and the coding efficiency
+ overhead due to the approximation is about 0.02%.
+We could have achieved the same efficiency by leaving the special case on the
+ last symbol and reversing the alphabet instead of inverting the probabilities.
+However, reversing the alphabet at runtime would have required an extra
+ subtraction (or more general reordering requires a table lookup).
+That may be avoidable in some cases, but only by propagating the reordering
+ alphabet outside of the entropy coding machinery, requiring changes to every
+ coding tool and potentially leading to confusion.
+CDFs, on the other hand, are already a somewhat abstract representation of the
+ underlying probabilities used for computational efficiency reasons.
+Generalizing these to "inverse CDFs" is a straightforward change that only
+ affects probability initialization and adaptation, without impacting the
+ design of other coding tools.


@@ 1276,17 +1278,6 @@ Thanks to Nathan Egge, Gregory Maxwell, and JeanMarc Valin for their
value="pp. 371380"/>


Piecewise Integer Mapping for Arithmetic Coding







Lowcomplexity Hierarchical Lapped Transform for LossytoLossless

2.11.0