Optimize fixed-point celt_inner_prod() and dual_inner_prod() for ARM NEON