Improved SSE version of xcorr_kernel()