More thorough en-/decoding tests show that sometimes the functions
that use intrinsics are slower (or not really faster) than old
plain C functions.
After this patch the encoder doesn't use these new functions
when their usefulness is questionable.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
GCC generates slow ia32 code for FLAC__lpc_restore_signal_wide() and
FLAC__lpc_compute_residual_from_qlp_coefficients_wide() so 24-bit
encoding/decoding is slower for GCC compile than for MSVS or ICC
compile. This patch adds ia32 asm versions of these functions.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
Most non-static functions have FLAC__ prefix, but they were missing
from the precompute_partition_info_sums_* functions.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
Restrict works very poorly in Visual Studio (much slower than without)
so defined flac_restrict in share/compat.h and use that in:
lpc_compute_residual...()
lpc_restore_signal...()
As a result, FLAC__lpc_compute_residual_from_qlp_coefficients_wide_intrin_sse41()
offers no advantage for 64-bit compiles and was removed from x86-64 part
of stream_encoder.c
Patch-from: lvqcl <lvqcl.mail@gmail.com>
* Splits lpc_x86intrin.c to lpc_intrin_sse.c and lpc_intrin_sse2.c
* Add FLAC__lpc_compute_residual_from_qlp_coefficients_intrin_sse2()
function to lpc_intrin_sse2.c
* Add lpc_intrin_sse41.c with two ..._wide_intrin_sse41() functions
(useful for 24-bit en-/decoding)
* Add precompute_partition_info_sums_intrin_sse2() / ...ssse3() and
disables precompute_partition_info_sums_32bit_asm_ia32_().
SSE2 version uses 4 SSE2 instructions instead of 1 SSSE3 instruction
PABSD so it is slightly slower.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
Before this patch it was possible to set or get data.ia32.sse3 value
from x86-64 code, etc which is a potential source of errors.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
For the 32 bit x86 ASM functions there were already versions of this
function for lags (N = 4, 8, 12). They require lpc_order less than N.
The best compression preset (flac -8) uses lpc_order up to 12; it
means that during encoding FLAC also uses unaccelerated C function.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
Signed-off-by: Erik de Castro Lopo <erikd@mega-nerd.com>
The previous fix (patch 6f7ec60c) had the undesireable effect of slowing
down encoding speed on 16 bit files where the arithmetic overflow was
less likely to happen.
This fix forces the use of a FLAC__uint64 accumulator for 24 bit files
and restores the use of a FLAC_uint32 accumulator for 16 (and less) bit
files.
Unfortunately, I have not been able to prove to myself that this overflow
*cannot* happen with 16 bit files.
For a specific 24 bit WAV file provided by Leigh Dyer
http://lists.xiph.org/pipermail/flac-dev/2013-July/004284.html
encoding with compression level 7 was generating a file a couple of
orders of magintude larger than the original.
Debugging showed that variable abs_residual_partition_sum (a FLAC__uint32)
in function precompute_partition_info_sums_() was suffering from an
arithmetic overflowing on some 24 bit input files although this value
overflowing did not always cause larger output files.
Since the value abs_residual_partition_sum is eventually stored in an
array of FLAC__uint64, it makes sense to make abs_residual_partition_sum
a FLAC__uint64 anyway.
Debugging this problem was made easier by use of the Clang compiler's
-fsanitize=integer option.
The files src/flac/encode.c and src/libFLAC/stream_encoder.c use
functions in libFLAC that are marked as 'unpublished debug routines'.
This patch moves these functions to new file include/share/private.h
and marks them as 'unpublished debug routines'.
The problem was that the function safe_malloc_mul_2op_() was originally
defined as static inline in inclide/share/alloc.h but had to be moved
because GCC was refusing to inline it. Once moved however, static linking
would fail when building the flac executable because the function ended
up beiong linked twice.
- INCLUDES is deprecated, and CPPFLAGS is an user-defined
variable, use the proper AM_CPPFLAGS instead
- Remove FLAC__INLINE definition, providing proper
replacement for MSVC compilers.
- Detect if we have C99 's lround and provide a replacement
for windows...