This function offer no speed up from the C version of the function and were
commented out after the release of 1.3.0. We will now drop them completely.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
In the precompute_partition_info_sums_ function, instead of selecting
64-bit accumulator when the signal bps is larger than 16, revert to the
original approach based on partition size, but make room for few extra
bits to not overflow with unusual signals where the average residual
magnitude may be larger than bps.
It slightly improves the performance with standard encoding levels and
16-bit files as the 17-bit side channel can still be processed with the
32-bit accumulator and correctly selects the 64-bit accumulator with
very large 16-bit partitions.
This is related to commits 6f7ec60c and 187e596e.
Signed-off-by: Erik de Castro Lopo <erikd@mega-nerd.com>
MSVC6 was not able to cast from a uint64_t to a double and this
commit removes some #ifdef hackery designed to work around this
problem.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
Rather than the buffer into format_input_() as a FLAC__byte pointer, pass
it as a pointer to a union of three pointers, one each for for FLAC__byte,
FLAC__int16 and FLAC_int32.
This should have zero measurable performance impact.
This refactoring is in preparation for fixing the cast-align warning when
compiling on ARM (and possibly others). Testing on stereo 16 bit files
suggests that the difference between the performance of this code and the
old code is negligible (tested only on amd64/linux).
Restore a FLAC__ASSERT() to bitmath functions FLAC__bitmath_ilog2 and
FLAC__bitmath_ilog2_wide functions. This prevents the return of an
"undefined" value.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
Previously, the files lpc_intrin_sse2.c and lpc_intrin_sse41.c both defined
macros RESIDUAL_RESULT and DATA_RESULT. This situation made it impossible
to merge these files which we may do at some stage.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
According to docs, it's incorrect to just call CPUID with EAX=1.
One must to ensure that this value is supported.
CPUs that don't support CPUID level 1 are very old, but...
if FLAC tests CPUID presence it should also test CPUID level support.
Also the function FLAC__cpu_have_cpuid_asm_ia32 was simplified
according to the docs at Intel website and in Wikipedia.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
These were most arising from -Wenum-conversion where an enum of
one type was being assigned to a variable on another.
Originally reported by Lenny Maiorani <lenny@colorado.edu> on the
flac-dev mailing list.
Includes:
* Replace 'CALLBACK' with 'WINAPI' because the signature of an unhandled
exception filter uses 'WINAPI'.
* Improvements to OS SSE testing code.
* Improvements to GCC asm code.
* Comment fixes.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
Previously if a zero length string was passed in, the pointer would be
stored regardless of the copy parameter. If the original source pointer
was reassigned to something else bad things could happen.
Closes: https://sourceforge.net/p/flac/bugs/377/
This reverts commit 70b078cfd5.
The code in the patch we're reverting probably only works for one
compiler and could easily stop working with the next release of
that compiler.
The x86 FPU holds intermediate results in larger registers than what
the SSE unit uses, resulting in slighlty different encodings of audio
data. Attempt to fix this by modifying libFLAC/lpc.c to store calculation
results in a FLAC__read before adding it to a sum.
At the moment this works, but I could easily imagine a new version of
the compiler optimising this store to the FLAC__real away leaving us
in the same situation we have now.
Patch-from: Oliver Stöneberg on sourceforge.net
Closes: https://sourceforge.net/p/flac/bugs/409/
CPU detection used to depend on ASM code. Now CPU features are
also detected when only FLAC__HAS_X86INTRIN is defined.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
More thorough en-/decoding tests show that sometimes the functions
that use intrinsics are slower (or not really faster) than old
plain C functions.
After this patch the encoder doesn't use these new functions
when their usefulness is questionable.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
The new functions are analogous to FLAC__lpc_restore_signal_asm_ia32_mmx.
FLAC uses them for x86-64 arch and also for ia32 if NASM is not available.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
According to Agner Fog in optimizing_assembly.pdf:
"... write to a partial register may result in false dependencies
between instructions, so it is better to avoid it."
Patch-from: lvqcl <lvqcl.mail@gmail.com>
GCC generates slow ia32 code for FLAC__lpc_restore_signal_wide() and
FLAC__lpc_compute_residual_from_qlp_coefficients_wide() so 24-bit
encoding/decoding is slower for GCC compile than for MSVS or ICC
compile. This patch adds ia32 asm versions of these functions.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
According to Agner Fog, "...you must make sure that all calls
are matched with returns. Never jump out of a subroutine without
a return and never use a return as an indirect jump."
(see paragraph 3.15 in microarchitecture.pdf and
examples 3.5a and 3.5b in optimizing_assembly.pdf)
Patch-from: lvqcl <lvqcl.mail@gmail.com>