According to Agner Fog in optimizing_assembly.pdf:
"... write to a partial register may result in false dependencies
between instructions, so it is better to avoid it."
Patch-from: lvqcl <lvqcl.mail@gmail.com>
GCC generates slow ia32 code for FLAC__lpc_restore_signal_wide() and
FLAC__lpc_compute_residual_from_qlp_coefficients_wide() so 24-bit
encoding/decoding is slower for GCC compile than for MSVS or ICC
compile. This patch adds ia32 asm versions of these functions.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
According to Agner Fog, "...you must make sure that all calls
are matched with returns. Never jump out of a subroutine without
a return and never use a return as an indirect jump."
(see paragraph 3.15 in microarchitecture.pdf and
examples 3.5a and 3.5b in optimizing_assembly.pdf)
Patch-from: lvqcl <lvqcl.mail@gmail.com>
Most non-static functions have FLAC__ prefix, but they were missing
from the precompute_partition_info_sums_* functions.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
Besides SPE (FSL e500v? cores) there are other powerpc processors
that don't support altivec instructions so only enable them when it's
100% sure that the target has it.
Signed-off-by: Gustavo Zacarias <gustavo@zacarias.com.ar>
Signed-off-by: Erik de Castro Lopo <erikd@mega-nerd.com>
Restrict works very poorly in Visual Studio (much slower than without)
so defined flac_restrict in share/compat.h and use that in:
lpc_compute_residual...()
lpc_restore_signal...()
As a result, FLAC__lpc_compute_residual_from_qlp_coefficients_wide_intrin_sse41()
offers no advantage for 64-bit compiles and was removed from x86-64 part
of stream_encoder.c
Patch-from: lvqcl <lvqcl.mail@gmail.com>
rplaces
OutputDirectory="..\..\..\..\objs\debug\bin"
with
OutputDirectory="$(SolutionDir)objs\$(ConfigurationName)\bin
and so on.
Rmoves
OutputFile="..\..\objs\debug\lib\$(ProjectName).lib
when possible.
Also, in the current version "Whole program optimization" compiler option
is set, but the corresponding linker option isn't. From MSDN:
"If you do not explicitly specify /LTCG when you pass /GL or MSIL modules
to the linker, the linker eventually detects this and restarts the link
by using /LTCG. Explicitly specify /LTCG when you pass /GL and MSIL modules
to the linker for the fastest possible build performance."
So /LTCG option was added too.
Debug build now uses libogg_static.lib from .\objs\debug\lib folder.
(the dependency for both release and debug is
objs\$(ConfigurationName)\lib\libogg_static.lib)
Patch-from: lvqcl <lvqcl.mail@gmail.com>
* Splits lpc_x86intrin.c to lpc_intrin_sse.c and lpc_intrin_sse2.c
* Add FLAC__lpc_compute_residual_from_qlp_coefficients_intrin_sse2()
function to lpc_intrin_sse2.c
* Add lpc_intrin_sse41.c with two ..._wide_intrin_sse41() functions
(useful for 24-bit en-/decoding)
* Add precompute_partition_info_sums_intrin_sse2() / ...ssse3() and
disables precompute_partition_info_sums_32bit_asm_ia32_().
SSE2 version uses 4 SSE2 instructions instead of 1 SSSE3 instruction
PABSD so it is slightly slower.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
Before this patch it was possible to set or get data.ia32.sse3 value
from x86-64 code, etc which is a potential source of errors.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
A preprocessor macro FLAC__ALIGN_MALLOC_DATA is defined in the Makefiles
but absent in *.vcproj files. This patch adds it to libFLAC_static.vcproj
and libFLAC_dynamic.vcproj.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
For the 32 bit x86 ASM functions there were already versions of this
function for lags (N = 4, 8, 12). They require lpc_order less than N.
The best compression preset (flac -8) uses lpc_order up to 12; it
means that during encoding FLAC also uses unaccelerated C function.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
Signed-off-by: Erik de Castro Lopo <erikd@mega-nerd.com>
The previous fix (patch 6f7ec60c) had the undesireable effect of slowing
down encoding speed on 16 bit files where the arithmetic overflow was
less likely to happen.
This fix forces the use of a FLAC__uint64 accumulator for 24 bit files
and restores the use of a FLAC_uint32 accumulator for 16 (and less) bit
files.
Unfortunately, I have not been able to prove to myself that this overflow
*cannot* happen with 16 bit files.
For a specific 24 bit WAV file provided by Leigh Dyer
http://lists.xiph.org/pipermail/flac-dev/2013-July/004284.html
encoding with compression level 7 was generating a file a couple of
orders of magintude larger than the original.
Debugging showed that variable abs_residual_partition_sum (a FLAC__uint32)
in function precompute_partition_info_sums_() was suffering from an
arithmetic overflowing on some 24 bit input files although this value
overflowing did not always cause larger output files.
Since the value abs_residual_partition_sum is eventually stored in an
array of FLAC__uint64, it makes sense to make abs_residual_partition_sum
a FLAC__uint64 anyway.
Debugging this problem was made easier by use of the Clang compiler's
-fsanitize=integer option.
Don't use the assembly function since it seems to be slower than
the current version of FLAC__bitreader_read_rice_signed_block.
Signed-off-by: Erik de Castro Lopo <erikd@mega-nerd.com>
Use Benjamin Stiglitz' MIN macros from gcc 4.3 (according to the
changelog, __COUNTER__ was introduced in this version). Previously,
the macros weren't used on any existing gcc version; the first one
would have been 5.5.
Signed-off-by: Erik de Castro Lopo <erikd@mega-nerd.com>
Commits a7e3705d05 and
a4c321e492, while trying to simplify how
the FLAC_API_SUPPORTS_OGG_FLAC global variable was initialized,
inadvertently caused it to be always set to false, whether Ogg support
was compiled in or not.
This commit reverts the relevant part to how it looked in the 1.2.1
release, which is verbose but correct.
The problem was found by Robert Kausch <robert.kausch@freac.org>.
Signed-off-by: Erik de Castro Lopo <erikd@mega-nerd.com>
The smaller patch makes the utf-8 library use ANSI codepage by
default. When frontends call the "get_utf8_argv" function it
changes Unicode conversion codepage from ANSI to UTF-8.
Patch from Janne Hyvärinen <cse@sci.fi>.
Libraries that are used internally by libFLAC(++) but are not part of
their API should be listed in pkg-config "private" clauses. Otherwise
executables that are linked dynamically against libFLAC(++) will have
unneeded direct dependencies (overlinking).
Based on a patch by Brad Smith from
https://sourceforge.net/p/flac/bugs/397/
that I updated to only include ogg if libFLAC is actually built with
ogg support.
Signed-off-by: Erik de Castro Lopo <erikd@mega-nerd.com>