Commit Graph

982 Commits

Author SHA1 Message Date
Erik de Castro Lopo
ace63cc828 stream_encoder.c : ifdef cleanup.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
2014-02-25 18:38:20 +11:00
Erik de Castro Lopo
b334fb2a5c Fix typos in comments.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
2014-02-24 21:47:20 +11:00
Erik de Castro Lopo
cf0e42ae6e Don't use intrinsics when they are slower.
More thorough en-/decoding tests show that sometimes the functions
that use intrinsics are slower (or not really faster) than old
plain C functions.

After this patch the encoder doesn't use these new functions
when their usefulness is questionable.

Patch-from: lvqcl <lvqcl.mail@gmail.com>
2014-02-24 21:46:05 +11:00
Erik de Castro Lopo
71c9555366 bitmath.h : Fixes for FLAC__bitmath_ilog2_wide().
Existing version had a number of problems:
1) it didn't compile with MSVS
2) it returned correct results only when compiles with GNUC
3) it mentioned LGPL which isn't good for a BSD-licensed library

LGPL -> BSD issue documented here:
http://lists.xiph.org/pipermail/flac-dev/2013-September/004356.html

Patch-from: lvqcl <lvqcl.mail@gmail.com>
2014-02-02 10:42:20 +11:00
Erik de Castro Lopo
26b9546149 Add sse2 intrinscics code for lpc_restore_signal_...()
The new functions are analogous to FLAC__lpc_restore_signal_asm_ia32_mmx.
FLAC uses them for x86-64 arch and also for ia32 if NASM is not available.

Patch-from: lvqcl <lvqcl.mail@gmail.com>
2014-02-02 08:55:56 +11:00
Erik de Castro Lopo
d163ef4567 libFLAC/stream_encoder.c : Fall back to intrinsics if NASM is not available.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
2014-02-01 20:34:55 +11:00
Erik de Castro Lopo
59cfca0030 stream_encoder : Remove un-needed conversion from __m128i to FLAC__uint64.
Encoding speed slightly increased (1...2% for FLAC -8).

Patch-from: lvqcl <lvqcl.mail@gmail.com>
2014-01-31 20:54:59 +11:00
Erik de Castro Lopo
4618512de2 Add a fast shift for int64 values.
This patch changes the code from:
	(FLAC__int32)(xmm.m128i_i64[0] >> lp_quantization)
into:
	_mm_cvtsi128_si32(_mm_srli_epi64(xmm, lp_quantization));

Encoding of 24-bit .wav files with 32-bit FLAC became noticeably faster.

Patch-from: lvqcl <lvqcl.mail@gmail.com>
2014-01-31 20:36:23 +11:00
Erik de Castro Lopo
a03999f570 lpc_intrin_sse2.c : Add RESIDUAL16_RESULT macro.
RESIDUAL16_RESULT is analogous to the existing RESIDUAL_RESULT macro
and simplifies the code a little.

Patch-from: lvqcl <lvqcl.mail@gmail.com>
2014-01-30 22:17:08 +11:00
Erik de Castro Lopo
1d920993f1 Remove redundant inline macro def.
The inline macro already exists in include/share/compat.h.

Patch-from: lvqcl <lvqcl.mail@gmail.com>
2014-01-30 21:57:21 +11:00
Erik de Castro Lopo
57297eea26 Add __INTEL_COMPILER to _MSC_VER #ifdefs.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
2014-01-30 21:53:41 +11:00
Erik de Castro Lopo
d40e986a1e Add FLAC__SSE_SUPPORTED and FLAC__SSE2_SUPPORTED flags.
* Allow compiling using GCC GCC w/o SSE support.
* Allow SSE4.1 intrinsic functions to be enabled.

Patch-from: lvqcl <lvqcl.mail@gmail.com>
2014-01-30 21:49:55 +11:00
Erik de Castro Lopo
c2747bec1c lpc_asm.nasm : More 'mov cl' -> 'mov ecx' fixes.
According to Agner Fog in optimizing_assembly.pdf:

  "... write to a partial register may result in false dependencies
   between instructions, so it is better to avoid it."

Patch-from: lvqcl <lvqcl.mail@gmail.com>
2014-01-18 07:55:19 +11:00
Erik de Castro Lopo
7e9278934e libFLAC : Add asm versions for two _wide() functions.
GCC generates slow ia32 code for FLAC__lpc_restore_signal_wide() and
FLAC__lpc_compute_residual_from_qlp_coefficients_wide() so 24-bit
encoding/decoding is slower for GCC compile than for MSVS or ICC
compile. This patch adds ia32 asm versions of these functions.

Patch-from: lvqcl <lvqcl.mail@gmail.com>
2014-01-07 21:35:08 +11:00
Erik de Castro Lopo
8e4a45ac86 libFLAC/ia32/lpc_asm.nasm : Match calls and returns.
According to Agner Fog, "...you must make sure that all calls
are matched with returns. Never jump out of a subroutine without
a return and never use a return as an indirect jump."

(see paragraph 3.15 in microarchitecture.pdf and
examples 3.5a and 3.5b in optimizing_assembly.pdf)

Patch-from: lvqcl <lvqcl.mail@gmail.com>
2014-01-07 21:27:09 +11:00
Erik de Castro Lopo
6cd8b42438 Add FLAC__ prefix to precompute_partition_info_sums....
Most non-static functions have FLAC__ prefix, but they were missing
from the precompute_partition_info_sums_* functions.

Patch-from: lvqcl <lvqcl.mail@gmail.com>
2014-01-07 21:27:00 +11:00
Gustavo Zacarias
d65ede3e87 Fix Makefile.am altivec logic
Besides SPE (FSL e500v? cores) there are other powerpc processors
that don't support altivec instructions so only enable them when it's
100% sure that the target has it.

Signed-off-by: Gustavo Zacarias <gustavo@zacarias.com.ar>
Signed-off-by: Erik de Castro Lopo <erikd@mega-nerd.com>
2013-12-20 05:57:33 +11:00
Erik de Castro Lopo
64f34e6e99 libFLAC/stream_encoder.c : Fix MSVS profiler hot spot.
Patch-from: vqcl <lvqcl.mail@gmail.com>
2013-10-10 21:32:07 +11:00
Erik de Castro Lopo
cf28c0144b Adds use of restrict keyword to improve encoding speed.
Restrict works very poorly in Visual Studio (much slower than without)
so defined flac_restrict in share/compat.h and use that in:

    lpc_compute_residual...()
    lpc_restore_signal...()

As a result, FLAC__lpc_compute_residual_from_qlp_coefficients_wide_intrin_sse41()
offers no advantage for 64-bit compiles and was removed from x86-64 part
of stream_encoder.c

Patch-from: lvqcl <lvqcl.mail@gmail.com>
2013-10-10 18:24:19 +11:00
Erik de Castro Lopo
a1abfa3df2 Vcproj file updates.
rplaces
     OutputDirectory="..\..\..\..\objs\debug\bin"
with
     OutputDirectory="$(SolutionDir)objs\$(ConfigurationName)\bin
and so on.

Rmoves
     OutputFile="..\..\objs\debug\lib\$(ProjectName).lib
when possible.

Also, in the current version "Whole program optimization" compiler option
is set, but the corresponding linker option isn't. From MSDN:
   "If you do not explicitly specify /LTCG when you pass /GL or MSIL modules
   to the linker, the linker eventually detects this and restarts the link
   by using /LTCG. Explicitly specify /LTCG when you pass /GL and MSIL modules
   to the linker for the fastest possible build performance."
So /LTCG option was added too.

Debug build now uses libogg_static.lib from .\objs\debug\lib folder.
(the dependency for both release and debug is
    objs\$(ConfigurationName)\lib\libogg_static.lib)

Patch-from: lvqcl <lvqcl.mail@gmail.com>
2013-10-04 13:50:01 +10:00
Erik de Castro Lopo
ecd0acba75 Improve x86 instrinsic implementation.
* Splits lpc_x86intrin.c to lpc_intrin_sse.c and lpc_intrin_sse2.c
* Add FLAC__lpc_compute_residual_from_qlp_coefficients_intrin_sse2()
  function to lpc_intrin_sse2.c
* Add lpc_intrin_sse41.c with two ..._wide_intrin_sse41() functions
  (useful for 24-bit en-/decoding)
* Add precompute_partition_info_sums_intrin_sse2() / ...ssse3() and
  disables precompute_partition_info_sums_32bit_asm_ia32_().
  SSE2 version uses 4 SSE2 instructions instead of 1 SSSE3 instruction
  PABSD so it is slightly slower.

Patch-from: lvqcl <lvqcl.mail@gmail.com>
2013-10-04 01:41:48 +10:00
Erik de Castro Lopo
bd6a920e40 Add FLAC__HAS_X86INTRIN to vcproj files.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
2013-09-27 03:10:37 +10:00
Erik de Castro Lopo
31a79d7e9a Move M_PI definition to include/share/compat.h.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
2013-09-27 03:05:06 +10:00
Erik de Castro Lopo
4a78cd4e4c Remove union data from FLAC__CPUInfo.
Before this patch it was possible to set or get data.ia32.sse3 value
from x86-64 code, etc which is a potential source of errors.

Patch-from: lvqcl <lvqcl.mail@gmail.com>
2013-09-25 23:07:46 +10:00
Erik de Castro Lopo
8fe2c23e31 Add SSE4.1/SSE4.2 detection.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
2013-09-25 23:05:17 +10:00
Erik de Castro Lopo
ae4d720417 Fix/re-enable SSE/SSE2 lpc optimisations. 2013-09-17 06:14:50 +10:00
Erik de Castro Lopo
bd9770ffd1 Only allow SSE2 intrinsics for x86_64. 2013-09-15 19:37:53 +10:00
Erik de Castro Lopo
0752740d8d src/libFLAC/lpc.c : Fix compiler warning. 2013-09-15 10:29:19 +10:00
Erik de Castro Lopo
e07bd181b1 lpc_x86intrin.c : Tweaks.
Include <config.h> before trying to use values defined in that file.
Fix compiler warnings.
2013-09-15 10:29:19 +10:00
Erik de Castro Lopo
5e5ee2720c Adds SSE-accelerated lpc functions.
New functions are:
    FLAC__lpc_compute_autocorrelation_intrin_sse_lag_4()
    FLAC__lpc_compute_autocorrelation_intrin_sse_lag_8()
    FLAC__lpc_compute_autocorrelation_intrin_sse_lag_12()
    FLAC__lpc_compute_autocorrelation_intrin_sse_lag_16()
    FLAC__lpc_compute_residual_from_qlp_coefficients_16_intrin_sse2()

Patch-from: lvqcl <lvqcl.mail@gmail.com>
2013-09-15 10:29:19 +10:00
Erik de Castro Lopo
84c3e3d52c Add CPU features (sse3, ssse3) detection code for x86-64.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
2013-09-15 09:46:20 +10:00
Erik de Castro Lopo
d11c66ffce bitmath.h : Minor improvements.
This is part of a larger patch from lvqcl.

Patch-from: lvqcl <lvqcl.mail@gmail.com>
2013-09-08 12:15:57 +10:00
Erik de Castro Lopo
ce6832bb62 Move defintion of M_LN2 to include/share/compat.h. 2013-09-07 22:00:23 +10:00
Erik de Castro Lopo
c532d34c11 MSVS : Define _USE_MATH_DEFINES.
MSVS does defined the M_LN2 constant in <math.h> but only makes it
visible if _USE_MATH_DEFINES is defined.

Patch-from: lvqcl <lvqcl.mail@gmail.com>
2013-09-07 22:00:23 +10:00
Erik de Castro Lopo
9e392706c9 MSVS : Add FLAC__ALIGN_MALLOC_DATA definition for MSVS projects.
A preprocessor macro FLAC__ALIGN_MALLOC_DATA is defined in the Makefiles
but absent in *.vcproj files. This patch adds it to libFLAC_static.vcproj
and libFLAC_dynamic.vcproj.

Patch-from: lvqcl <lvqcl.mail@gmail.com>
2013-09-07 21:58:11 +10:00
Erik de Castro Lopo
deb209906c Add ASM function FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16.
For the 32 bit x86 ASM functions there were already versions of this
function for lags (N = 4, 8, 12). They require lpc_order less than N.
The best compression preset (flac -8) uses lpc_order up to 12; it
means that during encoding FLAC also uses unaccelerated C function.

Patch-from: lvqcl <lvqcl.mail@gmail.com>
Signed-off-by: Erik de Castro Lopo <erikd@mega-nerd.com>
2013-08-31 13:53:37 +10:00
Erik de Castro Lopo
740eb68f53 src/libFLAC/cpu.c : Remove MSVC6 only code.
Patch from: lvqcl <lvqcl.mail@gmail.com>
2013-08-26 21:45:19 +10:00
Erik de Castro Lopo
3cea079a2f Fix a couple of NASM warnings.
Suggested by Ozkan Sezer <sezeroz@gmail.com>.
2013-08-13 19:30:24 +10:00
Erik de Castro Lopo
7050033b8f src/libFLAC/ia32/nasm.h : Fix nasm warning on windows.
Patch from Ozkan Sezer <sezeroz@gmail.com>.
2013-08-13 19:30:21 +10:00
Erik de Castro Lopo
187e596e4c stream_encoder.c : Improve fix for arithmetic overflow.
Only use the 32 bit accumulator if the input data is 16 bits or less.
2013-08-02 06:21:02 +10:00
Erik de Castro Lopo
f34f31dac0 stream_encoder.c : Improve fix for arithmetic overflow.
The previous fix (patch 6f7ec60c) had the undesireable effect of slowing
down encoding speed on 16 bit files where the arithmetic overflow was
less likely to happen.

This fix forces the use of a FLAC__uint64 accumulator for 24 bit files
and restores the use of a FLAC_uint32 accumulator for 16 (and less) bit
files.

Unfortunately, I have not been able to prove to myself that this overflow
*cannot* happen with 16 bit files.
2013-07-21 21:05:31 +10:00
Cristian Rodríguez
355f4aae47 Link with -no-undefined regardless of the OS
libFLAC* must never have undefined symbols no matter
what is the target platform.

Signed-off-by: Erik de Castro Lopo <erikd@mega-nerd.com>
2013-07-21 19:51:08 +10:00
Erik de Castro Lopo
33c23b3dfa src/libFLAC/bitwriter.c : Remove dead code. 2013-07-17 20:18:46 +10:00
Erik de Castro Lopo
6f7ec60c7e stream_encoder.c : Fix an arithmetic overflow in the RICE2 partitioner.
For a specific 24 bit WAV file provided by Leigh Dyer

    http://lists.xiph.org/pipermail/flac-dev/2013-July/004284.html

encoding with compression level 7 was generating a file a couple of
orders of magintude larger than the original.

Debugging showed that variable abs_residual_partition_sum (a FLAC__uint32)
in function precompute_partition_info_sums_() was suffering from an
arithmetic overflowing on some 24 bit input files although this value
overflowing did not always cause larger output files.

Since the value abs_residual_partition_sum is eventually stored in an
array of FLAC__uint64, it makes sense to make abs_residual_partition_sum
a FLAC__uint64 anyway.

Debugging this problem was made easier by use of the Clang compiler's
-fsanitize=integer option.
2013-07-17 19:42:12 +10:00
Miroslav Lichvar
4eab6313cd Disable FLAC__bitreader_read_rice_signed_block_asm_ia32_bswap.
Don't use the assembly function since it seems to be slower than
the current version of FLAC__bitreader_read_rice_signed_block.

Signed-off-by: Erik de Castro Lopo <erikd@mega-nerd.com>
2013-06-06 19:57:21 +10:00
Dagobert Michelsen
349c6adcf7 Sun Studio can not include static function from extern inline
Signed-off-by: Erik de Castro Lopo <erikd@mega-nerd.com>
2013-05-27 18:06:51 +10:00
Erik de Castro Lopo
b1982fbc5f Set version to 1.3.0 and update coyprights throughout. 2013-05-26 19:17:53 +10:00
Ulrich Klauer
d672efaa05 Fix gcc version check for private macros
Use Benjamin Stiglitz' MIN macros from gcc 4.3 (according to the
changelog, __COUNTER__ was introduced in this version). Previously,
the macros weren't used on any existing gcc version; the first one
would have been 5.5.

Signed-off-by: Erik de Castro Lopo <erikd@mega-nerd.com>
2013-05-26 08:26:45 +10:00
Robert Kausch
411ba53c7b bitwriter.c : Add missing "extern" declaration
Signed-off-by: Erik de Castro Lopo <erikd@mega-nerd.com>
2013-05-26 07:42:22 +10:00
Robert Kausch
bb79a59a9f Fix mistyped variable name
Signed-off-by: Erik de Castro Lopo <erikd@mega-nerd.com>
2013-05-26 07:42:19 +10:00