Commit Graph

15 Commits

Author SHA1 Message Date
Erik de Castro Lopo
94a61241b0 libFLAC: Add a workaround for a bug in MSVC2105 update2
MSVC2105 update2 compiles the C code:

    abs_residual_partition_sums[partition] =
                  (FLAC__uint32)_mm_cvtsi128_si32(mm_sum);

into this:

    movq    QWORD PTR [rsi], xmm2

while it should be:

    movd    eax, xmm2
    mov     QWORD PTR [rsi], rax

With this patch, MSVC emits:

    movq    QWORD PTR [rsi], xmm2
    mov     DWORD PTR [rsi+4], r9d

so the price of this workaround is 1 extra write instruction per
partition.

Patch-from: lvqcl <lvqcl.mail@gmail.com>
2016-05-05 17:23:52 +10:00
Erik de Castro Lopo
2319a688ec libFLAC/stream_encoder_intrin_*.c: More refactoring
Combine two intrinsic instructions into one line of code.

Patch-from: lvqcl <lvqcl.mail@gmail.com>
2015-11-18 19:24:48 +11:00
Erik de Castro Lopo
86b36d92d5 libFLAC: Refactoring
No functional changes.

Patch-from: lvqcl <lvqcl.mail@gmail.com>
2015-11-03 18:08:56 +11:00
Erik de Castro Lopo
1437391577 Update copyright years to include 2014. 2014-11-25 13:04:30 +11:00
Erik de Castro Lopo
6abc480387 stream_encoder_intrin_sse[23].c : Optimize of int32 -> uint64 conversion.
Optimizes int32 -> uint64 conversion by doing zero extension (int32 ->
uint32 -> uint64) instead of sign extension (int32 -> int64 -> uint64).

Patch-from: lvqcl <lvqcl.mail@gmail.com>
2014-09-21 08:48:20 +10:00
Miroslav Lichvar
f081524c19 stream_encoder : Improve selection of residual accumulator width
In the precompute_partition_info_sums_ function, instead of selecting
64-bit accumulator when the signal bps is larger than 16, revert to the
original approach based on partition size, but make room for few extra
bits to not overflow with unusual signals where the average residual
magnitude may be larger than bps.

It slightly improves the performance with standard encoding levels and
16-bit files as the 17-bit side channel can still be processed with the
32-bit accumulator and correctly selects the 64-bit accumulator with
very large 16-bit partitions.

This is related to commits 6f7ec60c and 187e596e.

Signed-off-by: Erik de Castro Lopo <erikd@mega-nerd.com>
2014-07-04 21:22:44 +10:00
Erik de Castro Lopo
b8d58e327c Revert "Replace FLAC__CPU_X86_64 with FLaC__CPU_X86_64."
This reverts commit 151739921b.

This patch only when part way to replace all FLAC_* with FLaC_*
and its really not worth going all the way.
2014-06-15 20:29:34 +10:00
Erik de Castro Lopo
151739921b Replace FLAC__CPU_X86_64 with FLaC__CPU_X86_64.
Previous autorconf versions had problems with variable begining witj
'FLAC_' (autoconf uses 'AC_').

Reported-by: lvqcl <lvqcl.mail@gmail.com>
2014-06-01 17:33:54 +10:00
Erik de Castro Lopo
006b8356d5 Fix all instances of '#if HAVE_CONFIG_H'.
Should be '#ifdef HAVE_CONFIG_H'.

Closes: https://sourceforge.net/p/flac/bugs/410/
2014-03-24 12:06:49 +11:00
Erik de Castro Lopo
d36ef6298b Whitespace.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
2014-03-14 15:33:11 +11:00
Erik de Castro Lopo
59cfca0030 stream_encoder : Remove un-needed conversion from __m128i to FLAC__uint64.
Encoding speed slightly increased (1...2% for FLAC -8).

Patch-from: lvqcl <lvqcl.mail@gmail.com>
2014-01-31 20:54:59 +11:00
Erik de Castro Lopo
57297eea26 Add __INTEL_COMPILER to _MSC_VER #ifdefs.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
2014-01-30 21:53:41 +11:00
Erik de Castro Lopo
d40e986a1e Add FLAC__SSE_SUPPORTED and FLAC__SSE2_SUPPORTED flags.
* Allow compiling using GCC GCC w/o SSE support.
* Allow SSE4.1 intrinsic functions to be enabled.

Patch-from: lvqcl <lvqcl.mail@gmail.com>
2014-01-30 21:49:55 +11:00
Erik de Castro Lopo
6cd8b42438 Add FLAC__ prefix to precompute_partition_info_sums....
Most non-static functions have FLAC__ prefix, but they were missing
from the precompute_partition_info_sums_* functions.

Patch-from: lvqcl <lvqcl.mail@gmail.com>
2014-01-07 21:27:00 +11:00
Erik de Castro Lopo
ecd0acba75 Improve x86 instrinsic implementation.
* Splits lpc_x86intrin.c to lpc_intrin_sse.c and lpc_intrin_sse2.c
* Add FLAC__lpc_compute_residual_from_qlp_coefficients_intrin_sse2()
  function to lpc_intrin_sse2.c
* Add lpc_intrin_sse41.c with two ..._wide_intrin_sse41() functions
  (useful for 24-bit en-/decoding)
* Add precompute_partition_info_sums_intrin_sse2() / ...ssse3() and
  disables precompute_partition_info_sums_32bit_asm_ia32_().
  SSE2 version uses 4 SSE2 instructions instead of 1 SSSE3 instruction
  PABSD so it is slightly slower.

Patch-from: lvqcl <lvqcl.mail@gmail.com>
2013-10-04 01:41:48 +10:00