[PR #856] [CLOSED] Python: use a new output buffer code #1455

Closed
opened 2026-01-29 20:54:12 +00:00 by claunia · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/google/brotli/pull/856
Author: @ghost
Created: 10/17/2020
Status: Closed

Base: masterHead: block_output_buffer


📝 Commits (10+)

📊 Changes

2 files changed (+284 additions, -102 deletions)

View changed files

📝 python/_brotli.c (+260 -70)
📝 setup.py (+24 -32)

📄 Description

Currently, the output buffer is a std::vector<uint8_t>.
When the buffer grows, resizing will cause unnecessary memcpy().

This PR uses a list of bytes object to represent output buffer, can avoid the extra overhead of resizing.
In addition, C++ code can be removed, it's a pure C extension.

Please review the 11 commits one by one.

Benchmarks:

The first column is output date size, the unit is MB.
The second and third columns are consumed time, in seconds.

size before   after
0    0.00006  0.00001
10   0.02449  0.02165
20   0.04640  0.03691
30   0.06695  0.05128
40   0.08199  0.06622
50   0.10581  0.08062
60   0.12336  0.09513
70   0.15077  0.11034
80   0.16498  0.12552
90   0.17981  0.13893
100  0.21600  0.15383
110  0.22965  0.17016
120  0.24485  0.18513
130  0.26064  0.19902
140  0.30751  0.21165
150  0.32188  0.22795
160  0.34317  0.24160
170  0.35872  0.25647
180  0.36515  0.27271
190  0.38189  0.28756
200  0.39475  0.30176
The first column is output date size, the unit is KB.

size before   after
0    0.00672  0.00669
20   0.00007  0.00011
40   0.00010  0.00013
60   0.00016  0.00017
80   0.00032  0.00024
100  0.00034  0.00028
120  0.00041  0.00032
140  0.00053  0.00037
160  0.00048  0.00043
180  0.00054  0.00057
200  0.00052  0.00053
220  0.00056  0.00071
240  0.00062  0.00062
260  0.00081  0.00071
280  0.00085  0.00090
300  0.00093  0.00090
320  0.00093  0.00093
340  0.00098  0.00091
360  0.00104  0.00106
380  0.00103  0.00142
400  0.00105  0.00114
420  0.00121  0.00116
440  0.00118  0.00128
460  0.00123  0.00137
480  0.00122  0.00178
500  0.00148  0.00142

Benchmark code:

from time import perf_counter
from brotli import *

MB = 1024*1024
KB = 1024

for i in range(0, 200*MB+1, 10*MB):
    dat1 = i * b'a'
    dat2 = compress(dat1)
    
    t1 = perf_counter()
    dat3 = decompress(dat2)
    t2 = perf_counter()
    print(i//MB, '%.5f' % (t2-t1))
    
    assert dat1 == dat3
    
for i in range(0, 500*KB+1, 20*KB):
    dat1 = i * b'a'
    dat2 = compress(dat1)
    
    t1 = perf_counter()
    dat3 = decompress(dat2)
    t2 = perf_counter()
    print(i//KB, '%.5f' % (t2-t1))
    
    assert dat1 == dat3

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/google/brotli/pull/856 **Author:** [@ghost](https://github.com/ghost) **Created:** 10/17/2020 **Status:** ❌ Closed **Base:** `master` ← **Head:** `block_output_buffer` --- ### 📝 Commits (10+) - [`ca2664d`](https://github.com/google/brotli/commit/ca2664dad43afd37102f142b17f60ba8c29d5df9) 1. blocks output buffer - [`6702fef`](https://github.com/google/brotli/commit/6702fefc110005c5745cdb8bb1c01a01e27f7562) 2. brotli_decompress - [`99f0869`](https://github.com/google/brotli/commit/99f08694a6a1e52db6b1192874d6e13dad2d4725) 3. compress_stream - [`3e3c0d1`](https://github.com/google/brotli/commit/3e3c0d1f9c6e8db46897df7a2351ec37c454e96c) 4. brotli_Compressor_process - [`92252a8`](https://github.com/google/brotli/commit/92252a863203b97075f201610af8612b5cfba457) 5. brotli_Compressor_flush - [`b8ec122`](https://github.com/google/brotli/commit/b8ec12258ad101d3467747e001ecba99a46910ee) 6. brotli_Compressor_finish - [`6abf03c`](https://github.com/google/brotli/commit/6abf03c38e441ba9c724c66e9f3eed68f8131098) 7. decompress_stream - [`b4475b2`](https://github.com/google/brotli/commit/b4475b286992942634de31a2bae390ed5df5684e) 8. brotli_Decompressor_process - [`7c771a1`](https://github.com/google/brotli/commit/7c771a153ba0e7e0a17f613dc27b5c0f06b649b4) 9. brotli_Decompressor_is_finished - [`bb3ae22`](https://github.com/google/brotli/commit/bb3ae22194d2d3f0a702cd7598b64a379d6f8d3f) 10. no c++ code ### 📊 Changes **2 files changed** (+284 additions, -102 deletions) <details> <summary>View changed files</summary> 📝 `python/_brotli.c` (+260 -70) 📝 `setup.py` (+24 -32) </details> ### 📄 Description Currently, the output buffer is a `std::vector<uint8_t>`. When the buffer grows, resizing will cause unnecessary `memcpy()`. This PR uses a list of bytes object to represent output buffer, can avoid the extra overhead of resizing. In addition, C++ code can be removed, it's a pure C extension. Please review the 11 commits one by one. Benchmarks: ``` The first column is output date size, the unit is MB. The second and third columns are consumed time, in seconds. size before after 0 0.00006 0.00001 10 0.02449 0.02165 20 0.04640 0.03691 30 0.06695 0.05128 40 0.08199 0.06622 50 0.10581 0.08062 60 0.12336 0.09513 70 0.15077 0.11034 80 0.16498 0.12552 90 0.17981 0.13893 100 0.21600 0.15383 110 0.22965 0.17016 120 0.24485 0.18513 130 0.26064 0.19902 140 0.30751 0.21165 150 0.32188 0.22795 160 0.34317 0.24160 170 0.35872 0.25647 180 0.36515 0.27271 190 0.38189 0.28756 200 0.39475 0.30176 ``` ``` The first column is output date size, the unit is KB. size before after 0 0.00672 0.00669 20 0.00007 0.00011 40 0.00010 0.00013 60 0.00016 0.00017 80 0.00032 0.00024 100 0.00034 0.00028 120 0.00041 0.00032 140 0.00053 0.00037 160 0.00048 0.00043 180 0.00054 0.00057 200 0.00052 0.00053 220 0.00056 0.00071 240 0.00062 0.00062 260 0.00081 0.00071 280 0.00085 0.00090 300 0.00093 0.00090 320 0.00093 0.00093 340 0.00098 0.00091 360 0.00104 0.00106 380 0.00103 0.00142 400 0.00105 0.00114 420 0.00121 0.00116 440 0.00118 0.00128 460 0.00123 0.00137 480 0.00122 0.00178 500 0.00148 0.00142 ``` Benchmark code: ```python from time import perf_counter from brotli import * MB = 1024*1024 KB = 1024 for i in range(0, 200*MB+1, 10*MB): dat1 = i * b'a' dat2 = compress(dat1) t1 = perf_counter() dat3 = decompress(dat2) t2 = perf_counter() print(i//MB, '%.5f' % (t2-t1)) assert dat1 == dat3 for i in range(0, 500*KB+1, 20*KB): dat1 = i * b'a' dat2 = compress(dat1) t1 = perf_counter() dat3 = decompress(dat2) t2 = perf_counter() print(i//KB, '%.5f' % (t2-t1)) assert dat1 == dat3 ``` --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
claunia added the pull-request label 2026-01-29 20:54:12 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/brotli#1455