mirror of
https://github.com/google/brotli.git
synced 2026-04-24 23:22:00 +00:00
This needs to be reviewed immediately #369
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @FesterCluck on GitHub (Jan 21, 2021).
2a51a85aa8/js/decode.js (L2151)According to the stated usage of this dictionary, it's a lookup table used only during slow compression. Why would a lookup table need multiple entries of the same thing? To argue that the preceding characters factor in is to argue that
cartoonregistrCommonsMuslimsWhatis either an uncompressible string or common enough, and therefore a necessary string to include in this constant. One finds multiple entries of</html>and</script>in this constant, with and without whitespace surrounding.Aside code repos, most of the uses of this package seem to be detected malware.
I understand this isn't a smoking gun of exploit, and this is a very old piece of code. But could someone familiar with it review the use of this constant? It's included as a dependency in many npm packages.
@echlebek commented on GitHub (Apr 8, 2021):
We just had a user find these strings in our product's compiled binary, which they found pretty concerning. Can someone from the project please comment?
@eustas commented on GitHub (Apr 8, 2021):
Dictionary is a concatenation of 4..24 byte chunks. Some chunks might contain other chunks or even be different by the suffix / prefix / letter register.
We found it inappropriate to compress dictionary inside library, because there is no built-in compression codec in JS. It is considered that JS itself served in compressed form (with gzip/brotli).
@eustas commented on GitHub (Apr 8, 2021):
JS/CSS/HTML chunks inside dictionary have been generated on the base of large corpus of files taken from Internet in 2012-2014. If you like, I can provide utility that splits dictionary into natural chunks for easier inspection.
@Xe commented on GitHub (Apr 8, 2021):
This is part of the Brotli RFC for what it's worth. It's a huge common dictionary that allows Brotli to be so space-efficient.
@FesterCluck commented on GitHub (Apr 22, 2021):
I understand what it is, perhaps it's time to rerun the corpus? Quite a bit has changed in that time, and if you do a Google search to find products that use it, not only is the list short but the members therein largely malware.
I understand users can run their own corpus, but please consider the request. I'm betting Archive.org would be happy to help with the sample.
@eustas commented on GitHub (Apr 22, 2021):
That is impossible. Brotli static dictionary is specified in RFC as an integral part of compression scheme.
There are more than 2 billion of chrome installations and who knows how many in total overall, including other software. Each installation contains the copy of the dictionary and will not always work correctly if the dictionary changes.
On the bright side - there is "shared brotli" project that allows using a custom word dictionary.
@laserjobs commented on GitHub (Apr 14, 2022):
Here is the draft dictionary to compare, it is much smaller but does not compress as well
494c85cebb/brotli/brotlispec.txt@laserjobs commented on GitHub (Apr 14, 2022):
Googlebot scanned this after scanning the Brotli dictionary
http://....../images/identified%20by%20thenatural%20resourcesclassification%20ofcan%20be%20consideredquantum%20mechanicsNevertheless,%20themillion%20years%20ago%3C/body%3E%3C/html%3E%CE%95%CE%BB%CE%BB%CE%B7%CE%BD%CE%B9%CE%BA%CE%ACtake%20advantage%20ofand,%20according%20toattributed%20to%20theMicrosoft%20Windowsthe%20first%20centuryunder%20the%20controldiv%20class=
@FesterCluck commented on GitHub (May 5, 2022):
Please explain @laserjobs
@laserjobs commented on GitHub (May 5, 2022):
Just use your browser inspection panel and go to the dictionary, you should see the ping backs.
https://www.gstatic.com/b/d
@eustas commented on GitHub (Jan 4, 2023):
There is no extra formatting in dictionary to make it dense. The original dictionary looks like:
No need to worry about the "scary" character combinations.