This needs to be reviewed immediately #369

Closed
opened 2026-01-29 20:42:47 +00:00 by claunia · 11 comments
Owner

Originally created by @FesterCluck on GitHub (Jan 21, 2021).

2a51a85aa8/js/decode.js (L2151)

According to the stated usage of this dictionary, it's a lookup table used only during slow compression. Why would a lookup table need multiple entries of the same thing? To argue that the preceding characters factor in is to argue that cartoonregistrCommonsMuslimsWhat is either an uncompressible string or common enough, and therefore a necessary string to include in this constant. One finds multiple entries of </html> and </script> in this constant, with and without whitespace surrounding.

Aside code repos, most of the uses of this package seem to be detected malware.

I understand this isn't a smoking gun of exploit, and this is a very old piece of code. But could someone familiar with it review the use of this constant? It's included as a dependency in many npm packages.

Originally created by @FesterCluck on GitHub (Jan 21, 2021). https://github.com/google/brotli/blob/2a51a85aa86abb4c294c65fab57f3d9c69f10080/js/decode.js#L2151 According to the stated usage of this dictionary, it's a lookup table used only during slow compression. Why would a lookup table need multiple entries of the same thing? To argue that the preceding characters factor in is to argue that `cartoonregistrCommonsMuslimsWhat` is either an uncompressible string or common enough, and therefore a necessary string to include in this constant. One finds multiple entries of `</html>` and `</script>` in this constant, with and without whitespace surrounding. Aside code repos, most of the uses of this package seem to be detected malware. I understand this isn't a smoking gun of exploit, and this is a very old piece of code. But could someone familiar with it review the use of this constant? It's included as a dependency in many npm packages.
Author
Owner

@echlebek commented on GitHub (Apr 8, 2021):

We just had a user find these strings in our product's compiled binary, which they found pretty concerning. Can someone from the project please comment?

@echlebek commented on GitHub (Apr 8, 2021): We just had a user find these strings in our product's compiled binary, which they found pretty concerning. Can someone from the project please comment?
Author
Owner

@eustas commented on GitHub (Apr 8, 2021):

Dictionary is a concatenation of 4..24 byte chunks. Some chunks might contain other chunks or even be different by the suffix / prefix / letter register.

We found it inappropriate to compress dictionary inside library, because there is no built-in compression codec in JS. It is considered that JS itself served in compressed form (with gzip/brotli).

@eustas commented on GitHub (Apr 8, 2021): Dictionary is a concatenation of 4..24 byte chunks. Some chunks might contain other chunks or even be different by the suffix / prefix / letter register. We found it inappropriate to compress dictionary inside library, because there is no built-in compression codec in JS. It is considered that JS itself served in compressed form (with gzip/brotli).
Author
Owner

@eustas commented on GitHub (Apr 8, 2021):

JS/CSS/HTML chunks inside dictionary have been generated on the base of large corpus of files taken from Internet in 2012-2014. If you like, I can provide utility that splits dictionary into natural chunks for easier inspection.

@eustas commented on GitHub (Apr 8, 2021): JS/CSS/HTML chunks inside dictionary have been generated on the base of large corpus of files taken from Internet in 2012-2014. If you like, I can provide utility that splits dictionary into natural chunks for easier inspection.
Author
Owner

@Xe commented on GitHub (Apr 8, 2021):

This is part of the Brotli RFC for what it's worth. It's a huge common dictionary that allows Brotli to be so space-efficient.

@Xe commented on GitHub (Apr 8, 2021): This is part of the [Brotli RFC](https://tools.ietf.org/html/rfc7932#appendix-A) for what it's worth. It's a huge common dictionary that allows Brotli to be so space-efficient.
Author
Owner

@FesterCluck commented on GitHub (Apr 22, 2021):

This is part of the Brotli RFC for what it's worth. It's a huge common dictionary that allows Brotli to be so space-efficient.

I understand what it is, perhaps it's time to rerun the corpus? Quite a bit has changed in that time, and if you do a Google search to find products that use it, not only is the list short but the members therein largely malware.

I understand users can run their own corpus, but please consider the request. I'm betting Archive.org would be happy to help with the sample.

@FesterCluck commented on GitHub (Apr 22, 2021): > This is part of the [Brotli RFC](https://tools.ietf.org/html/rfc7932#appendix-A) for what it's worth. It's a huge common dictionary that allows Brotli to be so space-efficient. I understand what it is, perhaps it's time to rerun the corpus? Quite a bit has changed in that time, and if you do a Google search to find products that use it, not only is the list short but the members therein largely malware. I understand users can run their own corpus, but please consider the request. I'm betting Archive.org would be happy to help with the sample.
Author
Owner

@eustas commented on GitHub (Apr 22, 2021):

That is impossible. Brotli static dictionary is specified in RFC as an integral part of compression scheme.
There are more than 2 billion of chrome installations and who knows how many in total overall, including other software. Each installation contains the copy of the dictionary and will not always work correctly if the dictionary changes.

On the bright side - there is "shared brotli" project that allows using a custom word dictionary.

@eustas commented on GitHub (Apr 22, 2021): That is impossible. Brotli static dictionary is specified in RFC as an integral part of compression scheme. There are more than 2 billion of chrome installations and who knows how many in total overall, including other software. Each installation contains the copy of the dictionary and will not always work correctly if the dictionary changes. On the bright side - there is "shared brotli" project that allows using a custom word dictionary.
Author
Owner

@laserjobs commented on GitHub (Apr 14, 2022):

Here is the draft dictionary to compare, it is much smaller but does not compress as well
494c85cebb/brotli/brotlispec.txt

@laserjobs commented on GitHub (Apr 14, 2022): Here is the draft dictionary to compare, it is much smaller but does not compress as well https://chromium.googlesource.com/external/font-compression-reference/+/494c85cebbaaa0db345df69ffa1b639aa4652022/brotli/brotlispec.txt
Author
Owner
@laserjobs commented on GitHub (Apr 14, 2022): Googlebot scanned this after scanning the Brotli dictionary http://....../images/identified%20by%20thenatural%20resourcesclassification%20ofcan%20be%20consideredquantum%20mechanicsNevertheless,%20themillion%20years%20ago%3C/body%3E%3C/html%3E%CE%95%CE%BB%CE%BB%CE%B7%CE%BD%CE%B9%CE%BA%CE%ACtake%20advantage%20ofand,%20according%20toattributed%20to%20theMicrosoft%20Windowsthe%20first%20centuryunder%20the%20controldiv%20class=
Author
Owner
@FesterCluck commented on GitHub (May 5, 2022): > Googlebot scanned this after scanning the Brotli dictionary http://....../images/identified%20by%20thenatural%20resourcesclassification%20ofcan%20be%20consideredquantum%20mechanicsNevertheless,%20themillion%20years%20ago%3C/body%3E%3C/html%3E%CE%95%CE%BB%CE%BB%CE%B7%CE%BD%CE%B9%CE%BA%CE%ACtake%20advantage%20ofand,%20according%20toattributed%20to%20theMicrosoft%20Windowsthe%20first%20centuryunder%20the%20controldiv%20class= Please explain @laserjobs
Author
Owner
@laserjobs commented on GitHub (May 5, 2022): > > Googlebot scanned this after scanning the Brotli dictionary http://....../images/identified%20by%20thenatural%20resourcesclassification%20ofcan%20be%20consideredquantum%20mechanicsNevertheless,%20themillion%20years%20ago%3C/body%3E%3C/html%3E%CE%95%CE%BB%CE%BB%CE%B7%CE%BD%CE%B9%CE%BA%CE%ACtake%20advantage%20ofand,%20according%20toattributed%20to%20theMicrosoft%20Windowsthe%20first%20centuryunder%20the%20controldiv%20class= > > Please explain @laserjobs Just use your browser inspection panel and go to the dictionary, you should see the ping backs. https://www.gstatic.com/b/d
Author
Owner

@eustas commented on GitHub (Jan 4, 2023):

There is no extra formatting in dictionary to make it dense. The original dictionary looks like:

  "diamond", "use the", "airline", "end -->", ").attr(", "readers", "hosting",
  "#ffffff", "realize", "Vincent", "signals", " src=\"/", "Product", "despite",
  "diverse", "telling", "Public ", "held in", "Joseph ", "theatre", "affects",
  "<style>", "a large", "doesn't", "later, ", "Element", "favicon", "creator",
  "Hungary", "Airport", "see the", "so that", "Michael", "Systems", "Program",
  "s, and ", " width=", "e&quot;", "trading", "left\">\n", "persons", "Golden ",
  "Affairs", "grammar", "forming", "destroy", "idea of", "case of", "oldest ",
  "this is", ".src = ", "cartoon", "registr", "Commons", "Muslims", "What is",
  "in many", "marking", "reveals", "Indeed,", "equally", "/show_a", "outdoor",
  "escape(", "Austria", "genetic", "system,", "In the ", "sitting", "He also",
  "Islands", "Academy", "\n\t\t<!--", "Daniel ", "binding", "block\">",
  "imposed", "utilize", "Abraham", "(except", "{width:", "putting", ").html(",
  "|| [];\n", "DATA[ *", "kitchen", "mounted", "actual ", "dialect", "mainly ",
  "_blank'", "install", "experts", "if(type", "It also", "&copy; ", "\">Terms",
  "born in", "Options", "eastern", "talking", "concern", "gained ", "ongoing",
  "justify", "critics", "factory", "its own", "assault", "invited", "lasting",
  "his own", "href=\"/", "\" rel=\"", "develop", "concert", "diagram",
  "dollars", "cluster", "php?id=", "alcohol", ");})();", "using a", "><span>",

No need to worry about the "scary" character combinations.

@eustas commented on GitHub (Jan 4, 2023): There is no extra formatting in dictionary to make it dense. The original dictionary looks like: ``` "diamond", "use the", "airline", "end -->", ").attr(", "readers", "hosting", "#ffffff", "realize", "Vincent", "signals", " src=\"/", "Product", "despite", "diverse", "telling", "Public ", "held in", "Joseph ", "theatre", "affects", "<style>", "a large", "doesn't", "later, ", "Element", "favicon", "creator", "Hungary", "Airport", "see the", "so that", "Michael", "Systems", "Program", "s, and ", " width=", "e&quot;", "trading", "left\">\n", "persons", "Golden ", "Affairs", "grammar", "forming", "destroy", "idea of", "case of", "oldest ", "this is", ".src = ", "cartoon", "registr", "Commons", "Muslims", "What is", "in many", "marking", "reveals", "Indeed,", "equally", "/show_a", "outdoor", "escape(", "Austria", "genetic", "system,", "In the ", "sitting", "He also", "Islands", "Academy", "\n\t\t<!--", "Daniel ", "binding", "block\">", "imposed", "utilize", "Abraham", "(except", "{width:", "putting", ").html(", "|| [];\n", "DATA[ *", "kitchen", "mounted", "actual ", "dialect", "mainly ", "_blank'", "install", "experts", "if(type", "It also", "&copy; ", "\">Terms", "born in", "Options", "eastern", "talking", "concern", "gained ", "ongoing", "justify", "critics", "factory", "its own", "assault", "invited", "lasting", "his own", "href=\"/", "\" rel=\"", "develop", "concert", "diagram", "dollars", "cluster", "php?id=", "alcohol", ");})();", "using a", "><span>", ``` No need to worry about the "scary" character combinations.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/brotli#369