[PR #305] [MERGED] Emoji and abbreviations parser #933

New Issue

claunia · 2026-01-29T14:47:28Z

claunia commented

2026-01-29 14:47:28 +00:00

📋 Pull Request Information

Original PR: https://github.com/xoofx/markdig/pull/305
Author: @MihaZupan
Created: 2/6/2019
Status: ✅ Merged
Merged: 2/8/2019
Merged by: @xoofx

Base: master ← Head: emoji-and-abbreviations-parser

📝 Commits (8)

b15b050 Allow single-char abbreviations
ca38da5 Cross target NetCore 2.1
d854b0b Port CompactPrefixTree to Markdig
325495a Improve EmojiParser memory performance
ef452c2 Fix Abbreviations parser's one-char handling
18e9486 Remove TextMatchHelper
a11676e Add test case for Extension - Header sections (#296)
b5293b9 Comment-out TextMatcher test in Benchmarks

📊 Changes

9 files changed (+1383 additions, -257 deletions)

View changed files

📝 src/Markdig.Benchmarks/TestMatchPerf.cs (+3 -2)
📝 src/Markdig.Tests/Specs/AbbreviationSpecs.cs (+40 -1)
📝 src/Markdig.Tests/Specs/AbbreviationSpecs.md (+21 -0)
📝 src/Markdig/Extensions/Abbreviations/AbbreviationParser.cs (+38 -54)
📝 src/Markdig/Extensions/Emoji/EmojiParser.cs (+75 -72)
➕ src/Markdig/Helpers/CompactPrefixTree.cs (+1110 -0)
➖ src/Markdig/Helpers/TextMatcher.cs (+0 -127)
➕ src/Markdig/Helpers/ThrowHelper.cs (+90 -0)
📝 src/Markdig/Markdig.csproj (+6 -1)

📄 Description

Fixes #296
Fewer memory allocations:

Building a new pipeline:

Method	Mean	Gen 0/1k Op	Gen 1/1k Op	Allocated Memory/Op
Markdig	12.05 us	19.1498	-	14.72 KB
Markdig_Advanced	42.11 us	45.1660	-	34.75 KB
Markdig_Advanced_Emoji	1,490.56 us	285.1563	130.8594	1502.8 KB
Markdig_Advanced_Emoji_Modified	1,499.65 us	298.8281	126.9531	1502.77 KB

(new) Method	Mean	Gen 0/1k Op	Gen 1/1k Op	Allocated Memory/Op
Markdig	12.04 us	19.1498	-	14.72 KB
Markdig_Advanced	41.97 us	45.1660	-	34.75 KB
Markdig_Advanced_Emoji	194.58 us	76.6602	15.3809	134.75 KB
Markdig_Advanced_Emoji_Modified	252.09 us	85.4492	18.5547	162.05 KB

Where Modified forces the lazy-init of dictionary properties.

Parsing speed for emojis and abbreviations is about the same (~10% faster for emojis),
as the dataset could be considered the worst-case for a prefix tree of this type (every input starts with the same character).

_{🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.}

## 📋 Pull Request Information **Original PR:** https://github.com/xoofx/markdig/pull/305 **Author:** [@MihaZupan](https://github.com/MihaZupan) **Created:** 2/6/2019 **Status:** ✅ Merged **Merged:** 2/8/2019 **Merged by:** [@xoofx](https://github.com/xoofx) **Base:** `master` ← **Head:** `emoji-and-abbreviations-parser` --- ### 📝 Commits (8) - [`b15b050`](https://github.com/xoofx/markdig/commit/b15b05015e6dff9a39147191d42a45156cd90ce0) Allow single-char abbreviations - [`ca38da5`](https://github.com/xoofx/markdig/commit/ca38da576ece9cad50e7c1c36f750807e119eaa4) Cross target NetCore 2.1 - [`d854b0b`](https://github.com/xoofx/markdig/commit/d854b0b941fadeb2a7eb9fc62410e7ea88d6bda3) Port CompactPrefixTree to Markdig - [`325495a`](https://github.com/xoofx/markdig/commit/325495a3676f0d8d56572e66613775a16c4b6550) Improve EmojiParser memory performance - [`ef452c2`](https://github.com/xoofx/markdig/commit/ef452c292c1ffdb41b27f0274f9c7e672ddbf776) Fix Abbreviations parser's one-char handling - [`18e9486`](https://github.com/xoofx/markdig/commit/18e9486c95f3931d6b95215b0332d28b311d499b) Remove TextMatchHelper - [`a11676e`](https://github.com/xoofx/markdig/commit/a11676ed7eebaf10b6acb71ef3e13f1f4cbe5c04) Add test case for #296 - [`b5293b9`](https://github.com/xoofx/markdig/commit/b5293b907f580d72c1a544a53608721c4c973391) Comment-out TextMatcher test in Benchmarks ### 📊 Changes **9 files changed** (+1383 additions, -257 deletions) <details> <summary>View changed files</summary> 📝 `src/Markdig.Benchmarks/TestMatchPerf.cs` (+3 -2) 📝 `src/Markdig.Tests/Specs/AbbreviationSpecs.cs` (+40 -1) 📝 `src/Markdig.Tests/Specs/AbbreviationSpecs.md` (+21 -0) 📝 `src/Markdig/Extensions/Abbreviations/AbbreviationParser.cs` (+38 -54) 📝 `src/Markdig/Extensions/Emoji/EmojiParser.cs` (+75 -72) ➕ `src/Markdig/Helpers/CompactPrefixTree.cs` (+1110 -0) ➖ `src/Markdig/Helpers/TextMatcher.cs` (+0 -127) ➕ `src/Markdig/Helpers/ThrowHelper.cs` (+90 -0) 📝 `src/Markdig/Markdig.csproj` (+6 -1) </details> ### 📄 Description Fixes #296 Fewer memory allocations: Building a new pipeline: | Method | Mean | Gen 0/1k Op | Gen 1/1k Op | Allocated Memory/Op | |-------------------------------- |------------:|------------:|------------:|--------------------:| | Markdig | 12.05 us | 19.1498 | - | 14.72 KB | | Markdig_Advanced | 42.11 us | 45.1660 | - | 34.75 KB | | Markdig_Advanced_Emoji | 1,490.56 us | 285.1563 | 130.8594 | 1502.8 KB | | Markdig_Advanced_Emoji_Modified | 1,499.65 us | 298.8281 | 126.9531 | 1502.77 KB | | (new) Method | Mean | Gen 0/1k Op | Gen 1/1k Op | Allocated Memory/Op | |-------------------------------- |------------:|------------:|------------:|--------------------:| | Markdig | 12.04 us | 19.1498 | - | 14.72 KB | | Markdig_Advanced | 41.97 us | 45.1660 | - | 34.75 KB | | Markdig_Advanced_Emoji | 194.58 us | 76.6602 | 15.3809 | 134.75 KB | | Markdig_Advanced_Emoji_Modified | 252.09 us | 85.4492 | 18.5547 | 162.05 KB | Where `Modified` forces the lazy-init of dictionary properties. Parsing speed for emojis and abbreviations is about the same (~10% faster for emojis), as the dataset could be considered the worst-case for a prefix tree of this type (every input starts with the same character). --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>

claunia added the pull-request label 2026-01-29 14:47:28 +00:00

Sign in to join this conversation.

Branches Tags

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: starred/markdig#933