Compare commits

...

99 Commits

Author SHA1 Message Date
Martijn Laarman
d47fbc757f Optimize PipeTable parsing: O(n²) → O(n) for 3.7x–85x speedup, enables 10K+ row tables (#922)
* Optimize PipeTable parsing: O(n²) → O(n) for large tables

Pipe tables were creating deeply nested tree structures where each pipe
delimiter contained all subsequent content as children, causing O(n²)
traversal complexity for n cells. This change restructures the parser to
use a flat sibling-based structure, treating tables as matrices rather
than nested trees.

Key changes:
- Set IsClosed=true on PipeTableDelimiterInline to prevent nesting
- Add PromoteNestedPipesToRootLevel() to flatten pipes nested in emphasis
- Update cell boundary detection to use sibling traversal
- Move EmphasisInlineParser before PipeTableParser in processing order
- Fix EmphasisInlineParser to continue past IsClosed delimiters
- Add ContainsParentOrSiblingOfType<T>() helper for flat structure detection

Performance improvements (measured on typical markdown content):

| Rows | Before    | After   | Speedup |
|------|-----------|---------|---------|
| 100  | 542 μs    | 150 μs  | 3.6x    |
| 500  | 23,018 μs | 763 μs  | 30x     |
| 1000 | 89,418 μs | 1,596 μs| 56x     |
| 1500 | 201,593 μs| 2,740 μs| 74x     |
| 5000 | CRASH     | 10,588 μs| ∞      |
| 10000| CRASH     | 18,551 μs| ∞      |

Tables with 5000+ rows previously crashed due to stack overflow from
recursive depth. They now parse successfully with linear time complexity.

* remove baseline results file

* Do not use System.Index and fix nullabillity checks for older platforms
2026-01-30 22:05:18 +01:00
prozolic
3602433b84 Replace null checks with IsEmpty property for ReadOnlySpan<char> (#916)
This change suppresses CA2265 warnings.
2026-01-30 22:01:50 +01:00
prozolic
1bac4afc9b Use Dictionary.TryAdd instead of ContainsKey and indexer by reducing lookups. (#917)
* Use Dictionary.TryAdd instead of ContainsKey and indexer by reducing lookups.

* Update src/Markdig/Parsers/ParserList.cs

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-30 22:01:27 +01:00
Tatsunori Uchino
a89056d961 Recognize supplementary characters (#913)
* Recognize supplementary characters

* Internatize Rune

* Fix failing tests

* Fix extra comment error

* Remove extra local variable c

* Reorganize classes around Rune

* Prepare both Rune and char variants / make Rune variant public for .NET

* Make APIs in StringSlice.cs public only in modern .NET

* Throw exception if cannot obtain first Rune

* Add comments

* Add comment on PeekRuneExtra

* Use `Rune.TryCreate`

* Remove backtrack

* Fix parameter name in XML comment

* Don't throw when error in `Rune.DecodeFromUtf16`

* Fix RuneAt

* Add tests of Rune-related methods of `StringSlice`

* Make comment more tolerant of changes

* Tweak comment

* Fix comment

* Add `readonly`

Co-authored-by: Miha Zupan <mihazupan.zupan1@gmail.com>

* Move namespace of polyfilled Rune out of System.Text

* Apply suggestions from code review

Co-authored-by: Miha Zupan <mihazupan.zupan1@gmail.com>

* Fix regression by review suggestion

* Prepare constant for .NET Standard test

* Don't call `IsPunctuationException` if unnecessary

* PR feedback

---------

Co-authored-by: Miha Zupan <mihazupan.zupan1@gmail.com>
2026-01-12 11:08:03 +01:00
Miha Zupan
cd7b9ca0ef Test netstandard (#915)
* Add GH Action to test netstandard 2.0 and 2.1

* Account for TFM changes in tests project
2025-11-17 18:46:26 +01:00
Alexandre Mutel
fb698598e4 Use central package management 2025-11-17 08:19:42 +01:00
mos379
12590e5fbe feat(link-helper): improve ASCII normalization handling (#911)
* feat(link-helper): improve ASCII normalization handling

Enhanced the `Urilize` method to better handle ASCII normalization and special characters. Added support for decomposing characters when `allowOnlyAscii` is true and skipping diacritical marks. Introduced handling for special German, Scandinavian, and Icelandic characters via new helper methods: `IsSpecialScandinavianOrGermanChar` and `NormalizeScandinavianOrGermanChar`.

Reorganized `using` directives for better clarity. Updated the processing loop in `Urilize` to handle normalized spans and ASCII equivalents more effectively. These changes improve link generation compatibility across various languages.

* Add tests for Scandinavian and German character normalization

Added tests for NormalizeScandinavianOrGermanChar method to validate character normalization for various special characters in both ASCII and non-ASCII contexts.

* test(link-helper): update ASCII transliteration tests

Updated test cases in `TestUrilizeOnlyAscii_Simple` to reflect
changes in `LinkHelper.Urilize` behavior. Non-ASCII characters
like `æ` and `ø` are now transliterated to their ASCII
equivalents (`ae` and `oe`) instead of being removed.
2025-11-10 22:01:35 +01:00
Miha Zupan
8c01cf0549 Add another test for pipe tables (#907) 2025-10-21 08:37:43 +02:00
Miha Zupan
bcbd8e47ac Lazily allocate ProcessInlinesBegin/End delegates on Blocks (#906) 2025-10-21 08:37:02 +02:00
Miha Zupan
d6e88f16f7 Fix pipe table parsing with a leading paragraph (#905)
* Fix pipe table parsing with a leading paragraph

* Use the alternative approach
2025-10-20 21:43:25 +02:00
Miha Zupan
03bdf60086 Add a basic fuzzing project (#903)
* Add basic fuzzing project

* Mark the project as non-packable
2025-10-17 08:09:28 +02:00
Miha Zupan
5c78932f55 Fix edge cases in EmphasisInlineParser (#902) 2025-10-17 08:07:15 +02:00
Miha Zupan
191e33ab32 Fix build warnings (#899) 2025-10-16 17:25:47 +02:00
Miha Zupan
800235ba7a Fix IndexOutOfRangeException in CodeInlineParser (#900) 2025-10-16 17:25:30 +02:00
Miha Zupan
d5f8a809a0 Move sln to slnx (#901) 2025-10-16 17:24:33 +02:00
Asttear
781d9b5365 Remove leading newline in block attributes (#896)
* Remove leading newline in block attributes

fix #895

* Add handling logic for `\r\n`
2025-10-05 11:21:12 +02:00
Phillip Haydon
543570224e Fix issue where an inline code block that spans multiple lines doesn't parse correctly (#893)
* fixes issue where an inline code block that spans multiple lines doesn't get treated as code

* Update src/Markdig.Tests/TestPipeTable.cs

Co-authored-by: Miha Zupan <mihazupan.zupan1@gmail.com>

* Apply suggestion from @MihaZupan

Co-authored-by: Miha Zupan <mihazupan.zupan1@gmail.com>

* Update src/Markdig.Tests/TestPipeTable.cs

Co-authored-by: Miha Zupan <mihazupan.zupan1@gmail.com>

* fix broken test

* removed unreachable code and added more tests

* Update src/Markdig.Tests/TestPipeTable.cs

Co-authored-by: Miha Zupan <mihazupan.zupan1@gmail.com>

* Update src/Markdig.Tests/TestPipeTable.cs

Co-authored-by: Miha Zupan <mihazupan.zupan1@gmail.com>

* removed uncessary inline code check

* Update src/Markdig/Parsers/Inlines/CodeInlineParser.cs

Co-authored-by: Miha Zupan <mihazupan.zupan1@gmail.com>

---------

Co-authored-by: Miha Zupan <mihazupan.zupan1@gmail.com>
Co-authored-by: Alexandre Mutel <alexandre_mutel@live.com>
2025-10-03 09:34:24 +02:00
Daniel Klecha
4dc0be88b4 add options for link inline (#894)
* add options for link inline

* create LinkOptions and associate it with all four parsers

* set EnableHtmlParsing to true by default
2025-10-03 09:22:51 +02:00
Phillip Haydon
0e9e80e1cd Fix for table depth error when cell contains backticks (#891)
* failing test

* fixed bug with table containing back tick which causes depth error
2025-09-21 16:26:02 +02:00
Alexandre Mutel
1b04599c44 Merge pull request #888 from prozolic/pullreq
Fixes issue #845
2025-09-11 07:55:51 +02:00
prozolic
5e6fb2d1c5 Add test for issue #845 list item blank line 2025-09-08 22:36:09 +09:00
prozolic
14406bc60d Fixes issue #845 2025-09-06 21:10:51 +09:00
Alexandre Mutel
2aa6780a30 Merge pull request #883 from messani/master
Add source position tracking for grid tables
2025-08-28 09:04:44 +02:00
Alexandre Mutel
c43646586c Merge pull request #885 from dannyp32/supportTableWithoutExtraLine
Add support for a table without an extra new line before it
2025-08-28 09:02:29 +02:00
Daniel Pino
d548b82bcd Add support for a table without an extra new line before it 2025-08-09 08:50:49 +00:00
Tibor Peluch
aab5543cb5 Code cleanup 2025-07-14 20:17:50 +02:00
Tibor Peluch
2e1d741aaf Cleaned up code, added tests for source position 2025-07-14 10:23:15 +02:00
Tibor Peluch
80c50e31e2 Attempt to fix tracking of tree node positions (line, column) inside GridTable 2025-07-11 13:25:03 +02:00
Alexandre Mutel
7ff8db9016 Merge pull request #877 from Mertsch/Mertsch-patch-1
Update readme.md
2025-06-19 08:41:54 +02:00
Alexandre Mutel
c69fb9ae73 Merge pull request #879 from stylefish/issue878
Fixes #878: RoundtripRenderer: render indent and 0 blocks for ordered lists
2025-06-19 08:41:10 +02:00
stylefish
5a3c206076 Fixes #878: render indent and 0 blocks 2025-06-16 11:26:23 +02:00
Mertsch
b92890094c Update readme.md 2025-06-12 14:26:00 +02:00
Alexandre Mutel
682c727288 Merge pull request #876 from Akarinnnnn/fix-872
Fix #872 by reserve null title string.
2025-06-05 07:57:29 +02:00
Fa鸽
ec2eef25b2 Remove HtmlHelper.UnescapeNullable 2025-06-04 19:23:18 +08:00
Fa鸽
6261660d37 Explain why not to normalize link title into empty strings 2025-05-31 22:26:33 +08:00
Fa鸽
6d1fa96389 Changed link parsing tests for #872 2025-05-31 16:33:29 +08:00
Fa鸽
47c4e9b1e2 Fix #872 by reserve null title string. 2025-05-31 16:01:42 +08:00
Alexandre Mutel
3535701d70 Merge pull request #869 from prozolic/pullreq
Fix bug in `Markdown.ToPlainText` with code blocks
2025-04-27 18:52:57 +02:00
prozolic
c41b389053 Fix CodeBlockRenderer.Write 2025-04-27 16:49:05 +09:00
Alexandre Mutel
09a4b81a6e Update tests 2025-04-15 11:35:54 +02:00
Alexandre Mutel
7b14e2e091 Merge pull request #867 from MihaZupan/commonmark-0.31.2
Update to CommonMark 0.31.2
2025-04-15 10:59:22 +02:00
Alexandre Mutel
1e17dcdd08 Merge pull request #866 from MihaZupan/alert-perf
Improve Alert parsing perf
2025-04-15 10:58:40 +02:00
Alexandre Mutel
40e5ab1514 Merge pull request #863 from Amberg/master
Infer pipe table column widths from separator row
2025-04-15 10:57:47 +02:00
Alexandre Mutel
2953b026fc Merge pull request #865 from RamType0/patch-1
Fix `MathInline` is called "math block"
2025-04-15 10:56:27 +02:00
Miha Zupan
42ab98968d Update readme 2025-04-15 04:32:52 +02:00
Miha Zupan
b15cf582a5 Add 'search' HTML tag support 2025-04-15 04:31:13 +02:00
Miha Zupan
61e9be290b Allow empty HTML comments, double hyphens in text 2025-04-15 04:02:22 +02:00
Miha Zupan
a9ce0eb438 Update definition of punctuation to include symbols 2025-04-15 03:09:59 +02:00
Miha Zupan
023d93c091 Update CommonMark spec to 0.31.2 2025-04-14 23:32:22 +02:00
Miha Zupan
bbefce3b1f Sealed + ref struct 2025-04-14 22:11:53 +02:00
Miha Zupan
0d6343b421 Make AlertBlock parsing a bit cheaper 2025-04-14 22:02:21 +02:00
Ram.Type-0
f4effc25c0 Fix MathInline is called "math block" 2025-04-15 00:57:16 +09:00
Alexandre Mutel
7a83a1fd3d Merge pull request #864 from MihaZupan/net9-perf4
A couple perf improvements
2025-04-14 11:10:48 +02:00
Miha Zupan
8269ff1af5 Improve AutoLinkParser overhead for false-positive opening chars 2025-04-13 17:45:52 +02:00
Miha Zupan
0e6d0f4cb2 Fix style 2025-04-13 17:23:40 +02:00
Miha Zupan
8484420b72 Remove some branches from IsWhiteSpace and IsWhiteSpaceOrZero 2025-04-13 17:23:27 +02:00
Miha Zupan
c82a36884d Use the field keyword in a few places 2025-04-13 17:22:51 +02:00
Miha Zupan
da3d7f4f3a Improve some descriptions 2025-04-13 17:22:24 +02:00
Miha Zupan
eceb70c16a Avoid delegate allocations in AutoIdentifierExtension 2025-04-13 17:22:04 +02:00
Miha Zupan
7a9c192d7d Speed up FencedCodeBlock rendering 2025-04-13 17:21:43 +02:00
Miha Zupan
8cfa0cf0ae Improve more character tests with SearchValues 2025-04-13 16:59:55 +02:00
Miha Zupan
a82c3bd705 Improve some character tests 2025-04-13 16:59:29 +02:00
Miha Zupan
ecfda373b9 Avoid warnings in Markdig.WebApp 2025-04-13 16:11:30 +02:00
Miha Zupan
d8f69218db Commit FrozenDictionary polyfill 2025-04-13 16:11:02 +02:00
Miha Zupan
adfcf42529 Use FrozenDictionary in a couple places 2025-04-13 16:09:37 +02:00
Miha Zupan
dab1ca5483 Avoid unnecessary null check when reading trivia info 2025-04-13 16:09:24 +02:00
Manuel Amstutz
55f770cc07 feat: infer pipe table column widths from separator row
Adds support for calculating column widths in pipe tables based on the number of dashes in the header separator row.
Enabled via the InferColumnWidthsFromSeparator option in PipeTableOptions.
2025-04-09 20:55:54 +02:00
Alexandre Mutel
8b84542527 Merge pull request #861 from Meir017/patch-1
chore: update repository's github path
2025-03-20 16:38:19 +01:00
Meir Blachman
086440bcd3 update repository's github path 2025-03-20 16:59:50 +02:00
Alexandre Mutel
97470bd61f Merge pull request #859 from JamesNK/jamesnk/autolinks-domain-no-period
Add AutoLinkOptions.AllowDomainWithoutPeriod
2025-03-18 10:00:13 +01:00
James Newton-King
90c73b7754 Update src/Markdig/Helpers/LinkHelper.cs
Co-authored-by: Miha Zupan <mihazupan.zupan1@gmail.com>
2025-03-18 14:54:20 +08:00
James Newton-King
ee403ce28f Port tests 2025-03-17 08:26:51 +08:00
James Newton-King
8b403918b9 Update XML doc 2025-03-17 07:47:40 +08:00
James Newton-King
39b07d6bc5 Add AutoLinkOptions.AllowDomainWithoutPeriod 2025-03-17 07:46:23 +08:00
Alexandre Mutel
fb3fe8b261 Merge pull request #838 from Melodi17/master
Implemented better indent control in TextRendererBase
2025-02-28 09:23:27 +01:00
Alexandre Mutel
abb19ecf37 Merge pull request #851 from Akarinnnnn/encoding-ployfill
Replace encoding polyfill with NET5+ one.
2025-02-28 09:22:48 +01:00
Fa鸽
9dac60df73 Replace encoding polyfill with NET5+ one.
netstandard2.1 is a special TFM that .NET5+ doesn't mark themselves compitable, even if they mostly are.
2025-02-24 10:49:59 +08:00
Melodi
148278417f Added error throwing when stack is empty and PopIndent() is called 2025-01-14 14:25:20 +10:00
Alexandre Mutel
5b32391348 Update dependencies NuGet 2025-01-10 08:56:38 +01:00
Alexandre Mutel
5528023158 Merge pull request #844 from snnz/fix-gridtables
Prevent GridTableParser from looking beyond the end of a line.
2025-01-09 18:15:11 +01:00
Alexandre Mutel
f93b9d79d9 Merge branch 'master' into fix-gridtables 2025-01-06 08:43:45 +01:00
Alexandre Mutel
d53fd0e870 Merge pull request #843 from snnz/fix-deflists
Fixes exception in DefinitionListParser.GetCurrentDefinitionList()
2025-01-06 08:42:36 +01:00
Alexandre Mutel
c488aca96c Merge branch 'master' into fix-deflists 2025-01-05 21:12:33 +01:00
Alexandre Mutel
9b3f442765 Merge pull request #842 from snnz/fix-alerts
Check that the alert candidate is not already in an alert block or nested within other elements.
2025-01-05 21:11:11 +01:00
Sergey Nozhenko
7b6d659bbd A test has been added. 2025-01-03 07:03:28 +03:00
Sergey Nozhenko
bc8ba4fecb A test has been added. 2025-01-03 07:02:38 +03:00
Sergey Nozhenko
d87bb7292d A test has been added. 2025-01-03 07:01:29 +03:00
Sergey Nozhenko
118d28f886 Prevent GridTableParser from looking beyond the end of a line. 2025-01-03 04:29:24 +03:00
Sergey Nozhenko
3e0c72f043 Fixes exception in DefinitionListParser.GetCurrentDefinitionList() 2025-01-03 03:30:49 +03:00
Sergey Nozhenko
f2590e7b80 Check that the alert candidate is not already in an alert block or nested within other elements. 2025-01-03 01:27:11 +03:00
Melodi
88c5b5cb41 Added method for clearing indents in TextRendererBase as well as added case handling to PopIndent() 2024-12-31 22:57:02 +10:00
Alexandre Mutel
d1233ffe66 Merge pull request #837 from snnz/fix-links
Fix errors in LinkHelper and LinkInlineParser.
2024-12-27 09:49:04 +01:00
Sergey Nozhenko
ab8e85b06e Remove additional condition, since a carriage return constitute a line ending regardless of whether it is followed by a line feed or not. 2024-12-21 06:56:23 +03:00
snnz
90bc15c016 Update src/Markdig.Tests/TestPlayParser.cs
Co-authored-by: Miha Zupan <mihazupan.zupan1@gmail.com>
2024-12-21 06:14:16 +03:00
snnz
7f604bef30 Update src/Markdig/Parsers/Inlines/LinkInlineParser.cs
Co-authored-by: Miha Zupan <mihazupan.zupan1@gmail.com>
2024-12-21 06:14:07 +03:00
snnz
54783b8f65 Update src/Markdig/Parsers/Inlines/LinkInlineParser.cs
Co-authored-by: Miha Zupan <mihazupan.zupan1@gmail.com>
2024-12-21 06:13:56 +03:00
snnz
ad0770a594 Update src/Markdig/Parsers/Inlines/LinkInlineParser.cs
Co-authored-by: Miha Zupan <mihazupan.zupan1@gmail.com>
2024-12-21 06:13:22 +03:00
snnz
90365bfeee Update src/Markdig/Parsers/Inlines/LinkInlineParser.cs
Co-authored-by: Miha Zupan <mihazupan.zupan1@gmail.com>
2024-12-21 06:13:09 +03:00
Sergey Nozhenko
c35f7fff17 Fixed errors in LinkHelper and LinkInlineParser. 2024-12-21 03:29:31 +03:00
121 changed files with 6600 additions and 2273 deletions

View File

@@ -12,8 +12,9 @@ insert_final_newline = false
trim_trailing_whitespace = true
# Solution Files
[*.sln]
indent_style = tab
[*.slnx]
indent_size = 2
insert_final_newline = true
# XML Project Files
[*.{csproj,vbproj,vcxproj,vcxproj.filters,proj,projitems,shproj}]
@@ -35,3 +36,8 @@ insert_final_newline = true
# Bash Files
[*.sh]
end_of_line = lf
# C# files
[*.cs]
# License header
file_header_template = Copyright (c) Alexandre Mutel. All rights reserved.\nThis file is licensed under the BSD-Clause 2 license.\nSee the license.txt file in the project root for more information.

2
.gitattributes vendored
View File

@@ -1,3 +1,3 @@
* text=auto
*.cs text=auto diff=csharp
*.sln text=auto eol=crlf
*.slnx text=auto eol=crlf

44
.github/workflows/test-netstandard.yml vendored Normal file
View File

@@ -0,0 +1,44 @@
name: Test netstandard
on: pull_request
jobs:
test-netstandard:
runs-on: ubuntu-latest
strategy:
matrix:
netstandard-version: ['netstandard2.0', 'netstandard2.1']
steps:
- uses: actions/checkout@v4
- name: Setup .NET
uses: actions/setup-dotnet@v4
with:
dotnet-version: |
8.0.x
9.0.x
- name: Patch build to test ${{ matrix.netstandard-version }}
run: |
cd src
sed -i 's/<TargetFrameworks>.*<\/TargetFrameworks>/<TargetFrameworks>${{ matrix.netstandard-version }}<\/TargetFrameworks>/' Markdig/Markdig.targets
sed -i 's/<TargetFrameworks>.*<\/TargetFrameworks>/<TargetFrameworks>net8.0;net9.0<\/TargetFrameworks>/' Markdig.Tests/Markdig.Tests.csproj
echo "Markdig.targets TFMs:"
grep "TargetFrameworks" Markdig/Markdig.targets
echo "Markdig.Tests.csproj TFMs:"
grep "TargetFrameworks" Markdig.Tests/Markdig.Tests.csproj
- name: Restore dependencies
run: dotnet restore src/Markdig.Tests/Markdig.Tests.csproj
- name: Test Debug
run: |
dotnet build src/Markdig.Tests/Markdig.Tests.csproj -c Debug --no-restore
dotnet test src/Markdig.Tests/Markdig.Tests.csproj -c Debug --no-build
- name: Test Release
run: |
dotnet build src/Markdig.Tests/Markdig.Tests.csproj -c Release --no-restore
dotnet test src/Markdig.Tests/Markdig.Tests.csproj -c Release --no-build

2
.gitignore vendored
View File

@@ -8,6 +8,8 @@
*.sln.docstates
*.nuget.props
*.nuget.targets
src/.idea
BenchmarkDotNet.Artifacts
# User-specific files (MonoDevelop/Xamarin Studio)
*.userprefs

View File

@@ -2,7 +2,7 @@
<img align="right" width="160px" height="160px" src="img/markdig.png">
Markdig is a fast, powerful, [CommonMark](http://commonmark.org/) compliant, extensible Markdown processor for .NET.
Markdig is a fast, powerful, [CommonMark](https://commonmark.org/) compliant, extensible Markdown processor for .NET.
> NOTE: The repository is under construction. There will be a dedicated website and proper documentation at some point!
@@ -14,7 +14,7 @@ You can **try Markdig online** and compare it to other implementations on [babel
- **Abstract Syntax Tree** with precise source code location for syntax tree, useful when building a Markdown editor.
- Checkout [Markdown Editor v2 for Visual Studio 2022](https://marketplace.visualstudio.com/items?itemName=MadsKristensen.MarkdownEditor2) powered by Markdig!
- Converter to **HTML**
- Passing more than **600+ tests** from the latest [CommonMark specs (0.30)](http://spec.commonmark.org/)
- Passing more than **600+ tests** from the latest [CommonMark specs (0.31.2)](https://spec.commonmark.org/)
- Includes all the core elements of CommonMark:
- including **GFM fenced code blocks**.
- **Extensible** architecture
@@ -22,9 +22,9 @@ You can **try Markdig online** and compare it to other implementations on [babel
- [**Roundtrip support**](./src/Markdig/Roundtrip.md): Parses trivia (whitespace, newlines and other characters) to support lossless parse ⭢ render roundtrip. This enables changing markdown documents without introducing undesired trivia changes.
- Built-in with **20+ extensions**, including:
- 2 kind of tables:
- [**Pipe tables**](src/Markdig.Tests/Specs/PipeTableSpecs.md) (inspired from GitHub tables and [PanDoc - Pipe Tables](http://pandoc.org/README.html#extension-pipe_tables))
- [**Grid tables**](src/Markdig.Tests/Specs/GridTableSpecs.md) (inspired from [Pandoc - Grid Tables](http://pandoc.org/README.html#extension-grid_tables))
- [**Extra emphasis**](src/Markdig.Tests/Specs/EmphasisExtraSpecs.md) (inspired from [Pandoc - Emphasis](http://pandoc.org/README.html#strikeout) and [Markdown-it](https://markdown-it.github.io/))
- [**Pipe tables**](src/Markdig.Tests/Specs/PipeTableSpecs.md) (inspired from GitHub tables and [PanDoc - Pipe Tables](https://pandoc.org/MANUAL.html#extension-pipe_tables))
- [**Grid tables**](src/Markdig.Tests/Specs/GridTableSpecs.md) (inspired from [Pandoc - Grid Tables](https://pandoc.org/MANUAL.html#extension-grid_tables))
- [**Extra emphasis**](src/Markdig.Tests/Specs/EmphasisExtraSpecs.md) (inspired from [Pandoc - Emphasis](https://pandoc.org/MANUAL.html#strikeout) and [Markdown-it](https://markdown-it.github.io/))
- strike through `~~`,
- Subscript `~`
- Superscript `^`
@@ -33,7 +33,7 @@ You can **try Markdig online** and compare it to other implementations on [babel
- [**Special attributes**](src/Markdig.Tests/Specs/GenericAttributesSpecs.md) or attached HTML attributes (inspired from [PHP Markdown Extra - Special Attributes](https://michelf.ca/projects/php-markdown/extra/#spe-attr))
- [**Definition lists**](src/Markdig.Tests/Specs/DefinitionListSpecs.md) (inspired from [PHP Markdown Extra - Definitions Lists](https://michelf.ca/projects/php-markdown/extra/#def-list))
- [**Footnotes**](src/Markdig.Tests/Specs/FootnotesSpecs.md) (inspired from [PHP Markdown Extra - Footnotes](https://michelf.ca/projects/php-markdown/extra/#footnotes))
- [**Auto-identifiers**](src/Markdig.Tests/Specs/AutoIdentifierSpecs.md) for headings (similar to [Pandoc - Auto Identifiers](http://pandoc.org/README.html#extension-auto_identifiers))
- [**Auto-identifiers**](src/Markdig.Tests/Specs/AutoIdentifierSpecs.md) for headings (similar to [Pandoc - Auto Identifiers](https://pandoc.org/MANUAL.html#extension-auto_identifiers))
- [**Auto-links**](src/Markdig.Tests/Specs/AutoLinks.md) generates links if a text starts with `http://` or `https://` or `ftp://` or `mailto:` or `www.xxx.yyy`
- [**Task Lists**](src/Markdig.Tests/Specs/TaskListSpecs.md) inspired from [Github Task lists](https://github.com/blog/1375-task-lists-in-gfm-issues-pulls-comments).
- [**Extra bullet lists**](src/Markdig.Tests/Specs/ListExtraSpecs.md), supporting alpha bullet `a.` `b.` and roman bullet (`i`, `ii`...etc.)
@@ -70,7 +70,7 @@ If you are looking for support for an old .NET Framework 3.5 or 4.0, you can dow
While there is not yet a dedicated documentation, you can find from the [specs documentation](src/Markdig.Tests/Specs/readme.md) how to use these extensions.
In the meantime, you can have a "behind the scene" article about Markdig in my blog post ["Implementing a Markdown Engine for .NET"](http://xoofx.github.io/blog/2016/06/13/implementing-a-markdown-processor-for-dotnet/)
In the meantime, you can have a "behind the scene" article about Markdig in my blog post ["Implementing a Markdown Engine for .NET"](https://xoofx.github.io/blog/2016/06/13/implementing-a-markdown-processor-for-dotnet/)
## Download
@@ -153,7 +153,7 @@ image editing, optimization, and delivery server](https://github.com/imazen/imag
## Credits
Thanks to the fantastic work done by [John Mac Farlane](http://johnmacfarlane.net/) for the CommonMark specs and all the people involved in making Markdown a better standard!
Thanks to the fantastic work done by [John Mac Farlane](https://johnmacfarlane.net/) for the CommonMark specs and all the people involved in making Markdown a better standard!
This project would not have been possible without this huge foundation.
@@ -161,7 +161,7 @@ Thanks also to the project [BenchmarkDotNet](https://github.com/PerfDotNet/Bench
Some decoding part (e.g HTML [EntityHelper.cs](https://github.com/lunet-io/markdig/blob/master/src/Markdig/Helpers/EntityHelper.cs)) have been re-used from [CommonMark.NET](https://github.com/Knagis/CommonMark.NET)
Thanks to the work done by @clarkd on the JIRA Link extension (https://github.com/clarkd/MarkdigJiraLinker), now included with this project!
Thanks to the work done by @clarkd on the [JIRA Link extension](https://github.com/clarkd/MarkdigJiraLinker), now included with this project!
## Author
Alexandre MUTEL aka [xoofx](http://xoofx.github.io)
Alexandre MUTEL aka [xoofx](https://xoofx.github.io/)

View File

@@ -0,0 +1,23 @@
<Project>
<PropertyGroup>
<ManagePackageVersionsCentrally>true</ManagePackageVersionsCentrally>
<CentralPackageTransitivePinningEnabled>false</CentralPackageTransitivePinningEnabled>
</PropertyGroup>
<ItemGroup>
<PackageVersion Include="BenchmarkDotNet" Version="0.14.0" />
<PackageVersion Include="BenchmarkDotNet.Diagnostics.Windows" Version="0.14.0" />
<PackageVersion Include="CommonMark.NET" Version="0.15.1" />
<PackageVersion Include="Markdown" Version="2.2.1" />
<PackageVersion Include="MarkdownSharp" Version="2.0.5" />
<PackageVersion Include="Microsoft.ApplicationInsights.AspNetCore" Version="2.23.0" />
<PackageVersion Include="Microsoft.Diagnostics.Runtime" Version="3.1.512801" />
<PackageVersion Include="Microsoft.NET.Test.Sdk" Version="18.0.1" />
<PackageVersion Include="MinVer" Version="6.0.0" />
<PackageVersion Include="NUnit" Version="4.4.0" />
<PackageVersion Include="NUnit3TestAdapter" Version="5.2.0" />
<PackageVersion Include="SharpFuzz" Version="2.2.0" />
</ItemGroup>
<ItemGroup Condition=" '$(TargetFramework)' == 'net462' OR '$(TargetFramework)' == 'netstandard2.0'">
<PackageVersion Include="System.Memory" Version="4.6.3" />
</ItemGroup>
</Project>

View File

@@ -19,12 +19,12 @@
</Content>
</ItemGroup>
<ItemGroup>
<PackageReference Include="BenchmarkDotNet" Version="0.13.12" />
<PackageReference Include="BenchmarkDotNet.Diagnostics.Windows" Version="0.13.12" />
<PackageReference Include="CommonMark.NET" Version="0.15.1" />
<PackageReference Include="Markdown" Version="2.2.1" />
<PackageReference Include="MarkdownSharp" Version="2.0.5" />
<PackageReference Include="Microsoft.Diagnostics.Runtime" Version="3.1.506101" />
<PackageReference Include="BenchmarkDotNet" />
<PackageReference Include="BenchmarkDotNet.Diagnostics.Windows" />
<PackageReference Include="CommonMark.NET" />
<PackageReference Include="Markdown" />
<PackageReference Include="MarkdownSharp" />
<PackageReference Include="Microsoft.Diagnostics.Runtime" />
</ItemGroup>
<ItemGroup>
<ProjectReference Include="..\Markdig\Markdig.csproj" />

View File

@@ -0,0 +1,81 @@
// Copyright (c) Alexandre Mutel. All rights reserved.
// This file is licensed under the BSD-Clause 2 license.
// See the license.txt file in the project root for more information.
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Diagnosers;
using Markdig;
namespace Testamina.Markdig.Benchmarks.PipeTable;
/// <summary>
/// Benchmark for pipe table parsing performance, especially for large tables.
/// Tests the performance of PipeTableParser with varying table sizes.
/// </summary>
[MemoryDiagnoser]
[GcServer(true)] // Use server GC to get more comprehensive GC stats
public class PipeTableBenchmark
{
private string _100Rows = null!;
private string _500Rows = null!;
private string _1000Rows = null!;
private string _1500Rows = null!;
private string _5000Rows = null!;
private string _10000Rows = null!;
private MarkdownPipeline _pipeline = null!;
[GlobalSetup]
public void Setup()
{
// Pipeline with pipe tables enabled (part of advanced extensions)
_pipeline = new MarkdownPipelineBuilder()
.UseAdvancedExtensions()
.Build();
// Generate tables of various sizes
// Note: Before optimization, 5000+ rows hit depth limit due to nested tree structure.
// After optimization, these should work.
_100Rows = PipeTableGenerator.Generate(rows: 100, columns: 5);
_500Rows = PipeTableGenerator.Generate(rows: 500, columns: 5);
_1000Rows = PipeTableGenerator.Generate(rows: 1000, columns: 5);
_1500Rows = PipeTableGenerator.Generate(rows: 1500, columns: 5);
_5000Rows = PipeTableGenerator.Generate(rows: 5000, columns: 5);
_10000Rows = PipeTableGenerator.Generate(rows: 10000, columns: 5);
}
[Benchmark(Description = "PipeTable 100 rows x 5 cols")]
public string Parse100Rows()
{
return Markdown.ToHtml(_100Rows, _pipeline);
}
[Benchmark(Description = "PipeTable 500 rows x 5 cols")]
public string Parse500Rows()
{
return Markdown.ToHtml(_500Rows, _pipeline);
}
[Benchmark(Description = "PipeTable 1000 rows x 5 cols")]
public string Parse1000Rows()
{
return Markdown.ToHtml(_1000Rows, _pipeline);
}
[Benchmark(Description = "PipeTable 1500 rows x 5 cols")]
public string Parse1500Rows()
{
return Markdown.ToHtml(_1500Rows, _pipeline);
}
[Benchmark(Description = "PipeTable 5000 rows x 5 cols")]
public string Parse5000Rows()
{
return Markdown.ToHtml(_5000Rows, _pipeline);
}
[Benchmark(Description = "PipeTable 10000 rows x 5 cols")]
public string Parse10000Rows()
{
return Markdown.ToHtml(_10000Rows, _pipeline);
}
}

View File

@@ -0,0 +1,61 @@
// Copyright (c) Alexandre Mutel. All rights reserved.
// This file is licensed under the BSD-Clause 2 license.
// See the license.txt file in the project root for more information.
using System.Text;
namespace Testamina.Markdig.Benchmarks.PipeTable;
/// <summary>
/// Generates pipe table markdown content for benchmarking purposes.
/// </summary>
public static class PipeTableGenerator
{
private const int DefaultCellWidth = 10;
/// <summary>
/// Generates a pipe table in markdown format.
/// </summary>
/// <param name="rows">Number of data rows (excluding header)</param>
/// <param name="columns">Number of columns</param>
/// <param name="cellWidth">Width of each cell content (default: 10)</param>
/// <returns>Pipe table markdown string</returns>
public static string Generate(int rows, int columns, int cellWidth = DefaultCellWidth)
{
var sb = new StringBuilder();
// Header row
sb.Append('|');
for (int col = 0; col < columns; col++)
{
sb.Append(' ');
sb.Append($"Header {col + 1}".PadRight(cellWidth));
sb.Append(" |");
}
sb.AppendLine();
// Separator row (with dashes)
sb.Append('|');
for (int col = 0; col < columns; col++)
{
sb.Append(new string('-', cellWidth + 2));
sb.Append('|');
}
sb.AppendLine();
// Data rows
for (int row = 0; row < rows; row++)
{
sb.Append('|');
for (int col = 0; col < columns; col++)
{
sb.Append(' ');
sb.Append($"R{row + 1}C{col + 1}".PadRight(cellWidth));
sb.Append(" |");
}
sb.AppendLine();
}
return sb.ToString();
}
}

View File

@@ -7,6 +7,7 @@ using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Running;
using Markdig;
using Testamina.Markdig.Benchmarks.PipeTable;
namespace Testamina.Markdig.Benchmarks;
@@ -68,7 +69,16 @@ public class Program
//config.Add(gcDiagnoser);
//var config = DefaultConfig.Instance;
BenchmarkRunner.Run<Program>(config);
// Run specific benchmarks based on command line arguments
if (args.Length > 0 && args[0] == "--pipetable")
{
BenchmarkRunner.Run<PipeTableBenchmark>(config);
}
else
{
BenchmarkRunner.Run<Program>(config);
}
//BenchmarkRunner.Run<TestDictionary>(config);
//BenchmarkRunner.Run<TestMatchPerf>();
//BenchmarkRunner.Run<TestStringPerf>();

4
src/Markdig.Fuzzing/.gitignore vendored Normal file
View File

@@ -0,0 +1,4 @@
corpus
libfuzzer-dotnet-windows.exe
crash-*
timeout-*

View File

@@ -0,0 +1,19 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net9.0</TargetFramework>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
<IsPackable>false</IsPackable>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="SharpFuzz" />
</ItemGroup>
<ItemGroup>
<ProjectReference Include="..\Markdig\Markdig.csproj" />
</ItemGroup>
</Project>

View File

@@ -0,0 +1,71 @@
using Markdig;
using Markdig.Renderers.Roundtrip;
using Markdig.Syntax;
using SharpFuzz;
using System.Diagnostics;
using System.Text;
ReadOnlySpanAction fuzzTarget = ParseRenderFuzzer.FuzzTarget;
if (args.Length > 0)
{
// Run the target on existing inputs
string[] files = Directory.Exists(args[0])
? Directory.GetFiles(args[0])
: [args[0]];
Debugger.Launch();
foreach (string inputFile in files)
{
fuzzTarget(File.ReadAllBytes(inputFile));
}
}
else
{
Fuzzer.LibFuzzer.Run(fuzzTarget);
}
sealed class ParseRenderFuzzer
{
private static readonly MarkdownPipeline s_advancedPipeline = new MarkdownPipelineBuilder()
.UseAdvancedExtensions()
.Build();
private static readonly ResettableRoundtripRenderer _roundtripRenderer = new();
public static void FuzzTarget(ReadOnlySpan<byte> bytes)
{
string text = Encoding.UTF8.GetString(bytes);
try
{
MarkdownDocument document = Markdown.Parse(text);
_ = document.ToHtml();
document = Markdown.Parse(text, s_advancedPipeline);
_ = document.ToHtml(s_advancedPipeline);
document = Markdown.Parse(text, trackTrivia: true);
_ = document.ToHtml();
_roundtripRenderer.Reset();
_roundtripRenderer.Render(document);
_ = Markdown.Normalize(text);
_ = Markdown.ToPlainText(text);
}
catch (Exception ex) when (IsIgnorableException(ex)) { }
}
private static bool IsIgnorableException(Exception exception)
{
return exception.Message.Contains("Markdown elements in the input are too deeply nested", StringComparison.Ordinal);
}
private sealed class ResettableRoundtripRenderer : RoundtripRenderer
{
public ResettableRoundtripRenderer() : base(new StringWriter(new StringBuilder(1024 * 1024))) { }
public new void Reset() => base.Reset();
}
}

View File

@@ -0,0 +1,86 @@
param (
[string]$configuration = $null
)
Set-StrictMode -Version Latest
$libFuzzer = "libfuzzer-dotnet-windows.exe"
$outputDir = "bin"
function Get-LibFuzzer {
param (
[string]$Path
)
$libFuzzerUrl = "https://github.com/Metalnem/libfuzzer-dotnet/releases/download/v2025.05.02.0904/libfuzzer-dotnet-windows.exe"
$expectedHash = "17af5b3f6ff4d2c57b44b9a35c13051b570eb66f0557d00015df3832709050bf"
Write-Output "Downloading libFuzzer from $libFuzzerUrl..."
try {
$tempFile = "$Path.tmp"
Invoke-WebRequest -Uri $libFuzzerUrl -OutFile $tempFile -UseBasicParsing
$downloadedHash = (Get-FileHash -Path $tempFile -Algorithm SHA256).Hash
if ($downloadedHash -eq $ExpectedHash) {
Move-Item -Path $tempFile -Destination $Path -Force
Write-Output "libFuzzer downloaded successfully to $Path"
}
else {
Write-Error "Hash validation failed."
Remove-Item -Path $tempFile -Force -ErrorAction SilentlyContinue
exit 1
}
}
catch {
Write-Error "Failed to download libFuzzer: $($_.Exception.Message)"
Remove-Item -Path $tempFile -Force -ErrorAction SilentlyContinue
exit 1
}
}
# Check if libFuzzer exists, download if not
if (-not (Test-Path $libFuzzer)) {
Get-LibFuzzer -Path $libFuzzer
}
$toolListOutput = dotnet tool list --global sharpFuzz.CommandLine 2>$null
if (-not ($toolListOutput -match "sharpfuzz")) {
Write-Output "Installing sharpfuzz CLI"
dotnet tool install --global sharpFuzz.CommandLine
}
if (Test-Path $outputDir) {
Remove-Item -Recurse -Force $outputDir
}
if ($configuration -eq $null) {
$configuration = "Debug"
}
dotnet publish -c $configuration -o $outputDir
$project = Join-Path $outputDir "Markdig.Fuzzing.dll"
$fuzzingTarget = Join-Path $outputDir "Markdig.dll"
Write-Output "Instrumenting $fuzzingTarget"
& sharpfuzz $fuzzingTarget
if ($LastExitCode -ne 0) {
Write-Error "An error occurred while instrumenting $fuzzingTarget"
exit 1
}
New-Item -ItemType Directory -Force -Path corpus | Out-Null
$libFuzzerArgs = @("--target_path=dotnet", "--target_arg=$project", "-timeout=10", "corpus")
# Add any additional arguments passed to the script
if ($args) {
$libFuzzerArgs += $args
}
Write-Output "Starting libFuzzer with arguments: $libFuzzerArgs"
& ./$libFuzzer @libFuzzerArgs

View File

@@ -1,7 +1,7 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<TargetFrameworks>net6.0;net8.0;net9.0</TargetFrameworks>
<TargetFrameworks>net8.0;net9.0</TargetFrameworks>
<OutputType>Exe</OutputType>
<IsPackable>false</IsPackable>
<ImplicitUsings>enable</ImplicitUsings>
@@ -9,12 +9,13 @@
<StartupObject>Markdig.Tests.Program</StartupObject>
<SpecExecutable>$(MSBuildProjectDirectory)\..\SpecFileGen\bin\$(Configuration)\$(TargetFramework)\SpecFileGen.dll</SpecExecutable>
<SpecTimestamp>$(MSBuildProjectDirectory)\..\SpecFileGen\bin\$(Configuration)\$(TargetFramework)\SpecFileGen.timestamp</SpecTimestamp>
<NoWarn>$(NoWarn);NETSDK1138</NoWarn>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="Microsoft.NET.Test.Sdk" Version="17.9.0" />
<PackageReference Include="NUnit" Version="4.1.0" />
<PackageReference Include="NUnit3TestAdapter" Version="4.5.0" />
<PackageReference Include="Microsoft.NET.Test.Sdk" />
<PackageReference Include="NUnit" />
<PackageReference Include="NUnit3TestAdapter" />
</ItemGroup>
<ItemGroup>
@@ -35,10 +36,10 @@
<InputSpecFiles Remove="Specs\readme.md" />
<!-- Allow Visual Studio up-to-date check to verify that nothing has changed - https://github.com/dotnet/project-system/blob/main/docs/up-to-date-check.md -->
<UpToDateCheckInput Include="@(InputSpecFiles)" />
<OutputSpecFiles Include="@(InputSpecFiles->'%(RelativeDir)%(Filename).generated.cs')" />
<OutputSpecFiles Include="@(InputSpecFiles-&gt;'%(RelativeDir)%(Filename).generated.cs')" />
</ItemGroup>
<Target Name="GeneratedSpecsFile" BeforeTargets="BeforeCompile;CoreCompile" Inputs="@(ItemSpecExecutable);@(InputSpecFiles)" Outputs="@(ItemSpecExecutable->'%(RelativeDir)%(Filename).timestamp');@(InputSpecFiles->'%(RelativeDir)%(Filename).generated.cs')">
<Target Name="GeneratedSpecsFile" BeforeTargets="BeforeCompile;CoreCompile" Inputs="@(ItemSpecExecutable);@(InputSpecFiles)" Outputs="@(ItemSpecExecutable-&gt;'%(RelativeDir)%(Filename).timestamp');@(InputSpecFiles-&gt;'%(RelativeDir)%(Filename).generated.cs')">
<Message Importance="high" Text="Regenerating Specs Files" />
<Exec Command="dotnet $(SpecExecutable)" />
<WriteLinesToFile File="$(SpecTimestamp)" Lines="$([System.DateTime]::Now)" />

View File

@@ -317,4 +317,96 @@ $$
Assert.That(paragraph.Inline.Span.Start == paragraph.Inline.FirstChild.Span.Start);
Assert.That(paragraph.Inline.Span.End == paragraph.Inline.LastChild.Span.End);
}
[Test]
public void TestGridTableShortLine()
{
var input = @"
+--+
| |
+-";
var expected = @"<table>
<col style=""width:100%"" />
<tbody>
<tr>
<td></td>
</tr>
</tbody>
</table>
";
TestParser.TestSpec(input, expected, new MarkdownPipelineBuilder().UseGridTables().Build());
}
[Test]
public void TestDefinitionListInListItemWithBlankLine()
{
var input = @"
-
term
: definition
";
var expected = @"<ul>
<li>
<dl>
<dt>term</dt>
<dd>definition</dd>
</dl>
</li>
</ul>
";
TestParser.TestSpec(input, expected, new MarkdownPipelineBuilder().UseDefinitionLists().Build());
}
[Test]
public void TestAlertWithinAlertOrNestedBlock()
{
var input = @"
>[!NOTE]
[!NOTE]
The second one is not a note.
>>[!NOTE]
Also not a note.
";
var expected = @"<div class=""markdown-alert markdown-alert-note"">
<p class=""markdown-alert-title""><svg viewBox=""0 0 16 16"" version=""1.1"" width=""16"" height=""16"" aria-hidden=""true""><path d=""M0 8a8 8 0 1 1 16 0A8 8 0 0 1 0 8Zm8-6.5a6.5 6.5 0 1 0 0 13 6.5 6.5 0 0 0 0-13ZM6.5 7.75A.75.75 0 0 1 7.25 7h1a.75.75 0 0 1 .75.75v2.75h.25a.75.75 0 0 1 0 1.5h-2a.75.75 0 0 1 0-1.5h.25v-2h-.25a.75.75 0 0 1-.75-.75ZM8 6a1 1 0 1 1 0-2 1 1 0 0 1 0 2Z""></path></svg>Note</p>
<p>[!NOTE]
The second one is not a note.</p>
</div>
<blockquote>
<blockquote>
<p>[!NOTE]
Also not a note.</p>
</blockquote>
</blockquote>
";
TestParser.TestSpec(input, expected, new MarkdownPipelineBuilder().UseAlertBlocks().Build());
}
[Test]
public void TestIssue845ListItemBlankLine()
{
TestParser.TestSpec("-\n\n foo",@"
<ul>
<li></li>
</ul>
<p>foo</p>");
TestParser.TestSpec("-\n-\n\n foo",@"
<ul>
<li></li>
<li></li>
</ul>
<p>foo</p>");
TestParser.TestSpec("-\n\n-\n\n foo",@"
<ul>
<li></li>
<li></li>
</ul>
<p>foo</p>");
}
}

View File

@@ -25,6 +25,7 @@ public class TestUnorderedList
[TestCase("-\ti1")]
[TestCase("-\ti1\n-\ti2")]
[TestCase("-\ti1\n- i2\n-\ti3")]
[TestCase("- 1.\n- 2.")]
public void Test(string value)
{
RoundTrip(value);

View File

@@ -533,5 +533,28 @@ namespace Markdig.Tests.Specs.AutoLinks
TestParser.TestSpec("<http://foö.bar.`baz>`", "<p><a href=\"http://xn--fo-gka.bar.%60baz\">http://foö.bar.`baz</a>`</p>", "autolinks|advanced", context: "Example 25\nSection Extensions / AutoLinks / Unicode support\n");
}
// Unicode punctuation characters are not allowed, but symbols are.
// Note that this does _not_ exactly match CommonMark's "Unicode punctuation character" definition.
[Test]
public void ExtensionsAutoLinksUnicodeSupport_Example026()
{
// Example 26
// Section: Extensions / AutoLinks / Unicode support
//
// The following Markdown:
// http://☃.net?☃ // OtherSymbol
//
// http://🍉.net?🍉 // A UTF-16 surrogate pair, but code point is OtherSymbol
//
// http://‰.net?‰ // OtherPunctuation
//
// Should be rendered as:
// <p><a href="http://xn--n3h.net?%E2%98%83">http://☃.net?☃</a> // OtherSymbol</p>
// <p><a href="http://xn--ji8h.net?%F0%9F%8D%89">http://🍉.net?🍉</a> // A UTF-16 surrogate pair, but code point is OtherSymbol</p>
// <p>http://‰.net?‰ // OtherPunctuation</p>
TestParser.TestSpec("http://☃.net?☃ // OtherSymbol\n\nhttp://🍉.net?🍉 // A UTF-16 surrogate pair, but code point is OtherSymbol\n\nhttp://‰.net?‰ // OtherPunctuation", "<p><a href=\"http://xn--n3h.net?%E2%98%83\">http://☃.net?☃</a> // OtherSymbol</p>\n<p><a href=\"http://xn--ji8h.net?%F0%9F%8D%89\">http://🍉.net?🍉</a> // A UTF-16 surrogate pair, but code point is OtherSymbol</p>\n<p>http://‰.net?‰ // OtherPunctuation</p>", "autolinks|advanced", context: "Example 26\nSection Extensions / AutoLinks / Unicode support\n");
}
}
}

View File

@@ -303,4 +303,19 @@ This will therefore be seen as an autolink and not as code inline.
<http://foö.bar.`baz>`
.
<p><a href="http://xn--fo-gka.bar.%60baz">http://foö.bar.`baz</a>`</p>
````````````````````````````````
Unicode punctuation characters are not allowed, but symbols are.
Note that this does _not_ exactly match CommonMark's "Unicode punctuation character" definition.
```````````````````````````````` example
http://☃.net?☃ // OtherSymbol
http://🍉.net?🍉 // A UTF-16 surrogate pair, but code point is OtherSymbol
http://‰.net?‰ // OtherPunctuation
.
<p><a href="http://xn--n3h.net?%E2%98%83">http://☃.net?☃</a> // OtherSymbol</p>
<p><a href="http://xn--ji8h.net?%F0%9F%8D%89">http://🍉.net?🍉</a> // A UTF-16 surrogate pair, but code point is OtherSymbol</p>
<p>http://‰.net?‰ // OtherPunctuation</p>
````````````````````````````````

File diff suppressed because it is too large Load Diff

View File

@@ -1,9 +1,9 @@
---
title: CommonMark Spec
author: John MacFarlane
version: '0.30'
date: '2021-06-19'
license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)'
version: '0.31.2'
date: '2024-01-28'
license: '[CC-BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/)'
...
# Introduction
@@ -14,7 +14,7 @@ Markdown is a plain text format for writing structured documents,
based on conventions for indicating formatting in email
and usenet posts. It was developed by John Gruber (with
help from Aaron Swartz) and released in 2004 in the form of a
[syntax description](http://daringfireball.net/projects/markdown/syntax)
[syntax description](https://daringfireball.net/projects/markdown/syntax)
and a Perl script (`Markdown.pl`) for converting Markdown to
HTML. In the next decade, dozens of implementations were
developed in many languages. Some extended the original
@@ -34,10 +34,10 @@ As Gruber writes:
> Markdown-formatted document should be publishable as-is, as
> plain text, without looking like it's been marked up with tags
> or formatting instructions.
> (<http://daringfireball.net/projects/markdown/>)
> (<https://daringfireball.net/projects/markdown/>)
The point can be illustrated by comparing a sample of
[AsciiDoc](http://www.methods.co.nz/asciidoc/) with
[AsciiDoc](https://asciidoc.org/) with
an equivalent sample of Markdown. Here is a sample of
AsciiDoc from the AsciiDoc manual:
@@ -103,7 +103,7 @@ source, not just in the processed document.
## Why is a spec needed?
John Gruber's [canonical description of Markdown's
syntax](http://daringfireball.net/projects/markdown/syntax)
syntax](https://daringfireball.net/projects/markdown/syntax)
does not specify the syntax unambiguously. Here are some examples of
questions it does not answer:
@@ -316,9 +316,9 @@ A line containing no characters, or a line containing only spaces
The following definitions of character classes will be used in this spec:
A [Unicode whitespace character](@) is
any code point in the Unicode `Zs` general category, or a tab (`U+0009`),
line feed (`U+000A`), form feed (`U+000C`), or carriage return (`U+000D`).
A [Unicode whitespace character](@) is a character in the Unicode `Zs` general
category, or a tab (`U+0009`), line feed (`U+000A`), form feed (`U+000C`), or
carriage return (`U+000D`).
[Unicode whitespace](@) is a sequence of one or more
[Unicode whitespace characters].
@@ -337,9 +337,8 @@ is `!`, `"`, `#`, `$`, `%`, `&`, `'`, `(`, `)`,
`[`, `\`, `]`, `^`, `_`, `` ` `` (U+005B0060),
`{`, `|`, `}`, or `~` (U+007B007E).
A [Unicode punctuation character](@) is an [ASCII
punctuation character] or anything in
the general Unicode categories `Pc`, `Pd`, `Pe`, `Pf`, `Pi`, `Po`, or `Ps`.
A [Unicode punctuation character](@) is a character in the Unicode `P`
(puncuation) or `S` (symbol) general categories.
## Tabs
@@ -579,9 +578,9 @@ raw HTML:
```````````````````````````````` example
<http://example.com?find=\*>
<https://example.com?find=\*>
.
<p><a href="http://example.com?find=%5C*">http://example.com?find=\*</a></p>
<p><a href="https://example.com?find=%5C*">https://example.com?find=\*</a></p>
````````````````````````````````
@@ -1964,7 +1963,7 @@ has been found, the code block contains all of the lines after the
opening code fence until the end of the containing block (or
document). (An alternative spec would require backtracking in the
event that a closing code fence is not found. But this makes parsing
much less efficient, and there seems to be no real down side to the
much less efficient, and there seems to be no real downside to the
behavior described here.)
A fenced code block may interrupt a paragraph, and does not require
@@ -2403,7 +2402,7 @@ followed by one of the strings (case-insensitive) `address`,
`h1`, `h2`, `h3`, `h4`, `h5`, `h6`, `head`, `header`, `hr`,
`html`, `iframe`, `legend`, `li`, `link`, `main`, `menu`, `menuitem`,
`nav`, `noframes`, `ol`, `optgroup`, `option`, `p`, `param`,
`section`, `source`, `summary`, `table`, `tbody`, `td`,
`search`, `section`, `summary`, `table`, `tbody`, `td`,
`tfoot`, `th`, `thead`, `title`, `tr`, `track`, `ul`, followed
by a space, a tab, the end of the line, the string `>`, or
the string `/>`.\
@@ -4115,7 +4114,7 @@ The following rules define [list items]:
blocks *Bs* starting with a character other than a space or tab, and *M* is
a list marker of width *W* followed by 1 ≤ *N* ≤ 4 spaces of indentation,
then the result of prepending *M* and the following spaces to the first line
of Ls*, and indenting subsequent lines of *Ls* by *W + N* spaces, is a
of *Ls*, and indenting subsequent lines of *Ls* by *W + N* spaces, is a
list item with *Bs* as its contents. The type of the list item
(bullet or ordered) is determined by the type of its list marker.
If the list item is ordered, then it is also assigned a start
@@ -5350,11 +5349,11 @@ by itself should be a paragraph followed by a nested sublist.
Since it is well established Markdown practice to allow lists to
interrupt paragraphs inside list items, the [principle of
uniformity] requires us to allow this outside list items as
well. ([reStructuredText](http://docutils.sourceforge.net/rst.html)
well. ([reStructuredText](https://docutils.sourceforge.net/rst.html)
takes a different approach, requiring blank lines before lists
even inside other list items.)
In order to solve of unwanted lists in paragraphs with
In order to solve the problem of unwanted lists in paragraphs with
hard-wrapped numerals, we allow only lists starting with `1` to
interrupt paragraphs. Thus,
@@ -6055,18 +6054,18 @@ But this is an HTML tag:
And this is code:
```````````````````````````````` example
`<http://foo.bar.`baz>`
`<https://foo.bar.`baz>`
.
<p><code>&lt;http://foo.bar.</code>baz&gt;`</p>
<p><code>&lt;https://foo.bar.</code>baz&gt;`</p>
````````````````````````````````
But this is an autolink:
```````````````````````````````` example
<http://foo.bar.`baz>`
<https://foo.bar.`baz>`
.
<p><a href="http://foo.bar.%60baz">http://foo.bar.`baz</a>`</p>
<p><a href="https://foo.bar.%60baz">https://foo.bar.`baz</a>`</p>
````````````````````````````````
@@ -6099,7 +6098,7 @@ closing backtick strings to be equal in length:
## Emphasis and strong emphasis
John Gruber's original [Markdown syntax
description](http://daringfireball.net/projects/markdown/syntax#em) says:
description](https://daringfireball.net/projects/markdown/syntax#em) says:
> Markdown treats asterisks (`*`) and underscores (`_`) as indicators of
> emphasis. Text wrapped with one `*` or `_` will be wrapped with an HTML
@@ -6201,7 +6200,7 @@ Here are some examples of delimiter runs.
(The idea of distinguishing left-flanking and right-flanking
delimiter runs based on the character before and the character
after comes from Roopesh Chander's
[vfmd](http://www.vfmd.org/vfmd-spec/specification/#procedure-for-identifying-emphasis-tags).
[vfmd](https://web.archive.org/web/20220608143320/http://www.vfmd.org/vfmd-spec/specification/#procedure-for-identifying-emphasis-tags).
vfmd uses the terminology "emphasis indicator string" instead of "delimiter
run," and its rules for distinguishing left- and right-flanking runs
are a bit more complex than the ones given here.)
@@ -6343,6 +6342,21 @@ Unicode nonbreaking spaces count as whitespace, too:
````````````````````````````````
Unicode symbols count as punctuation, too:
```````````````````````````````` example
*$*alpha.
*£*bravo.
*€*charlie.
.
<p>*$*alpha.</p>
<p>*£*bravo.</p>
<p>*€*charlie.</p>
````````````````````````````````
Intraword emphasis with `*` is permitted:
```````````````````````````````` example
@@ -7428,16 +7442,16 @@ _a `_`_
```````````````````````````````` example
**a<http://foo.bar/?q=**>
**a<https://foo.bar/?q=**>
.
<p>**a<a href="http://foo.bar/?q=**">http://foo.bar/?q=**</a></p>
<p>**a<a href="https://foo.bar/?q=**">https://foo.bar/?q=**</a></p>
````````````````````````````````
```````````````````````````````` example
__a<http://foo.bar/?q=__>
__a<https://foo.bar/?q=__>
.
<p>__a<a href="http://foo.bar/?q=__">http://foo.bar/?q=__</a></p>
<p>__a<a href="https://foo.bar/?q=__">https://foo.bar/?q=__</a></p>
````````````````````````````````
@@ -7685,13 +7699,13 @@ A link can contain fragment identifiers and queries:
```````````````````````````````` example
[link](#fragment)
[link](http://example.com#fragment)
[link](https://example.com#fragment)
[link](http://example.com?foo=3#frag)
[link](https://example.com?foo=3#frag)
.
<p><a href="#fragment">link</a></p>
<p><a href="http://example.com#fragment">link</a></p>
<p><a href="http://example.com?foo=3#frag">link</a></p>
<p><a href="https://example.com#fragment">link</a></p>
<p><a href="https://example.com?foo=3#frag">link</a></p>
````````````````````````````````
@@ -7935,9 +7949,9 @@ and autolinks over link grouping:
```````````````````````````````` example
[foo<http://example.com/?search=](uri)>
[foo<https://example.com/?search=](uri)>
.
<p>[foo<a href="http://example.com/?search=%5D(uri)">http://example.com/?search=](uri)</a></p>
<p>[foo<a href="https://example.com/?search=%5D(uri)">https://example.com/?search=](uri)</a></p>
````````````````````````````````
@@ -8091,11 +8105,11 @@ and autolinks over link grouping:
```````````````````````````````` example
[foo<http://example.com/?search=][ref]>
[foo<https://example.com/?search=][ref]>
[ref]: /uri
.
<p>[foo<a href="http://example.com/?search=%5D%5Bref%5D">http://example.com/?search=][ref]</a></p>
<p>[foo<a href="https://example.com/?search=%5D%5Bref%5D">https://example.com/?search=][ref]</a></p>
````````````````````````````````
@@ -8295,7 +8309,7 @@ A [collapsed reference link](@)
consists of a [link label] that [matches] a
[link reference definition] elsewhere in the
document, followed by the string `[]`.
The contents of the first link label are parsed as inlines,
The contents of the link label are parsed as inlines,
which are used as the link's text. The link's URI and title are
provided by the matching reference link definition. Thus,
`[foo][]` is equivalent to `[foo][foo]`.
@@ -8348,7 +8362,7 @@ A [shortcut reference link](@)
consists of a [link label] that [matches] a
[link reference definition] elsewhere in the
document and is not followed by `[]` or a link label.
The contents of the first link label are parsed as inlines,
The contents of the link label are parsed as inlines,
which are used as the link's text. The link's URI and title
are provided by the matching link reference definition.
Thus, `[foo]` is equivalent to `[foo][]`.
@@ -8435,7 +8449,7 @@ following closing bracket:
````````````````````````````````
Full and compact references take precedence over shortcut
Full and collapsed references take precedence over shortcut
references:
```````````````````````````````` example
@@ -8771,9 +8785,9 @@ Here are some valid autolinks:
```````````````````````````````` example
<http://foo.bar.baz/test?q=hello&id=22&boolean>
<https://foo.bar.baz/test?q=hello&id=22&boolean>
.
<p><a href="http://foo.bar.baz/test?q=hello&amp;id=22&amp;boolean">http://foo.bar.baz/test?q=hello&amp;id=22&amp;boolean</a></p>
<p><a href="https://foo.bar.baz/test?q=hello&amp;id=22&amp;boolean">https://foo.bar.baz/test?q=hello&amp;id=22&amp;boolean</a></p>
````````````````````````````````
@@ -8813,9 +8827,9 @@ with their syntax:
```````````````````````````````` example
<http://../>
<https://../>
.
<p><a href="http://../">http://../</a></p>
<p><a href="https://../">https://../</a></p>
````````````````````````````````
@@ -8829,18 +8843,18 @@ with their syntax:
Spaces are not allowed in autolinks:
```````````````````````````````` example
<http://foo.bar/baz bim>
<https://foo.bar/baz bim>
.
<p>&lt;http://foo.bar/baz bim&gt;</p>
<p>&lt;https://foo.bar/baz bim&gt;</p>
````````````````````````````````
Backslash-escapes do not work inside autolinks:
```````````````````````````````` example
<http://example.com/\[\>
<https://example.com/\[\>
.
<p><a href="http://example.com/%5C%5B%5C">http://example.com/\[\</a></p>
<p><a href="https://example.com/%5C%5B%5C">https://example.com/\[\</a></p>
````````````````````````````````
@@ -8892,9 +8906,9 @@ These are not autolinks:
```````````````````````````````` example
< http://foo.bar >
< https://foo.bar >
.
<p>&lt; http://foo.bar &gt;</p>
<p>&lt; https://foo.bar &gt;</p>
````````````````````````````````
@@ -8913,9 +8927,9 @@ These are not autolinks:
```````````````````````````````` example
http://example.com
https://example.com
.
<p>http://example.com</p>
<p>https://example.com</p>
````````````````````````````````
@@ -8977,10 +8991,9 @@ A [closing tag](@) consists of the string `</`, a
[tag name], optional spaces, tabs, and up to one line ending, and the character
`>`.
An [HTML comment](@) consists of `<!--` + *text* + `-->`,
where *text* does not start with `>` or `->`, does not end with `-`,
and does not contain `--`. (See the
[HTML5 spec](http://www.w3.org/TR/html5/syntax.html#comments).)
An [HTML comment](@) consists of `<!-->`, `<!--->`, or `<!--`, a string of
characters not including the string `-->`, and `-->` (see the
[HTML spec](https://html.spec.whatwg.org/multipage/parsing.html#markup-declaration-open-state)).
A [processing instruction](@)
consists of the string `<?`, a string
@@ -9119,30 +9132,20 @@ Illegal attributes in closing tag:
Comments:
```````````````````````````````` example
foo <!-- this is a
comment - with hyphen -->
foo <!-- this is a --
comment - with hyphens -->
.
<p>foo <!-- this is a
comment - with hyphen --></p>
<p>foo <!-- this is a --
comment - with hyphens --></p>
````````````````````````````````
```````````````````````````````` example
foo <!-- not a comment -- two hyphens -->
.
<p>foo &lt;!-- not a comment -- two hyphens --&gt;</p>
````````````````````````````````
Not comments:
```````````````````````````````` example
foo <!--> foo -->
foo <!-- foo--->
foo <!---> foo -->
.
<p>foo &lt;!--&gt; foo --&gt;</p>
<p>foo &lt;!-- foo---&gt;</p>
<p>foo <!--> foo --&gt;</p>
<p>foo <!---> foo --&gt;</p>
````````````````````````````````
@@ -9671,7 +9674,7 @@ through the stack for an opening `[` or `![` delimiter.
delimiter from the stack, and return a literal text node `]`.
- If we find one and it's active, then we parse ahead to see if
we have an inline link/image, reference link/image, compact reference
we have an inline link/image, reference link/image, collapsed reference
link/image, or shortcut reference link/image.
+ If we don't, then we remove the opening delimiter from the

View File

@@ -123,6 +123,8 @@ namespace Markdig.Tests.Specs.EmphasisExtra
public class TestExtensionsEmphasisOnHtmlEntities
{
// ## Emphasis on Html Entities
//
// Note that Unicode symbols are treated as punctuation, which are not allowed to open the emphasis unless they are preceded by a space.
[Test]
public void ExtensionsEmphasisOnHtmlEntities_Example006()
{
@@ -132,14 +134,14 @@ namespace Markdig.Tests.Specs.EmphasisExtra
// The following Markdown:
// This is text MyBrand ^&reg;^ and MyTrademark ^&trade;^
// This is text MyBrand^&reg;^ and MyTrademark^&trade;^
// This is text MyBrand~&reg;~ and MyCopyright^&copy;^
// This is text MyBrand ~&reg;~ and MyCopyright ^&copy;^
//
// Should be rendered as:
// <p>This is text MyBrand <sup>®</sup> and MyTrademark <sup>TM</sup>
// This is text MyBrand<sup>®</sup> and MyTrademark<sup>TM</sup>
// This is text MyBrand<sub>®</sub> and MyCopyright<sup>©</sup></p>
// This is text MyBrand^®^ and MyTrademark^TM^
// This is text MyBrand <sub>®</sub> and MyCopyright <sup>©</sup></p>
TestParser.TestSpec("This is text MyBrand ^&reg;^ and MyTrademark ^&trade;^\nThis is text MyBrand^&reg;^ and MyTrademark^&trade;^\nThis is text MyBrand~&reg;~ and MyCopyright^&copy;^", "<p>This is text MyBrand <sup>®</sup> and MyTrademark <sup>TM</sup>\nThis is text MyBrand<sup>®</sup> and MyTrademark<sup>TM</sup>\nThis is text MyBrand<sub>®</sub> and MyCopyright<sup>©</sup></p>", "emphasisextras|advanced", context: "Example 6\nSection Extensions / Emphasis on Html Entities\n");
TestParser.TestSpec("This is text MyBrand ^&reg;^ and MyTrademark ^&trade;^\nThis is text MyBrand^&reg;^ and MyTrademark^&trade;^\nThis is text MyBrand ~&reg;~ and MyCopyright ^&copy;^", "<p>This is text MyBrand <sup>®</sup> and MyTrademark <sup>TM</sup>\nThis is text MyBrand^®^ and MyTrademark^TM^\nThis is text MyBrand <sub>®</sub> and MyCopyright <sup>©</sup></p>", "emphasisextras|advanced", context: "Example 6\nSection Extensions / Emphasis on Html Entities\n");
}
}
}

View File

@@ -52,16 +52,17 @@ Marked text can be used to specify that a text has been marked in a document. T
.
<p><mark>Marked text</mark></p>
````````````````````````````````
## Emphasis on Html Entities
Note that Unicode symbols are treated as punctuation, which are not allowed to open the emphasis unless they are preceded by a space.
```````````````````````````````` example
This is text MyBrand ^&reg;^ and MyTrademark ^&trade;^
This is text MyBrand^&reg;^ and MyTrademark^&trade;^
This is text MyBrand~&reg;~ and MyCopyright^&copy;^
This is text MyBrand ~&reg;~ and MyCopyright ^&copy;^
.
<p>This is text MyBrand <sup>®</sup> and MyTrademark <sup>TM</sup>
This is text MyBrand<sup>®</sup> and MyTrademark<sup>TM</sup>
This is text MyBrand<sub>®</sub> and MyCopyright<sup>©</sup></p>
This is text MyBrand^®^ and MyTrademark^TM^
This is text MyBrand <sub>®</sub> and MyCopyright <sup>©</sup></p>
````````````````````````````````

View File

@@ -98,5 +98,22 @@ namespace Markdig.Tests.Specs.GenericAttributes
TestParser.TestSpec("[Foo](url){data-x=1}\n\n[Foo](url){data-x='1'}\n\n[Foo](url){data-x=11}", "<p><a href=\"url\" data-x=\"1\">Foo</a></p>\n<p><a href=\"url\" data-x=\"1\">Foo</a></p>\n<p><a href=\"url\" data-x=\"11\">Foo</a></p>", "attributes|advanced", context: "Example 3\nSection Extensions / Generic Attributes\n");
}
// Attributes that occur immediately before a block element, on a line by themselves, affect that element
[Test]
public void ExtensionsGenericAttributes_Example004()
{
// Example 4
// Section: Extensions / Generic Attributes
//
// The following Markdown:
// {.center}
// A paragraph
//
// Should be rendered as:
// <p class="center">A paragraph</p>
TestParser.TestSpec("{.center}\nA paragraph", "<p class=\"center\">A paragraph</p>", "attributes|advanced", context: "Example 4\nSection Extensions / Generic Attributes\n");
}
}
}

View File

@@ -61,3 +61,12 @@ Attribute values can be one character long
<p><a href="url" data-x="1">Foo</a></p>
<p><a href="url" data-x="11">Foo</a></p>
````````````````````````````````
Attributes that occur immediately before a block element, on a line by themselves, affect that element
```````````````````````````````` example
{.center}
A paragraph
.
<p class="center">A paragraph</p>
````````````````````````````````

View File

@@ -386,5 +386,34 @@ namespace Markdig.Tests.Specs.GridTables
TestParser.TestSpec("+", "<ul>\n<li></li>\n</ul>", "gridtables|advanced", context: "Example 11\nSection Extensions / Grid Table\n");
}
// A table may begin right after a paragraph without an empty line in between:
[Test]
public void ExtensionsGridTable_Example012()
{
// Example 12
// Section: Extensions / Grid Table
//
// The following Markdown:
// Some
// **text**.
// +---+
// | A |
// +---+
//
// Should be rendered as:
// <p>Some
// <strong>text</strong>.</p>
// <table>
// <col style="width:100%" />
// <tbody>
// <tr>
// <td>A</td>
// </tr>
// </tbody>
// </table>
TestParser.TestSpec("Some\n**text**.\n+---+\n| A |\n+---+", "<p>Some\n<strong>text</strong>.</p>\n<table>\n<col style=\"width:100%\" />\n<tbody>\n<tr>\n<td>A</td>\n</tr>\n</tbody>\n</table>", "gridtables|advanced", context: "Example 12\nSection Extensions / Grid Table\n");
}
}
}

View File

@@ -285,3 +285,24 @@ An empty `+` on a line should result in a simple empty list output:
<li></li>
</ul>
````````````````````````````````
A table may begin right after a paragraph without an empty line in between:
```````````````````````````````` example
Some
**text**.
+---+
| A |
+---+
.
<p>Some
<strong>text</strong>.</p>
<table>
<col style="width:100%" />
<tbody>
<tr>
<td>A</td>
</tr>
</tbody>
</table>
````````````````````````````````

View File

@@ -17,7 +17,7 @@ namespace Markdig.Tests.Specs.Math
//
// ## Math Inline
//
// Allows to define a mathematic block embraced by `$...$`
// Allows to define a mathematic inline block embraced by `$...$`
[Test]
public void ExtensionsMathInline_Example001()
{
@@ -25,12 +25,12 @@ namespace Markdig.Tests.Specs.Math
// Section: Extensions / Math Inline
//
// The following Markdown:
// This is a $math block$
// This is a $math inline$
//
// Should be rendered as:
// <p>This is a <span class="math">\(math block\)</span></p>
// <p>This is a <span class="math">\(math inline\)</span></p>
TestParser.TestSpec("This is a $math block$", "<p>This is a <span class=\"math\">\\(math block\\)</span></p>", "mathematics|advanced", context: "Example 1\nSection Extensions / Math Inline\n");
TestParser.TestSpec("This is a $math inline$", "<p>This is a <span class=\"math\">\\(math inline\\)</span></p>", "mathematics|advanced", context: "Example 1\nSection Extensions / Math Inline\n");
}
// Or by `$$...$$` embracing it by:
@@ -41,12 +41,12 @@ namespace Markdig.Tests.Specs.Math
// Section: Extensions / Math Inline
//
// The following Markdown:
// This is a $$math block$$
// This is a $$math inline$$
//
// Should be rendered as:
// <p>This is a <span class="math">\(math block\)</span></p>
// <p>This is a <span class="math">\(math inline\)</span></p>
TestParser.TestSpec("This is a $$math block$$", "<p>This is a <span class=\"math\">\\(math block\\)</span></p>", "mathematics|advanced", context: "Example 2\nSection Extensions / Math Inline\n");
TestParser.TestSpec("This is a $$math inline$$", "<p>This is a <span class=\"math\">\\(math inline\\)</span></p>", "mathematics|advanced", context: "Example 2\nSection Extensions / Math Inline\n");
}
// Newlines inside an inline math are not allowed:
@@ -58,13 +58,13 @@ namespace Markdig.Tests.Specs.Math
//
// The following Markdown:
// This is not a $$math
// block$$ and? this is a $$math block$$
// inline$$ and? this is a $$math inline$$
//
// Should be rendered as:
// <p>This is not a $$math
// block$$ and? this is a <span class="math">\(math block\)</span></p>
// inline$$ and? this is a <span class="math">\(math inline\)</span></p>
TestParser.TestSpec("This is not a $$math \nblock$$ and? this is a $$math block$$", "<p>This is not a $$math\nblock$$ and? this is a <span class=\"math\">\\(math block\\)</span></p>", "mathematics|advanced", context: "Example 3\nSection Extensions / Math Inline\n");
TestParser.TestSpec("This is not a $$math \ninline$$ and? this is a $$math inline$$", "<p>This is not a $$math\ninline$$ and? this is a <span class=\"math\">\\(math inline\\)</span></p>", "mathematics|advanced", context: "Example 3\nSection Extensions / Math Inline\n");
}
[Test]
@@ -75,13 +75,13 @@ namespace Markdig.Tests.Specs.Math
//
// The following Markdown:
// This is not a $math
// block$ and? this is a $math block$
// inline$ and? this is a $math inline$
//
// Should be rendered as:
// <p>This is not a $math
// block$ and? this is a <span class="math">\(math block\)</span></p>
// inline$ and? this is a <span class="math">\(math inline\)</span></p>
TestParser.TestSpec("This is not a $math \nblock$ and? this is a $math block$", "<p>This is not a $math\nblock$ and? this is a <span class=\"math\">\\(math block\\)</span></p>", "mathematics|advanced", context: "Example 4\nSection Extensions / Math Inline\n");
TestParser.TestSpec("This is not a $math \ninline$ and? this is a $math inline$", "<p>This is not a $math\ninline$ and? this is a <span class=\"math\">\\(math inline\\)</span></p>", "mathematics|advanced", context: "Example 4\nSection Extensions / Math Inline\n");
}
// An opening `$` can be followed by a space if the closing is also preceded by a space `$`:
@@ -92,12 +92,12 @@ namespace Markdig.Tests.Specs.Math
// Section: Extensions / Math Inline
//
// The following Markdown:
// This is a $ math block $
// This is a $ math inline $
//
// Should be rendered as:
// <p>This is a <span class="math">\(math block\)</span></p>
// <p>This is a <span class="math">\(math inline\)</span></p>
TestParser.TestSpec("This is a $ math block $", "<p>This is a <span class=\"math\">\\(math block\\)</span></p>", "mathematics|advanced", context: "Example 5\nSection Extensions / Math Inline\n");
TestParser.TestSpec("This is a $ math inline $", "<p>This is a <span class=\"math\">\\(math inline\\)</span></p>", "mathematics|advanced", context: "Example 5\nSection Extensions / Math Inline\n");
}
[Test]
@@ -107,12 +107,12 @@ namespace Markdig.Tests.Specs.Math
// Section: Extensions / Math Inline
//
// The following Markdown:
// This is a $ math block $ after
// This is a $ math inline $ after
//
// Should be rendered as:
// <p>This is a <span class="math">\(math block\)</span> after</p>
// <p>This is a <span class="math">\(math inline\)</span> after</p>
TestParser.TestSpec("This is a $ math block $ after", "<p>This is a <span class=\"math\">\\(math block\\)</span> after</p>", "mathematics|advanced", context: "Example 6\nSection Extensions / Math Inline\n");
TestParser.TestSpec("This is a $ math inline $ after", "<p>This is a <span class=\"math\">\\(math inline\\)</span> after</p>", "mathematics|advanced", context: "Example 6\nSection Extensions / Math Inline\n");
}
[Test]
@@ -122,12 +122,12 @@ namespace Markdig.Tests.Specs.Math
// Section: Extensions / Math Inline
//
// The following Markdown:
// This is a $$ math block $$ after
// This is a $$ math inline $$ after
//
// Should be rendered as:
// <p>This is a <span class="math">\(math block\)</span> after</p>
// <p>This is a <span class="math">\(math inline\)</span> after</p>
TestParser.TestSpec("This is a $$ math block $$ after", "<p>This is a <span class=\"math\">\\(math block\\)</span> after</p>", "mathematics|advanced", context: "Example 7\nSection Extensions / Math Inline\n");
TestParser.TestSpec("This is a $$ math inline $$ after", "<p>This is a <span class=\"math\">\\(math inline\\)</span> after</p>", "mathematics|advanced", context: "Example 7\nSection Extensions / Math Inline\n");
}
[Test]
@@ -137,12 +137,12 @@ namespace Markdig.Tests.Specs.Math
// Section: Extensions / Math Inline
//
// The following Markdown:
// This is a not $ math block$ because there is not a whitespace before the closing
// This is a not $ math inline$ because there is not a whitespace before the closing
//
// Should be rendered as:
// <p>This is a not $ math block$ because there is not a whitespace before the closing</p>
// <p>This is a not $ math inline$ because there is not a whitespace before the closing</p>
TestParser.TestSpec("This is a not $ math block$ because there is not a whitespace before the closing", "<p>This is a not $ math block$ because there is not a whitespace before the closing</p>", "mathematics|advanced", context: "Example 8\nSection Extensions / Math Inline\n");
TestParser.TestSpec("This is a not $ math inline$ because there is not a whitespace before the closing", "<p>This is a not $ math inline$ because there is not a whitespace before the closing</p>", "mathematics|advanced", context: "Example 8\nSection Extensions / Math Inline\n");
}
// For the opening `$` it requires a space or a punctuation before (but cannot be used within a word):
@@ -153,12 +153,12 @@ namespace Markdig.Tests.Specs.Math
// Section: Extensions / Math Inline
//
// The following Markdown:
// This is not a m$ath block$
// This is not a m$ath inline$
//
// Should be rendered as:
// <p>This is not a m$ath block$</p>
// <p>This is not a m$ath inline$</p>
TestParser.TestSpec("This is not a m$ath block$", "<p>This is not a m$ath block$</p>", "mathematics|advanced", context: "Example 9\nSection Extensions / Math Inline\n");
TestParser.TestSpec("This is not a m$ath inline$", "<p>This is not a m$ath inline$</p>", "mathematics|advanced", context: "Example 9\nSection Extensions / Math Inline\n");
}
// For the closing `$` it requires a space after or a punctuation (but cannot be preceded by a space and cannot be used within a word):
@@ -169,12 +169,12 @@ namespace Markdig.Tests.Specs.Math
// Section: Extensions / Math Inline
//
// The following Markdown:
// This is not a $math bloc$k
// This is not a $math inlin$e
//
// Should be rendered as:
// <p>This is not a $math bloc$k</p>
// <p>This is not a $math inlin$e</p>
TestParser.TestSpec("This is not a $math bloc$k", "<p>This is not a $math bloc$k</p>", "mathematics|advanced", context: "Example 10\nSection Extensions / Math Inline\n");
TestParser.TestSpec("This is not a $math inlin$e", "<p>This is not a $math inlin$e</p>", "mathematics|advanced", context: "Example 10\nSection Extensions / Math Inline\n");
}
// For the closing `$` it requires a space after or a punctuation (but cannot be preceded by a space and cannot be used within a word):
@@ -201,12 +201,12 @@ namespace Markdig.Tests.Specs.Math
// Section: Extensions / Math Inline
//
// The following Markdown:
// This is a $math \$ block$
// This is a $math \$ inline$
//
// Should be rendered as:
// <p>This is a <span class="math">\(math \$ block\)</span></p>
// <p>This is a <span class="math">\(math \$ inline\)</span></p>
TestParser.TestSpec("This is a $math \\$ block$", "<p>This is a <span class=\"math\">\\(math \\$ block\\)</span></p>", "mathematics|advanced", context: "Example 12\nSection Extensions / Math Inline\n");
TestParser.TestSpec("This is a $math \\$ inline$", "<p>This is a <span class=\"math\">\\(math \\$ inline\\)</span></p>", "mathematics|advanced", context: "Example 12\nSection Extensions / Math Inline\n");
}
// At most, two `$` will be matched for the opening and closing:
@@ -217,12 +217,12 @@ namespace Markdig.Tests.Specs.Math
// Section: Extensions / Math Inline
//
// The following Markdown:
// This is a $$$math block$$$
// This is a $$$math inline$$$
//
// Should be rendered as:
// <p>This is a <span class="math">\($math block$\)</span></p>
// <p>This is a <span class="math">\($math inline$\)</span></p>
TestParser.TestSpec("This is a $$$math block$$$", "<p>This is a <span class=\"math\">\\($math block$\\)</span></p>", "mathematics|advanced", context: "Example 13\nSection Extensions / Math Inline\n");
TestParser.TestSpec("This is a $$$math inline$$$", "<p>This is a <span class=\"math\">\\($math inline$\\)</span></p>", "mathematics|advanced", context: "Example 13\nSection Extensions / Math Inline\n");
}
// Regular text can come both before and after the math inline
@@ -233,15 +233,15 @@ namespace Markdig.Tests.Specs.Math
// Section: Extensions / Math Inline
//
// The following Markdown:
// This is a $math block$ with text on both sides.
// This is a $math inline$ with text on both sides.
//
// Should be rendered as:
// <p>This is a <span class="math">\(math block\)</span> with text on both sides.</p>
// <p>This is a <span class="math">\(math inline\)</span> with text on both sides.</p>
TestParser.TestSpec("This is a $math block$ with text on both sides.", "<p>This is a <span class=\"math\">\\(math block\\)</span> with text on both sides.</p>", "mathematics|advanced", context: "Example 14\nSection Extensions / Math Inline\n");
TestParser.TestSpec("This is a $math inline$ with text on both sides.", "<p>This is a <span class=\"math\">\\(math inline\\)</span> with text on both sides.</p>", "mathematics|advanced", context: "Example 14\nSection Extensions / Math Inline\n");
}
// A mathematic block takes precedence over standard emphasis `*` `_`:
// A mathematic inline block takes precedence over standard emphasis `*` `_`:
[Test]
public void ExtensionsMathInline_Example015()
{
@@ -249,15 +249,15 @@ namespace Markdig.Tests.Specs.Math
// Section: Extensions / Math Inline
//
// The following Markdown:
// This is *a $math* block$
// This is *a $math* inline$
//
// Should be rendered as:
// <p>This is *a <span class="math">\(math* block\)</span></p>
// <p>This is *a <span class="math">\(math* inline\)</span></p>
TestParser.TestSpec("This is *a $math* block$", "<p>This is *a <span class=\"math\">\\(math* block\\)</span></p>", "mathematics|advanced", context: "Example 15\nSection Extensions / Math Inline\n");
TestParser.TestSpec("This is *a $math* inline$", "<p>This is *a <span class=\"math\">\\(math* inline\\)</span></p>", "mathematics|advanced", context: "Example 15\nSection Extensions / Math Inline\n");
}
// An opening $$ at the beginning of a line should not be interpreted as a Math block:
// An opening $$ at the beginning of a line should not be interpreted as a Math inline:
[Test]
public void ExtensionsMathInline_Example016()
{

View File

@@ -4,79 +4,79 @@ Adds support for mathematics spans:
## Math Inline
Allows to define a mathematic block embraced by `$...$`
Allows to define a mathematic inline block embraced by `$...$`
```````````````````````````````` example
This is a $math block$
This is a $math inline$
.
<p>This is a <span class="math">\(math block\)</span></p>
<p>This is a <span class="math">\(math inline\)</span></p>
````````````````````````````````
Or by `$$...$$` embracing it by:
```````````````````````````````` example
This is a $$math block$$
This is a $$math inline$$
.
<p>This is a <span class="math">\(math block\)</span></p>
<p>This is a <span class="math">\(math inline\)</span></p>
````````````````````````````````
Newlines inside an inline math are not allowed:
```````````````````````````````` example
This is not a $$math
block$$ and? this is a $$math block$$
inline$$ and? this is a $$math inline$$
.
<p>This is not a $$math
block$$ and? this is a <span class="math">\(math block\)</span></p>
inline$$ and? this is a <span class="math">\(math inline\)</span></p>
````````````````````````````````
```````````````````````````````` example
This is not a $math
block$ and? this is a $math block$
inline$ and? this is a $math inline$
.
<p>This is not a $math
block$ and? this is a <span class="math">\(math block\)</span></p>
inline$ and? this is a <span class="math">\(math inline\)</span></p>
````````````````````````````````
An opening `$` can be followed by a space if the closing is also preceded by a space `$`:
```````````````````````````````` example
This is a $ math block $
This is a $ math inline $
.
<p>This is a <span class="math">\(math block\)</span></p>
<p>This is a <span class="math">\(math inline\)</span></p>
````````````````````````````````
```````````````````````````````` example
This is a $ math block $ after
This is a $ math inline $ after
.
<p>This is a <span class="math">\(math block\)</span> after</p>
<p>This is a <span class="math">\(math inline\)</span> after</p>
````````````````````````````````
```````````````````````````````` example
This is a $$ math block $$ after
This is a $$ math inline $$ after
.
<p>This is a <span class="math">\(math block\)</span> after</p>
<p>This is a <span class="math">\(math inline\)</span> after</p>
````````````````````````````````
```````````````````````````````` example
This is a not $ math block$ because there is not a whitespace before the closing
This is a not $ math inline$ because there is not a whitespace before the closing
.
<p>This is a not $ math block$ because there is not a whitespace before the closing</p>
<p>This is a not $ math inline$ because there is not a whitespace before the closing</p>
````````````````````````````````
For the opening `$` it requires a space or a punctuation before (but cannot be used within a word):
```````````````````````````````` example
This is not a m$ath block$
This is not a m$ath inline$
.
<p>This is not a m$ath block$</p>
<p>This is not a m$ath inline$</p>
````````````````````````````````
For the closing `$` it requires a space after or a punctuation (but cannot be preceded by a space and cannot be used within a word):
```````````````````````````````` example
This is not a $math bloc$k
This is not a $math inlin$e
.
<p>This is not a $math bloc$k</p>
<p>This is not a $math inlin$e</p>
````````````````````````````````
For the closing `$` it requires a space after or a punctuation (but cannot be preceded by a space and cannot be used within a word):
@@ -90,34 +90,34 @@ This is should not match a 16$ or a $15
A `$` can be escaped between a math inline block by using the escape `\\`
```````````````````````````````` example
This is a $math \$ block$
This is a $math \$ inline$
.
<p>This is a <span class="math">\(math \$ block\)</span></p>
<p>This is a <span class="math">\(math \$ inline\)</span></p>
````````````````````````````````
At most, two `$` will be matched for the opening and closing:
```````````````````````````````` example
This is a $$$math block$$$
This is a $$$math inline$$$
.
<p>This is a <span class="math">\($math block$\)</span></p>
<p>This is a <span class="math">\($math inline$\)</span></p>
````````````````````````````````
Regular text can come both before and after the math inline
```````````````````````````````` example
This is a $math block$ with text on both sides.
This is a $math inline$ with text on both sides.
.
<p>This is a <span class="math">\(math block\)</span> with text on both sides.</p>
<p>This is a <span class="math">\(math inline\)</span> with text on both sides.</p>
````````````````````````````````
A mathematic block takes precedence over standard emphasis `*` `_`:
A mathematic inline block takes precedence over standard emphasis `*` `_`:
```````````````````````````````` example
This is *a $math* block$
This is *a $math* inline$
.
<p>This is *a <span class="math">\(math* block\)</span></p>
<p>This is *a <span class="math">\(math* inline\)</span></p>
````````````````````````````````
An opening $$ at the beginning of a line should not be interpreted as a Math block:
An opening $$ at the beginning of a line should not be interpreted as a Math inline:
```````````````````````````````` example
$$ math $$ starting at a line

View File

@@ -825,5 +825,190 @@ namespace Markdig.Tests.Specs.PipeTables
TestParser.TestSpec("a | b\n-- | - \n0 | 1 | 2", "<table>\n<thead>\n<tr>\n<th>a</th>\n<th>b</th>\n<th></th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td>0</td>\n<td>1</td>\n<td>2</td>\n</tr>\n</tbody>\n</table>", "pipetables|advanced", context: "Example 25\nSection Extensions / Pipe Table\n");
}
// A table may begin right after a paragraph without an empty line in between:
[Test]
public void ExtensionsPipeTable_Example026()
{
// Example 26
// Section: Extensions / Pipe Table
//
// The following Markdown:
// Some
// **text**.
// | A |
// |---|
// | B |
//
// Should be rendered as:
// <p>Some
// <strong>text</strong>.</p>
// <table>
// <thead>
// <tr>
// <th>A</th>
// </tr>
// </thead>
// <tbody>
// <tr>
// <td>B</td>
// </tr>
// </tbody>
// </table>
TestParser.TestSpec("Some\n**text**.\n| A |\n|---|\n| B |", "<p>Some\n<strong>text</strong>.</p>\n<table>\n<thead>\n<tr>\n<th>A</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td>B</td>\n</tr>\n</tbody>\n</table>", "pipetables|advanced", context: "Example 26\nSection Extensions / Pipe Table\n");
}
// Tables can be nested inside other blocks, like lists:
[Test]
public void ExtensionsPipeTable_Example027()
{
// Example 27
// Section: Extensions / Pipe Table
//
// The following Markdown:
// Bullet list
// * Table 1
//
// | Header 1 | Header 2 |
// |----------------|----------------|
// | Row 1 Column 1 | Row 1 Column 2 |
//
// * Table 2
// | Header 1 | Header 2 |
// |----------------|----------------|
// | Row 1 Column 1 | Row 1 Column 2 |
//
// * Table 3
// Lorem ipsum ...
// Lorem ipsum ...
// | Header 1 | Header 2 |
// |----------------|----------------|
// | Row 1 Column 1 | Row 1 Column 2 |
//
//
// Ordered list
// 1. Table 1
//
// | Header 1 | Header 2 |
// |----------------|----------------|
// | Row 1 Column 1 | Row 1 Column 2 |
//
// 2. Table 2
// | Header 1 | Header 2 |
// |----------------|----------------|
// | Row 1 Column 1 | Row 1 Column 2 |
//
// 3. Table 3
// Lorem ipsum ...
// Lorem ipsum ...
// | Header 1 | Header 2 |
// |----------------|----------------|
// | Row 1 Column 1 | Row 1 Column 2 |
//
// Should be rendered as:
// <p>Bullet list</p>
// <ul>
// <li><p>Table 1</p>
// <table>
// <thead>
// <tr>
// <th>Header 1</th>
// <th>Header 2</th>
// </tr>
// </thead>
// <tbody>
// <tr>
// <td>Row 1 Column 1</td>
// <td>Row 1 Column 2</td>
// </tr>
// </tbody>
// </table></li>
// <li><p>Table 2</p>
// <table>
// <thead>
// <tr>
// <th>Header 1</th>
// <th>Header 2</th>
// </tr>
// </thead>
// <tbody>
// <tr>
// <td>Row 1 Column 1</td>
// <td>Row 1 Column 2</td>
// </tr>
// </tbody>
// </table></li>
// <li><p>Table 3
// Lorem ipsum ...
// Lorem ipsum ...</p>
// <table>
// <thead>
// <tr>
// <th>Header 1</th>
// <th>Header 2</th>
// </tr>
// </thead>
// <tbody>
// <tr>
// <td>Row 1 Column 1</td>
// <td>Row 1 Column 2</td>
// </tr>
// </tbody>
// </table></li>
// </ul>
// <p>Ordered list</p>
// <ol>
// <li><p>Table 1</p>
// <table>
// <thead>
// <tr>
// <th>Header 1</th>
// <th>Header 2</th>
// </tr>
// </thead>
// <tbody>
// <tr>
// <td>Row 1 Column 1</td>
// <td>Row 1 Column 2</td>
// </tr>
// </tbody>
// </table></li>
// <li><p>Table 2</p>
// <table>
// <thead>
// <tr>
// <th>Header 1</th>
// <th>Header 2</th>
// </tr>
// </thead>
// <tbody>
// <tr>
// <td>Row 1 Column 1</td>
// <td>Row 1 Column 2</td>
// </tr>
// </tbody>
// </table></li>
// <li><p>Table 3
// Lorem ipsum ...
// Lorem ipsum ...</p>
// <table>
// <thead>
// <tr>
// <th>Header 1</th>
// <th>Header 2</th>
// </tr>
// </thead>
// <tbody>
// <tr>
// <td>Row 1 Column 1</td>
// <td>Row 1 Column 2</td>
// </tr>
// </tbody>
// </table></li>
// </ol>
TestParser.TestSpec("Bullet list\n* Table 1\n\n | Header 1 | Header 2 |\n |----------------|----------------|\n | Row 1 Column 1 | Row 1 Column 2 |\n\n* Table 2\n | Header 1 | Header 2 |\n |----------------|----------------|\n | Row 1 Column 1 | Row 1 Column 2 |\n\n* Table 3\n Lorem ipsum ...\n Lorem ipsum ...\n | Header 1 | Header 2 |\n |----------------|----------------|\n | Row 1 Column 1 | Row 1 Column 2 |\n\n\nOrdered list\n1. Table 1\n\n | Header 1 | Header 2 |\n |----------------|----------------|\n | Row 1 Column 1 | Row 1 Column 2 |\n\n2. Table 2\n | Header 1 | Header 2 |\n |----------------|----------------|\n | Row 1 Column 1 | Row 1 Column 2 |\n\n3. Table 3\n Lorem ipsum ...\n Lorem ipsum ...\n | Header 1 | Header 2 |\n |----------------|----------------|\n | Row 1 Column 1 | Row 1 Column 2 |", "<p>Bullet list</p>\n<ul>\n<li><p>Table 1</p>\n<table>\n<thead>\n<tr>\n<th>Header 1</th>\n<th>Header 2</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td>Row 1 Column 1</td>\n<td>Row 1 Column 2</td>\n</tr>\n</tbody>\n</table></li>\n<li><p>Table 2</p>\n<table>\n<thead>\n<tr>\n<th>Header 1</th>\n<th>Header 2</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td>Row 1 Column 1</td>\n<td>Row 1 Column 2</td>\n</tr>\n</tbody>\n</table></li>\n<li><p>Table 3\nLorem ipsum ...\nLorem ipsum ...</p>\n<table>\n<thead>\n<tr>\n<th>Header 1</th>\n<th>Header 2</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td>Row 1 Column 1</td>\n<td>Row 1 Column 2</td>\n</tr>\n</tbody>\n</table></li>\n</ul>\n<p>Ordered list</p>\n<ol>\n<li><p>Table 1</p>\n<table>\n<thead>\n<tr>\n<th>Header 1</th>\n<th>Header 2</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td>Row 1 Column 1</td>\n<td>Row 1 Column 2</td>\n</tr>\n</tbody>\n</table></li>\n<li><p>Table 2</p>\n<table>\n<thead>\n<tr>\n<th>Header 1</th>\n<th>Header 2</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td>Row 1 Column 1</td>\n<td>Row 1 Column 2</td>\n</tr>\n</tbody>\n</table></li>\n<li><p>Table 3\nLorem ipsum ...\nLorem ipsum ...</p>\n<table>\n<thead>\n<tr>\n<th>Header 1</th>\n<th>Header 2</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td>Row 1 Column 1</td>\n<td>Row 1 Column 2</td>\n</tr>\n</tbody>\n</table></li>\n</ol>", "pipetables|advanced", context: "Example 27\nSection Extensions / Pipe Table\n");
}
}
}

View File

@@ -612,4 +612,173 @@ a | b
</tr>
</tbody>
</table>
````````````````````````````````
A table may begin right after a paragraph without an empty line in between:
```````````````````````````````` example
Some
**text**.
| A |
|---|
| B |
.
<p>Some
<strong>text</strong>.</p>
<table>
<thead>
<tr>
<th>A</th>
</tr>
</thead>
<tbody>
<tr>
<td>B</td>
</tr>
</tbody>
</table>
````````````````````````````````
Tables can be nested inside other blocks, like lists:
```````````````````````````````` example
Bullet list
* Table 1
| Header 1 | Header 2 |
|----------------|----------------|
| Row 1 Column 1 | Row 1 Column 2 |
* Table 2
| Header 1 | Header 2 |
|----------------|----------------|
| Row 1 Column 1 | Row 1 Column 2 |
* Table 3
Lorem ipsum ...
Lorem ipsum ...
| Header 1 | Header 2 |
|----------------|----------------|
| Row 1 Column 1 | Row 1 Column 2 |
Ordered list
1. Table 1
| Header 1 | Header 2 |
|----------------|----------------|
| Row 1 Column 1 | Row 1 Column 2 |
2. Table 2
| Header 1 | Header 2 |
|----------------|----------------|
| Row 1 Column 1 | Row 1 Column 2 |
3. Table 3
Lorem ipsum ...
Lorem ipsum ...
| Header 1 | Header 2 |
|----------------|----------------|
| Row 1 Column 1 | Row 1 Column 2 |
.
<p>Bullet list</p>
<ul>
<li><p>Table 1</p>
<table>
<thead>
<tr>
<th>Header 1</th>
<th>Header 2</th>
</tr>
</thead>
<tbody>
<tr>
<td>Row 1 Column 1</td>
<td>Row 1 Column 2</td>
</tr>
</tbody>
</table></li>
<li><p>Table 2</p>
<table>
<thead>
<tr>
<th>Header 1</th>
<th>Header 2</th>
</tr>
</thead>
<tbody>
<tr>
<td>Row 1 Column 1</td>
<td>Row 1 Column 2</td>
</tr>
</tbody>
</table></li>
<li><p>Table 3
Lorem ipsum ...
Lorem ipsum ...</p>
<table>
<thead>
<tr>
<th>Header 1</th>
<th>Header 2</th>
</tr>
</thead>
<tbody>
<tr>
<td>Row 1 Column 1</td>
<td>Row 1 Column 2</td>
</tr>
</tbody>
</table></li>
</ul>
<p>Ordered list</p>
<ol>
<li><p>Table 1</p>
<table>
<thead>
<tr>
<th>Header 1</th>
<th>Header 2</th>
</tr>
</thead>
<tbody>
<tr>
<td>Row 1 Column 1</td>
<td>Row 1 Column 2</td>
</tr>
</tbody>
</table></li>
<li><p>Table 2</p>
<table>
<thead>
<tr>
<th>Header 1</th>
<th>Header 2</th>
</tr>
</thead>
<tbody>
<tr>
<td>Row 1 Column 1</td>
<td>Row 1 Column 2</td>
</tr>
</tbody>
</table></li>
<li><p>Table 3
Lorem ipsum ...
Lorem ipsum ...</p>
<table>
<thead>
<tr>
<th>Header 1</th>
<th>Header 2</th>
</tr>
</thead>
<tbody>
<tr>
<td>Row 1 Column 1</td>
<td>Row 1 Column 2</td>
</tr>
</tbody>
</table></li>
</ol>
````````````````````````````````

View File

@@ -0,0 +1,24 @@
using Markdig.Extensions.AutoLinks;
namespace Markdig.Tests;
[TestFixture]
public class TestAutoLinks
{
[Test]
[TestCase("https://localhost", "<p><a href=\"https://localhost\">https://localhost</a></p>")]
[TestCase("http://localhost", "<p><a href=\"http://localhost\">http://localhost</a></p>")]
[TestCase("https://l", "<p><a href=\"https://l\">https://l</a></p>")]
[TestCase("www.l", "<p><a href=\"http://www.l\">www.l</a></p>")]
[TestCase("https://localhost:5000", "<p><a href=\"https://localhost:5000\">https://localhost:5000</a></p>")]
[TestCase("www.l:5000", "<p><a href=\"http://www.l:5000\">www.l:5000</a></p>")]
public void TestLinksWithAllowDomainWithoutPeriod(string markdown, string expected)
{
var pipeline = new MarkdownPipelineBuilder()
.UseAutoLinks(new AutoLinkOptions { AllowDomainWithoutPeriod = true })
.Build();
var html = Markdown.ToHtml(markdown, pipeline);
Assert.That(html, Is.EqualTo(expected).IgnoreWhiteSpace);
}
}

View File

@@ -19,18 +19,32 @@ public class TestCharHelper
'{', '|', '}', '~'
};
// A Unicode punctuation character is an ASCII punctuation character or anything in the general Unicode categories
// Pc, Pd, Pe, Pf, Pi, Po, or Ps.
private static readonly HashSet<UnicodeCategory> s_punctuationCategories = new()
{
// A Unicode punctuation character is a character in the Unicode P (punctuation) or S (symbol) general categories.
private static readonly HashSet<UnicodeCategory> s_punctuationCategories =
[
UnicodeCategory.ConnectorPunctuation,
UnicodeCategory.DashPunctuation,
UnicodeCategory.OpenPunctuation,
UnicodeCategory.ClosePunctuation,
UnicodeCategory.FinalQuotePunctuation,
UnicodeCategory.InitialQuotePunctuation,
UnicodeCategory.FinalQuotePunctuation,
UnicodeCategory.OtherPunctuation,
UnicodeCategory.OpenPunctuation
};
UnicodeCategory.MathSymbol,
UnicodeCategory.CurrencySymbol,
UnicodeCategory.ModifierSymbol,
UnicodeCategory.OtherSymbol,
];
private static readonly HashSet<UnicodeCategory> s_punctuationWithoutSymbolsCategories =
[
UnicodeCategory.ConnectorPunctuation,
UnicodeCategory.DashPunctuation,
UnicodeCategory.OpenPunctuation,
UnicodeCategory.ClosePunctuation,
UnicodeCategory.InitialQuotePunctuation,
UnicodeCategory.FinalQuotePunctuation,
UnicodeCategory.OtherPunctuation,
];
private static bool ExpectedIsPunctuation(char c)
{
@@ -39,23 +53,98 @@ public class TestCharHelper
: s_punctuationCategories.Contains(CharUnicodeInfo.GetUnicodeCategory(c));
}
private static bool ExpectedIsPunctuationWithoutSymbols(char c)
{
return c <= 127
? s_asciiPunctuation.Contains(c)
: s_punctuationWithoutSymbolsCategories.Contains(CharUnicodeInfo.GetUnicodeCategory(c));
}
private static bool ExpectedIsWhitespace(char c)
{
// A Unicode whitespace character is any code point in the Unicode Zs general category,
// or a tab (U+0009), line feed (U+000A), form feed (U+000C), or carriage return (U+000D).
return c == '\t' || c == '\n' || c == '\u000C' || c == '\r' ||
return c == '\t' || c == '\n' || c == '\f' || c == '\r' ||
CharUnicodeInfo.GetUnicodeCategory(c) == UnicodeCategory.SpaceSeparator;
}
[Test]
public void IsAcrossTab()
{
Assert.False(CharHelper.IsAcrossTab(0));
Assert.True(CharHelper.IsAcrossTab(1));
Assert.True(CharHelper.IsAcrossTab(2));
Assert.True(CharHelper.IsAcrossTab(3));
Assert.False(CharHelper.IsAcrossTab(4));
}
[Test]
public void AddTab()
{
Assert.AreEqual(4, CharHelper.AddTab(0));
Assert.AreEqual(4, CharHelper.AddTab(1));
Assert.AreEqual(4, CharHelper.AddTab(2));
Assert.AreEqual(4, CharHelper.AddTab(3));
Assert.AreEqual(8, CharHelper.AddTab(4));
Assert.AreEqual(8, CharHelper.AddTab(5));
}
[Test]
public void IsWhitespace()
{
for (int i = char.MinValue; i <= char.MaxValue; i++)
{
char c = (char)i;
Test(
ExpectedIsWhitespace,
CharHelper.IsWhitespace);
Assert.AreEqual(ExpectedIsWhitespace(c), CharHelper.IsWhitespace(c));
}
Test(
ExpectedIsWhitespace,
CharHelper.WhitespaceChars.Contains);
}
[Test]
public void IsWhiteSpaceOrZero()
{
Test(
c => ExpectedIsWhitespace(c) || c == 0,
CharHelper.IsWhiteSpaceOrZero);
}
[Test]
public void IsAsciiPunctuation()
{
Test(
c => char.IsAscii(c) && ExpectedIsPunctuation(c),
CharHelper.IsAsciiPunctuation);
}
[Test]
public void IsAsciiPunctuationOrZero()
{
Test(
c => char.IsAscii(c) && (ExpectedIsPunctuation(c) || c == 0),
CharHelper.IsAsciiPunctuationOrZero);
}
[Test]
public void IsSpaceOrPunctuationForGFMAutoLink()
{
Test(
c => c == 0 || ExpectedIsWhitespace(c) || ExpectedIsPunctuationWithoutSymbols(c),
CharHelper.IsSpaceOrPunctuationForGFMAutoLink);
}
[Test]
public void InvalidAutoLinkCharacters()
{
// 6.5 Autolinks - https://spec.commonmark.org/0.31.2/#autolinks
// An absolute URI, for these purposes, consists of a scheme followed by a colon (:) followed by
// zero or more characters other than ASCII control characters, space, <, and >.
//
// 2.1 Characters and lines
// An ASCII control character is a character between U+00001F (both including) or U+007F.
Test(
c => c != 0 && c is < (char)0x20 or ' ' or '<' or '>' or '\u007F',
CharHelper.InvalidAutoLinkCharacters.Contains);
}
[Test]
@@ -76,15 +165,98 @@ public class TestCharHelper
}
[Test]
public void IsSpaceOrPunctuation()
public void IsControl()
{
Test(
char.IsControl,
CharHelper.IsControl);
}
[Test]
public void IsAlpha()
{
Test(
c => (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z'),
CharHelper.IsAlpha);
}
[Test]
public void IsAlphaUpper()
{
Test(
c => c >= 'A' && c <= 'Z',
CharHelper.IsAlphaUpper);
}
[Test]
public void IsAlphaNumeric()
{
Test(
c => (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || (c >= '0' && c <= '9'),
CharHelper.IsAlphaNumeric);
}
[Test]
public void IsDigit()
{
Test(
c => c >= '0' && c <= '9',
CharHelper.IsDigit);
}
[Test]
public void IsNewLineOrLineFeed()
{
Test(
c => c is '\r' or '\n',
CharHelper.IsNewLineOrLineFeed);
}
[Test]
public void IsSpaceOrTab()
{
Test(
c => c is ' ' or '\t',
CharHelper.IsSpaceOrTab);
}
[Test]
public void IsEscapableSymbol()
{
Test(
"!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~•".Contains,
CharHelper.IsEscapableSymbol);
}
[Test]
public void IsEmailUsernameSpecialChar()
{
Test(
".!#$%&'*+/=?^_`{|}~-+.~".Contains,
CharHelper.IsEmailUsernameSpecialChar);
}
[Test]
public void IsEmailUsernameSpecialCharOrDigit()
{
Test(
c => CharHelper.IsDigit(c) || ".!#$%&'*+/=?^_`{|}~-+.~".Contains(c),
CharHelper.IsEmailUsernameSpecialCharOrDigit);
}
private static void Test(Func<char, bool> expected, Func<char, bool> actual)
{
for (int i = char.MinValue; i <= char.MaxValue; i++)
{
char c = (char)i;
bool expected = c == 0 || ExpectedIsWhitespace(c) || ExpectedIsPunctuation(c);
bool expectedResult = expected(c);
bool actualResult = actual(c);
Assert.AreEqual(expected, CharHelper.IsSpaceOrPunctuation(c));
if (expectedResult != actualResult)
{
Assert.AreEqual(expectedResult, actualResult, $"Char: '{c}' ({i})");
}
}
}
}

View File

@@ -0,0 +1,10 @@
namespace Markdig.Tests;
public class TestCodeInline
{
[Test]
public void UnpairedCodeInlineWithTrailingChars()
{
TestParser.TestSpec("*`\n\f", "<p>*`</p>");
}
}

View File

@@ -148,6 +148,9 @@ public class TestEmphasisExtended
[TestCase("1Foo1", "<one-only>Foo</one-only>")]
[TestCase("1121", "1<one-only>2</one-only>")]
[TestCase("22322", "<two-only>3</two-only>")]
[TestCase("2223222", "2<two-only>32</two-only>")]
[TestCase("22223222", "22<two-only>32</two-only>")]
[TestCase("22223223222", "22223<two-only>3</two-only>2")]
[TestCase("2232", "2232")]
[TestCase("333", "333")]
[TestCase("3334333", "<three-only>4</three-only>")]

View File

@@ -22,6 +22,18 @@ public partial class TestEmphasisPlus
TestParser.TestSpec("normal ***Strong emphasis*** normal", "<p>normal <em><strong>Strong emphasis</strong></em> normal</p>", "");
}
[Test]
public void SupplementaryPunctuation()
{
TestParser.TestSpec("a*a∇*a\n\na*∇a*a\n\na*a𝜵*a\n\na*𝜵a*a\n\na*𐬼a*a\n\na*a𐬼*a", "<p>a*a∇*a</p>\n<p>a*∇a*a</p>\n<p>a*a𝜵*a</p>\n<p>a*𝜵a*a</p>\n<p>a*𐬼a*a</p>\n<p>a*a𐬼*a</p>", "");
}
[Test]
public void RecognizeSupplementaryChars()
{
TestParser.TestSpec("🌶️**𰻞**🍜**𰻞**🌶️**麺**🍜", "<p>🌶️<strong>𰻞</strong>🍜<strong>𰻞</strong>🌶️<strong>麺</strong>🍜</p>", "");
}
[Test]
public void OpenEmphasisHasConvenientContentStringSlice()
{

View File

@@ -0,0 +1,35 @@
using Markdig.Syntax;
namespace Markdig.Tests;
public class TestHtmlCodeBlocks
{
// Start condition: line begins with the string < or </ followed by one of the strings (case-insensitive)
// {list of all tags}, followed by a space, a tab, the end of the line, the string >, or the string />.
public static string[] KnownSimpleHtmlTags =>
[
"address", "article", "aside", "base", "basefont", "blockquote", "body", "caption", "center", "col", "colgroup", "dd", "details",
"dialog", "dir", "div", "dl", "dt", "fieldset", "figcaption", "figure", "footer", "form", "frame", "frameset",
"h1", "h2", "h3", "h4", "h5", "h6", "head", "header", "hr", "html", "iframe", "legend", "li", "link",
"main", "menu", "menuitem", "nav", "noframes", "ol", "optgroup", "option", "p", "param",
"search", "section", "summary", "table", "tbody", "td", "tfoot", "th", "thead", "title", "tr", "track", "ul",
];
[Theory]
[TestCaseSource(nameof(KnownSimpleHtmlTags))]
public void TestKnownTags(string tag)
{
MarkdownDocument document = Markdown.Parse(
$"""
Hello
<{tag} />
World
""".ReplaceLineEndings("\n"));
HtmlBlock[] htmlBlocks = document.Descendants<HtmlBlock>().ToArray();
Assert.AreEqual(1, htmlBlocks.Length);
Assert.AreEqual(7, htmlBlocks[0].Span.Start);
Assert.AreEqual(10 + tag.Length, htmlBlocks[0].Span.Length);
}
}

View File

@@ -89,6 +89,14 @@ public class TestLinkHelper
Assert.AreEqual("this\ris\r\na\ntitle", title);
}
[Test]
public void TestTitleMultilineWithSpaceAndBackslash()
{
var text = new StringSlice("'a\n\\ \\\ntitle'");
Assert.True(LinkHelper.TryParseTitle(ref text, out string title, out _));
Assert.AreEqual("a\n\\ \\\ntitle", title);
}
[Test]
public void TestUrlAndTitle()
{
@@ -104,26 +112,26 @@ public class TestLinkHelper
}
[Test]
public void TestUrlAndTitleEmpty()
public void TestUrlEmptyAndTitleNull()
{
// 01234
var text = new StringSlice(@"(<>)A");
Assert.True(LinkHelper.TryParseInlineLink(ref text, out string link, out string title, out SourceSpan linkSpan, out SourceSpan titleSpan));
Assert.AreEqual(string.Empty, link);
Assert.AreEqual(string.Empty, title);
Assert.AreEqual(null, title);
Assert.AreEqual(new SourceSpan(1, 2), linkSpan);
Assert.AreEqual(SourceSpan.Empty, titleSpan);
Assert.AreEqual('A', text.CurrentChar);
}
[Test]
public void TestUrlAndTitleEmpty2()
public void TestUrlEmptyAndTitleNull2()
{
// 012345
var text = new StringSlice(@"( <> )A");
Assert.True(LinkHelper.TryParseInlineLink(ref text, out string link, out string title, out SourceSpan linkSpan, out SourceSpan titleSpan));
Assert.AreEqual(string.Empty, link);
Assert.AreEqual(string.Empty, title);
Assert.AreEqual(null, title);
Assert.AreEqual(new SourceSpan(2, 3), linkSpan);
Assert.AreEqual(SourceSpan.Empty, titleSpan);
Assert.AreEqual('A', text.CurrentChar);
@@ -150,7 +158,7 @@ public class TestLinkHelper
var text = new StringSlice(@"()A");
Assert.True(LinkHelper.TryParseInlineLink(ref text, out string link, out string title, out SourceSpan linkSpan, out SourceSpan titleSpan));
Assert.AreEqual(string.Empty, link);
Assert.AreEqual(string.Empty, title);
Assert.AreEqual(null, title);
Assert.AreEqual(SourceSpan.Empty, linkSpan);
Assert.AreEqual(SourceSpan.Empty, titleSpan);
Assert.AreEqual('A', text.CurrentChar);
@@ -238,6 +246,13 @@ public class TestLinkHelper
}
[Test]
public void TestlLinkReferenceDefinitionInvalid()
{
var text = new StringSlice("[foo]: /url (title) x\n");
Assert.False(LinkHelper.TryParseLinkReferenceDefinition(ref text, out _, out _, out _, out _, out _, out _));
}
[Test]
public void TestAutoLinkUrlSimple()
{
@@ -312,8 +327,8 @@ public class TestLinkHelper
Assert.AreEqual(expectedResult, LinkHelper.Urilize(input, true));
}
[TestCase("bær", "br")]
[TestCase("bør", "br")]
[TestCase("bær", "baer")]
[TestCase("bør", "boer")]
[TestCase("bΘr", "br")]
[TestCase("四五", "")]
public void TestUrilizeOnlyAscii_NonAscii(string input, string expectedResult)
@@ -328,6 +343,75 @@ public class TestLinkHelper
Assert.AreEqual(expectedResult, LinkHelper.Urilize(input, true));
}
// Tests for NormalizeScandinavianOrGermanChar method mappings
// These special characters are always normalized (both allowOnlyAscii=true and false)
//
// Note: When allowOnlyAscii=true, NFD (Canonical Decomposition) is applied first:
// - German umlauts ä,ö,ü decompose to base letter + combining mark (ü -> u + ¨)
// The combining mark is then stripped, leaving just the base letter (ü -> u)
// - å decomposes similarly (å -> a + ˚ -> a)
// - But ø, æ, ß, þ, ð do NOT decompose, so they use NormalizeScandinavianOrGermanChar
//
// When allowOnlyAscii=false, NormalizeScandinavianOrGermanChar is used for ALL special chars
// German ß (Eszett/sharp s) - does NOT decompose with NFD
[TestCase("Straße", "strasse")] // ß -> ss (both allowOnlyAscii=true and false)
// Scandinavian æ, ø - do NOT decompose with NFD
[TestCase("æble", "aeble")] // æ -> ae (both modes)
[TestCase("Ærø", "aeroe")] // Æ -> Ae, ø -> oe (both modes, then lowercase)
[TestCase("København", "koebenhavn")] // ø -> oe (both modes)
[TestCase("Øresund", "oeresund")] // Ø -> Oe (both modes, then lowercase)
// Icelandic þ, ð - do NOT decompose with NFD
[TestCase("þing", "thing")] // þ (thorn) -> th (both modes)
[TestCase("bað", "bad")] // ð (eth) -> d (both modes)
// Mixed special characters (only chars that behave same in both modes)
[TestCase("øst-æble", "oest-aeble")] // ø->oe, æ->ae (both modes)
public void TestUrilizeScandinavianGermanChars(string input, string expectedResult)
{
// These transformations apply regardless of allowOnlyAscii flag
Assert.AreEqual(expectedResult, LinkHelper.Urilize(input, true));
Assert.AreEqual(expectedResult, LinkHelper.Urilize(input, false));
}
// Tests specific to allowOnlyAscii=true behavior
// German umlauts (ä, ö, ü) and å decompose with NFD, so they become base letter only
[TestCase("schön", "schon")] // ö decomposes to o (NFD strips combining mark)
[TestCase("Mädchen", "madchen")] // ä decomposes to a
[TestCase("Übung", "ubung")] // Ü decomposes to U (then lowercase to u)
[TestCase("Düsseldorf", "dusseldorf")] // ü decomposes to u
[TestCase("Käse", "kase")] // ä decomposes to a
[TestCase("gå", "ga")] // å decomposes to a
[TestCase("Ålesund", "alesund")] // Å decomposes to A (then lowercase)
[TestCase("grüßen", "grussen")] // ü decomposes to u, ß -> ss
[TestCase("Þór", "thor")] // Þ -> Th, ó decomposes to o (then lowercase)
[TestCase("Íslandsbanki", "islandsbanki")] // Í decomposes to I (then lowercase)
public void TestUrilizeOnlyAscii_GermanUmlautsDecompose(string input, string expectedResult)
{
// With allowOnlyAscii=true, these characters decompose via NFD and lose their diacritics
Assert.AreEqual(expectedResult, LinkHelper.Urilize(input, true));
}
// Tests specific to allowOnlyAscii=false behavior
// All special chars use NormalizeScandinavianOrGermanChar (including ä, ö, ü, å)
[TestCase("schön", "schoen")] // ö -> oe (NormalizeScandinavianOrGermanChar)
[TestCase("Mädchen", "maedchen")] // ä -> ae
[TestCase("Übung", "uebung")] // Ü -> Ue (then lowercase)
[TestCase("Düsseldorf", "duesseldorf")] // ü -> ue
[TestCase("Käse", "kaese")] // ä -> ae
[TestCase("gå", "gaa")] // å -> aa
[TestCase("Ålesund", "aalesund")] // Å -> Aa (then lowercase)
[TestCase("grüßen", "gruessen")] // ü -> ue, ß -> ss
[TestCase("Þór", "thór")] // Þ -> Th (then lowercase 'th'), ó is kept as-is
[TestCase("Íslandsbanki", "íslandsbanki")] // í is kept as-is when allowOnlyAscii=false
public void TestUrilizeNonAscii_GermanUmlautsExpanded(string input, string expectedResult)
{
// With allowOnlyAscii=false, these characters use NormalizeScandinavianOrGermanChar
Assert.AreEqual(expectedResult, LinkHelper.Urilize(input, false));
}
[TestCase("123", "")]
[TestCase("1,-b", "b")]
[TestCase("b1,-", "b1")] // Not Pandoc equivalent: b1-
@@ -345,11 +429,11 @@ public class TestLinkHelper
Assert.AreEqual(expectedResult, LinkHelper.Urilize(input, false));
}
[TestCase("bær", "bær")]
[TestCase("æ5el", "æ5el")]
[TestCase("-æ5el", "æ5el")]
[TestCase("-frø-", "frø")]
[TestCase("-fr-ø", "fr-ø")]
[TestCase("bær", "baer")]
[TestCase("æ5el", "ae5el")]
[TestCase("-æ5el", "ae5el")]
[TestCase("-frø-", "froe")]
[TestCase("-fr-ø", "fr-oe")]
public void TestUrilizeNonAscii_Simple(string input, string expectedResult)
{
Assert.AreEqual(expectedResult, LinkHelper.Urilize(input, false));
@@ -378,4 +462,4 @@ public class TestLinkHelper
{
TestParser.TestSpec("[Foo]\n\n[Foo]: http://ünicode.com", "<p><a href=\"http://xn--nicode-2ya.com\">Foo</a></p>");
}
}
}

View File

@@ -1,5 +1,7 @@
using Markdig;
using Markdig.Extensions.Tables;
using Markdig.Syntax;
using Markdig.Syntax.Inlines;
namespace Markdig.Tests;
@@ -10,12 +12,195 @@ public sealed class TestPipeTable
[TestCase("| S | T |\r\n|---|---|\t\r\n| G | H |")]
[TestCase("| S | T |\r\n|---|---|\f\r\n| G | H |")]
[TestCase("| S | \r\n|---|\r\n| G |\r\n\r\n| D | D |\r\n| ---| ---| \r\n| V | V |", 2)]
[TestCase("a\r| S | T |\r|---|---|")]
[TestCase("a\n| S | T |\r|---|---|")]
[TestCase("a\r\n| S | T |\r|---|---|")]
public void TestTableBug(string markdown, int tableCount = 1)
{
MarkdownDocument document = Markdown.Parse(markdown, new MarkdownPipelineBuilder().UseAdvancedExtensions().Build());
MarkdownDocument document =
Markdown.Parse(markdown, new MarkdownPipelineBuilder().UseAdvancedExtensions().Build());
Table[] tables = document.Descendants().OfType<Table>().ToArray();
Assert.AreEqual(tableCount, tables.Length);
}
[TestCase("A | B\r\n---|---", new[] {50.0f, 50.0f})]
[TestCase("A | B\r\n-|---", new[] {25.0f, 75.0f})]
[TestCase("A | B\r\n-|---\r\nA | B\r\n---|---", new[] {25.0f, 75.0f})]
[TestCase("A | B\r\n---|---|---", new[] {33.33f, 33.33f, 33.33f})]
[TestCase("A | B\r\n---|---|---|", new[] {33.33f, 33.33f, 33.33f})]
public void TestColumnWidthByHeaderLines(string markdown, float[] expectedWidth)
{
var pipeline = new MarkdownPipelineBuilder()
.UsePipeTables(new PipeTableOptions() {InferColumnWidthsFromSeparator = true})
.Build();
var document = Markdown.Parse(markdown, pipeline);
var table = document.Descendants().OfType<Table>().FirstOrDefault();
Assert.IsNotNull(table);
var actualWidths = table.ColumnDefinitions.Select(x => x.Width).ToList();
Assert.AreEqual(actualWidths.Count, expectedWidth.Length);
for (int i = 0; i < expectedWidth.Length; i++)
{
Assert.AreEqual(actualWidths[i], expectedWidth[i], 0.01);
}
}
[Test]
public void TestColumnWidthIsNotSetWithoutConfigurationFlag()
{
var pipeline = new MarkdownPipelineBuilder()
.UsePipeTables(new PipeTableOptions() {InferColumnWidthsFromSeparator = false})
.Build();
var document = Markdown.Parse("| A | B | C |\r\n|---|---|---|", pipeline);
var table = document.Descendants().OfType<Table>().FirstOrDefault();
Assert.IsNotNull(table);
foreach (var column in table.ColumnDefinitions)
{
Assert.AreEqual(0, column.Width);
}
}
[Test]
public void TableWithUnbalancedCodeSpanParsesWithoutDepthLimitError()
{
const string markdown = """
| Count | A | B | C | D | E |
|-------|---|---|---|---|---|
| 0 | B | C | D | E | F |
| 1 | B | `C | D | E | F |
| 2 | B | `C | D | E | F |
| 3 | B | C | D | E | F |
| 4 | B | C | D | E | F |
| 5 | B | C | D | E | F |
| 6 | B | C | D | E | F |
| 7 | B | C | D | E | F |
| 8 | B | C | D | E | F |
| 9 | B | C | D | E | F |
| 10 | B | C | D | E | F |
| 11 | B | C | D | E | F |
| 12 | B | C | D | E | F |
| 13 | B | C | D | E | F |
| 14 | B | C | D | E | F |
| 15 | B | C | D | E | F |
| 16 | B | C | D | E | F |
| 17 | B | C | D | E | F |
| 18 | B | C | D | E | F |
| 19 | B | C | D | E | F |
""";
var pipeline = new MarkdownPipelineBuilder()
.UseAdvancedExtensions()
.Build();
MarkdownDocument document = null!;
Assert.DoesNotThrow(() => document = Markdown.Parse(markdown, pipeline));
var tables = document.Descendants().OfType<Table>().ToArray();
Assert.That(tables, Has.Length.EqualTo(1));
string html = string.Empty;
Assert.DoesNotThrow(() => html = Markdown.ToHtml(markdown, pipeline));
Assert.That(html, Does.Contain("<table"));
Assert.That(html, Does.Contain("<td>`C</td>"));
}
[Test]
public void CodeInlineWithPipeDelimitersRemainsCodeInline()
{
const string markdown = "`|| hidden text ||`";
var pipeline = new MarkdownPipelineBuilder()
.UseAdvancedExtensions()
.Build();
var document = Markdown.Parse(markdown, pipeline);
var codeInline = document.Descendants().OfType<CodeInline>().SingleOrDefault();
Assert.IsNotNull(codeInline);
Assert.That(codeInline!.Content, Is.EqualTo("|| hidden text ||"));
Assert.That(document.ToHtml(), Is.EqualTo("<p><code>|| hidden text ||</code></p>\n"));
}
[Test]
public void MultiLineCodeInlineWithPipeDelimitersRendersAsCode()
{
string markdown =
"""
`
|| hidden text ||
`
""".ReplaceLineEndings("\n");
var pipeline = new MarkdownPipelineBuilder()
.UseAdvancedExtensions()
.Build();
var html = Markdown.ToHtml(markdown, pipeline);
Assert.That(html, Is.EqualTo("<p><code>|| hidden text ||</code></p>\n"));
}
[Test]
public void TableCellWithCodeInlineRendersCorrectly()
{
const string markdown =
"""
| Count | A | B | C | D | E |
|-------|---|---|---|---|---|
| 0 | B | C | D | E | F |
| 1 | B | `Code block` | D | E | F |
| 2 | B | C | D | E | F |
""";
var pipeline = new MarkdownPipelineBuilder()
.UseAdvancedExtensions()
.Build();
var html = Markdown.ToHtml(markdown, pipeline);
Assert.That(html, Does.Contain("<td><code>Code block</code></td>"));
}
[Test]
public void CodeInlineWithIndentedContentPreservesWhitespace()
{
const string markdown = "`\n foo\n`";
var pipeline = new MarkdownPipelineBuilder()
.UseAdvancedExtensions()
.Build();
var document = Markdown.Parse(markdown, pipeline);
var codeInline = document.Descendants().OfType<CodeInline>().Single();
Assert.That(codeInline.Content, Is.EqualTo("foo"));
Assert.That(Markdown.ToHtml(markdown, pipeline), Is.EqualTo("<p><code>foo</code></p>\n"));
}
[Test]
public void TableWithIndentedPipeAfterCodeInlineParsesCorrectly()
{
var markdown =
"""
`
|| hidden text ||
`
| Count | Value |
|-------|-------|
| 0 | B |
""".ReplaceLineEndings("\n");
var pipeline = new MarkdownPipelineBuilder()
.UseAdvancedExtensions()
.Build();
var html = Markdown.ToHtml(markdown, pipeline);
Assert.That(html, Does.Contain("<p><code>|| hidden text ||</code></p>"));
Assert.That(html, Does.Contain("<table"));
Assert.That(html, Does.Contain("<td>B</td>"));
}
}

View File

@@ -26,6 +26,14 @@ public class TestPlainText
Assert.AreEqual(expected, actual);
}
[Test]
[TestCase(/* markdownText: */ "```\nConsole.WriteLine(\"Hello, World!\");\n```", /* expected: */ "Console.WriteLine(\"Hello, World!\");\n")]
public void TestPlainCodeBlock(string markdownText, string expected)
{
var actual = Markdown.ToPlainText(markdownText);
Assert.AreEqual(expected, actual);
}
[Test]
[TestCase(/* markdownText: */ ":::\nfoo\n:::", /* expected: */ "foo\n", /*extensions*/ "customcontainers|advanced")]
[TestCase(/* markdownText: */ ":::bar\nfoo\n:::", /* expected: */ "foo\n", /*extensions*/ "customcontainers+attributes|advanced")]

View File

@@ -46,6 +46,14 @@ public class TestPlayParser
Assert.AreEqual("/yoyo", link?.Url);
}
[Test]
public void TestLinkWithMultipleBackslashesInTitle()
{
var doc = Markdown.Parse(@"[link](/uri '\\\\127.0.0.1')");
var link = doc.Descendants<LinkInline>().FirstOrDefault();
Assert.AreEqual(@"\\127.0.0.1", link?.Title);
}
[Test]
public void TestListBug2()
{

View File

@@ -31,4 +31,14 @@ public class TestSmartyPants
TestParser.TestSpec("<<test>>", "<p>&laquo;test&raquo;</p>", pipeline);
}
[Test]
public void RecognizesSupplementaryCharacters()
{
var pipeline = new MarkdownPipelineBuilder()
.UseSmartyPants()
.Build();
TestParser.TestSpec("\"𝜵\"𠮷\"𝜵\"𩸽\"", "<p>&quot;𝜵&ldquo;𠮷&rdquo;𝜵&ldquo;𩸽&rdquo;</p>", pipeline);
}
}

View File

@@ -834,6 +834,29 @@ literal ( 2, 2) 11-11
", "pipetables");
}
[Test]
public void TestGridTable()
{
Check("0\n\n+-+-+\n|A|B|\n+=+=+\n|C|D|\n+-+-+", @"
paragraph ( 0, 0) 0-0
literal ( 0, 0) 0-0
table ( 2, 0) 3-31
tablerow ( 3, 0) 9-13
tablecell ( 3, 0) 9-11
paragraph ( 3, 1) 10-10
literal ( 3, 1) 10-10
tablecell ( 3, 2) 11-13
paragraph ( 3, 3) 12-12
literal ( 3, 3) 12-12
tablerow ( 5, 0) 21-25
tablecell ( 5, 0) 21-23
paragraph ( 5, 1) 22-22
literal ( 5, 1) 22-22
tablecell ( 5, 2) 23-25
paragraph ( 5, 3) 24-24
literal ( 5, 3) 24-24", "gridtables");
}
[Test]
public void TestIndentedCode()
{

View File

@@ -0,0 +1,165 @@
// Copyright (c) Alexandre Mutel. All rights reserved.
// This file is licensed under the BSD-Clause 2 license.
// See the license.txt file in the project root for more information.
using Markdig.Helpers;
namespace Markdig.Tests;
[TestFixture]
public class TestStringSlice
{
#if NET
[Test]
public void TestRuneBmp()
{
var slice = new StringSlice("01234");
Assert.AreEqual('0', slice.CurrentRune.Value);
Assert.AreEqual(0, slice.Start);
Assert.AreEqual('1', slice.NextRune().Value);
Assert.AreEqual(1, slice.Start);
Assert.AreEqual('2', slice.NextRune().Value);
Assert.AreEqual(2, slice.Start);
Assert.AreEqual('2', slice.CurrentRune.Value);
Assert.AreEqual("234", slice.ToString());
Assert.AreEqual('3', slice.PeekRuneExtra(1).Value);
Assert.AreEqual('4', slice.PeekRuneExtra(2).Value);
Assert.AreEqual(0, slice.PeekRuneExtra(3).Value);
Assert.AreEqual('1', slice.PeekRuneExtra(-1).Value);
Assert.AreEqual('0', slice.PeekRuneExtra(-2).Value);
Assert.AreEqual(0, slice.PeekRuneExtra(-3).Value);
Assert.AreEqual('0', slice.RuneAt(0).Value);
Assert.AreEqual('1', slice.RuneAt(1).Value);
Assert.AreEqual('2', slice.RuneAt(2).Value);
Assert.AreEqual('3', slice.RuneAt(3).Value);
Assert.AreEqual('4', slice.RuneAt(4).Value);
Assert.AreEqual(2, slice.Start);
}
[Test]
public void TestRuneSupplementaryOnly()
{
var slice = new StringSlice("𝟎𝟏𝟐𝟑𝟒");
Assert.AreEqual(10, slice.Length);
// 𝟎 = U+1D7CE, 𝟐 = U+1D7D0
Assert.AreEqual(0x1D7CE, slice.CurrentRune.Value); // 𝟎
Assert.AreEqual(0, slice.Start);
Assert.AreEqual(0x1D7CF, slice.NextRune().Value); // 𝟏
Assert.AreEqual(2, slice.Start);
Assert.AreEqual(0x1D7D0, slice.NextRune().Value); // 𝟐
Assert.AreEqual(4, slice.Start);
Assert.AreEqual(0x1D7D0, slice.CurrentRune.Value); // 𝟐
Assert.AreEqual("𝟐𝟑𝟒", slice.ToString());
// CurrentRune occupies 2 `char`s, so next Rune starts at index 2
Assert.AreEqual(0x1D7D1, slice.PeekRuneExtra(2).Value); // 𝟑
Assert.AreEqual(0x1D7D2, slice.PeekRuneExtra(4).Value); // 𝟒
Assert.AreEqual(0, slice.PeekRuneExtra(6).Value);
Assert.AreEqual(0x1D7CF, slice.PeekRuneExtra(-1).Value); // 𝟏
Assert.AreEqual(0x1D7CE, slice.PeekRuneExtra(-3).Value); // 𝟎
Assert.AreEqual(0, slice.PeekRuneExtra(-5).Value);
Assert.AreEqual(0x1D7CE, slice.RuneAt(0).Value); // 𝟎
Assert.AreEqual(0x1D7CF, slice.RuneAt(2).Value); // 𝟏
Assert.AreEqual(0x1D7D0, slice.RuneAt(4).Value); // 𝟐
Assert.AreEqual(0x1D7D1, slice.RuneAt(6).Value); // 𝟑
Assert.AreEqual(0x1D7D2, slice.RuneAt(8).Value); // 𝟒
// The following usages are not expected. You should take into consideration the `char`s that the Rune you just acquired occupies.
Assert.AreEqual(0, slice.PeekRuneExtra(-4).Value);
Assert.AreEqual(0, slice.PeekRuneExtra(-2).Value);
Assert.AreEqual(0, slice.PeekRuneExtra(1).Value);
Assert.AreEqual(0, slice.PeekRuneExtra(3).Value);
Assert.AreEqual(0, slice.PeekRuneExtra(5).Value);
Assert.AreEqual(0, slice.RuneAt(1).Value);
Assert.AreEqual(0, slice.RuneAt(3).Value);
Assert.AreEqual(0, slice.RuneAt(5).Value);
Assert.AreEqual(0, slice.RuneAt(7).Value);
Assert.AreEqual(0, slice.RuneAt(9).Value);
Assert.AreEqual(4, slice.Start);
}
[Test]
public void TestRuneIsolatedHighSurrogate()
{
var slice = new StringSlice("\ud800\ud801\ud802\ud803\ud804");
Assert.AreEqual(0, slice.CurrentRune.Value);
Assert.AreEqual(0, slice.Start);
Assert.AreEqual(0, slice.NextRune().Value);
Assert.AreEqual(0, slice.CurrentRune.Value);
Assert.AreEqual('\ud801', slice.CurrentChar);
Assert.AreEqual(1, slice.Start);
Assert.AreEqual(0, slice.NextRune().Value);
Assert.AreEqual(2, slice.Start);
Assert.AreEqual('\ud802', slice.CurrentChar);
Assert.AreEqual(0, slice.CurrentRune.Value);
Assert.AreEqual(0, slice.PeekRuneExtra(-3).Value);
Assert.AreEqual(0, slice.PeekRuneExtra(-2).Value);
Assert.AreEqual(0, slice.PeekRuneExtra(-1).Value);
Assert.AreEqual(0, slice.PeekRuneExtra(1).Value);
Assert.AreEqual(0, slice.PeekRuneExtra(2).Value);
Assert.AreEqual(0, slice.PeekRuneExtra(3).Value);
Assert.AreEqual(0, slice.RuneAt(0).Value);
Assert.AreEqual(0, slice.RuneAt(1).Value);
Assert.AreEqual(0, slice.RuneAt(2).Value);
Assert.AreEqual(0, slice.RuneAt(3).Value);
Assert.AreEqual(0, slice.RuneAt(4).Value);
Assert.AreEqual(2, slice.Start);
}
[Test]
public void TestRuneIsolatedLowSurrogate()
{
var slice = new StringSlice("\udc00\udc01\udc02\udc03\udc04");
Assert.AreEqual(0, slice.CurrentRune.Value);
Assert.AreEqual(0, slice.NextRune().Value);
Assert.AreEqual('\udc01', slice.CurrentChar);
Assert.AreEqual(0, slice.NextRune().Value);
Assert.AreEqual('\udc02', slice.CurrentChar);
Assert.AreEqual(0, slice.CurrentRune.Value);
Assert.AreEqual(0, slice.PeekRuneExtra(-3).Value);
Assert.AreEqual(0, slice.PeekRuneExtra(-2).Value);
Assert.AreEqual(0, slice.PeekRuneExtra(-1).Value);
Assert.AreEqual(0, slice.PeekRuneExtra(1).Value);
Assert.AreEqual(0, slice.PeekRuneExtra(2).Value);
Assert.AreEqual(0, slice.PeekRuneExtra(3).Value);
Assert.AreEqual(0, slice.RuneAt(0).Value);
Assert.AreEqual(0, slice.RuneAt(1).Value);
Assert.AreEqual(0, slice.RuneAt(2).Value);
Assert.AreEqual(0, slice.RuneAt(3).Value);
Assert.AreEqual(0, slice.RuneAt(4).Value);
}
[Test]
public void TestMixedInput()
{
var slice = new StringSlice("a\udc00bc𝟑d𝟒\udc00");
Assert.AreEqual(10, slice.Length);
Assert.AreEqual('a', slice.CurrentRune.Value);
Assert.AreEqual(0, slice.Start);
Assert.AreEqual(0, slice.NextRune().Value);
Assert.AreEqual(1, slice.Start);
Assert.AreEqual('b', slice.NextRune().Value);
Assert.AreEqual(2, slice.Start);
Assert.AreEqual('c', slice.NextRune().Value);
Assert.AreEqual(3, slice.Start);
Assert.AreEqual(0x1D7D1, slice.NextRune().Value);
Assert.AreEqual(4, slice.Start);
Assert.AreEqual('d', slice.NextRune().Value);
Assert.AreEqual(6, slice.Start);
Assert.AreEqual(0x1D7D2, slice.NextRune().Value);
Assert.AreEqual(7, slice.Start);
Assert.AreEqual(0, slice.NextRune().Value);
Assert.AreEqual(9, slice.Start);
Assert.False(slice.IsEmpty);
Assert.AreEqual(0, slice.NextRune().Value);
Assert.AreEqual(10, slice.Start);
Assert.True(slice.IsEmpty);
slice = new StringSlice(slice.Text + 'a', 7, 10);
Assert.AreEqual(0x1D7D2, slice.CurrentRune.Value);
Assert.AreEqual(0, slice.NextRune().Value);
Assert.AreEqual(9, slice.Start);
Assert.AreEqual('a', slice.NextRune().Value);
}
#endif
}

View File

@@ -8,7 +8,6 @@ namespace Markdig.Tests;
[TestFixture]
public class TestStringSliceList
{
// TODO: Add tests for StringSlice
// TODO: Add more tests for StringLineGroup
[Test]

View File

@@ -8,7 +8,7 @@ public class ApiController : Controller
{
[HttpGet()]
[Route("")]
public string Empty()
public new string Empty()
{
return string.Empty;
}

View File

@@ -14,7 +14,7 @@
</PropertyGroup>
<ItemGroup>
<PackageReference Include="Microsoft.ApplicationInsights.AspNetCore" Version="2.22.0" />
<PackageReference Include="Microsoft.ApplicationInsights.AspNetCore" />
</ItemGroup>
<ItemGroup>

View File

@@ -12,7 +12,7 @@ public class Startup
if (env.IsEnvironment("Development"))
{
// This will push telemetry data through Application Insights pipeline faster, allowing you to view results immediately.
builder.AddApplicationInsightsSettings(developerMode: true);
builder.AddApplicationInsightsSettings(connectionString: null, developerMode: true);
}
builder.AddEnvironmentVariables();

View File

@@ -73,8 +73,6 @@ public class AbbreviationParser : BlockParser
private void DocumentOnProcessInlinesBegin(InlineProcessor inlineProcessor, Inline? inline)
{
inlineProcessor.Document.ProcessInlinesBegin -= DocumentOnProcessInlinesBegin;
var abbreviations = inlineProcessor.Document.GetAbbreviations();
// Should not happen, but another extension could decide to remove them, so...
if (abbreviations is null)
@@ -203,20 +201,17 @@ public class AbbreviationParser : BlockParser
while (index <= contentNew.End)
{
var c = contentNew.PeekCharAbsolute(index);
if (!(c == '\0' || c.IsWhitespace() || c.IsAsciiPunctuation()))
{
return false;
}
if (c.IsAlphaNumeric())
{
return false;
}
if (c.IsWhitespace())
{
break;
}
if (!c.IsAsciiPunctuationOrZero())
{
return false;
}
index++;
}
return true;

View File

@@ -22,7 +22,7 @@ public class AlertBlock : QuoteBlock
}
/// <summary>
/// Gets or sets the kind of the alert block (e.g `NOTE`, `TIP`, `IMPORTANT`, `WARNING`, `CAUTION`)
/// Gets or sets the kind of the alert block (e.g `NOTE`, `TIP`, `IMPORTANT`, `WARNING`, `CAUTION`).
/// </summary>
public StringSlice Kind { get; set; }

View File

@@ -5,7 +5,6 @@
using Markdig.Helpers;
using Markdig.Renderers;
using Markdig.Renderers.Html;
using Markdig.Syntax;
namespace Markdig.Extensions.Alerts;

View File

@@ -6,7 +6,6 @@ using Markdig.Helpers;
using Markdig.Parsers;
using Markdig.Renderers.Html;
using Markdig.Syntax;
using Markdig.Syntax.Inlines;
namespace Markdig.Extensions.Alerts;
@@ -16,6 +15,9 @@ namespace Markdig.Extensions.Alerts;
/// <seealso cref="InlineParser" />
public class AlertInlineParser : InlineParser
{
private static readonly TransformedStringCache s_alertTypeClassCache = new(
type => $"markdown-alert-{type.ToLowerInvariant()}");
/// <summary>
/// Initializes a new instance of the <see cref="AlertInlineParser"/> class.
/// </summary>
@@ -26,26 +28,30 @@ public class AlertInlineParser : InlineParser
public override bool Match(InlineProcessor processor, ref StringSlice slice)
{
if (slice.PeekChar() != '!')
{
return false;
}
// We expect the alert to be the first child of a quote block. Example:
// > [!NOTE]
// > This is a note
if (processor.Block is not ParagraphBlock paragraphBlock || paragraphBlock.Parent is not QuoteBlock quoteBlock || paragraphBlock.Inline?.FirstChild != null)
if (processor.Block is not ParagraphBlock paragraphBlock ||
paragraphBlock.Parent is not QuoteBlock quoteBlock ||
paragraphBlock.Inline?.FirstChild != null ||
quoteBlock is AlertBlock ||
quoteBlock.Parent is not MarkdownDocument)
{
return false;
}
var saved = slice;
var c = slice.NextChar();
if (c != '!')
{
slice = saved;
return false;
}
StringSlice saved = slice;
c = slice.NextChar(); // Skip !
slice.SkipChar(); // Skip [
char c = slice.NextChar(); // Skip !
var start = slice.Start;
var end = start;
int start = slice.Start;
int end = start;
while (c.IsAlpha())
{
end = slice.Start;
@@ -76,13 +82,13 @@ public class AlertInlineParser : InlineParser
end = slice.Start;
if (c == '\n')
{
slice.NextChar(); // Skip \n
slice.SkipChar(); // Skip \n
}
}
}
else if (c == '\n')
{
slice.NextChar(); // Skip \n
slice.SkipChar(); // Skip \n
}
break;
}
@@ -103,8 +109,9 @@ public class AlertInlineParser : InlineParser
Column = quoteBlock.Column,
};
alertBlock.GetAttributes().AddClass("markdown-alert");
alertBlock.GetAttributes().AddClass($"markdown-alert-{alertType.ToString().ToLowerInvariant()}");
HtmlAttributes attributes = alertBlock.GetAttributes();
attributes.AddClass("markdown-alert");
attributes.AddClass(s_alertTypeClassCache.Get(alertType.AsSpan()));
// Replace the quote block with the alert block
var parentQuoteBlock = quoteBlock.Parent!;

View File

@@ -22,6 +22,8 @@ public class AutoIdentifierExtension : IMarkdownExtension
private static readonly StripRendererCache _rendererCache = new();
private readonly AutoIdentifierOptions _options;
private readonly ProcessInlineDelegate _processInlinesBegin;
private readonly ProcessInlineDelegate _processInlinesEnd;
/// <summary>
/// Initializes a new instance of the <see cref="AutoIdentifierExtension"/> class.
@@ -30,6 +32,8 @@ public class AutoIdentifierExtension : IMarkdownExtension
public AutoIdentifierExtension(AutoIdentifierOptions options)
{
_options = options;
_processInlinesBegin = DocumentOnProcessInlinesBegin;
_processInlinesEnd = HeadingBlock_ProcessInlinesEnd;
}
public void Setup(MarkdownPipelineBuilder pipeline)
@@ -85,19 +89,18 @@ public class AutoIdentifierExtension : IMarkdownExtension
{
dictionary = new Dictionary<string, HeadingLinkReferenceDefinition>();
doc.SetData(this, dictionary);
doc.ProcessInlinesBegin += DocumentOnProcessInlinesBegin;
doc.ProcessInlinesBegin += _processInlinesBegin;
}
dictionary[text] = linkRef;
}
// Then we register after inline have been processed to actually generate the proper #id
headingBlock.ProcessInlinesEnd += HeadingBlock_ProcessInlinesEnd;
headingBlock.ProcessInlinesEnd += _processInlinesEnd;
}
private void DocumentOnProcessInlinesBegin(InlineProcessor processor, Inline? inline)
{
var doc = processor.Document;
doc.ProcessInlinesBegin -= DocumentOnProcessInlinesBegin;
var dictionary = (Dictionary<string, HeadingLinkReferenceDefinition>)doc.GetData(this)!;
foreach (var keyPair in dictionary)
{
@@ -117,7 +120,7 @@ public class AutoIdentifierExtension : IMarkdownExtension
/// Callback when there is a reference to found to a heading.
/// Note that reference are only working if they are declared after.
/// </summary>
private Inline CreateLinkInlineForHeading(InlineProcessor inlineState, LinkReferenceDefinition linkRef, Inline? child)
private static Inline CreateLinkInlineForHeading(InlineProcessor inlineState, LinkReferenceDefinition linkRef, Inline? child)
{
var headingRef = (HeadingLinkReferenceDefinition) linkRef;
return new LinkInline()

View File

@@ -2,9 +2,11 @@
// This file is licensed under the BSD-Clause 2 license.
// See the license.txt file in the project root for more information.
using Markdig.Parsers;
namespace Markdig.Extensions.AutoLinks;
public class AutoLinkOptions
public class AutoLinkOptions : LinkOptions
{
public AutoLinkOptions()
{
@@ -13,13 +15,13 @@ public class AutoLinkOptions
public string ValidPreviousCharacters { get; set; }
/// <summary>
/// Should the link open in a new window when clicked (false by default)
/// </summary>
public bool OpenInNewWindow { get; set; }
/// <summary>
/// Should a www link be prefixed with https:// instead of http:// (false by default)
/// </summary>
public bool UseHttpsForWWWLinks { get; set; }
/// <summary>
/// Should auto-linking allow a domain with no period, e.g. https://localhost (false by default)
/// </summary>
public bool AllowDomainWithoutPeriod { get; set; }
}

View File

@@ -6,6 +6,8 @@ using Markdig.Helpers;
using Markdig.Parsers;
using Markdig.Renderers.Html;
using Markdig.Syntax.Inlines;
using System.Buffers;
using System.Diagnostics;
namespace Markdig.Extensions.AutoLinks;
@@ -31,186 +33,171 @@ public class AutoLinkParser : InlineParser
'w', // for www.
];
_listOfCharCache = new ListOfCharCache();
_validPreviousCharacters = SearchValues.Create(options.ValidPreviousCharacters);
}
public readonly AutoLinkOptions Options;
private readonly ListOfCharCache _listOfCharCache;
private readonly SearchValues<char> _validPreviousCharacters;
// This is a particularly expensive parser as it gets called for many common letters.
public override bool Match(InlineProcessor processor, ref StringSlice slice)
{
// Previous char must be a whitespace or a punctuation
var previousChar = slice.PeekCharExtra(-1);
if (!previousChar.IsWhiteSpaceOrZero() && Options.ValidPreviousCharacters.IndexOf(previousChar) == -1)
if (!previousChar.IsWhiteSpaceOrZero() && !_validPreviousCharacters.Contains(previousChar))
{
return false;
}
ReadOnlySpan<char> span = slice.AsSpan();
Debug.Assert(span[0] is 'h' or 'f' or 'm' or 't' or 'w');
// Precheck URL
bool mayBeValid = span.Length >= 4 && span[0] switch
{
'h' => span.StartsWith("https://", StringComparison.Ordinal) || span.StartsWith("http://", StringComparison.Ordinal),
'w' => span.StartsWith("www.", StringComparison.Ordinal), // We won't match http:/www. or /www.xxx
'f' => span.StartsWith("ftp://", StringComparison.Ordinal),
'm' => span.StartsWith("mailto:", StringComparison.Ordinal),
_ => span.StartsWith("tel:", StringComparison.Ordinal),
};
return mayBeValid && MatchCore(processor, ref slice);
}
private bool MatchCore(InlineProcessor processor, ref StringSlice slice)
{
char c = slice.CurrentChar;
var startPosition = slice.Start;
// We don't bother disposing the builder as it'll realistically never grow beyond the initial stack size.
var pendingEmphasis = new ValueStringBuilder(stackalloc char[32]);
// Check that an autolink is possible in the current context
if (!IsAutoLinkValidInCurrentContext(processor, ref pendingEmphasis))
{
return false;
}
// Parse URL
if (!LinkHelper.TryParseUrl(ref slice, out string? link, out _, true))
{
return false;
}
// If we have any pending emphasis, remove any pending emphasis characters from the end of the link
if (pendingEmphasis.Length > 0)
{
for (int i = link.Length - 1; i >= 0; i--)
{
if (pendingEmphasis.AsSpan().Contains(link[i]))
{
slice.Start--;
}
else
{
if (i < link.Length - 1)
{
link = link.Substring(0, i + 1);
}
break;
}
}
}
int domainOffset = 0;
var c = slice.CurrentChar;
// Precheck URL
// Post-check URL
switch (c)
{
case 'h':
if (slice.MatchLowercase("ttp://", 1))
if (string.Equals(link, "http://", StringComparison.Ordinal) ||
string.Equals(link, "https://", StringComparison.Ordinal))
{
domainOffset = 7; // http://
return false;
}
else if (slice.MatchLowercase("ttps://", 1))
{
domainOffset = 8; // https://
}
else return false;
domainOffset = link[4] == 's' ? 8 : 7; // https:// or http://
break;
case 'w':
domainOffset = 4; // www.
break;
case 'f':
if (!slice.MatchLowercase("tp://", 1))
if (string.Equals(link, "ftp://", StringComparison.Ordinal))
{
return false;
}
domainOffset = 6; // ftp://
break;
case 'm':
if (!slice.MatchLowercase("ailto:", 1))
{
return false;
}
break;
case 't':
if (!slice.MatchLowercase("el:", 1))
if (string.Equals(link, "tel", StringComparison.Ordinal))
{
return false;
}
domainOffset = 4;
break;
case 'w':
if (!slice.MatchLowercase("ww.", 1)) // We won't match http:/www. or /www.xxx
case 'm':
int atIndex = link.IndexOf('@');
if (atIndex == -1 ||
atIndex == 7) // mailto:@ - no email part
{
return false;
}
domainOffset = 4; // www.
domainOffset = atIndex + 1;
break;
}
List<char> pendingEmphasis = _listOfCharCache.Get();
try
// Do not need to check if a telephone number is a valid domain
if (c != 't' && !LinkHelper.IsValidDomain(link, domainOffset, Options.AllowDomainWithoutPeriod))
{
// Check that an autolink is possible in the current context
if (!IsAutoLinkValidInCurrentContext(processor, pendingEmphasis))
{
return false;
}
// Parse URL
if (!LinkHelper.TryParseUrl(ref slice, out string? link, out _, true))
{
return false;
}
// If we have any pending emphasis, remove any pending emphasis characters from the end of the link
if (pendingEmphasis.Count > 0)
{
for (int i = link.Length - 1; i >= 0; i--)
{
if (pendingEmphasis.Contains(link[i]))
{
slice.Start--;
}
else
{
if (i < link.Length - 1)
{
link = link.Substring(0, i + 1);
}
break;
}
}
}
// Post-check URL
switch (c)
{
case 'h':
if (string.Equals(link, "http://", StringComparison.OrdinalIgnoreCase) ||
string.Equals(link, "https://", StringComparison.OrdinalIgnoreCase))
{
return false;
}
break;
case 'f':
if (string.Equals(link, "ftp://", StringComparison.OrdinalIgnoreCase))
{
return false;
}
break;
case 't':
if (string.Equals(link, "tel", StringComparison.OrdinalIgnoreCase))
{
return false;
}
break;
case 'm':
int atIndex = link.IndexOf('@');
if (atIndex == -1 ||
atIndex == 7) // mailto:@ - no email part
{
return false;
}
domainOffset = atIndex + 1;
break;
}
// Do not need to check if a telephone number is a valid domain
if (c != 't' && !LinkHelper.IsValidDomain(link, domainOffset))
{
return false;
}
var inline = new LinkInline()
{
Span =
{
Start = processor.GetSourcePosition(startPosition, out int line, out int column),
},
Line = line,
Column = column,
Url = c == 'w' ? ((Options.UseHttpsForWWWLinks ? "https://" : "http://") + link) : link,
IsClosed = true,
IsAutoLink = true,
};
var skipFromBeginning = c == 'm' ? 7 : 0; // For mailto: skip "mailto:" for content
skipFromBeginning = c == 't' ? 4 : skipFromBeginning; // See above but for tel:
inline.Span.End = inline.Span.Start + link.Length - 1;
inline.UrlSpan = inline.Span;
inline.AppendChild(new LiteralInline()
{
Span = inline.Span,
Line = line,
Column = column,
Content = new StringSlice(slice.Text, startPosition + skipFromBeginning, startPosition + link.Length - 1),
IsClosed = true
});
processor.Inline = inline;
if (Options.OpenInNewWindow)
{
inline.GetAttributes().AddPropertyIfNotExist("target", "_blank");
}
return true;
return false;
}
finally
var inline = new LinkInline()
{
_listOfCharCache.Release(pendingEmphasis);
Span =
{
Start = processor.GetSourcePosition(startPosition, out int line, out int column),
},
Line = line,
Column = column,
Url = c == 'w' ? ((Options.UseHttpsForWWWLinks ? "https://" : "http://") + link) : link,
IsClosed = true,
IsAutoLink = true,
};
int skipFromBeginning = c switch
{
'm' => 7, // For mailto: skip "mailto:" for content
't' => 4, // Same but for tel:
_ => 0
};
inline.Span.End = inline.Span.Start + link.Length - 1;
inline.UrlSpan = inline.Span;
inline.AppendChild(new LiteralInline()
{
Span = inline.Span,
Line = line,
Column = column,
Content = new StringSlice(slice.Text, startPosition + skipFromBeginning, startPosition + link.Length - 1),
IsClosed = true
});
processor.Inline = inline;
if (Options.OpenInNewWindow)
{
inline.GetAttributes().AddPropertyIfNotExist("target", "_blank");
}
return true;
}
private bool IsAutoLinkValidInCurrentContext(InlineProcessor processor, List<char> pendingEmphasis)
private static bool IsAutoLinkValidInCurrentContext(InlineProcessor processor, ref ValueStringBuilder pendingEmphasis)
{
// Case where there is a pending HtmlInline <a>
var currentInline = processor.Inline;
@@ -257,9 +244,9 @@ public class AutoLinkParser : InlineParser
// Record all pending characters for emphasis
if (currentInline is EmphasisDelimiterInline emphasisDelimiter)
{
if (!pendingEmphasis.Contains(emphasisDelimiter.DelimiterChar))
if (!pendingEmphasis.AsSpan().Contains(emphasisDelimiter.DelimiterChar))
{
pendingEmphasis.Add(emphasisDelimiter.DelimiterChar);
pendingEmphasis.Append(emphasisDelimiter.DelimiterChar);
}
}
}
@@ -268,12 +255,4 @@ public class AutoLinkParser : InlineParser
return countBrackets <= 0;
}
private sealed class ListOfCharCache : DefaultObjectCache<List<char>>
{
protected override void Reset(List<char> instance)
{
instance.Clear();
}
}
}

View File

@@ -105,13 +105,20 @@ public class DefinitionListParser : BlockParser
{
var index = previousParent.IndexOf(paragraphBlock) - 1;
if (index < 0) return null;
var lastBlock = previousParent[index];
if (lastBlock is BlankLineBlock)
switch (previousParent[index])
{
lastBlock = previousParent[index - 1];
previousParent.RemoveAt(index);
case DefinitionList definitionList:
return definitionList;
case BlankLineBlock:
if (index > 0 && previousParent[index - 1] is DefinitionList definitionList2)
{
previousParent.RemoveAt(index);
return definitionList2;
}
break;
}
return lastBlock as DefinitionList;
return null;
}
public override BlockState TryContinue(BlockProcessor processor, Block block)

View File

@@ -134,9 +134,6 @@ public class FootnoteParser : BlockParser
/// <param name="inline">The inline.</param>
private void Document_ProcessInlinesEnd(InlineProcessor state, Inline? inline)
{
// Unregister
state.Document.ProcessInlinesEnd -= Document_ProcessInlinesEnd;
var footnotes = (FootnoteGroup)state.Document.GetData(DocumentKey)!;
// Remove the footnotes from the document and readd them at the end
state.Document.Remove(footnotes);

View File

@@ -109,6 +109,15 @@ public class GenericAttributesParser : InlineParser
{
isValid = true;
line.SkipChar(); // skip }
// skip line breaks
if (line.CurrentChar == '\n')
{
line.SkipChar();
}
else if (line.CurrentChar == '\r' && line.PeekChar() == '\n')
{
line.Start += 2;
}
break;
}
@@ -124,7 +133,7 @@ public class GenericAttributesParser : InlineParser
var start = line.Start;
// Get all non-whitespace characters following a #
// But stop if we found a } or \0
while (c != '}' && c != '\0' && !c.IsWhitespace())
while (c != '}' && !c.IsWhiteSpaceOrZero())
{
c = line.NextChar();
}

View File

@@ -105,7 +105,7 @@ public class GlobalizationExtension : IMarkdownExtension
}
int rune = c;
if (CharHelper.IsHighSurrogate(c) && i < slice.End && CharHelper.IsLowSurrogate(slice[i + 1]))
if (char.IsHighSurrogate(c) && i < slice.End && char.IsLowSurrogate(slice[i + 1]))
{
Debug.Assert(char.IsSurrogatePair(c, slice[i + 1]));
rune = char.ConvertToUtf32(c, slice[i + 1]);

View File

@@ -3,13 +3,14 @@
// See the license.txt file in the project root for more information.
using Markdig.Helpers;
using Markdig.Parsers;
namespace Markdig.Extensions.JiraLinks;
/// <summary>
/// Available options for replacing JIRA links
/// </summary>
public class JiraLinkOptions
public class JiraLinkOptions : LinkOptions
{
/// <summary>
/// The base Url (e.g. `https://mycompany.atlassian.net`)
@@ -21,11 +22,6 @@ public class JiraLinkOptions
/// </summary>
public string BasePath { get; set; }
/// <summary>
/// Should the link open in a new window when clicked
/// </summary>
public bool OpenInNewWindow { get; set; }
public JiraLinkOptions(string baseUrl)
{
OpenInNewWindow = true; //default

View File

@@ -55,16 +55,15 @@ public class HostProviderBuilder
return new DelegateProvider(hostPrefix, handler, allowFullScreen, iframeClass);
}
internal static Dictionary<string, IHostProvider> KnownHosts { get; }
= new Dictionary<string, IHostProvider>(StringComparer.OrdinalIgnoreCase)
{
["YouTubeShort"] = Create("www.youtube.com", YouTubeShort, iframeClass: "youtubeshort"),
["YouTube"] = Create("www.youtube.com", YouTube, iframeClass: "youtube"),
["YouTubeShortened"] = Create("youtu.be", YouTubeShortened, iframeClass: "youtube"),
["Vimeo"] = Create("vimeo.com", Vimeo, iframeClass: "vimeo"),
["Yandex"] = Create("music.yandex.ru", Yandex, allowFullScreen: false, iframeClass: "yandex"),
["Odnoklassniki"] = Create("ok.ru", Odnoklassniki, iframeClass: "odnoklassniki"),
};
internal static readonly IHostProvider[] KnownHosts =
[
Create("www.youtube.com", YouTubeShort, iframeClass: "youtubeshort"),
Create("www.youtube.com", YouTube, iframeClass: "youtube"),
Create("youtu.be", YouTubeShortened, iframeClass: "youtube"),
Create("vimeo.com", Vimeo, iframeClass: "vimeo"),
Create("music.yandex.ru", Yandex, allowFullScreen: false, iframeClass: "yandex"),
Create("ok.ru", Odnoklassniki, iframeClass: "odnoklassniki"),
];
#region Known providers

View File

@@ -81,7 +81,7 @@ public class MediaOptions
{".au", "audio/basic"},
{".wav", "audio/x-wav"},
};
Hosts = new List<IHostProvider>(HostProviderBuilder.KnownHosts.Values);
Hosts = new List<IHostProvider>(HostProviderBuilder.KnownHosts);
}
public string Width { get; set; }

View File

@@ -36,16 +36,15 @@ public class SmartyPantsInlineParser : InlineParser, IPostInlineProcessor
// -- &ndash; 'ndash'
// --- — &mdash; 'mdash'
var pc = slice.PeekCharExtra(-1);
var c = slice.CurrentChar;
var openingChar = c;
var pc = slice.PeekRuneExtra(-1);
var openingChar = slice.CurrentChar;
var startingPosition = slice.Start;
// undefined first
var type = (SmartyPantType) 0;
switch (c)
switch (openingChar)
{
case '\'':
type = SmartyPantType.Quote; // We will resolve them at the end of parsing all inlines
@@ -93,9 +92,9 @@ public class SmartyPantsInlineParser : InlineParser, IPostInlineProcessor
}
// Skip char
c = slice.NextChar();
var next = slice.NextRune();
CharHelper.CheckOpenCloseDelimiter(pc, c, false, out bool canOpen, out bool canClose);
CharHelper.CheckOpenCloseDelimiter(pc, next, false, out bool canOpen, out bool canClose);
bool postProcess = false;
@@ -204,8 +203,6 @@ public class SmartyPantsInlineParser : InlineParser, IPostInlineProcessor
private void BlockOnProcessInlinesEnd(InlineProcessor processor, Inline? inline)
{
processor.Block!.ProcessInlinesEnd -= BlockOnProcessInlinesEnd;
var pants = (ListSmartyPants) processor.ParserStates[Index];
var openers = new Stack<Opener>(4);

View File

@@ -1,10 +1,11 @@
// Copyright (c) Alexandre Mutel. All rights reserved.
// This file is licensed under the BSD-Clause 2 license.
// This file is licensed under the BSD-Clause 2 license.
// See the license.txt file in the project root for more information.
using Markdig.Helpers;
using Markdig.Parsers;
using Markdig.Syntax;
using System.Linq;
namespace Markdig.Extensions.Tables;
@@ -43,7 +44,7 @@ public class GridTableParser : BlockParser
}
// Parse a column alignment
if (!TableHelper.ParseColumnHeader(ref line, '-', out TableColumnAlign? columnAlign))
if (!TableHelper.ParseColumnHeader(ref line, '-', out TableColumnAlign? columnAlign, out _))
{
return BlockState.None;
}
@@ -60,7 +61,12 @@ public class GridTableParser : BlockParser
}
// Store the line (if we need later to build a ParagraphBlock because the GridTable was in fact invalid)
tableState.AddLine(ref processor.Line);
var table = new Table(this);
var table = new Table(this)
{
Line = processor.LineIndex,
Column = processor.Column,
Span = { Start = lineStart }
};
table.SetData(typeof(GridTableState), tableState);
// Calculate the total width of all columns
@@ -94,10 +100,12 @@ public class GridTableParser : BlockParser
tableState.AddLine(ref processor.Line);
if (processor.CurrentChar == '+')
{
gridTable.UpdateSpanEnd(processor.Line.End);
return HandleNewRow(processor, tableState, gridTable);
}
if (processor.CurrentChar == '|')
{
gridTable.UpdateSpanEnd(processor.Line.End);
return HandleContents(processor, tableState, gridTable);
}
TerminateCurrentRow(processor, tableState, gridTable, true);
@@ -135,6 +143,7 @@ public class GridTableParser : BlockParser
private static void SetRowSpanState(List<GridTableState.ColumnSlice> columns, StringSlice line, out bool isHeaderRow, out bool hasRowSpan)
{
var lineStart = line.Start;
var lineEnd = line.End;
isHeaderRow = line.PeekChar() == '=' || line.PeekChar(2) == '=';
hasRowSpan = false;
foreach (var columnSlice in columns)
@@ -142,7 +151,7 @@ public class GridTableParser : BlockParser
if (columnSlice.CurrentCell != null)
{
line.Start = lineStart + columnSlice.Start + 1;
line.End = lineStart + columnSlice.End - 1;
line.End = Math.Min(lineStart + columnSlice.End - 1, lineEnd);
line.Trim();
if (line.IsEmptyOrWhitespace() || !IsRowSeparator(line))
{
@@ -181,8 +190,18 @@ public class GridTableParser : BlockParser
var columnSlice = columns[i];
if (columnSlice.CurrentCell != null)
{
currentRow ??= new TableRow();
if (currentRow == null)
{
TableCell firstCell = columns.First(c => c.CurrentCell != null).CurrentCell!;
TableCell lastCell = columns.Last(c => c.CurrentCell != null).CurrentCell!;
currentRow ??= new TableRow()
{
Span = new SourceSpan(firstCell.Span.Start, lastCell.Span.End),
Line = firstCell.Line
};
}
// If this cell does not already belong to a row
if (columnSlice.CurrentCell.Parent is null)
{
@@ -270,7 +289,10 @@ public class GridTableParser : BlockParser
columnSlice.CurrentCell = new TableCell(this)
{
ColumnSpan = columnSlice.CurrentColumnSpan,
ColumnIndex = i
ColumnIndex = i,
Column = columnSlice.Start,
Line = processor.LineIndex,
Span = new SourceSpan(line.Start + columnSlice.Start, line.Start + columnSlice.End)
};
columnSlice.BlockProcessor ??= processor.CreateChild();
@@ -280,7 +302,8 @@ public class GridTableParser : BlockParser
}
// Process the content of the cell
columnSlice.BlockProcessor!.LineIndex = processor.LineIndex;
columnSlice.BlockProcessor.ProcessLine(sliceForCell);
columnSlice.BlockProcessor.ProcessLinePart(sliceForCell, sliceForCell.Start - line.Start);
}
// Go to next column

View File

@@ -38,7 +38,7 @@ public class PipeTableExtension : IMarkdownExtension
var lineBreakParser = pipeline.InlineParsers.FindExact<LineBreakInlineParser>();
if (!pipeline.InlineParsers.Contains<PipeTableParser>())
{
pipeline.InlineParsers.InsertBefore<EmphasisInlineParser>(new PipeTableParser(lineBreakParser!, Options));
pipeline.InlineParsers.InsertAfter<EmphasisInlineParser>(new PipeTableParser(lineBreakParser!, Options));
}
}

View File

@@ -33,4 +33,11 @@ public class PipeTableOptions
/// in all other rows (default behavior).
/// </summary>
public bool UseHeaderForColumnCount { get; set; }
/// <summary>
/// Gets or sets a value indicating whether column widths should be inferred based on the number of dashes
/// in the header separator row. Each column's width will be proportional to the dash count in its respective column.
/// </summary>
public bool InferColumnWidthsFromSeparator { get; set; }
}

View File

@@ -3,7 +3,6 @@
// See the license.txt file in the project root for more information.
using System.Diagnostics;
using Markdig.Helpers;
using Markdig.Parsers;
using Markdig.Parsers.Inlines;
@@ -20,7 +19,7 @@ namespace Markdig.Extensions.Tables;
/// <seealso cref="IPostInlineProcessor" />
public class PipeTableParser : InlineParser, IPostInlineProcessor
{
private readonly LineBreakInlineParser lineBreakParser;
private readonly LineBreakInlineParser _lineBreakParser;
/// <summary>
/// Initializes a new instance of the <see cref="PipeTableParser" /> class.
@@ -29,7 +28,7 @@ public class PipeTableParser : InlineParser, IPostInlineProcessor
/// <param name="options">The options.</param>
public PipeTableParser(LineBreakInlineParser lineBreakParser, PipeTableOptions? options = null)
{
this.lineBreakParser = lineBreakParser ?? throw new ArgumentNullException(nameof(lineBreakParser));
_lineBreakParser = lineBreakParser ?? throw new ArgumentNullException(nameof(lineBreakParser));
OpeningCharacters = ['|', '\n', '\r'];
Options = options ?? new PipeTableOptions();
}
@@ -60,13 +59,12 @@ public class PipeTableParser : InlineParser, IPostInlineProcessor
if (tableState is null)
{
// A table could be preceded by an empty line or a line containing an inline
// that has not been added to the stack, so we consider this as a valid
// start for a table. Typically, with this, we can have an attributes {...}
// starting on the first line of a pipe table, even if the first line
// doesn't have a pipe
if (processor.Inline != null && (localLineIndex > 0 || c == '\n' || c == '\r'))
if (processor.Inline != null && (c == '\n' || c == '\r'))
{
return false;
}
@@ -75,6 +73,7 @@ public class PipeTableParser : InlineParser, IPostInlineProcessor
{
isFirstLineEmpty = true;
}
// Else setup a table processor
tableState = new TableState();
processor.ParserStates[Index] = tableState;
@@ -87,8 +86,7 @@ public class PipeTableParser : InlineParser, IPostInlineProcessor
tableState.IsInvalidTable = true;
}
tableState.LineHasPipe = false;
lineBreakParser.Match(processor, ref slice);
tableState.LineIndex++;
_lineBreakParser.Match(processor, ref slice);
if (!isFirstLineEmpty)
{
tableState.ColumnAndLineDelimiters.Add(processor.Inline!);
@@ -102,15 +100,11 @@ public class PipeTableParser : InlineParser, IPostInlineProcessor
Span = new SourceSpan(position, position),
Line = globalLineIndex,
Column = column,
LocalLineIndex = localLineIndex
LocalLineIndex = localLineIndex,
IsClosed = true // Creates flat sibling structure for O(n) traversal
};
var deltaLine = localLineIndex - tableState.LineIndex;
if (deltaLine > 0)
{
tableState.IsInvalidTable = true;
}
tableState.LineHasPipe = true;
tableState.LineIndex = localLineIndex;
slice.SkipChar(); // Skip the `|` character
tableState.ColumnAndLineDelimiters.Add(processor.Inline);
@@ -132,6 +126,8 @@ public class PipeTableParser : InlineParser, IPostInlineProcessor
return true;
}
// With flat structure, pipes are siblings at root level
// Walk backwards from the last child to find pipe delimiters
var child = container.LastChild;
List<PipeTableDelimiterInline>? delimitersToRemove = null;
@@ -149,8 +145,8 @@ public class PipeTableParser : InlineParser, IPostInlineProcessor
break;
}
var subContainer = child as ContainerInline;
child = subContainer?.LastChild;
// Walk siblings instead of descending into containers
child = child.PreviousSibling;
}
// If we have found any delimiters, transform them to literals
@@ -193,19 +189,36 @@ public class PipeTableParser : InlineParser, IPostInlineProcessor
// Remove previous state
state.ParserStates[Index] = null!;
// Continue
if (tableState is null || container is null || tableState.IsInvalidTable || !tableState.LineHasPipe ) //|| tableState.LineIndex != state.LocalLineIndex)
// Abort if not a valid table
if (tableState is null || container is null || tableState.IsInvalidTable || !tableState.LineHasPipe)
{
if (tableState is not null)
{
foreach (var inline in tableState.ColumnAndLineDelimiters)
{
if (inline is PipeTableDelimiterInline pipeDelimiter)
{
pipeDelimiter.ReplaceByLiteral();
}
}
}
return true;
}
// Detect the header row
var delimiters = tableState.ColumnAndLineDelimiters;
// TODO: we could optimize this by merging FindHeaderRow and the cell loop
var aligns = FindHeaderRow(delimiters);
if (Options.RequireHeaderSeparator && aligns is null)
{
// No valid header separator found - convert all pipe delimiters to literals
foreach (var inline in delimiters)
{
if (inline is PipeTableDelimiterInline pipeDelimiter)
{
pipeDelimiter.ReplaceByLiteral();
}
}
return true;
}
@@ -218,72 +231,43 @@ public class PipeTableParser : InlineParser, IPostInlineProcessor
attributes.CopyTo(table.GetAttributes());
}
state.BlockNew = table;
var cells = tableState.Cells;
cells.Clear();
//delimiters[0].DumpTo(state.DebugLog);
// Pipes may end up nested inside unmatched emphasis delimiters, e.g.:
// *a | b*|
// Promote them to root level so we have a flat sibling structure.
PromoteNestedPipesToRootLevel(delimiters, container);
// delimiters contain a list of `|` and `\n` delimiters
// The `|` delimiters are created as child containers.
// So the following:
// | a | b \n
// | d | e \n
// The inline tree is now flat: all pipes and line breaks are siblings at root level.
// For example, `| a | b \n| c | d \n` produces:
// [|] [a] [|] [b] [\n] [|] [c] [|] [d] [\n]
//
// Will generate a tree of the following node:
// |
// a
// |
// b
// \n
// |
// d
// |
// e
// \n
// When parsing delimiters, we need to recover whether a row is of the following form:
// 0) | a | b | \n
// 1) | a | b \n
// 2) a | b \n
// 3) a | b | \n
// Tables support four row formats:
// | a | b | (leading and trailing pipes)
// | a | b (leading pipe only)
// a | b (no leading or trailing pipes)
// a | b | (trailing pipe only)
// If the last element is not a line break, add a line break to homogenize parsing in the next loop
// Ensure the table ends with a line break to simplify row detection
var lastElement = delimiters[delimiters.Count - 1];
if (!(lastElement is LineBreakInline))
{
while (true)
// Find the actual last sibling (there may be content after the last delimiter)
while (lastElement.NextSibling != null)
{
if (lastElement is ContainerInline lastElementContainer)
{
var nextElement = lastElementContainer.LastChild;
if (nextElement != null)
{
lastElement = nextElement;
continue;
}
}
break;
lastElement = lastElement.NextSibling;
}
var endOfTable = new LineBreakInline();
// If the last element is a container, we have to add the EOL to its child
// otherwise only next sibling
if (lastElement is ContainerInline)
{
((ContainerInline)lastElement).AppendChild(endOfTable);
}
else
{
lastElement.InsertAfter(endOfTable);
}
lastElement.InsertAfter(endOfTable);
delimiters.Add(endOfTable);
tableState.EndOfLines.Add(endOfTable);
}
int lastPipePos = 0;
// Cell loop
// Reconstruct the table from the delimiters
// Build table rows and cells by iterating through delimiters
TableRow? row = null;
TableRow? firstRow = null;
for (int i = 0; i < delimiters.Count; i++)
@@ -298,9 +282,7 @@ public class PipeTableParser : InlineParser, IPostInlineProcessor
firstRow ??= row;
// If the first delimiter is a pipe and doesn't have any parent or previous sibling, for cases like:
// 0) | a | b | \n
// 1) | a | b \n
// Skip leading pipe at start of row (e.g., `| a | b` or `| a | b |`)
if (pipeSeparator != null && (delimiter.PreviousSibling is null || delimiter.PreviousSibling is LineBreakInline))
{
delimiter.Remove();
@@ -314,57 +296,37 @@ public class PipeTableParser : InlineParser, IPostInlineProcessor
}
}
// We need to find the beginning/ending of a cell from a right delimiter. From the delimiter 'x', we need to find a (without the delimiter start `|`)
// So we iterate back to the first pipe or line break
// x
// 1) | a | b \n
// 2) a | b \n
// Find cell content by walking backwards from this delimiter to the previous pipe or line break.
// For `| a | b \n` at delimiter 'x':
// [|] [a] [x] [b] [\n]
// ^--- current delimiter
// Walk back: [a] is the cell content (stop at [|])
Inline? endOfCell = null;
Inline? beginOfCell = null;
var cellContentIt = delimiter;
while (true)
var cellContentIt = delimiter.PreviousSibling;
while (cellContentIt != null)
{
cellContentIt = cellContentIt.PreviousSibling ?? cellContentIt.Parent;
if (cellContentIt is null || cellContentIt is LineBreakInline)
{
if (cellContentIt is LineBreakInline || cellContentIt is PipeTableDelimiterInline)
break;
}
// The cell begins at the first effective child after a | or the top ContainerInline (which is not necessary to bring into the tree + it contains an invalid span calculation)
if (cellContentIt is PipeTableDelimiterInline || (cellContentIt.GetType() == typeof(ContainerInline) && cellContentIt.Parent is null ))
{
beginOfCell = ((ContainerInline)cellContentIt).FirstChild;
if (endOfCell is null)
{
endOfCell = beginOfCell;
}
// Stop at the root ContainerInline (which is not necessary to bring into the tree + it contains an invalid span calculation)
if (cellContentIt.GetType() == typeof(ContainerInline) && cellContentIt.Parent is null)
break;
}
beginOfCell = cellContentIt;
if (endOfCell is null)
{
endOfCell = beginOfCell;
}
endOfCell ??= beginOfCell;
cellContentIt = cellContentIt.PreviousSibling;
}
// If the current deilimiter is a pipe `|` OR
// If the current delimiter is a pipe `|` OR
// the beginOfCell/endOfCell are not null and
// either they are :
// either they are:
// - different
// - they contain a single element, but it is not a line break (\n) or an empty/whitespace Literal.
// Then we can add a cell to the current row
if (!isLine || (beginOfCell != null && endOfCell != null && ( beginOfCell != endOfCell || !(beginOfCell is LineBreakInline || (beginOfCell is LiteralInline beingOfCellLiteral && beingOfCellLiteral.Content.IsEmptyOrWhitespace())))))
{
if (!isLine)
{
// If the delimiter is a pipe, we need to remove it from the tree
// so that previous loop looking for a parent will not go further on subsequent cells
delimiter.Remove();
lastPipePos = delimiter.Span.End;
}
// We trim whitespace at the beginning and ending of the cell
TrimStart(beginOfCell);
TrimEnd(endOfCell);
@@ -372,10 +334,20 @@ public class PipeTableParser : InlineParser, IPostInlineProcessor
var cellContainer = new ContainerInline();
// Copy elements from beginOfCell on the first level
// The pipe delimiter serves as a boundary - stop when we hit it
var cellIt = beginOfCell;
while (cellIt != null && !IsLine(cellIt) && !(cellIt is PipeTableDelimiterInline))
{
var nextSibling = cellIt.NextSibling;
// Skip empty literals (can result from trimming)
if (cellIt is LiteralInline { Content.IsEmpty: true })
{
cellIt.Remove();
cellIt = nextSibling;
continue;
}
cellIt.Remove();
if (cellContainer.Span.IsEmpty)
{
@@ -388,8 +360,16 @@ public class PipeTableParser : InlineParser, IPostInlineProcessor
cellIt = nextSibling;
}
if (!isLine)
{
// Remove the pipe delimiter AFTER copying cell content
// This preserves the sibling chain during the copy loop
delimiter.Remove();
lastPipePos = delimiter.Span.End;
}
// Create the cell and add it to the pending row
var tableParagraph = new ParagraphBlock()
var tableParagraph = new ParagraphBlock
{
Span = cellContainer.Span,
Line = cellContainer.Line,
@@ -441,8 +421,7 @@ public class PipeTableParser : InlineParser, IPostInlineProcessor
endOfLine.Remove();
}
// If we have a header row, we can remove it
// TODO: we could optimize this by merging FindHeaderRow and the previous loop
// Mark first row as header and remove the separator row if present
var tableRow = (TableRow)table[0];
tableRow.IsHeader = Options.RequireHeaderSeparator;
if (aligns != null)
@@ -452,11 +431,13 @@ public class PipeTableParser : InlineParser, IPostInlineProcessor
table.ColumnDefinitions.AddRange(aligns);
}
// Perform delimiter processor that are coming after this processor
// Perform all post-processors on cell content
// With InsertAfter, emphasis runs before pipe table, so we need to re-run from index 0
// to ensure emphasis delimiters in cells are properly matched
foreach (var cell in cells)
{
var paragraph = (ParagraphBlock) cell[0];
state.PostProcessInlines(postInlineProcessorIndex + 1, paragraph.Inline, null, true);
state.PostProcessInlines(0, paragraph.Inline, null, true);
if (paragraph.Inline?.LastChild is not null)
{
paragraph.Inline.Span.End = paragraph.Inline.LastChild.Span.End;
@@ -477,13 +458,43 @@ public class PipeTableParser : InlineParser, IPostInlineProcessor
table.NormalizeUsingMaxWidth();
}
if (state.Block is ParagraphBlock { Inline.FirstChild: not null } leadingParagraph)
{
// The table was preceded by a non-empty paragraph, e.g.
// ```md
// Some text
// | Header |
// ```
//
// Keep the paragraph as-is and insert the table after it.
// Since we've already processed all the inlines in this table block,
// we can't insert it while the parent is still being processed.
// Hook up a callback that inserts the table after we're done with ProcessInlines for the parent block.
// We've processed inlines in the table, but not the leading paragraph itself yet.
state.PostProcessInlines(0, leadingParagraph.Inline, null, isFinalProcessing: true);
ContainerBlock parent = leadingParagraph.Parent!;
parent.ProcessInlinesEnd += (_, _) =>
{
parent.Insert(parent.IndexOf(leadingParagraph) + 1, table);
};
}
else
{
// Nothing interesting in the existing block, just replace it.
state.BlockNew = table;
}
// We don't want to continue procesing delimiters, as we are already processing them here
return false;
}
private static bool ParseHeaderString(Inline? inline, out TableColumnAlign? align)
private static bool ParseHeaderString(Inline? inline, out TableColumnAlign? align, out int delimiterCount)
{
align = 0;
delimiterCount = 0;
var literal = inline as LiteralInline;
if (literal is null)
{
@@ -492,7 +503,7 @@ public class PipeTableParser : InlineParser, IPostInlineProcessor
// Work on a copy of the slice
var line = literal.Content;
if (TableHelper.ParseColumnHeader(ref line, '-', out align))
if (TableHelper.ParseColumnHeader(ref line, '-', out align, out delimiterCount))
{
if (line.CurrentChar != '\0')
{
@@ -507,7 +518,8 @@ public class PipeTableParser : InlineParser, IPostInlineProcessor
private List<TableColumnDefinition>? FindHeaderRow(List<Inline> delimiters)
{
bool isValidRow = false;
List<TableColumnDefinition>? aligns = null;
int totalDelimiterCount = 0;
List<TableColumnDefinition>? columnDefinitions = null;
for (int i = 0; i < delimiters.Count; i++)
{
if (!IsLine(delimiters[i]))
@@ -515,7 +527,7 @@ public class PipeTableParser : InlineParser, IPostInlineProcessor
continue;
}
// The last delimiter is always null,
// Parse the separator row (second row) to extract column alignments
for (int j = i + 1; j < delimiters.Count; j++)
{
var delimiter = delimiters[j];
@@ -527,42 +539,44 @@ public class PipeTableParser : InlineParser, IPostInlineProcessor
continue;
}
// Check the left side of a `|` delimiter
// Parse the content before this delimiter as a column definition (e.g., `:---`, `---:`, `:---:`)
// Skip if previous sibling is a pipe (empty cell) or whitespace
TableColumnAlign? align = null;
int delimiterCount = 0;
if (delimiter.PreviousSibling != null &&
!(delimiter.PreviousSibling is LiteralInline li && li.Content.IsEmptyOrWhitespace()) && // ignore parsed whitespace
!ParseHeaderString(delimiter.PreviousSibling, out align))
!(delimiter.PreviousSibling is PipeTableDelimiterInline) &&
!(delimiter.PreviousSibling is LiteralInline li && li.Content.IsEmptyOrWhitespace()) &&
!ParseHeaderString(delimiter.PreviousSibling, out align, out delimiterCount))
{
break;
}
// Create aligns until we may have a header row
aligns ??= new List<TableColumnDefinition>();
columnDefinitions ??= new List<TableColumnDefinition>();
totalDelimiterCount += delimiterCount;
columnDefinitions.Add(new TableColumnDefinition() { Alignment = align, Width = delimiterCount});
aligns.Add(new TableColumnDefinition() { Alignment = align });
// If this is the last delimiter, we need to check the right side of the `|` delimiter
// If this is the last pipe, check for a trailing column definition (row without trailing pipe)
// e.g., `| :--- | ---:` has content after the last pipe
if (nextDelimiter is null)
{
var nextSibling = columnDelimiter != null
? columnDelimiter.FirstChild
: delimiter.NextSibling;
var nextSibling = delimiter.NextSibling;
// If there is no content after
// No trailing content means row ends with pipe: `| :--- |`
if (IsNullOrSpace(nextSibling))
{
isValidRow = true;
break;
}
if (!ParseHeaderString(nextSibling, out align))
if (!ParseHeaderString(nextSibling, out align, out delimiterCount))
{
break;
}
totalDelimiterCount += delimiterCount;
isValidRow = true;
aligns.Add(new TableColumnDefinition() { Alignment = align });
columnDefinitions.Add(new TableColumnDefinition() { Alignment = align, Width = delimiterCount});
break;
}
@@ -576,7 +590,27 @@ public class PipeTableParser : InlineParser, IPostInlineProcessor
break;
}
return isValidRow ? aligns : null;
// calculate the width of the columns in percent based on the delimiter count
if (!isValidRow || columnDefinitions == null)
{
return null;
}
if (Options.InferColumnWidthsFromSeparator)
{
foreach (var columnDefinition in columnDefinitions)
{
columnDefinition.Width = (columnDefinition.Width * 100) / totalDelimiterCount;
}
}
else
{
foreach (var columnDefinition in columnDefinitions)
{
columnDefinition.Width = 0;
}
}
return columnDefinitions;
}
private static bool IsLine(Inline inline)
@@ -610,9 +644,9 @@ public class PipeTableParser : InlineParser, IPostInlineProcessor
private static void TrimStart(Inline? inline)
{
while (inline is ContainerInline && !(inline is DelimiterInline))
while (inline is ContainerInline containerInline && !(containerInline is DelimiterInline))
{
inline = ((ContainerInline)inline).FirstChild;
inline = containerInline.FirstChild;
}
if (inline is LiteralInline literal)
@@ -623,6 +657,13 @@ public class PipeTableParser : InlineParser, IPostInlineProcessor
private static void TrimEnd(Inline? inline)
{
// Walk into containers to find the last leaf to trim
// Skip PipeTableDelimiterInline but walk into other containers (including emphasis)
while (inline is ContainerInline container && !(inline is PipeTableDelimiterInline))
{
inline = container.LastChild;
}
if (inline is LiteralInline literal)
{
literal.Content.TrimEnd();
@@ -643,14 +684,112 @@ public class PipeTableParser : InlineParser, IPostInlineProcessor
return false;
}
/// <summary>
/// Promotes nested pipe delimiters and line breaks to root level.
/// </summary>
/// <remarks>
/// Handles cases like `*a | b*|` where the pipe ends up inside an unmatched emphasis container.
/// After promotion, all delimiters become siblings at root level for consistent cell boundary detection.
/// </remarks>
private static void PromoteNestedPipesToRootLevel(List<Inline> delimiters, ContainerInline root)
{
for (int i = 0; i < delimiters.Count; i++)
{
var delimiter = delimiters[i];
// Handle both pipe delimiters and line breaks
bool isPipe = delimiter is PipeTableDelimiterInline;
bool isLineBreak = delimiter is LineBreakInline;
if (!isPipe && !isLineBreak)
continue;
// Skip if already at root level
if (delimiter.Parent == root)
continue;
// Find the top-level ancestor (direct child of root)
var ancestor = delimiter.Parent;
while (ancestor?.Parent != null && ancestor.Parent != root)
{
ancestor = ancestor.Parent;
}
if (ancestor is null || ancestor.Parent != root)
continue;
// Split: promote delimiter to be sibling of ancestor
SplitContainerAtDelimiter(delimiter, ancestor);
}
}
/// <summary>
/// Splits a container at the delimiter, promoting the delimiter to root level.
/// </summary>
/// <remarks>
/// For input `*a | b*`, the pipe is inside the emphasis container:
/// EmphasisDelimiter { "a", Pipe, "b" }
/// After splitting:
/// EmphasisDelimiter { "a" }, Pipe, Container { "b" }
/// </remarks>
private static void SplitContainerAtDelimiter(Inline delimiter, Inline ancestor)
{
if (delimiter.Parent is not { } parent) return;
// Collect content after the delimiter
var contentAfter = new List<Inline>();
var current = delimiter.NextSibling;
while (current != null)
{
contentAfter.Add(current);
current = current.NextSibling;
}
// Remove content after delimiter from parent
foreach (var inline in contentAfter)
{
inline.Remove();
}
// Remove delimiter from parent
delimiter.Remove();
// Insert delimiter after the ancestor (at root level)
ancestor.InsertAfter(delimiter);
// If there's content after, wrap in new container and insert after delimiter
if (contentAfter.Count > 0)
{
// Create new container matching the original parent type
var newContainer = CreateMatchingContainer(parent);
foreach (var inline in contentAfter)
{
newContainer.AppendChild(inline);
}
delimiter.InsertAfter(newContainer);
}
}
/// <summary>
/// Creates a container to wrap content split from the source container.
/// </summary>
private static ContainerInline CreateMatchingContainer(ContainerInline source)
{
// Emphasis processing runs before pipe table processing, so emphasis delimiters
// are already resolved. A plain ContainerInline suffices.
return new ContainerInline
{
Span = source.Span,
Line = source.Line,
Column = source.Column
};
}
private sealed class TableState
{
public bool IsInvalidTable { get; set; }
public bool LineHasPipe { get; set; }
public int LineIndex { get; set; }
public List<Inline> ColumnAndLineDelimiters { get; } = [];
public List<TableCell> Cells { get; } = [];

View File

@@ -17,12 +17,13 @@ public static class TableHelper
/// <param name="slice">The text slice.</param>
/// <param name="delimiterChar">The delimiter character (either `-` or `=`).</param>
/// <param name="align">The alignment of the column.</param>
/// <param name="delimiterCount">The number of delimiters.</param>
/// <returns>
/// <c>true</c> if parsing was successful
/// </returns>
public static bool ParseColumnHeader(ref StringSlice slice, char delimiterChar, out TableColumnAlign? align)
public static bool ParseColumnHeader(ref StringSlice slice, char delimiterChar, out TableColumnAlign? align, out int delimiterCount)
{
return ParseColumnHeaderDetect(ref slice, ref delimiterChar, out align);
return ParseColumnHeaderDetect(ref slice, ref delimiterChar, out align, out delimiterCount);
}
/// <summary>
@@ -37,7 +38,7 @@ public static class TableHelper
public static bool ParseColumnHeaderAuto(ref StringSlice slice, out char delimiterChar, out TableColumnAlign? align)
{
delimiterChar = '\0';
return ParseColumnHeaderDetect(ref slice, ref delimiterChar, out align);
return ParseColumnHeaderDetect(ref slice, ref delimiterChar, out align, out _);
}
/// <summary>
@@ -46,13 +47,14 @@ public static class TableHelper
/// <param name="slice">The text slice.</param>
/// <param name="delimiterChar">The delimiter character (either `-` or `=`). If `\0`, it will detect the character (either `-` or `=`)</param>
/// <param name="align">The alignment of the column.</param>
/// <param name="delimiterCount">The number of times <paramref name="delimiterChar"/> appeared in the column header.</param>
/// <returns>
/// <c>true</c> if parsing was successful
/// </returns>
public static bool ParseColumnHeaderDetect(ref StringSlice slice, ref char delimiterChar, out TableColumnAlign? align)
public static bool ParseColumnHeaderDetect(ref StringSlice slice, ref char delimiterChar, out TableColumnAlign? align, out int delimiterCount)
{
align = null;
delimiterCount = 0;
slice.TrimStart();
var c = slice.CurrentChar;
bool hasLeft = false;
@@ -80,7 +82,8 @@ public static class TableHelper
}
// We expect at least one `-` delimiter char
if (slice.CountAndSkipChar(delimiterChar) == 0)
delimiterCount = slice.CountAndSkipChar(delimiterChar);
if (delimiterCount == 0)
{
return false;
}

View File

@@ -71,7 +71,7 @@ public class YamlFrontMatterParser : BlockParser
// If three dashes (optionally followed by whitespace)
// this is a YAML front matter block
if (count == 3 && (c == '\0' || c.IsWhitespace()) && line.TrimEnd())
if (count == 3 && c.IsWhiteSpaceOrZero() && line.TrimEnd())
{
bool hasFullYamlFrontMatter = false;
// We make sure that there is a closing frontmatter somewhere in the document
@@ -146,7 +146,7 @@ public class YamlFrontMatterParser : BlockParser
// If we have a closing fence, close it and discard the current line
// The line must contain only fence characters and optional following whitespace.
if (count == 3 && !processor.IsCodeIndent && (c == '\0' || c.IsWhitespace()) && line.TrimEnd())
if (count == 3 && !processor.IsCodeIndent && c.IsWhiteSpaceOrZero() && line.TrimEnd())
{
block.UpdateSpanEnd(line.Start - 1);

View File

@@ -1,2 +1,3 @@
global using System;
global using System.Collections.Frozen;
global using System.Collections.Generic;

View File

@@ -1,10 +1,12 @@
// Copyright (c) Alexandre Mutel. All rights reserved.
// This file is licensed under the BSD-Clause 2 license.
// This file is licensed under the BSD-Clause 2 license.
// See the license.txt file in the project root for more information.
using System.Buffers;
using System.Diagnostics;
using System.Globalization;
using System.Runtime.CompilerServices;
using System.Text;
namespace Markdig.Helpers;
@@ -19,29 +21,103 @@ public static class CharHelper
public const string ReplacementCharString = "\uFFFD";
private const char HighSurrogateStart = '\ud800';
private const char HighSurrogateEnd = '\udbff';
private const char LowSurrogateStart = '\udc00';
private const char LowSurrogateEnd = '\udfff';
private const string EmailUsernameSpecialChars = ".!#$%&'*+/=?^_`{|}~-+.~";
// We don't support LCDM
private static readonly Dictionary<char, int> romanMap = new Dictionary<char, int>(6) {
{ 'i', 1 }, { 'v', 5 }, { 'x', 10 },
{ 'I', 1 }, { 'V', 5 }, { 'X', 10 }
};
// 2.1 Characters and lines
// A Unicode whitespace character is any code point in the Unicode Zs general category,
// or a tab (U+0009), line feed (U+000A), form feed (U+000C), or carriage return (U+000D).
// CharUnicodeInfo.GetUnicodeCategory(c) == UnicodeCategory.SpaceSeparator;
private const string AsciiWhitespaceChars = "\t\n\f\r ";
internal const string WhitespaceChars = AsciiWhitespaceChars + "\u00A0\u1680\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200A\u202F\u205F\u3000";
// 2.1 Characters and lines
// An ASCII punctuation character is
// !, ", #, $, %, &, ', (, ), *, +, ,, -, ., / (U+00212F),
// :, ;, <, =, >, ?, @ (U+003A0040),
// [, \, ], ^, _, ` (U+005B0060),
// {, |, }, or ~ (U+007B007E).
private const string AsciiPunctuationChars = "!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~";
// Unicode P (punctuation) categories.
private const int UnicodePunctuationCategoryMask =
1 << (int)UnicodeCategory.ConnectorPunctuation |
1 << (int)UnicodeCategory.DashPunctuation |
1 << (int)UnicodeCategory.OpenPunctuation |
1 << (int)UnicodeCategory.ClosePunctuation |
1 << (int)UnicodeCategory.InitialQuotePunctuation |
1 << (int)UnicodeCategory.FinalQuotePunctuation |
1 << (int)UnicodeCategory.OtherPunctuation;
private const int UnicodePunctuationOrSpaceCategoryMask =
UnicodePunctuationCategoryMask |
1 << (int)UnicodeCategory.SpaceSeparator;
// 2.1 Characters and lines
// A Unicode punctuation character is a character in the Unicode P (punctuation) or S (symbol) general categories.
private const int CommonMarkPunctuationCategoryMask =
UnicodePunctuationCategoryMask |
1 << (int)UnicodeCategory.MathSymbol |
1 << (int)UnicodeCategory.CurrencySymbol |
1 << (int)UnicodeCategory.ModifierSymbol |
1 << (int)UnicodeCategory.OtherSymbol;
// We're not currently using these SearchValues instances for vectorized IndexOfAny-like searches, but for their efficient single Contains(char) checks.
private static readonly SearchValues<char> s_emailUsernameSpecialChar = SearchValues.Create(EmailUsernameSpecialChars);
private static readonly SearchValues<char> s_emailUsernameSpecialCharOrDigit = SearchValues.Create(EmailUsernameSpecialChars + "0123456789");
private static readonly SearchValues<char> s_asciiPunctuationChars = SearchValues.Create(AsciiPunctuationChars);
private static readonly SearchValues<char> s_asciiPunctuationCharsOrZero = SearchValues.Create(AsciiPunctuationChars + '\0');
private static readonly SearchValues<char> s_asciiPunctuationOrWhitespaceCharsOrZero = SearchValues.Create(AsciiPunctuationChars + AsciiWhitespaceChars + '\0');
private static readonly SearchValues<char> s_escapableSymbolChars = SearchValues.Create("!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~•");
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static bool IsPunctuationException(char c) =>
c is '' or '-' or '†' or '‡';
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static bool IsPunctuationException(Rune c) =>
c.IsBmp && IsPunctuationException((char)c.Value);
public static void CheckOpenCloseDelimiter(char pc, char c, bool enableWithinWord, out bool canOpen, out bool canClose)
{
pc.CheckUnicodeCategory(out bool prevIsWhiteSpace, out bool prevIsPunctuation);
c.CheckUnicodeCategory(out bool nextIsWhiteSpace, out bool nextIsPunctuation);
CheckOpenCloseDelimiter(
prevIsWhiteSpace,
prevIsPunctuation,
prevIsPunctuation && IsPunctuationException(pc),
nextIsWhiteSpace,
nextIsPunctuation,
nextIsPunctuation && IsPunctuationException(c),
enableWithinWord,
out canOpen,
out canClose);
}
var prevIsExcepted = prevIsPunctuation && IsPunctuationException(pc);
var nextIsExcepted = nextIsPunctuation && IsPunctuationException(c);
#if NET
public
#else
internal
#endif
static void CheckOpenCloseDelimiter(Rune pc, Rune c, bool enableWithinWord, out bool canOpen, out bool canClose)
{
pc.CheckUnicodeCategory(out bool prevIsWhiteSpace, out bool prevIsPunctuation);
c.CheckUnicodeCategory(out bool nextIsWhiteSpace, out bool nextIsPunctuation);
CheckOpenCloseDelimiter(
prevIsWhiteSpace,
prevIsPunctuation,
prevIsPunctuation && IsPunctuationException(pc),
nextIsWhiteSpace,
nextIsPunctuation,
nextIsPunctuation && IsPunctuationException(c),
enableWithinWord,
out canOpen,
out canClose);
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static void CheckOpenCloseDelimiter(bool prevIsWhiteSpace, bool prevIsPunctuation, bool prevIsExcepted, bool nextIsWhiteSpace, bool nextIsPunctuation, bool nextIsExcepted, bool enableWithinWord, out bool canOpen, out bool canClose)
{
// A left-flanking delimiter run is a delimiter run that is
// (1) not followed by Unicode whitespace, and either
// (2a) not followed by a punctuation character or
@@ -62,13 +138,13 @@ public static class CharHelper
if (!enableWithinWord)
{
var temp = canOpen;
// A single _ character can open emphasis iff it is part of a left-flanking delimiter run and either
// (a) not part of a right-flanking delimiter run or
// A single _ character can open emphasis iff it is part of a left-flanking delimiter run and either
// (a) not part of a right-flanking delimiter run or
// (b) part of a right-flanking delimiter run preceded by punctuation.
canOpen = canOpen && (!canClose || prevIsPunctuation);
// A single _ character can close emphasis iff it is part of a right-flanking delimiter run and either
// (a) not part of a left-flanking delimiter run or
// (a) not part of a left-flanking delimiter run or
// (b) part of a left-flanking delimiter run followed by punctuation.
canClose = canClose && (!temp || nextIsPunctuation);
}
@@ -101,8 +177,8 @@ public static class CharHelper
int result = 0;
for (int i = 0; i < text.Length; i++)
{
var candidate = romanMap[text[i]];
if ((uint)(i + 1) < text.Length && candidate < romanMap[text[i + 1]])
int candidate = RomanToArabic(text[i]);
if ((uint)(i + 1) < text.Length && candidate < RomanToArabic(text[i + 1]))
{
result -= candidate;
}
@@ -112,6 +188,20 @@ public static class CharHelper
}
}
return result;
// We don't support LCDM
[MethodImpl(MethodImplOptions.AggressiveInlining)]
static int RomanToArabic(char c)
{
Debug.Assert(IsRomanLetterPartial(c));
return (c | 0x20) switch
{
'i' => 1,
'v' => 5,
_ => 10
};
}
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
@@ -128,39 +218,73 @@ public static class CharHelper
return (column & (TabSize - 1)) != 0;
}
/// <summary>
/// <see langword="true"/> if the character is a <see href="https://spec.commonmark.org/0.31.2/#unicode-whitespace-character">Unicode whitespace character</see>.
/// </summary>
/// <param name="c">The character to evaluate.</param>
/// <returns><see langword="true"/> if the character is a Unicode whitespace character</returns>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool IsWhitespace(this char c)
{
// 2.1 Characters and lines
// A Unicode whitespace character is any code point in the Unicode Zs general category,
// or a tab (U+0009), line feed (U+000A), form feed (U+000C), or carriage return (U+000D).
if (c <= ' ')
if (c < '\u00A0')
{
const long Mask =
(1L << ' ') |
(1L << '\t') |
(1L << '\n') |
(1L << '\f') |
(1L << '\r');
return (Mask & (1L << c)) != 0;
// Matches any of "\t\n\f\r ". See comments in HexConverter.IsHexChar for how these checks work:
// https://github.com/dotnet/runtime/blob/a2e1d21bb4faf914363968b812c990329ba92d8e/src/libraries/Common/src/System/HexConverter.cs#L392-L415
// https://gist.github.com/MihaZupan/b93ba180c2b5fbaaed993db2ade76b49
ulong shift = 30399299632234496UL << c;
ulong mask = (ulong)c - 64;
return (long)(shift & mask) < 0;
}
return c >= '\u00A0' && IsWhitespaceRare(c);
return IsWhitespaceRare(c);
}
static bool IsWhitespaceRare(char c)
/// <summary>
/// <see langword="true"/> if the character is a <see href="https://spec.commonmark.org/0.31.2/#unicode-whitespace-character">Unicode whitespace character</see>.
/// </summary>
/// <param name="r">The character to evaluate. A supplementary character is also accepted.</param>
/// <returns><see langword="true"/> if the character is a Unicode whitespace character</returns>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
#if NET
public
#else
internal
#endif
static bool IsWhitespace(this Rune r) => r.IsBmp && IsWhitespace((char)r.Value);
// Note: there is no supplementary character whose Unicode category is Zs (at least as of Unicode 17).
// https://www.compart.com/en/unicode/category/Zs
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool IsWhiteSpaceOrZero(this char c)
{
if (c < '\u00A0')
{
// return CharUnicodeInfo.GetUnicodeCategory(c) == UnicodeCategory.SpaceSeparator;
// Matches any of "\0\t\n\f\r ".
ulong shift = 9253771336487010304UL << c;
ulong mask = (ulong)c - 64;
return (long)(shift & mask) < 0;
}
if (c < 5760)
{
return c == '\u00A0';
}
else
{
return c <= 12288 &&
(c == 5760 || IsInInclusiveRange(c, 8192, 8202) || c == 8239 || c == 8287 || c == 12288);
}
return IsWhitespaceRare(c);
}
private static bool IsWhitespaceRare(char c)
{
Debug.Assert(c >= '\u00A0');
// return CharUnicodeInfo.GetUnicodeCategory(c) == UnicodeCategory.SpaceSeparator;
if (c < 5760)
{
return c == '\u00A0';
}
else
{
return c <= 12288 &&
(c == 5760 || IsInInclusiveRange(c, 8192, 8202) || c == 8239 || c == 8287 || c == 12288);
}
}
@@ -174,16 +298,15 @@ public static class CharHelper
public static bool IsEscapableSymbol(this char c)
{
// char.IsSymbol also works with Unicode symbols that cannot be escaped based on the specification.
return (c > ' ' && c < '0') || (c > '9' && c < 'A') || (c > 'Z' && c < 'a') || (c > 'z' && c < 127) || c == '•';
return s_escapableSymbolChars.Contains(c);
}
//[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool IsWhiteSpaceOrZero(this char c)
{
return IsZero(c) || IsWhitespace(c);
}
// Check if a char is a space or a punctuation
/// <summary>
/// Checks the Unicode category of the given character and determines whether it is a whitespace or punctuation character.
/// </summary>
/// <param name="c">The character to check.</param>
/// <param name="space">Output parameter indicating whether the character is a whitespace character.</param>
/// <param name="punctuation">Output parameter indicating whether the character is a punctuation character.</param>
public static void CheckUnicodeCategory(this char c, out bool space, out bool punctuation)
{
if (IsWhitespace(c))
@@ -194,52 +317,76 @@ public static class CharHelper
else if (c <= 127)
{
space = c == '\0';
punctuation = c == '\0' || IsAsciiPunctuation(c);
punctuation = IsAsciiPunctuationOrZero(c);
}
else
{
// A Unicode punctuation character is an ASCII punctuation character
// or anything in the general Unicode categories Pc, Pd, Pe, Pf, Pi, Po, or Ps.
const int PunctuationCategoryMask =
1 << (int)UnicodeCategory.ConnectorPunctuation |
1 << (int)UnicodeCategory.DashPunctuation |
1 << (int)UnicodeCategory.OpenPunctuation |
1 << (int)UnicodeCategory.ClosePunctuation |
1 << (int)UnicodeCategory.InitialQuotePunctuation |
1 << (int)UnicodeCategory.FinalQuotePunctuation |
1 << (int)UnicodeCategory.OtherPunctuation;
space = false;
punctuation = (PunctuationCategoryMask & (1 << (int)CharUnicodeInfo.GetUnicodeCategory(c))) != 0;
punctuation = (CommonMarkPunctuationCategoryMask & (1 << (int)CharUnicodeInfo.GetUnicodeCategory(c))) != 0;
}
}
// Same as CheckUnicodeCategory
internal static bool IsSpaceOrPunctuation(this char c)
/// <summary>
/// Check if a character is a <see href="https://spec.commonmark.org/0.31.2/#unicode-whitespace-character">Unicode whitespace</see> or <see href="https://spec.commonmark.org/0.31.2/#unicode-punctuation-character">punctuation character</see>.
/// </summary>
/// <param name="r">The character to evaluate. A supplementary character is also accepted.</param>
/// <param name="space"><see langword="true"/> if the character is an <see href="https://spec.commonmark.org/0.31.2/#unicode-whitespace-character">Unicode whitespace character</see></param>
/// <param name="punctuation"><see langword="true"/> if the character is a <see href="https://spec.commonmark.org/0.31.2/#unicode-punctuation-character">Unicode punctuation character</see></param>
#if NET
public
#else
internal
#endif
static void CheckUnicodeCategory(this Rune r, out bool space, out bool punctuation)
{
if (IsWhitespace(c))
if (IsWhitespace(r))
{
return true;
space = true;
punctuation = false;
}
else if (c <= 127)
else if (r.Value <= 127)
{
return c == '\0' || IsAsciiPunctuation(c);
space = r.Value == 0;
punctuation = r.IsBmp && IsAsciiPunctuationOrZero((char)r.Value);
}
else
{
const int PunctuationCategoryMask =
1 << (int)UnicodeCategory.ConnectorPunctuation |
1 << (int)UnicodeCategory.DashPunctuation |
1 << (int)UnicodeCategory.OpenPunctuation |
1 << (int)UnicodeCategory.ClosePunctuation |
1 << (int)UnicodeCategory.InitialQuotePunctuation |
1 << (int)UnicodeCategory.FinalQuotePunctuation |
1 << (int)UnicodeCategory.OtherPunctuation;
return (PunctuationCategoryMask & (1 << (int)CharUnicodeInfo.GetUnicodeCategory(c))) != 0;
space = false;
punctuation = (CommonMarkPunctuationCategoryMask & (1 << (int)Rune.GetUnicodeCategory(r))) != 0;
}
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
internal static bool IsSpaceOrPunctuationForGFMAutoLink(char c)
{
// Github Flavored Markdown's allowed set of domain characters differs from CommonMark's "punctuation" definition.
// CommonMark also counts symbols as punctuation, but GitHub will render e.g. http://☃.net as an autolink, despite
// the snowman emoji falling under the OtherSymbol (So) category.
if (c <= 127)
{
return s_asciiPunctuationOrWhitespaceCharsOrZero.Contains(c);
}
else
{
return NonAscii(c);
static bool NonAscii(char c) =>
(UnicodePunctuationOrSpaceCategoryMask & (1 << (int)CharUnicodeInfo.GetUnicodeCategory(c))) != 0;
}
}
// 6.5 Autolinks - https://spec.commonmark.org/0.31.2/#autolinks
// An absolute URI, for these purposes, consists of a scheme followed by a colon (:) followed by
// zero or more characters other than ASCII control characters, space, <, and >.
//
// 2.1 Characters and lines
// An ASCII control character is a character between U+00001F (both including) or U+007F.
internal static readonly SearchValues<char> InvalidAutoLinkCharacters = SearchValues.Create(
// 0 is excluded because it can be slightly more expensive for SearchValues to handle, and we've already removed it from the input text.
"\u0001\u0002\u0003\u0004\u0005\u0006\u0007\u0008\u0009\u000A\u000B\u000C\u000D\u000E\u000F" +
"\u0010\u0011\u0012\u0013\u0014\u0015\u0016\u0017\u0018\u0019\u001A\u001B\u001C\u001D\u001E\u001F" +
" <>\u007F");
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool IsNewLineOrLineFeed(this char c)
{
@@ -252,22 +399,37 @@ public static class CharHelper
return c == '\0';
}
/// <summary>
/// Returns <see langword="true"/> if the character is a <see href="https://spec.commonmark.org/0.31.2/#space">space</see> (U+0020).
/// </summary>
/// <param name="c">The character to evaluate</param>
/// <returns><see langword="true"/> if the character is a space</returns>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool IsSpace(this char c)
{
// 2.1 Characters and lines
// 2.1 Characters and lines
// A space is U+0020.
return c == ' ';
}
/// <summary>
/// Returns <see langword="true"/> if the character is a <see href="https://spec.commonmark.org/0.31.2/#tab">tab</see> (U+0009).
/// </summary>
/// <param name="c">The character to evaluate</param>
/// <returns><see langword="true"/> if the character is a tab</returns>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool IsTab(this char c)
{
// 2.1 Characters and lines
// 2.1 Characters and lines
// A space is U+0009.
return c == '\t';
}
/// <summary>
/// Returns <see langword="true"/> if the character is a <see href="https://spec.commonmark.org/0.31.2/#space">space</see> (U+0020) or <see href="https://spec.commonmark.org/0.31.2/#tab">tab</see> (U+0009).
/// </summary>
/// <param name="c">The character to evaluate.</param>
/// <returns><see langword="true"/> if the character is a space or tab</returns>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool IsSpaceOrTab(this char c)
{
@@ -279,7 +441,7 @@ public static class CharHelper
{
// 2.3 Insecure characters
// For security reasons, the Unicode character U+0000 must be replaced with the REPLACEMENT CHARACTER (U+FFFD).
return c == '\0' ? '\ufffd' : c;
return c == '\0' ? ReplacementChar : c;
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
@@ -306,46 +468,33 @@ public static class CharHelper
return (uint)(c - '0') <= ('9' - '0');
}
public static bool IsAsciiPunctuation(this char c)
{
// 2.1 Characters and lines
// An ASCII punctuation character is
// !, ", #, $, %, &, ', (, ), *, +, ,, -, ., / (U+00212F),
// :, ;, <, =, >, ?, @ (U+003A0040),
// [, \, ], ^, _, ` (U+005B0060),
// {, |, }, or ~ (U+007B007E).
return c <= 127 && (
IsInInclusiveRange(c, 33, 47) ||
IsInInclusiveRange(c, 58, 64) ||
IsInInclusiveRange(c, 91, 96) ||
IsInInclusiveRange(c, 123, 126));
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
internal static bool IsAsciiPunctuationOrZero(this char c) =>
s_asciiPunctuationCharsOrZero.Contains(c);
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool IsEmailUsernameSpecialChar(char c)
{
return ".!#$%&'*+/=?^_`{|}~-+.~".IndexOf(c) >= 0;
}
public static bool IsAsciiPunctuation(this char c) =>
s_asciiPunctuationChars.Contains(c);
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool IsHighSurrogate(char c)
{
return IsInInclusiveRange(c, HighSurrogateStart, HighSurrogateEnd);
}
public static bool IsEmailUsernameSpecialChar(char c) =>
s_emailUsernameSpecialChar.Contains(c);
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool IsLowSurrogate(char c)
{
return IsInInclusiveRange(c, LowSurrogateStart, LowSurrogateEnd);
}
internal static bool IsEmailUsernameSpecialCharOrDigit(char c) =>
s_emailUsernameSpecialCharOrDigit.Contains(c);
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static bool IsInInclusiveRange(char c, char min, char max)
=> (uint)(c - min) <= (uint)(max - min);
public static bool IsHighSurrogate(char c) =>
char.IsHighSurrogate(c);
[MethodImpl(MethodImplOptions.AggressiveInlining)]
internal static bool IsInInclusiveRange(int value, uint min, uint max)
=> ((uint)value - min) <= (max - min);
public static bool IsLowSurrogate(char c) =>
char.IsLowSurrogate(c);
[MethodImpl(MethodImplOptions.AggressiveInlining)]
internal static bool IsInInclusiveRange(int value, uint min, uint max) =>
((uint)value - min) <= (max - min);
public static bool IsRightToLeft(int c)
{

View File

@@ -20,7 +20,7 @@ public static class CharNormalizer
}
// This table was generated by the app UnicodeNormDApp
private static readonly Dictionary<char, string> CodeToAscii = new(1269)
private static readonly FrozenDictionary<char, string> CodeToAscii = new Dictionary<char, string>(1269)
{
{'Ḋ', "D"},
{'Ḍ', "D"},
@@ -1291,5 +1291,5 @@ public static class CharNormalizer
{'', "|"},
{'', "}"},
{'', "~"},
};
}.ToFrozenDictionary();
}

View File

@@ -4,7 +4,6 @@
using System.Buffers;
using System.Diagnostics;
using System.Linq;
using System.Runtime.CompilerServices;
namespace Markdig.Helpers;
@@ -17,7 +16,7 @@ public sealed class CharacterMap<T> where T : class
{
private readonly SearchValues<char> _values;
private readonly T[] _asciiMap;
private readonly Dictionary<uint, T>? _nonAsciiMap;
private readonly FrozenDictionary<uint, T>? _nonAsciiMap;
/// <summary>
/// Initializes a new instance of the <see cref="CharacterMap{T}"/> class.
@@ -39,6 +38,7 @@ public sealed class CharacterMap<T> where T : class
Array.Sort(OpeningCharacters);
_asciiMap = new T[128];
Dictionary<uint, T>? nonAsciiMap = null;
foreach (var state in maps)
{
@@ -49,16 +49,18 @@ public sealed class CharacterMap<T> where T : class
}
else
{
_nonAsciiMap ??= new Dictionary<uint, T>();
nonAsciiMap ??= [];
if (!_nonAsciiMap.ContainsKey(openingChar))
{
_nonAsciiMap[openingChar] = state.Value;
}
nonAsciiMap.TryAdd(openingChar, state.Value);
}
}
_values = SearchValues.Create(OpeningCharacters);
if (nonAsciiMap is not null)
{
_nonAsciiMap = nonAsciiMap.ToFrozenDictionary();
}
}
/// <summary>

View File

@@ -1,7 +1,8 @@
// Copyright (c) Alexandre Mutel. All rights reserved.
// This file is licensed under the BSD-Clause 2 license.
// This file is licensed under the BSD-Clause 2 license.
// See the license.txt file in the project root for more information.
using System.Diagnostics;
using System.Diagnostics.CodeAnalysis;
using System.Runtime.CompilerServices;
@@ -393,40 +394,52 @@ public static class HtmlHelper
private static bool TryParseHtmlTagHtmlComment(ref StringSlice text, ref ValueStringBuilder builder)
{
// https://spec.commonmark.org/0.31.2/#raw-html
// An HTML comment consists of <!-->, <!--->, or
// <!--, a string of characters not including the string -->, and -->.
// The caller already checked <!-
Debug.Assert(text.CurrentChar == '-' && text.PeekCharExtra(-1) == '!' && text.PeekCharExtra(-2) == '<');
var c = text.NextChar();
if (c != '-')
{
return false;
}
builder.Append('-');
builder.Append('-');
if (text.PeekChar() == '>')
c = text.NextChar();
if (c == '>')
{
// <!--> is considered valid.
builder.Append("-->");
text.SkipChar();
return true;
}
if (c == '-' && text.PeekChar() == '>')
{
// <!---> is also considered valid.
builder.Append("--->");
text.SkipChar();
text.SkipChar();
return true;
}
ReadOnlySpan<char> slice = text.AsSpan();
const string EndOfComment = "-->";
int endOfComment = slice.IndexOf(EndOfComment.AsSpan(), StringComparison.Ordinal);
if (endOfComment < 0)
{
return false;
}
var countHyphen = 0;
while (true)
{
c = text.NextChar();
if (c == '\0')
{
return false;
}
if (countHyphen == 2)
{
if (c == '>')
{
builder.Append('>');
text.SkipChar();
return true;
}
return false;
}
countHyphen = c == '-' ? countHyphen + 1 : 0;
builder.Append(c);
}
builder.Append("--");
builder.Append(slice.Slice(0, endOfComment + EndOfComment.Length));
text.Start += endOfComment + EndOfComment.Length;
return true;
}
private static bool TryParseHtmlTagProcessingInstruction(ref StringSlice text, ref ValueStringBuilder builder)
@@ -461,7 +474,7 @@ public static class HtmlHelper
public static string Unescape(string? text, bool removeBackSlash = true)
{
// Credits: code from CommonMark.NET
// Copyright (c) 2014, Kārlis Gaņģis All rights reserved.
// Copyright (c) 2014, Kārlis Gaņģis All rights reserved.
// See license for details: https://github.com/Knagis/CommonMark.NET/blob/master/LICENSE.md
if (string.IsNullOrEmpty(text))
{
@@ -540,7 +553,7 @@ public static class HtmlHelper
public static int ScanEntity<T>(T slice, out int numericEntity, out int namedEntityStart, out int namedEntityLength) where T : ICharIterator
{
// Credits: code from CommonMark.NET
// Copyright (c) 2014, Kārlis Gaņģis All rights reserved.
// Copyright (c) 2014, Kārlis Gaņģis All rights reserved.
// See license for details: https://github.com/Knagis/CommonMark.NET/blob/master/LICENSE.md
numericEntity = 0;
@@ -555,7 +568,7 @@ public static class HtmlHelper
var start = slice.Start;
char c = slice.NextChar();
int counter = 0;
if (c == '#')
{
c = slice.PeekChar();

View File

@@ -1,10 +1,14 @@
// Copyright (c) Alexandre Mutel. All rights reserved.
// This file is licensed under the BSD-Clause 2 license.
// This file is licensed under the BSD-Clause 2 license.
// See the license.txt file in the project root for more information.
using System.Diagnostics.CodeAnalysis;
using System.Runtime.CompilerServices;
using Markdig.Syntax;
using System.Buffers;
using System.Diagnostics;
using System.Diagnostics.CodeAnalysis;
using System.Globalization;
using System.Runtime.CompilerServices;
using System.Text;
namespace Markdig.Helpers;
@@ -28,13 +32,40 @@ public static class LinkHelper
var headingBuffer = new ValueStringBuilder(stackalloc char[ValueStringBuilder.StackallocThreshold]);
bool hasLetter = keepOpeningDigits && headingText.Length > 0 && char.IsLetterOrDigit(headingText[0]);
bool previousIsSpace = false;
for (int i = 0; i < headingText.Length; i++)
// First normalize the string to decompose characters if allowOnlyAscii is true
string normalizedString = string.Empty;
if (allowOnlyAscii)
{
var c = headingText[i];
var normalized = allowOnlyAscii ? CharNormalizer.ConvertToAscii(c) : null;
for (int j = 0; j < (normalized?.Length ?? 1); j++)
normalizedString = headingText.ToString().Normalize(NormalizationForm.FormD);
}
var textToProcess = string.IsNullOrEmpty(normalizedString) ? headingText : normalizedString.AsSpan();
for (int i = 0; i < textToProcess.Length; i++)
{
var c = textToProcess[i];
// Skip combining diacritical marks when normalized
if (allowOnlyAscii && CharUnicodeInfo.GetUnicodeCategory(c) == UnicodeCategory.NonSpacingMark)
{
if (normalized != null)
continue;
}
// Handle German umlauts and Norwegian/Danish characters explicitly (they don't decompose properly)
ReadOnlySpan<char> normalized;
if (IsSpecialScandinavianOrGermanChar(c))
{
normalized = NormalizeScandinavianOrGermanChar(c);
}
else
{
normalized = allowOnlyAscii ? CharNormalizer.ConvertToAscii(c) : ReadOnlySpan<char>.Empty;
}
for (int j = 0; j < (normalized.Length < 1 ? 1 : normalized.Length); j++)
{
if (!normalized.IsEmpty)
{
c = normalized[j];
}
@@ -99,6 +130,50 @@ public static class LinkHelper
return headingBuffer.ToString();
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static bool IsSpecialScandinavianOrGermanChar(char c)
{
// German umlauts and ß
// Norwegian/Danish/Swedish æ, ø, å
// Icelandic þ (thorn), ð (eth)
return c == 'ä' || c == 'ö' || c == 'ü' ||
c == 'Ä' || c == 'Ö' || c == 'Ü' ||
c == 'ß' ||
c == 'æ' || c == 'ø' || c == 'å' ||
c == 'Æ' || c == 'Ø' || c == 'Å' ||
c == 'þ' || c == 'ð' ||
c == 'Þ' || c == 'Ð';
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static ReadOnlySpan<char> NormalizeScandinavianOrGermanChar(char c)
{
return c switch
{
// German
'ä' => "ae",
'ö' => "oe",
'ü' => "ue",
'Ä' => "Ae",
'Ö' => "Oe",
'Ü' => "Ue",
'ß' => "ss",
// Norwegian/Danish/Swedish
'æ' => "ae",
'ø' => "oe",
'å' => "aa",
'Æ' => "Ae",
'Ø' => "Oe",
'Å' => "Aa",
// Icelandic
'þ' => "th",
'Þ' => "Th",
'ð' => "d",
'Ð' => "D",
_ => ReadOnlySpan<char>.Empty
};
}
public static string UrilizeAsGfm(string headingText)
{
return UrilizeAsGfm(headingText.AsSpan());
@@ -140,13 +215,13 @@ public static class LinkHelper
return false;
}
// An absolute URI, for these purposes, consists of a scheme followed by a colon (:)
// followed by zero or more characters other than ASCII whitespace and control characters, <, and >.
// An absolute URI, for these purposes, consists of a scheme followed by a colon (:)
// followed by zero or more characters other than ASCII whitespace and control characters, <, and >.
// If the URI includes these characters, they must be percent-encoded (e.g. %20 for a space).
// A URI that would end with a full stop (.) is treated instead as ending immediately before the full stop.
// a scheme is any sequence of 232 characters
// beginning with an ASCII letter
// a scheme is any sequence of 232 characters
// beginning with an ASCII letter
// and followed by any combination of ASCII letters, digits, or the symbols plus (”+”), period (”.”), or hyphen (”-”).
// An email address, for these purposes, is anything that matches the non-normative regex from the HTML5 spec:
@@ -166,7 +241,7 @@ public static class LinkHelper
if (!c.IsAlpha())
{
// We may have an email char?
if (c.IsDigit() || CharHelper.IsEmailUsernameSpecialChar(c))
if (CharHelper.IsEmailUsernameSpecialCharOrDigit(c))
{
state = -1;
}
@@ -201,7 +276,7 @@ public static class LinkHelper
if (isValidChar)
{
// a scheme is any sequence of 232 characters
// a scheme is any sequence of 232 characters
if (state > 0 && builder.Length >= 32)
{
goto ReturnFalse;
@@ -216,7 +291,8 @@ public static class LinkHelper
}
state = 1;
break;
} else if (c == '@')
}
else if (c == '@')
{
if (state > 0)
{
@@ -231,8 +307,8 @@ public static class LinkHelper
}
}
// append ':' or '@'
builder.Append(c);
// append ':' or '@'
builder.Append(c);
if (state < 0)
{
@@ -286,40 +362,34 @@ public static class LinkHelper
}
else
{
// scan an uri
// An absolute URI, for these purposes, consists of a scheme followed by a colon (:)
// followed by zero or more characters other than ASCII whitespace and control characters, <, and >.
// 6.5 Autolinks - https://spec.commonmark.org/0.31.2/#autolinks
// An absolute URI, for these purposes, consists of a scheme followed by a colon (:) followed by
// zero or more characters other than ASCII control characters, space, <, and >.
// If the URI includes these characters, they must be percent-encoded (e.g. %20 for a space).
//
// 2.1 Characters and lines
// An ASCII control character is a character between U+00001F (both including) or U+007F.
while (true)
text.SkipChar();
ReadOnlySpan<char> slice = text.AsSpan();
Debug.Assert(!slice.Contains('\0'));
// This set of invalid characters includes '>'.
int end = slice.IndexOfAny(CharHelper.InvalidAutoLinkCharacters);
if ((uint)end < (uint)slice.Length && slice[end] == '>')
{
c = text.NextChar();
if (c == '\0')
{
break;
}
if (c == '>')
{
text.SkipChar();
link = builder.ToString();
return true;
}
// Chars valid for both scheme and email
if (c <= 127)
{
if (c > ' ' && c != '>')
{
builder.Append(c);
}
else break;
}
else if (!c.IsSpaceOrPunctuation())
{
builder.Append(c);
}
else break;
// We've found '>' and all characters before it are valid.
#if NET
link = string.Concat(builder.AsSpan(), slice.Slice(0, end));
builder.Dispose();
#else
builder.Append(slice.Slice(0, end));
link = builder.ToString();
#endif
text.Start += end + 1; // +1 to skip '>'
return true;
}
}
@@ -345,10 +415,10 @@ public static class LinkHelper
public static bool TryParseInlineLink(ref StringSlice text, out string? link, out string? title, out SourceSpan linkSpan, out SourceSpan titleSpan)
{
// 1. An inline link consists of a link text followed immediately by a left parenthesis (,
// 1. An inline link consists of a link text followed immediately by a left parenthesis (,
// 2. optional whitespace, TODO: specs: is it whitespace or multiple whitespaces?
// 3. an optional link destination,
// 4. an optional link title separated from the link destination by whitespace,
// 3. an optional link destination,
// 4. an optional link title separated from the link destination by whitespace,
// 5. optional whitespace, TODO: specs: is it whitespace or multiple whitespaces?
// 6. and a right parenthesis )
bool isValid = false;
@@ -359,7 +429,7 @@ public static class LinkHelper
linkSpan = SourceSpan.Empty;
titleSpan = SourceSpan.Empty;
// 1. An inline link consists of a link text followed immediately by a left parenthesis (,
// 1. An inline link consists of a link text followed immediately by a left parenthesis (,
if (c == '(')
{
text.SkipChar();
@@ -415,7 +485,7 @@ public static class LinkHelper
{
// Skip ')'
text.SkipChar();
title ??= string.Empty;
// not to normalize nulls into empty strings, since LinkInline.Title property is nullable.
}
return isValid;
@@ -435,10 +505,10 @@ public static class LinkHelper
out SourceSpan triviaAfterTitle,
out bool urlHasPointyBrackets)
{
// 1. An inline link consists of a link text followed immediately by a left parenthesis (,
// 1. An inline link consists of a link text followed immediately by a left parenthesis (,
// 2. optional whitespace, TODO: specs: is it whitespace or multiple whitespaces?
// 3. an optional link destination,
// 4. an optional link title separated from the link destination by whitespace,
// 3. an optional link destination,
// 4. an optional link title separated from the link destination by whitespace,
// 5. optional whitespace, TODO: specs: is it whitespace or multiple whitespaces?
// 6. and a right parenthesis )
bool isValid = false;
@@ -456,7 +526,7 @@ public static class LinkHelper
urlHasPointyBrackets = false;
titleEnclosingCharacter = '\0';
// 1. An inline link consists of a link text followed immediately by a left parenthesis (,
// 1. An inline link consists of a link text followed immediately by a left parenthesis (,
if (c == '(')
{
text.SkipChar();
@@ -545,88 +615,70 @@ public static class LinkHelper
enclosingCharacter = c;
var closingQuote = c == '(' ? ')' : c;
bool hasEscape = false;
// -1: undefined
// 0: has only spaces
// 1: has other characters
int hasOnlyWhiteSpacesSinceLastLine = -1;
while (true)
bool isLineBlank = false; // the first line is never blank
while ((c = text.NextChar()) != '\0')
{
c = text.NextChar();
if (c == '\r' || c == '\n')
{
if (hasOnlyWhiteSpacesSinceLastLine >= 0)
if (isLineBlank)
{
if (hasOnlyWhiteSpacesSinceLastLine == 1)
{
break;
}
hasOnlyWhiteSpacesSinceLastLine = -1;
break;
}
if (hasEscape)
{
hasEscape = false;
buffer.Append('\\');
}
buffer.Append(c);
if (c == '\r' && text.PeekChar() == '\n')
{
buffer.Append('\n');
text.SkipChar();
}
continue;
}
if (c == '\0')
{
break;
isLineBlank = true;
}
if (c == closingQuote)
else if (hasEscape)
{
if (hasEscape)
hasEscape = false;
if (!c.IsAsciiPunctuation())
{
buffer.Append(closingQuote);
hasEscape = false;
continue;
buffer.Append('\\');
}
buffer.Append(c);
}
else if (c == closingQuote)
{
// Skip last quote
text.SkipChar();
goto ReturnValid;
title = buffer.ToString();
return true;
}
if (hasEscape && !c.IsAsciiPunctuation())
{
buffer.Append('\\');
}
if (c == '\\')
else if (c == '\\')
{
hasEscape = true;
continue;
isLineBlank = false;
}
hasEscape = false;
if (c.IsSpaceOrTab())
else
{
if (hasOnlyWhiteSpacesSinceLastLine < 0)
if (isLineBlank && !c.IsSpaceOrTab())
{
hasOnlyWhiteSpacesSinceLastLine = 1;
isLineBlank = false;
}
}
else if (c != '\n' && c != '\r' && text.PeekChar() != '\n')
{
hasOnlyWhiteSpacesSinceLastLine = 0;
}
buffer.Append(c);
buffer.Append(c);
}
}
}
buffer.Dispose();
title = null;
return false;
ReturnValid:
title = buffer.ToString();
return true;
}
public static bool TryParseTitleTrivia<T>(ref T text, out string? title, out char enclosingCharacter) where T : ICharIterator
@@ -642,88 +694,70 @@ public static class LinkHelper
enclosingCharacter = c;
var closingQuote = c == '(' ? ')' : c;
bool hasEscape = false;
// -1: undefined
// 0: has only spaces
// 1: has other characters
int hasOnlyWhiteSpacesSinceLastLine = -1;
while (true)
bool isLineBlank = false; // the first line is never blank
while ((c = text.NextChar()) != '\0')
{
c = text.NextChar();
if (c == '\r' || c == '\n')
{
if (hasOnlyWhiteSpacesSinceLastLine >= 0)
if (isLineBlank)
{
if (hasOnlyWhiteSpacesSinceLastLine == 1)
{
break;
}
hasOnlyWhiteSpacesSinceLastLine = -1;
break;
}
if (hasEscape)
{
hasEscape = false;
buffer.Append('\\');
}
buffer.Append(c);
if (c == '\r' && text.PeekChar() == '\n')
{
buffer.Append('\n');
text.SkipChar();
}
continue;
}
if (c == '\0')
{
break;
isLineBlank = true;
}
if (c == closingQuote)
else if (hasEscape)
{
if (hasEscape)
hasEscape = false;
if (!c.IsAsciiPunctuation())
{
buffer.Append(closingQuote);
hasEscape = false;
continue;
buffer.Append('\\');
}
buffer.Append(c);
}
else if (c == closingQuote)
{
// Skip last quote
text.SkipChar();
goto ReturnValid;
title = buffer.ToString();
return true;
}
if (hasEscape && !c.IsAsciiPunctuation())
{
buffer.Append('\\');
}
if (c == '\\')
else if (c == '\\')
{
hasEscape = true;
continue;
isLineBlank = false;
}
hasEscape = false;
if (c.IsSpaceOrTab())
else
{
if (hasOnlyWhiteSpacesSinceLastLine < 0)
if (isLineBlank && !c.IsSpaceOrTab())
{
hasOnlyWhiteSpacesSinceLastLine = 1;
isLineBlank = false;
}
}
else if (c != '\n' && c != '\r' && text.PeekChar() != '\n')
{
hasOnlyWhiteSpacesSinceLastLine = 0;
}
buffer.Append(c);
buffer.Append(c);
}
}
}
buffer.Dispose();
title = null;
return false;
ReturnValid:
title = buffer.ToString();
return true;
}
public static bool TryParseUrl<T>(T text, [NotNullWhen(true)] out string? link) where T : ICharIterator
@@ -739,7 +773,7 @@ public static class LinkHelper
var c = text.CurrentChar;
// a sequence of zero or more characters between an opening < and a closing >
// a sequence of zero or more characters between an opening < and a closing >
// that contains no line breaks, or unescaped < or > characters, or
if (c == '<')
{
@@ -760,12 +794,15 @@ public static class LinkHelper
break;
}
if (hasEscape && !c.IsAsciiPunctuation())
if (hasEscape)
{
buffer.Append('\\');
hasEscape = false;
if (!c.IsAsciiPunctuation())
{
buffer.Append('\\');
}
}
if (c == '\\')
else if (c == '\\')
{
hasEscape = true;
continue;
@@ -776,8 +813,6 @@ public static class LinkHelper
break;
}
hasEscape = false;
buffer.Append(c);
} while (c != '\0');
@@ -785,9 +820,9 @@ public static class LinkHelper
else
{
// a nonempty sequence of characters that does not start with <, does not include ASCII space or control characters,
// and includes parentheses only if (a) they are backslash-escaped or (b) they are part of a
// balanced pair of unescaped parentheses that is not itself inside a balanced pair of unescaped
// parentheses.
// and includes parentheses only if (a) they are backslash-escaped or (b) they are part of a
// balanced pair of unescaped parentheses that is not itself inside a balanced pair of unescaped
// parentheses.
bool hasEscape = false;
int openedParent = 0;
while (true)
@@ -816,20 +851,21 @@ public static class LinkHelper
if (!isAutoLink)
{
if (hasEscape && !c.IsAsciiPunctuation())
if (hasEscape)
{
buffer.Append('\\');
hasEscape = false;
if (!c.IsAsciiPunctuation())
{
buffer.Append('\\');
}
}
// If we have an escape
if (c == '\\')
else if (c == '\\')
{
hasEscape = true;
c = text.NextChar();
continue;
}
hasEscape = false;
}
if (IsEndOfUri(c, isAutoLink))
@@ -886,7 +922,7 @@ public static class LinkHelper
var c = text.CurrentChar;
// a sequence of zero or more characters between an opening < and a closing >
// a sequence of zero or more characters between an opening < and a closing >
// that contains no line breaks, or unescaped < or > characters, or
if (c == '<')
{
@@ -907,12 +943,15 @@ public static class LinkHelper
break;
}
if (hasEscape && !c.IsAsciiPunctuation())
if (hasEscape)
{
buffer.Append('\\');
hasEscape = false;
if (!c.IsAsciiPunctuation())
{
buffer.Append('\\');
}
}
if (c == '\\')
else if (c == '\\')
{
hasEscape = true;
continue;
@@ -923,8 +962,6 @@ public static class LinkHelper
break;
}
hasEscape = false;
buffer.Append(c);
} while (c != '\0');
@@ -932,9 +969,9 @@ public static class LinkHelper
else
{
// a nonempty sequence of characters that does not start with <, does not include ASCII space or control characters,
// and includes parentheses only if (a) they are backslash-escaped or (b) they are part of a
// balanced pair of unescaped parentheses that is not itself inside a balanced pair of unescaped
// parentheses.
// and includes parentheses only if (a) they are backslash-escaped or (b) they are part of a
// balanced pair of unescaped parentheses that is not itself inside a balanced pair of unescaped
// parentheses.
bool hasEscape = false;
int openedParent = 0;
while (true)
@@ -963,20 +1000,21 @@ public static class LinkHelper
if (!isAutoLink)
{
if (hasEscape && !c.IsAsciiPunctuation())
if (hasEscape)
{
buffer.Append('\\');
hasEscape = false;
if (!c.IsAsciiPunctuation())
{
buffer.Append('\\');
}
}
// If we have an escape
if (c == '\\')
else if (c == '\\')
{
hasEscape = true;
c = text.NextChar();
continue;
}
hasEscape = false;
}
if (IsEndOfUri(c, isAutoLink))
@@ -1038,7 +1076,7 @@ public static class LinkHelper
return c == '\0' || c.IsSpaceOrTab() || c.IsControl() || (isAutoLink && c == '<'); // TODO: specs unclear. space is strict or relaxed? (includes tabs?)
}
public static bool IsValidDomain(string link, int prefixLength)
public static bool IsValidDomain(string link, int prefixLength, bool allowDomainWithoutPeriod = false)
{
// https://github.github.com/gfm/#extended-www-autolink
// A valid domain consists of alphanumeric characters, underscores (_), hyphens (-) and periods (.).
@@ -1051,22 +1089,22 @@ public static class LinkHelper
bool segmentHasCharacters = false;
int lastUnderscoreSegment = -1;
for (int i = prefixLength; i < link.Length; i++)
for (int i = prefixLength; (uint)i < (uint)link.Length; i++)
{
char c = link[i];
if (c == '.') // New segment
{
if (!segmentHasCharacters)
return false;
segmentCount++;
segmentHasCharacters = false;
continue;
}
if (!c.IsAlphaNumeric())
{
if (c == '.') // New segment
{
if (!segmentHasCharacters)
return false;
segmentCount++;
segmentHasCharacters = false;
continue;
}
if (c == '/' || c == '?' || c == '#' || c == ':') // End of domain name
break;
@@ -1074,7 +1112,7 @@ public static class LinkHelper
{
lastUnderscoreSegment = segmentCount;
}
else if (c != '-' && c.IsSpaceOrPunctuation())
else if (c != '-' && CharHelper.IsSpaceOrPunctuationForGFMAutoLink(c))
{
// An invalid character has been found
return false;
@@ -1084,7 +1122,7 @@ public static class LinkHelper
segmentHasCharacters = true;
}
return segmentCount != 1 && // At least one dot was present
return (segmentCount != 1 || allowDomainWithoutPeriod) && // At least one dot was present
segmentHasCharacters && // Last segment has valid characters
segmentCount - lastUnderscoreSegment >= 2; // No underscores are present in the last two segments of the domain
}
@@ -1161,9 +1199,9 @@ public static class LinkHelper
c = text.NextChar();
}
if (c != '\0' && c != '\n' && c != '\r' && text.PeekChar() != '\n')
if (c != '\0' && c != '\n' && c != '\r')
{
// If we were able to parse the url but the title doesn't end with space,
// If we were able to parse the url but the title doesn't end with space,
// we are still returning a valid definition
if (newLineCount > 0 && title != null)
{
@@ -1301,9 +1339,9 @@ public static class LinkHelper
c = text.NextChar();
}
if (c != '\0' && c != '\n' && c != '\r' && text.PeekChar() != '\n')
if (c != '\0' && c != '\n' && c != '\r')
{
// If we were able to parse the url but the title doesn't end with space,
// If we were able to parse the url but the title doesn't end with space,
// we are still returning a valid definition
if (newLineCount > 0 && title != null)
{
@@ -1601,4 +1639,4 @@ public static class LinkHelper
label = buffer.ToString();
return true;
}
}
}

View File

@@ -1,11 +1,12 @@
// Copyright (c) Alexandre Mutel. All rights reserved.
// This file is licensed under the BSD-Clause 2 license.
// This file is licensed under the BSD-Clause 2 license.
// See the license.txt file in the project root for more information.
#nullable disable
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Text;
namespace Markdig.Helpers;
@@ -125,6 +126,34 @@ public struct StringSlice : ICharIterator
}
}
/// <summary>
/// Gets the current <see cref="Rune"/>. Recognizes supplementary code points that cannot be covered by a single <see cref="char"/>.
/// </summary>
/// <returns>The current rune or <see langword="default"/> if the current position contains an incomplete surrogate pair or <see cref="IsEmpty"/>.</returns>
#if NET
public
#else
internal
#endif
readonly Rune CurrentRune
{
get
{
int start = Start;
if (start > End) return default;
char first = Text[start];
// '\0' is stored in `rune` if `TryCreate` returns false
if (!Rune.TryCreate(first, out Rune rune) && start + 1 <= End)
{
// The first character is a surrogate, check if we have a valid pair
Rune.TryCreate(first, Text[start + 1], out rune);
}
return rune;
}
}
/// <summary>
/// Gets a value indicating whether this instance is empty.
/// </summary>
@@ -145,6 +174,32 @@ public struct StringSlice : ICharIterator
get => Text[index];
}
/// <summary>
/// Gets the <see cref="Rune"/> at the specified index.
/// Recognizes supplementary code points that cannot be covered by a single <see cref="char"/>.
/// </summary>
/// <param name="index">The index into <see cref="Text"/>.</param>
/// <returns>The rune at the specified index or <see langword="default"/> if the location contains an incomplete surrogate pair.</returns>
/// <exception cref="IndexOutOfRangeException">Thrown when the given <paramref name="index"/> is out of range</exception>
#if NET
public
#else
internal
#endif
readonly Rune RuneAt(int index)
{
string text = Text;
char first = text[index];
if (!Rune.TryCreate(first, out Rune rune) && (uint)(index + 1) < (uint)text.Length)
{
// The first character is a surrogate, check if we have a valid pair
Rune.TryCreate(first, text[index + 1], out rune);
}
return rune;
}
/// <summary>
/// Goes to the next character, incrementing the <see cref="Start" /> position.
@@ -166,6 +221,50 @@ public struct StringSlice : ICharIterator
return Text[start];
}
/// <summary>
/// Goes to the next <see cref="Rune"/>, incrementing the <see cref="Start"/> position.
/// If <see cref="CurrentRune"/> is a supplementary character, <see cref="Start"/> will be advanced by 2.
/// </summary>
/// <returns>The current rune or <see langword="default"/> if the next position contains an incomplete surrogate pair or <see cref="IsEmpty"/>.</returns>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
#if NET
public
#else
internal
#endif
Rune NextRune()
{
int start = Start;
if (start >= End)
{
Start = End + 1;
return default;
}
// Start may be pointing at the start of a previous surrogate pair. Check if we have to advance by 2 chars.
if (
// Advance to the next character, checking for a valid surrogate pair
char.IsHighSurrogate(Text[start++])
// Don't unconditionally increment `start` here. Check the surrogate code unit at `start` is a part of a valid surrogate pair first.
&& start <= End
&& char.IsLowSurrogate(Text[start]))
{
// Valid surrogate pair representing a supplementary character
start++;
}
Start = start;
var first = Text[start];
// '\0' is stored in `rune` if `TryCreate` returns false
if (!Rune.TryCreate(first, out Rune rune) && start + 1 <= End)
{
// Supplementary character
Rune.TryCreate(first, Text[start + 1], out rune);
}
return rune;
}
/// <summary>
/// Goes to the next character, incrementing the <see cref="Start" /> position.
/// </summary>
@@ -231,7 +330,7 @@ public struct StringSlice : ICharIterator
}
/// <summary>
/// Peeks a character at the specified offset from the current begining of the slice
/// Peeks a character at the specified offset from the current beginning of the slice
/// without using the range <see cref="Start"/> or <see cref="End"/>, returns `\0` if outside the <see cref="Text"/>.
/// </summary>
/// <param name="offset">The offset.</param>
@@ -244,6 +343,60 @@ public struct StringSlice : ICharIterator
return (uint)index < (uint)text.Length ? text[index] : '\0';
}
/// <summary>
/// Peeks a <see cref="Rune"/> at the specified offset from the current beginning of the slice
/// without using the range <see cref="Start"/> or <see cref="End"/>, returns <see langword="default"/> if outside the <see cref="Text"/>.
/// Recognizes supplementary code points that cannot be covered by a single <see cref="char"/>.
/// A positive <paramref name="offset"/> value expects the <em>high</em> surrogate and a negative <paramref name="offset"/> expects the <em>low</em> surrogate of the surrogate pair of a supplementary character at that position.
/// </summary>
/// <param name="offset">The offset.</param>
/// <returns>The rune at the specified offset, returns default if none.</returns>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
#if NET
public
#else
internal
#endif
readonly Rune PeekRuneExtra(int offset)
{
int index = Start + offset;
string text = Text;
if ((uint)index >= (uint)text.Length)
{
return default;
}
var bmpOrNearerSurrogate = text[index];
if (Rune.TryCreate(bmpOrNearerSurrogate, out var rune))
{
// BMP
return rune;
}
// Check if we have a valid surrogate pair
if (offset < 0)
{
// The code unit at `index` should be a low surrogate
// The scalar value (rune) of a supplementary character should start at `index - 1`, which should be a high surrogate
// By casting to uint and comparing with < text.Length ("abusing" overflow), we can check both > 0 and < text.Length in one check
if ((uint)(index - 1) < (uint)text.Length)
{
// Stores '\0' in `rune` if `TryCreate` returns false
Rune.TryCreate(text[index - 1], bmpOrNearerSurrogate, out rune);
}
}
else
{
// The code unit at `index` should be a high surrogate and the start of a scalar value (rune) of a supplementary character
if ((uint)(index + 1) < (uint)text.Length)
{
Rune.TryCreate(bmpOrNearerSurrogate, text[index + 1], out rune);
}
}
return rune;
}
/// <summary>
/// Matches the specified text.
/// </summary>
@@ -291,7 +444,7 @@ public struct StringSlice : ICharIterator
var c = Text[i];
if (c.IsWhitespace())
{
if (c == '\0' || c == '\n' || (c == '\r' && i + 1 <= End && Text[i + 1] != '\n'))
if (c == '\n' || (c == '\r' && i + 1 <= End && Text[i + 1] != '\n'))
{
return true;
}
@@ -474,7 +627,7 @@ public struct StringSlice : ICharIterator
return default;
}
#if NETCOREAPP3_1_OR_GREATER
#if NET
return MemoryMarshal.CreateReadOnlySpan(ref Unsafe.Add(ref Unsafe.AsRef(in text.GetPinnableReference()), start), length);
#else
return text.AsSpan(start, length);

View File

@@ -22,9 +22,147 @@ internal static class UnicodeUtility
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static void GetUtf16SurrogatesFromSupplementaryPlaneScalar(uint value, out char highSurrogateCodePoint, out char lowSurrogateCodePoint)
{
Debug.Assert(IsValidUnicodeScalar(value) && IsBmpCodePoint(value));
Debug.Assert(IsValidUnicodeScalar(value) && !IsBmpCodePoint(value));
highSurrogateCodePoint = (char)((value + ((0xD800u - 0x40u) << 10)) >> 10);
lowSurrogateCodePoint = (char)((value & 0x3FFu) + 0xDC00u);
}
#if !NETCOREAPP3_0_OR_GREATER
// The following section is used only for the implementation of Rune.
/// <summary>
/// The Unicode replacement character U+FFFD.
/// </summary>
public const uint ReplacementChar = 0xFFFD;
/// <summary>
/// Returns the Unicode plane (0 through 16, inclusive) which contains this code point.
/// </summary>
public static int GetPlane(uint codePoint)
{
UnicodeDebug.AssertIsValidCodePoint(codePoint);
return (int)(codePoint >> 16);
}
/// <summary>
/// Returns a Unicode scalar value from two code points representing a UTF-16 surrogate pair.
/// </summary>
public static uint GetScalarFromUtf16SurrogatePair(uint highSurrogateCodePoint, uint lowSurrogateCodePoint)
{
UnicodeDebug.AssertIsHighSurrogateCodePoint(highSurrogateCodePoint);
UnicodeDebug.AssertIsLowSurrogateCodePoint(lowSurrogateCodePoint);
// This calculation comes from the Unicode specification, Table 3-5.
// Need to remove the D800 marker from the high surrogate and the DC00 marker from the low surrogate,
// then fix up the "wwww = uuuuu - 1" section of the bit distribution. The code is written as below
// to become just two instructions: shl, lea.
return (highSurrogateCodePoint << 10) + lowSurrogateCodePoint - ((0xD800U << 10) + 0xDC00U - (1 << 16));
}
/// <summary>
/// Returns <see langword="true"/> iff <paramref name="value"/> is an ASCII
/// character ([ U+0000..U+007F ]).
/// </summary>
/// <remarks>
/// Per http://www.unicode.org/glossary/#ASCII, ASCII is only U+0000..U+007F.
/// </remarks>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool IsAsciiCodePoint(uint value) => value <= 0x7Fu;
/// <summary>
/// Returns <see langword="true"/> iff <paramref name="value"/> is a UTF-16 high surrogate code point,
/// i.e., is in [ U+D800..U+DBFF ], inclusive.
/// </summary>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool IsHighSurrogateCodePoint(uint value) => IsInRangeInclusive(value, 0xD800U, 0xDBFFU);
/// <summary>
/// Returns <see langword="true"/> iff <paramref name="value"/> is between
/// <paramref name="lowerBound"/> and <paramref name="upperBound"/>, inclusive.
/// </summary>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool IsInRangeInclusive(uint value, uint lowerBound, uint upperBound) => (value - lowerBound) <= (upperBound - lowerBound);
/// <summary>
/// Returns <see langword="true"/> iff <paramref name="value"/> is a UTF-16 low surrogate code point,
/// i.e., is in [ U+DC00..U+DFFF ], inclusive.
/// </summary>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool IsLowSurrogateCodePoint(uint value) => IsInRangeInclusive(value, 0xDC00U, 0xDFFFU);
/// <summary>
/// Returns <see langword="true"/> iff <paramref name="value"/> is a UTF-16 surrogate code point,
/// i.e., is in [ U+D800..U+DFFF ], inclusive.
/// </summary>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool IsSurrogateCodePoint(uint value) => IsInRangeInclusive(value, 0xD800U, 0xDFFFU);
/// <summary>
/// Returns <see langword="true"/> iff <paramref name="codePoint"/> is a valid Unicode code
/// point, i.e., is in [ U+0000..U+10FFFF ], inclusive.
/// </summary>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool IsValidCodePoint(uint codePoint) => codePoint <= 0x10FFFFU;
/// <summary>
/// Given a Unicode scalar value, gets the number of UTF-16 code units required to represent this value.
/// </summary>
public static int GetUtf16SequenceLength(uint value)
{
UnicodeDebug.AssertIsValidScalar(value);
value -= 0x10000; // if value < 0x10000, high byte = 0xFF; else high byte = 0x00
value += (2 << 24); // if value < 0x10000, high byte = 0x01; else high byte = 0x02
value >>= 24; // shift high byte down
return (int)value; // and return it
}
/// <summary>
/// Given a Unicode scalar value, gets the number of UTF-8 code units required to represent this value.
/// </summary>
public static int GetUtf8SequenceLength(uint value)
{
UnicodeDebug.AssertIsValidScalar(value);
// The logic below can handle all valid scalar values branchlessly.
// It gives generally good performance across all inputs, and on x86
// it's only six instructions: lea, sar, xor, add, shr, lea.
// 'a' will be -1 if input is < 0x800; else 'a' will be 0
// => 'a' will be -1 if input is 1 or 2 UTF-8 code units; else 'a' will be 0
int a = ((int)value - 0x0800) >> 31;
// The number of UTF-8 code units for a given scalar is as follows:
// - U+0000..U+007F => 1 code unit
// - U+0080..U+07FF => 2 code units
// - U+0800..U+FFFF => 3 code units
// - U+10000+ => 4 code units
//
// If we XOR the incoming scalar with 0xF800, the chart mutates:
// - U+0000..U+F7FF => 3 code units
// - U+F800..U+F87F => 1 code unit
// - U+F880..U+FFFF => 2 code units
// - U+10000+ => 4 code units
//
// Since the 1- and 3-code unit cases are now clustered, they can
// both be checked together very cheaply.
value ^= 0xF800u;
value -= 0xF880u; // if scalar is 1 or 3 code units, high byte = 0xFF; else high byte = 0x00
value += (4 << 24); // if scalar is 1 or 3 code units, high byte = 0x03; else high byte = 0x04
value >>= 24; // shift high byte down
// Final return value:
// - U+0000..U+007F => 3 + (-1) * 2 = 1
// - U+0080..U+07FF => 4 + (-1) * 2 = 2
// - U+0800..U+FFFF => 3 + ( 0) * 2 = 3
// - U+10000+ => 4 + ( 0) * 2 = 4
return (int)value + (a * 2);
}
#endif
}

View File

@@ -8,13 +8,13 @@
<TargetFrameworks>net462;netstandard2.0;netstandard2.1;net8.0;net9.0</TargetFrameworks>
<CheckEolTargetFramework>false</CheckEolTargetFramework>
<PackageTags>Markdown CommonMark md html md2html</PackageTags>
<PackageReleaseNotes>https://github.com/lunet-io/markdig/blob/master/changelog.md</PackageReleaseNotes>
<PackageReleaseNotes>https://github.com/xoofx/markdig/blob/master/changelog.md</PackageReleaseNotes>
<PackageLicenseExpression>BSD-2-Clause</PackageLicenseExpression>
<PackageReadmeFile>readme.md</PackageReadmeFile>
<PackageIcon>markdig.png</PackageIcon>
<PackageProjectUrl>https://github.com/lunet-io/markdig</PackageProjectUrl>
<PackageProjectUrl>https://github.com/xoofx/markdig</PackageProjectUrl>
<AllowUnsafeBlocks>true</AllowUnsafeBlocks>
<LangVersion>12</LangVersion>
<LangVersion>preview</LangVersion>
<Nullable>enable</Nullable>
<NoWarn>$(NoWarn);CS1591</NoWarn>
<GenerateDocumentationFile>true</GenerateDocumentationFile>
@@ -24,18 +24,17 @@
<SymbolPackageFormat>snupkg</SymbolPackageFormat>
</PropertyGroup>
<ItemGroup Condition=" '$(TargetFramework)' == 'net462' OR '$(TargetFramework)' == 'netstandard2.0'">
<PackageReference Include="System.Memory" Version="4.5.5" />
<ItemGroup Condition=" '$(TargetFramework)' == 'net462' OR '$(TargetFramework)' == 'netstandard2.0'">
<PackageReference Include="System.Memory" />
</ItemGroup>
<ItemGroup>
<None Include="../../img/markdig.png" Pack="true" PackagePath="" />
<None Include="../../readme.md" Pack="true" PackagePath="/"/>
<PackageReference Include="MinVer" Version="4.3.0">
<None Include="../../readme.md" Pack="true" PackagePath="/" />
<PackageReference Include="MinVer">
<PrivateAssets>all</PrivateAssets>
<IncludeAssets>runtime; build; native; contentfiles; analyzers; buildtransitive</IncludeAssets>
</PackageReference>
<PackageReference Include="Microsoft.SourceLink.GitHub" Version="8.0.*" PrivateAssets="All"/>
</ItemGroup>
<Target Name="PatchVersion" AfterTargets="MinVer">

View File

@@ -2,8 +2,8 @@
// This file is licensed under the BSD-Clause 2 license.
// See the license.txt file in the project root for more information.
using System.Diagnostics.CodeAnalysis;
using System.IO;
using System.Linq;
using System.Reflection;
using Markdig.Helpers;
@@ -19,13 +19,13 @@ namespace Markdig;
/// </summary>
public static class Markdown
{
public static string Version =>
s_version ??= typeof(Markdown).Assembly.GetCustomAttribute<AssemblyFileVersionAttribute>()?.Version ?? "Unknown";
private static string? s_version;
[field: MaybeNull]
public static string Version => field ??= typeof(Markdown).Assembly.GetCustomAttribute<AssemblyFileVersionAttribute>()?.Version ?? "Unknown";
internal static readonly MarkdownPipeline DefaultPipeline = new MarkdownPipelineBuilder().Build();
private static readonly MarkdownPipeline _defaultTrackTriviaPipeline = new MarkdownPipelineBuilder().EnableTrackTrivia().Build();
[field: MaybeNull]
private static MarkdownPipeline DefaultTrackTriviaPipeline => field ??= new MarkdownPipelineBuilder().EnableTrackTrivia().Build();
private static MarkdownPipeline GetPipeline(MarkdownPipeline? pipeline, string markdown)
{
@@ -90,8 +90,8 @@ public static class Markdown
/// <param name="markdown">A Markdown text.</param>
/// <param name="pipeline">The pipeline used for the conversion.</param>
/// <param name="context">A parser context used for the parsing.</param>
/// <returns>The result of the conversion</returns>
/// <exception cref="ArgumentNullException">if markdown variable is null</exception>
/// <returns>The HTML string.</returns>
/// <exception cref="ArgumentNullException">If <paramref name="markdown"/> is null.</exception>
public static string ToHtml(string markdown, MarkdownPipeline? pipeline = null, MarkdownParserContext? context = null)
{
if (markdown is null) ThrowHelper.ArgumentNullException_markdown();
@@ -108,8 +108,8 @@ public static class Markdown
/// </summary>
/// <param name="document">A Markdown document.</param>
/// <param name="pipeline">The pipeline used for the conversion.</param>
/// <returns>The result of the conversion</returns>
/// <exception cref="ArgumentNullException">if markdown document variable is null</exception>
/// <returns>The HTML string.</returns>
/// <exception cref="ArgumentNullException">If <paramref name="document"/> is null.</exception>
public static string ToHtml(this MarkdownDocument document, MarkdownPipeline? pipeline = null)
{
if (document is null) ThrowHelper.ArgumentNullException(nameof(document));
@@ -131,8 +131,8 @@ public static class Markdown
/// <param name="document">A Markdown document.</param>
/// <param name="writer">The destination <see cref="TextWriter"/> that will receive the result of the conversion.</param>
/// <param name="pipeline">The pipeline used for the conversion.</param>
/// <returns>The result of the conversion</returns>
/// <exception cref="ArgumentNullException">if markdown document variable is null</exception>
/// <returns>The HTML string.</returns>
/// <exception cref="ArgumentNullException">If <paramref name="document"/> is null.</exception>
public static void ToHtml(this MarkdownDocument document, TextWriter writer, MarkdownPipeline? pipeline = null)
{
if (document is null) ThrowHelper.ArgumentNullException(nameof(document));
@@ -165,11 +165,7 @@ public static class Markdown
var document = MarkdownParser.Parse(markdown, pipeline, context);
using var rentedRenderer = pipeline.RentHtmlRenderer(writer);
HtmlRenderer renderer = rentedRenderer.Instance;
renderer.Render(document);
writer.Flush();
ToHtml(document, writer, pipeline);
return document;
}
@@ -206,7 +202,7 @@ public static class Markdown
{
if (markdown is null) ThrowHelper.ArgumentNullException_markdown();
MarkdownPipeline? pipeline = trackTrivia ? _defaultTrackTriviaPipeline : null;
MarkdownPipeline? pipeline = trackTrivia ? DefaultTrackTriviaPipeline : null;
return Parse(markdown, pipeline);
}

View File

@@ -538,7 +538,7 @@ public static class MarkdownExtensions
var inlineParser = pipeline.InlineParsers.Find<AutolinkInlineParser>();
if (inlineParser != null)
{
inlineParser.EnableHtmlParsing = false;
inlineParser.Options.EnableHtmlParsing = false;
}
return pipeline;
}

View File

@@ -127,7 +127,7 @@ public sealed class MarkdownPipeline
}
}
internal readonly struct RentedHtmlRenderer : IDisposable
internal readonly ref struct RentedHtmlRenderer : IDisposable
{
private readonly HtmlRendererCache _cache;
public readonly HtmlRenderer Instance;

View File

@@ -16,7 +16,7 @@ namespace Markdig;
/// <remarks>NOTE: A pipeline is not thread-safe.</remarks>
public class MarkdownPipelineBuilder
{
private MarkdownPipeline? pipeline;
private MarkdownPipeline? _pipeline;
/// <summary>
/// Initializes a new instance of the <see cref="MarkdownPipeline" /> class.
@@ -95,9 +95,9 @@ public class MarkdownPipelineBuilder
/// <exception cref="InvalidOperationException">An extension cannot be null</exception>
public MarkdownPipeline Build()
{
if (pipeline != null)
if (_pipeline != null)
{
return pipeline;
return _pipeline;
}
// TODO: Review the whole initialization process for extensions
@@ -115,7 +115,7 @@ public class MarkdownPipelineBuilder
extension.Setup(this);
}
pipeline = new MarkdownPipeline(
_pipeline = new MarkdownPipeline(
new OrderedList<IMarkdownExtension>(Extensions),
new BlockParserList(BlockParsers),
new InlineParserList(InlineParsers),
@@ -125,6 +125,6 @@ public class MarkdownPipelineBuilder
PreciseSourceLocation = PreciseSourceLocation,
TrackTrivia = TrackTrivia,
};
return pipeline;
return _pipeline;
}
}

View File

@@ -493,8 +493,34 @@ public class BlockProcessor
ContinueProcessingLine = true;
ResetLine(newLine);
ResetLine(newLine, 0);
Process();
LineIndex++;
}
/// <summary>
/// Processes part of a line.
/// </summary>
/// <param name="line">The line.</param>
/// <param name="column">The column.</param>
public void ProcessLinePart(StringSlice line, int column)
{
CurrentLineStartPosition = line.Start - column;
ContinueProcessingLine = true;
ResetLine(line, column);
Process();
}
/// <summary>
/// Process current string slice.
/// </summary>
private void Process()
{
TryContinueBlocks();
// If the line was not entirely processed by pending blocks, try to process it with any new block
@@ -502,8 +528,6 @@ public class BlockProcessor
// Close blocks that are no longer opened
CloseAll(false);
LineIndex++;
}
internal bool IsOpen(Block block)
@@ -956,18 +980,17 @@ public class BlockProcessor
ContinueProcessingLine = !result.IsDiscard();
}
private void ResetLine(StringSlice newLine)
private void ResetLine(StringSlice newLine, int column)
{
Line = newLine;
Column = 0;
Column = column;
ColumnBeforeIndent = 0;
StartBeforeIndent = Start;
originalLineStart = newLine.Start;
originalLineStart = newLine.Start - column;
TriviaStart = newLine.Start;
}
[MemberNotNull(nameof(Document), nameof(Parsers))]
internal void Setup(MarkdownDocument document, BlockParserList parsers, MarkdownParserContext? context, bool trackTrivia)
{

View File

@@ -322,7 +322,7 @@ public abstract class FencedBlockParserBase<T> : FencedBlockParserBase where T :
if (fence.OpeningFencedCharCount <= closingCount &&
!processor.IsCodeIndent &&
(c == '\0' || c.IsWhitespace()) &&
c.IsWhiteSpaceOrZero() &&
line.TrimEnd())
{
block.UpdateSpanEnd(startBeforeTrim - 1);

View File

@@ -139,9 +139,7 @@ public class HtmlBlockParser : BlockParser
c = line.NextChar();
}
if (
!(c == '>' || (!hasLeadingClose && c == '/' && line.PeekChar() == '>') || c.IsWhitespace() ||
c == '\0'))
if (!(c == '>' || (!hasLeadingClose && c == '/' && line.PeekChar() == '>') || c.IsWhiteSpaceOrZero()))
{
return BlockState.None;
}
@@ -297,7 +295,7 @@ public class HtmlBlockParser : BlockParser
return BlockState.Continue;
}
private static readonly CompactPrefixTree<int> HtmlTags = new(66, 94, 83)
private static readonly CompactPrefixTree<int> HtmlTags = new(67, 96, 86)
{
{ "address", 0 },
{ "article", 1 },
@@ -364,6 +362,7 @@ public class HtmlBlockParser : BlockParser
{ "title", 62 },
{ "tr", 63 },
{ "track", 64 },
{ "ul", 65 }
{ "ul", 65 },
{ "search", 66 },
};
}

View File

@@ -3,6 +3,7 @@
// See the license.txt file in the project root for more information.
using Markdig.Helpers;
using Markdig.Renderers.Html;
using Markdig.Syntax;
using Markdig.Syntax.Inlines;
@@ -14,19 +15,21 @@ namespace Markdig.Parsers.Inlines;
/// <seealso cref="InlineParser" />
public class AutolinkInlineParser : InlineParser
{
/// <summary>
/// Initializes a new instance of the <see cref="AutolinkInlineParser"/> class.
/// </summary>
public AutolinkInlineParser()
public AutolinkInlineParser() : this(new AutolinkOptions())
{
OpeningCharacters = ['<'];
EnableHtmlParsing = true;
}
/// <summary>
/// Gets or sets a value indicating whether to enable HTML parsing. Default is <c>true</c>
/// Initializes a new instance of the <see cref="AutolinkInlineParser"/> class.
/// </summary>
public bool EnableHtmlParsing { get; set; }
public AutolinkInlineParser(AutolinkOptions options)
{
Options = options ?? throw new ArgumentNullException(nameof(options));
OpeningCharacters = ['<'];
}
public readonly AutolinkOptions Options;
public override bool Match(InlineProcessor processor, ref StringSlice slice)
{
@@ -42,8 +45,12 @@ public class AutolinkInlineParser : InlineParser
Line = line,
Column = column
};
if (Options.OpenInNewWindow)
{
processor.Inline.GetAttributes().AddPropertyIfNotExist("target", "_blank");
}
}
else if (EnableHtmlParsing)
else if (Options.EnableHtmlParsing)
{
slice = saved;
if (!HtmlHelper.TryParseHtmlTag(ref slice, out string? htmlTag))
@@ -57,6 +64,10 @@ public class AutolinkInlineParser : InlineParser
Line = line,
Column = column
};
if (Options.OpenInNewWindow)
{
processor.Inline.GetAttributes().AddPropertyIfNotExist("target", "_blank");
}
}
else
{

View File

@@ -0,0 +1,13 @@
// Copyright (c) Alexandre Mutel. All rights reserved.
// This file is licensed under the BSD-Clause 2 license.
// See the license.txt file in the project root for more information.
namespace Markdig.Parsers.Inlines;
public class AutolinkOptions : LinkOptions
{
/// <summary>
/// Gets or sets a value indicating whether to enable HTML parsing. Default is <c>true</c>
/// </summary>
public bool EnableHtmlParsing { get; set; } = true;
}

View File

@@ -4,6 +4,7 @@
using System.Diagnostics;
using Markdig.Extensions.Tables;
using Markdig.Helpers;
using Markdig.Syntax;
using Markdig.Syntax.Inlines;
@@ -35,6 +36,7 @@ public class CodeInlineParser : InlineParser
Debug.Assert(match is not ('\r' or '\n'));
// Match the opened sticks
int openingStart = slice.Start;
int openSticks = slice.CountAndSkipChar(match);
// A backtick string is a string of one or more backtick characters (`) that is neither preceded nor followed by a backtick.
@@ -75,8 +77,22 @@ public class CodeInlineParser : InlineParser
{
break;
}
else if (closeSticks == 0)
if (closeSticks == 0)
{
if (span.TrimStart(['\r', '\n']).StartsWith('|'))
{
// We saw the start of a code inline, but the close sticks are not present on the same line.
// If the next line starts with a pipe character, this is likely an incomplete CodeInline within a table.
// Treat it as regular text to avoid breaking the overall table shape.
// Use ContainsParentOrSiblingOfType to handle both nested and flat pipe table structures.
if (processor.Inline != null && processor.Inline.ContainsParentOrSiblingOfType<PipeTableDelimiterInline>())
{
slice.Start = openingStart;
return false;
}
}
containsNewLines = true;
span = span.Slice(1);
}

View File

@@ -3,12 +3,14 @@
// See the license.txt file in the project root for more information.
using Markdig.Helpers;
using System.Diagnostics;
namespace Markdig.Parsers.Inlines;
/// <summary>
/// Descriptor for an emphasis.
/// </summary>
[DebuggerDisplay("Emphasis Char={Character}, Min={MinimumCount}, Max={MaximumCount}, EnableWithinWord={EnableWithinWord}")]
public sealed class EmphasisDescriptor
{
/// <summary>

View File

@@ -4,7 +4,7 @@
using System.Diagnostics;
using System.Runtime.CompilerServices;
using System.Text;
using Markdig.Helpers;
using Markdig.Renderers.Html;
using Markdig.Syntax;
@@ -125,7 +125,10 @@ public class EmphasisInlineParser : InlineParser, IPostInlineProcessor
}
// Follow DelimiterInline (EmphasisDelimiter, TableDelimiter...)
child = delimiterInline.FirstChild;
// If the delimiter has IsClosed=true (e.g., pipe table delimiter), it has no children
// In that case, continue to next sibling instead of stopping
var firstChild = delimiterInline.FirstChild;
child = firstChild ?? delimiterInline.NextSibling;
}
else
{
@@ -150,18 +153,19 @@ public class EmphasisInlineParser : InlineParser, IPostInlineProcessor
var delimiterChar = slice.CurrentChar;
var emphasisDesc = emphasisMap![delimiterChar]!;
char pc = (char)0;
Rune pc = (Rune)0;
if (processor.Inline is HtmlEntityInline htmlEntityInline)
{
if (htmlEntityInline.Transcoded.Length > 0)
{
pc = htmlEntityInline.Transcoded[htmlEntityInline.Transcoded.End];
pc = htmlEntityInline.Transcoded.RuneAt(htmlEntityInline.Transcoded.End);
}
}
if (pc == 0)
if (pc.Value == 0)
{
pc = slice.PeekCharExtra(-1);
if (pc == delimiterChar && slice.PeekCharExtra(-2) != '\\')
pc = slice.PeekRuneExtra(-1);
// delimiterChar is BMP, so slice.PeekCharExtra(-2) is (a part of) the character two positions back.
if (pc == (Rune)delimiterChar && slice.PeekCharExtra(-2) != '\\')
{
// If we get here, we determined that either:
// a) there weren't enough delimiters in the delimiter run to satisfy the MinimumCount condition
@@ -179,12 +183,13 @@ public class EmphasisInlineParser : InlineParser, IPostInlineProcessor
return false;
}
char c = slice.CurrentChar;
Rune c = slice.CurrentRune;
// The following character is actually an entity, we need to decode it
if (HtmlEntityParser.TryParse(ref slice, out string? htmlString, out int htmlLength))
{
c = htmlString[0];
// Note: c is U+FFFD when decode error
Rune.DecodeFromUtf16(htmlString, out c, out _);
}
// Calculate Open-Close for current character
@@ -233,9 +238,9 @@ public class EmphasisInlineParser : InlineParser, IPostInlineProcessor
continue;
}
if ((closeDelimiter.Type & DelimiterType.Close) != 0 && closeDelimiter.DelimiterCount >= emphasisDesc.MinimumCount)
if ((closeDelimiter.Type & DelimiterType.Close) != 0)
{
while (true)
while (closeDelimiter.DelimiterCount >= emphasisDesc.MinimumCount)
{
// Now, look back in the stack (staying above stack_bottom and the openers_bottom for this delimiter type)
// for the first matching potential opener (“matching” means same delimiter).
@@ -245,8 +250,7 @@ public class EmphasisInlineParser : InlineParser, IPostInlineProcessor
{
var previousOpenDelimiter = delimiters[j];
var isOddMatch = ((closeDelimiter.Type & DelimiterType.Open) != 0 ||
(previousOpenDelimiter.Type & DelimiterType.Close) != 0) &&
var isOddMatch = ((closeDelimiter.Type & DelimiterType.Open) != 0 || (previousOpenDelimiter.Type & DelimiterType.Close) != 0) &&
previousOpenDelimiter.DelimiterCount != closeDelimiter.DelimiterCount &&
(previousOpenDelimiter.DelimiterCount + closeDelimiter.DelimiterCount) % 3 == 0 &&
(previousOpenDelimiter.DelimiterCount % 3 != 0 || closeDelimiter.DelimiterCount % 3 != 0);
@@ -357,7 +361,8 @@ public class EmphasisInlineParser : InlineParser, IPostInlineProcessor
}
// The current delimiters are matching
if (openDelimiter.DelimiterCount >= emphasisDesc.MinimumCount)
if (openDelimiter.DelimiterCount >= emphasisDesc.MinimumCount &&
closeDelimiter.DelimiterCount >= emphasisDesc.MinimumCount)
{
goto process_delims;
}

View File

@@ -3,6 +3,7 @@
// See the license.txt file in the project root for more information.
using Markdig.Helpers;
using Markdig.Renderers.Html;
using Markdig.Syntax;
using Markdig.Syntax.Inlines;
@@ -17,11 +18,22 @@ public class LinkInlineParser : InlineParser
/// <summary>
/// Initializes a new instance of the <see cref="LinkInlineParser"/> class.
/// </summary>
public LinkInlineParser()
public LinkInlineParser() : this(new LinkOptions())
{
}
/// <summary>
/// Initializes a new instance of the <see cref="LinkInlineParser"/> class.
/// </summary>
public LinkInlineParser(LinkOptions options)
{
Options = options ?? throw new ArgumentNullException(nameof(options));
OpeningCharacters = ['[', ']', '!'];
}
public readonly LinkOptions Options;
public override bool Match(InlineProcessor processor, ref StringSlice slice)
{
// The following methods are inspired by the "An algorithm for parsing nested emphasis and links"
@@ -137,6 +149,9 @@ public class LinkInlineParser : InlineParser
if (linkRef.CreateLinkInline != null)
{
link = linkRef.CreateLinkInline(state, linkRef, parent.FirstChild);
link.Span = new SourceSpan(parent.Span.Start, endPosition);
link.Line = parent.Line;
link.Column = parent.Column;
}
// Create a default link if the callback was not found
@@ -145,8 +160,8 @@ public class LinkInlineParser : InlineParser
// Inline Link
var linkInline = new LinkInline()
{
Url = HtmlHelper.Unescape(linkRef.Url),
Title = HtmlHelper.Unescape(linkRef.Title),
Url = HtmlHelper.Unescape(linkRef.Url, removeBackSlash: false),
Title = HtmlHelper.Unescape(linkRef.Title, removeBackSlash: false),
Label = label,
LabelSpan = labelSpan,
UrlSpan = linkRef.UrlSpan,
@@ -166,6 +181,11 @@ public class LinkInlineParser : InlineParser
linkInline.LocalLabel = localLabel;
}
if (Options.OpenInNewWindow)
{
linkInline.GetAttributes().AddPropertyIfNotExist("target", "_blank");
}
link = linkInline;
}
@@ -256,8 +276,8 @@ public class LinkInlineParser : InlineParser
// Inline Link
link = new LinkInline()
{
Url = HtmlHelper.Unescape(url),
Title = HtmlHelper.Unescape(title),
Url = HtmlHelper.Unescape(url, removeBackSlash: false),
Title = title is null ? null : HtmlHelper.Unescape(title, removeBackSlash: false),
IsImage = openParent.IsImage,
LabelSpan = openParent.LabelSpan,
UrlSpan = inlineState.GetSourcePositionFromLocalSpan(linkSpan),
@@ -382,11 +402,11 @@ public class LinkInlineParser : InlineParser
return new LinkInline()
{
TriviaBeforeUrl = wsBeforeLink,
Url = HtmlHelper.Unescape(url),
Url = HtmlHelper.Unescape(url, removeBackSlash: false),
UnescapedUrl = unescapedUrl,
UrlHasPointyBrackets = urlHasPointyBrackets,
TriviaAfterUrl = wsAfterLink,
Title = HtmlHelper.Unescape(title),
Title = HtmlHelper.Unescape(title, removeBackSlash: false),
UnescapedTitle = unescapedTitle,
TitleEnclosingCharacter = titleEnclosingCharacter,
TriviaAfterTitle = wsAfterTitle,

View File

@@ -0,0 +1,19 @@
// Copyright (c) Alexandre Mutel. All rights reserved.
// This file is licensed under the BSD-Clause 2 license.
// See the license.txt file in the project root for more information.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace Markdig.Parsers;
public class LinkOptions
{
/// <summary>
/// Should the link open in a new window when clicked (false by default)
/// </summary>
public bool OpenInNewWindow { get; set; }
}

View File

@@ -145,6 +145,7 @@ public class ListBlockParser : BlockParser
if (list.CountBlankLinesReset == 1 && listItem.ColumnWidth < 0)
{
state.Close(listItem);
list.CountBlankLinesReset = 0;
// Leave the list open
list.IsOpen = true;

View File

@@ -38,11 +38,10 @@ public abstract class ParserList<T, TState> : OrderedList<T> where T : notnull,
{
foreach (var openingChar in parser.OpeningCharacters)
{
if (!charCounter.ContainsKey(openingChar))
if (!charCounter.TryAdd(openingChar, 1))
{
charCounter[openingChar] = 0;
charCounter[openingChar]++;
}
charCounter[openingChar]++;
}
}
else

View File

@@ -0,0 +1,23 @@
// Copyright (c) Alexandre Mutel. All rights reserved.
// This file is licensed under the BSD-Clause 2 license.
// See the license.txt file in the project root for more information.
#if !(NETSTANDARD2_1_OR_GREATER || NET)
namespace System.Collections.Generic;
internal static class DictionaryExtensions
{
public static bool TryAdd<TKey, TValue>(this Dictionary<TKey, TValue> dictionary, TKey key, TValue value) where TKey : notnull
{
if (!dictionary.ContainsKey(key))
{
dictionary[key] = value;
return true;
}
return false;
}
}
#endif

View File

@@ -2,7 +2,7 @@
// This file is licensed under the BSD-Clause 2 license.
// See the license.txt file in the project root for more information.
#if !NETSTANDARD2_1_OR_GREATER
#if !NETCOREAPP2_1_OR_GREATER && !NETSTANDARD2_1_OR_GREATER
using System.Runtime.InteropServices;

View File

@@ -0,0 +1,38 @@
// Copyright (c) Alexandre Mutel. All rights reserved.
// This file is licensed under the BSD-Clause 2 license.
// See the license.txt file in the project root for more information.
#if !NET8_0_OR_GREATER
namespace System.Collections.Frozen;
// We're using a polyfill instead of conditionally referencing the package as the package is untested on older TFMs, and
// brings in a reference to System.Runtime.CompilerServices.Unsafe, which conflicts with our polyfills of that type.
internal sealed class FrozenDictionary<TKey, TValue> : Dictionary<TKey, TValue>
{
public FrozenDictionary(Dictionary<TKey, TValue> dictionary) : base(dictionary) { }
}
internal static class FrozenDictionaryExtensions
{
public static FrozenDictionary<TKey, TValue> ToFrozenDictionary<TKey, TValue>(this Dictionary<TKey, TValue> dictionary)
{
return new FrozenDictionary<TKey, TValue>(dictionary);
}
}
internal sealed class FrozenSet<T> : HashSet<T>
{
public FrozenSet(HashSet<T> set, IEqualityComparer<T> comparer) : base(set, comparer) { }
}
internal static class FrozenSetExtensions
{
public static FrozenSet<T> ToFrozenSet<T>(this HashSet<T> set, IEqualityComparer<T> comparer)
{
return new FrozenSet<T>(set, comparer);
}
}
#endif

View File

@@ -18,6 +18,9 @@ internal sealed class NotNullWhenAttribute : Attribute
[AttributeUsage(AttributeTargets.Property | AttributeTargets.Field | AttributeTargets.Parameter, Inherited = false)]
internal sealed class AllowNullAttribute : Attribute { }
[AttributeUsage(AttributeTargets.Field | AttributeTargets.Parameter | AttributeTargets.Property | AttributeTargets.ReturnValue, Inherited = false)]
internal sealed class MaybeNullAttribute : Attribute { }
#endif
#if !NET5_0_OR_GREATER

File diff suppressed because it is too large Load Diff

View File

@@ -26,6 +26,8 @@ internal static class SearchValues
internal abstract class SearchValues<T>
{
public abstract bool Contains(T value);
public abstract int IndexOfAny(ReadOnlySpan<char> span);
public abstract int IndexOfAnyExcept(ReadOnlySpan<char> span);
@@ -52,6 +54,10 @@ internal sealed class PreNet8CompatSearchValues : SearchValues<char>
}
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public override bool Contains(char value) =>
value < 128 ? _ascii[value] : (_nonAscii is { } nonAscii && nonAscii.Contains(value));
public override int IndexOfAny(ReadOnlySpan<char> span)
{
if (_nonAscii is null)

Some files were not shown because too many files have changed in this diff Show More