Converting emphasis with angled quotation marks #541

New Issue

claunia · 2026-01-29T14:39:14Z

claunia commented

2026-01-29 14:39:14 +00:00

Originally created by @alexeyfv on GitHub (Jun 19, 2022).

Originally assigned to: @MihaZupan on GitHub.

Hi,

I'm trying to convert a document which contains "«_word_»" string. As you can see on example below, the parser cannot recognize it as emphasis:

var html1 = Markdown.ToHtml("«_word_»"); // "<p>«_word_»</p>\n"

But "_«word»_" has been converted ok:

var html2 = Markdown.ToHtml("_«word»_"); // "<p><em>«word»</em></p>\n"

I'm using Markdig 0.30.2. Is it a bug? If yes, is there any workaround to avoid the issue? Thanks.

Originally created by @alexeyfv on GitHub (Jun 19, 2022). Originally assigned to: @MihaZupan on GitHub. Hi, I'm trying to convert a document which contains `"«_word_»"` string. As you can see on example below, the parser cannot recognize it as emphasis: ``` csharp var html1 = Markdown.ToHtml("«_word_»"); // "«_word_»\n" ``` But `"_«word»_"` has been converted ok: ``` csharp var html2 = Markdown.ToHtml("_«word»_"); // "«word»\n" ``` I'm using Markdig 0.30.2. Is it a bug? If yes, is there any workaround to avoid the issue? Thanks.

claunia added the question bug labels 2026-01-29 14:39:14 +00:00

claunia closed this issue

2026-01-29 14:39:15 +00:00

claunia commented

2026-01-29 14:39:16 +00:00

@xoofx commented on GitHub (Jun 19, 2022):

Oh, interesting... you might hit a specific case of the specs, as there is a split between the results of the different CommonMark parsers here

So the spec about emphasis is here and I would think that it is not a bug as per the rule:

A left-flanking delimiter run is a delimiter run that is (1) not followed by Unicode whitespace, and either (2a) not followed by a Unicode punctuation character, or (2b) followed by a Unicode punctuation character and preceded by Unicode whitespace or a Unicode punctuation character. For purposes of this definition, the beginning and the end of the line count as Unicode whitespace.

I haven't checked but it is high likely that the character « and » are Unicode punctuation character.

cc: @MihaZupan thoughts?

@xoofx commented on GitHub (Jun 19, 2022): Oh, interesting... you might hit a specific case of the specs, as there is a split between the results of the different CommonMark parsers [here](https://babelmark.github.io/?text=%C2%AB_word_%C2%BB) So the spec about emphasis is [here](https://spec.commonmark.org/0.30/#emphasis-and-strong-emphasis) and I would think that it is not a bug as per the rule: > A [left-flanking delimiter run](https://spec.commonmark.org/0.30/#left-flanking-delimiter-run) is a [delimiter run](https://spec.commonmark.org/0.30/#delimiter-run) that is (1) not followed by [Unicode whitespace](https://spec.commonmark.org/0.30/#unicode-whitespace), and either (2a) not followed by a [Unicode punctuation character](https://spec.commonmark.org/0.30/#unicode-punctuation-character), or (2b) followed by a [Unicode punctuation character](https://spec.commonmark.org/0.30/#unicode-punctuation-character) and preceded by [Unicode whitespace](https://spec.commonmark.org/0.30/#unicode-whitespace) or a [Unicode punctuation character](https://spec.commonmark.org/0.30/#unicode-punctuation-character). For purposes of this definition, the beginning and the end of the line count as Unicode whitespace. I haven't checked but it is high likely that the character `«` and `»` are [Unicode punctuation character](https://spec.commonmark.org/0.30/#unicode-punctuation-character). cc: @MihaZupan thoughts?

claunia commented

2026-01-29 14:39:17 +00:00

@MihaZupan commented on GitHub (Jul 17, 2022):

This is a bug, our CheckUnicodeCategory helper is not matching what CommonMark defines as Unicode Whitespace and Unicode punctuation.

Specifically, we are off in the 128-255 range (where « and » are) and with our Unicode space categories.

  11 ('♂') Space should be False
 133 ('?') Space should be False
 161 ('¡') Punctuation should be True
 167 ('§') Punctuation should be True
 171 ('«') Punctuation should be True
 182 ('¶') Punctuation should be True
 183 ('·') Punctuation should be True
 187 ('»') Punctuation should be True
 191 ('¿') Punctuation should be True
8232 ('?') Space should be False
8233 ('?') Space should be False

IsWhitespace also isn't matching the spec rn.

@MihaZupan commented on GitHub (Jul 17, 2022): This is a bug, our [`CheckUnicodeCategory`](https://github.com/xoofx/markdig/blob/5f80d86265fbcfccc0c452ca88c46f35b31c4dfa/src/Markdig/Helpers/CharHelper.cs#L172) helper is not matching what CommonMark defines as [Unicode Whitespace](https://spec.commonmark.org/0.30/#unicode-whitespace-character) and [Unicode punctuation](https://spec.commonmark.org/0.30/#unicode-punctuation-character). Specifically, we are off in the 128-255 range (where « and » are) and with our Unicode space categories. ``` 11 ('♂') Space should be False 133 ('?') Space should be False 161 ('¡') Punctuation should be True 167 ('§') Punctuation should be True 171 ('«') Punctuation should be True 182 ('¶') Punctuation should be True 183 ('·') Punctuation should be True 187 ('»') Punctuation should be True 191 ('¿') Punctuation should be True 8232 ('?') Space should be False 8233 ('?') Space should be False ``` [`IsWhitespace`](https://github.com/xoofx/markdig/blob/5f80d86265fbcfccc0c452ca88c46f35b31c4dfa/src/Markdig/Helpers/CharHelper.cs#L145) also isn't matching the spec rn.

Sign in to join this conversation.

Branches Tags

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: starred/markdig#541