In the Chinese environment, some specific symbols are not parsed correctly at the beginning and the end. #491

Closed
opened 2026-01-29 14:38:03 +00:00 by claunia · 4 comments
Owner

Originally created by @pengqian089 on GitHub (Jan 17, 2022).

Emphasis is placed on displaying certain text that will not be parsed correctly when starting and ending with or .

var md = "不如人意的**《李茶的姑妈》**已经为开心麻花影业敲响警钟";
var pipeline = new MarkdownPipelineBuilder()
    .UsePipeTables()
    .UseTaskLists()
    .UseEmphasisExtras()
    .UseAutoIdentifiers()
    .UseAdvancedExtensions()
    .DisableHtml()
    .Build();
var html = Markdown.ToHtml(md, pipeline);
Console.WriteLine(html);

Result:

<p>不如人意的**《李茶的姑妈》**已经为开心麻花影业敲响警钟</p>

Expected:

<p>不如人意的<strong>《李茶的姑妈》</strong>已经为开心麻花影业敲响警钟</p>

I found that Github is not parsed correctly either.

**《李茶的姑妈》**
**李茶的姑妈**

**《strong》**
**strong**

《李茶的姑妈》
李茶的姑妈

《strong》
strong

Sorry, my English is very poor.
I am still working on my English, but it's getting better.
I use it every day now.

Originally created by @pengqian089 on GitHub (Jan 17, 2022). Emphasis is placed on displaying certain text that will not be parsed correctly when starting and ending with `《` or `》`. ``` cs var md = "不如人意的**《李茶的姑妈》**已经为开心麻花影业敲响警钟"; var pipeline = new MarkdownPipelineBuilder() .UsePipeTables() .UseTaskLists() .UseEmphasisExtras() .UseAutoIdentifiers() .UseAdvancedExtensions() .DisableHtml() .Build(); var html = Markdown.ToHtml(md, pipeline); Console.WriteLine(html); ``` Result: ``` html <p>不如人意的**《李茶的姑妈》**已经为开心麻花影业敲响警钟</p> ``` Expected: ```html <p>不如人意的<strong>《李茶的姑妈》</strong>已经为开心麻花影业敲响警钟</p> ``` I found that Github is not parsed correctly either. ``` markdown **《李茶的姑妈》** **李茶的姑妈** **《strong》** **strong** ``` ------ **《李茶的姑妈》** **李茶的姑妈** **《strong》** **strong** Sorry, my English is very poor. I am still working on my English, but it's getting better. I use it every day now.
Author
Owner

@MihaZupan commented on GitHub (Jan 17, 2022):

It's not so much the 《 》 characters, but that emphasis requires a space before it:

a**《text》**b

a **《text》** b

See babelmark - most CommonMark parsers behave the same way as Markdig

@MihaZupan commented on GitHub (Jan 17, 2022): It's not so much the `《 》` characters, but that emphasis requires a space before it: ``` a**《text》**b a **《text》** b ``` See [babelmark](https://babelmark.github.io/?text=a**%E3%80%8Atext%E3%80%8B**b%0A%0Aa+**%E3%80%8Atext%E3%80%8B**+b) - most CommonMark parsers behave the same way as Markdig
Author
Owner

@pengqian089 commented on GitHub (Jan 17, 2022):

It's not so much the 《 》 characters, but that emphasis requires a space before it:

a**《text》**b

a **《text》** b

See babelmark - most CommonMark parsers behave the same way as Markdig

Thanks.
Can this be considered a bug?
I can be parsed correctly when I use Typora.

@pengqian089 commented on GitHub (Jan 17, 2022): > It's not so much the `《 》` characters, but that emphasis requires a space before it: > > ``` > a**《text》**b > > a **《text》** b > ``` > > See [babelmark](https://babelmark.github.io/?text=a**%E3%80%8Atext%E3%80%8B**b%0A%0Aa+**%E3%80%8Atext%E3%80%8B**+b) - most CommonMark parsers behave the same way as Markdig Thanks. Can this be considered a bug? I can be parsed correctly when I use Typora.
Author
Owner

@MihaZupan commented on GitHub (Jan 17, 2022):

It is not a bug - Markdig is behaving according to the CommonMark specification here.

While CommonMark allows intraword emphasis like foo**bar**baz, it has restrictions when the emphasis is preceded/followed by punctuation.

See left-flanking-delimiter-run from the spec.

A left-flanking delimiter run is a delimiter run that is (1) not followed by Unicode whitespace, and either (2a) not followed by a Unicode punctuation character, or (2b) followed by a Unicode punctuation character and preceded by Unicode whitespace or a Unicode punctuation character. For purposes of this definition, the beginning and the end of the line count as Unicode whitespace.

Note (2b) followed by a Unicode punctuation character and preceded by Unicode whitespace or a Unicode punctuation character. In your case, falls under the Open Punctuation Unicode category, so CommonMark requires that emphasis be preceded by a whitespace character.

Similar holds for that falls under Close Punctuation. The right-flanking-delimiter-run definition requires that it is followed by either a whitespace / another punctuation character.

Following these rules, the following are valid:

**《text》**
a **《text》**
a **《text》**
a **《text》** b
a **《text》**.

But

a**《text》**b

is not.

I can be parsed correctly when I use Typora.

Typora may be using a more relaxed definition of Markdown, but isn't compliant to the spec here.

@MihaZupan commented on GitHub (Jan 17, 2022): It is not a bug - Markdig is behaving according to the CommonMark specification here. While CommonMark allows intraword emphasis like `foo**bar**baz`, it has restrictions when the emphasis is preceded/followed by punctuation. See [left-flanking-delimiter-run](https://spec.commonmark.org/0.30/#left-flanking-delimiter-run) from the spec. > A left-flanking delimiter run is a delimiter run that is (1) not followed by Unicode whitespace, and either (2a) not followed by a Unicode punctuation character, or (2b) followed by a Unicode punctuation character and preceded by Unicode whitespace or a Unicode punctuation character. For purposes of this definition, the beginning and the end of the line count as Unicode whitespace. Note `(2b) followed by a Unicode punctuation character and preceded by Unicode whitespace or a Unicode punctuation character`. In your case, `《` falls under the `Open Punctuation` Unicode category, so CommonMark requires that emphasis be preceded by a whitespace character. Similar holds for `》` that falls under `Close Punctuation`. The [right-flanking-delimiter-run](https://spec.commonmark.org/0.30/#right-flanking-delimiter-run) definition requires that it is followed by either a whitespace / another punctuation character. Following these rules, the following are valid: ``` **《text》** a **《text》** a **《text》** a **《text》** b a **《text》**. ``` But ``` a**《text》**b ``` is not. > I can be parsed correctly when I use Typora. Typora may be using a more relaxed definition of Markdown, but isn't compliant to the spec here.
Author
Owner

@pengqian089 commented on GitHub (Jan 18, 2022):

Thank you for your reply and answer.

I'll get to work on those characters next.

@pengqian089 commented on GitHub (Jan 18, 2022): Thank you for your reply and answer. I'll get to work on those characters next.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/markdig#491