Possible bug with string *bob*&_margaret_ #396

Closed
opened 2026-01-29 14:35:43 +00:00 by claunia · 2 comments
Owner

Originally created by @xakpc on GitHub (Aug 21, 2020).

Hey!

I switched from CommonMark to Markdig to get more control over what markdown parse and what ignore and find out that one of my unit test failed.

Here is TestCase
[TestCase("*bob*&_margaret_", "<em>bob</em>&amp;<em>margaret</em>")]

CommonMark result: <em>bob</em>&amp;<em>margaret</em>
Markdig result: <em>bob</em>&amp;_margaret_

Is this a bug?

P.S. Check how it works in Github: bob&margaret

Originally created by @xakpc on GitHub (Aug 21, 2020). Hey! I switched from CommonMark to Markdig to get more control over what markdown parse and what ignore and find out that one of my unit test failed. Here is TestCase `[TestCase("*bob*&_margaret_", "<em>bob</em>&amp;<em>margaret</em>")]` CommonMark result: `<em>bob</em>&amp;<em>margaret</em>` Markdig result: `<em>bob</em>&amp;_margaret_` Is this a bug? P.S. Check how it works in Github: *bob*&_margaret_
claunia added the bugPR Welcome! labels 2026-01-29 14:35:43 +00:00
Author
Owner

@MihaZupan commented on GitHub (Aug 21, 2020):

Looks like this could be a bug: BabelMark

@MihaZupan commented on GitHub (Aug 21, 2020): Looks like this could be a bug: [BabelMark](https://babelmark.github.io/?text=*bob*%26_margaret_)
Author
Owner

@iskcal commented on GitHub (Aug 25, 2020):

I have found the problem may be at the function CheckUnicodeCategory of class CharHelper from Line 174. The character & is not regarded as a valid punctuation in this function, because & may be used for HTML entities or print unicode. Markdig considers that the parts &_margaret_ is a word and _ has no effect in a word. If & is replaced to other punctuations, it would be parsed properly. For example, *bob*+_margaret_ would be transformed into <em>bob</em>+<em>margaret</em>.

Interestingly, the code of this function seems to come from CommomMark.NET, but the behavior of CommonMark.NET is correct. If we don't want to change the function CheckUnicodeCategory, we can add a function to recoginize what & stands for, and it might be a good idea to see how CommonMark.NET implements it.

@iskcal commented on GitHub (Aug 25, 2020): I have found the problem may be at the function `CheckUnicodeCategory` of class `CharHelper` from Line 174. The character & is not regarded as a valid punctuation in this function, because & may be used for HTML entities or print unicode. Markdig considers that the parts `&_margaret_` is a word and `_` has no effect in a word. If & is replaced to other punctuations, it would be parsed properly. For example, `*bob*+_margaret_` would be transformed into `<em>bob</em>+<em>margaret</em>`. Interestingly, the code of this function seems to come from CommomMark.NET, but the behavior of CommonMark.NET is correct. If we don't want to change the function `CheckUnicodeCategory`, we can add a function to recoginize what & stands for, and it might be a good idea to see how CommonMark.NET implements it.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/markdig#396