Parser: Inline HTML combined with code block (if no blank line) #430

Closed
opened 2026-01-29 14:36:31 +00:00 by claunia · 2 comments
Owner

Originally created by @paultechguy on GitHub (Feb 2, 2021).

When using the parser, if a code block immediately follows inline HTML, they are both combined into a single HTML block. See end for info on the CommonMark spec. The CommonMark online editor handles both cases below fine, parsing things as two separate blocks. Markdig for example, if markdown is:

    <img src="http://domain.com" />
    ```html
      <strong>Foo</strong>
    ```

The above is returned by the parser's Parse method as one whole HtmlBlock. The way to fix this is to separate the inline HTML by blank line(s):

    <img src="http://domain.com" />
    
    ```html
      <strong>Foo</strong>
    ```

The above is returned as two separate blocks by the Markdig parser.

Is there a way for the Markdig parser to return two separate blocks regardless of format?

The CommonMark spec indicates:
A fenced code block may interrupt a paragraph, and does not require a blank line either before or after.

Originally created by @paultechguy on GitHub (Feb 2, 2021). When using the parser, if a code block immediately follows inline HTML, they are both combined into a single HTML block. See end for info on the CommonMark spec. The CommonMark online editor handles both cases below fine, parsing things as two separate blocks. Markdig for example, if markdown is: ```html <img src="http://domain.com" /> ```html <strong>Foo</strong> ``` ``` The above is returned by the parser's *Parse* method as one whole *HtmlBlock*. The way to fix this is to separate the inline HTML by blank line(s): ```html <img src="http://domain.com" /> ```html <strong>Foo</strong> ``` ``` The above is returned as two separate blocks by the Markdig parser. **Is there a way for the Markdig parser to return two separate blocks regardless of format?** The [CommonMark spec](https://spec.commonmark.org/0.29/#fenced-code-blocks) indicates: *A fenced code block may interrupt a paragraph, and does not require a blank line either before or after.*
claunia added the questioninvalid labels 2026-01-29 14:36:31 +00:00
Author
Owner

@xoofx commented on GitHub (Feb 2, 2021):

Is there a way for the Markdig parser to return two separate blocks regardless of format?
The CommonMark spec indicates:
A fenced code block may interrupt a paragraph, and does not require a blank line either before or after.

Nope, but it's indeed a corner case. I believe most implementers took the HTML rule as superseding the fenced block rule.

Almost all CommonMark implementations are following the same behavior than markdig. Seems that GitHub flavored choose a different path on that particular case, which is unfortunate...

So we can't really change that, unless you are willing to make a PR with an extension to add an option to turn that off, but I'm not sure how easy it is to fit that in...

@xoofx commented on GitHub (Feb 2, 2021): > Is there a way for the Markdig parser to return two separate blocks regardless of format? > The CommonMark spec indicates: > A fenced code block may interrupt a paragraph, and does not require a blank line either before or after. Nope, but it's indeed a corner case. I believe most implementers took the HTML rule as superseding the fenced block rule. Almost all CommonMark implementations are following the [same behavior](https://babelmark.github.io/?text=%3Cimg+src%3D%22http%3A%2F%2Fdomain.com%22+%2F%3E%0A%60%60%60html%0A++++++%3Cstrong%3EFoo%3C%2Fstrong%3E%0A%60%60%60) than markdig. Seems that GitHub flavored choose a different path on that particular case, which is unfortunate... So we can't really change that, unless you are willing to make a PR with an extension to add an option to turn that off, but I'm not sure how easy it is to fit that in...
Author
Owner

@paultechguy commented on GitHub (Feb 2, 2021):

To add more context from my testing. I've tried a lot of online parsers and most of them tend to be sensitive to the type of inline HTML that appears before a code block. If the HTML is a CSS block element (div, img, p) then the parsing detects the HTML and code block as a single HTML Block. If the HTML is a CSS inline element (span, a, em), then the parsing detects the HTML and code block separately, as an inline block and fenced code block respectively. This sort of makes sense. The Markdig parser is also sensitive to the HTML element types as I describe. A flag to toggle the handling of this might be useful.

@paultechguy commented on GitHub (Feb 2, 2021): To add more context from my testing. I've tried a lot of online parsers and most of them tend to be sensitive to the type of inline HTML that appears before a code block. If the HTML is a CSS block element (div, img, p) then the parsing detects the HTML and code block as a single *HTML Block*. If the HTML is a CSS inline element (span, a, em), then the parsing detects the HTML and code block separately, as an *inline block* and *fenced code block* respectively. This sort of makes sense. The Markdig parser is also sensitive to the HTML element types as I describe. A flag to toggle the handling of this might be useful.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/markdig#430