HTML block parsed incorreclty? #555

Closed
opened 2026-01-29 14:39:36 +00:00 by claunia · 2 comments
Owner

Originally created by @MihailsKuzmins on GitHub (Aug 29, 2022).

I am referring to this issue https://github.com/MyNihongo/MudBlazor.Markdown/issues/117

Please refer to this sample:

var pipeline = new MarkdownPipelineBuilder().UseAdvancedExtensions().Build();

const string value =
@"<details>
  <summary markdown=""span"">Release 1.0.1</summary>

**New**
-  Error fixes.

</details>";

var parsedText = Markdown.Parse(value, pipeline);

The input is this HTML string, but Markdig returns 4 elements (<detals> + <summary>, some text in the middle, closing tag for </details>). I would expect it to return a single element which is just the HTML HtmlBlock, but maybe I am wrong in my assumption.

Could you please comment whether or not this behaviour is correct?

image

Originally created by @MihailsKuzmins on GitHub (Aug 29, 2022). I am referring to this issue https://github.com/MyNihongo/MudBlazor.Markdown/issues/117 Please refer to this sample: ```cs var pipeline = new MarkdownPipelineBuilder().UseAdvancedExtensions().Build(); const string value = @"<details> <summary markdown=""span"">Release 1.0.1</summary> **New** - Error fixes. </details>"; var parsedText = Markdown.Parse(value, pipeline); ``` The input is this HTML string, but Markdig returns 4 elements (`<detals>` + `<summary>`, some text in the middle, closing tag for `</details>`). I would expect it to return a single element which is just the HTML `HtmlBlock`, but maybe I am wrong in my assumption. Could you please comment whether or not this behaviour is correct? ![image](https://user-images.githubusercontent.com/47413092/187207204-609aeb74-7480-45f7-8c69-6ddc2b02426d.png)
Author
Owner

@MihaZupan commented on GitHub (Aug 29, 2022):

Markdig is parsing according to the CommonMark spec here - HTML parsing is limited to what the spec defines.
HtmlBlock is more of a "this is the part we can't treat as Markdown" rather than a full HTML AST.
Consider this example where foo is not treated as Markdown, while bar is (because of the extra blank line).

In this case the HtmlBlock starts with <details> and ends on the first blank line.

If you want an actual HTML syntax tree, pass Markdig's output to a library like AngleSharp.

Regarding the HTML Markdig generates, it matches what all the other CommonMark-compliant parsers do.

@MihaZupan commented on GitHub (Aug 29, 2022): Markdig is parsing according to the [CommonMark spec](https://spec.commonmark.org/0.30/#html-blocks) here - HTML parsing is limited to what the spec defines. `HtmlBlock` is more of a "this is the part we can't treat as Markdown" rather than a full HTML AST. Consider [this example](https://babelmark.github.io/?text=%3Cdiv%3E%0A*foo*%0A%0A%3Cdiv%3E%0A%0A*bar*) where `foo` is not treated as Markdown, while `bar` is (because of the extra blank line). In this case the `HtmlBlock` starts with `<details>` and ends on the first blank line. If you want an actual HTML syntax tree, pass Markdig's output to a library like `AngleSharp`. Regarding the HTML Markdig generates, it [matches what all the other CommonMark-compliant parsers do](https://babelmark.github.io/?text=%3Cdetails%3E%0A++%3Csummary+markdown%3D%22%22span%22%22%3ERelease+1.0.1%3C%2Fsummary%3E%0A%0A**New**%0A-++Error+fixes.%0A%0A%3C%2Fdetails%3E).
Author
Owner

@MihailsKuzmins commented on GitHub (Aug 30, 2022):

OK, thanks I just wanted to confirm it. Indeed the empty line seems to be the end of the html block according to the link you sent.
image

@MihailsKuzmins commented on GitHub (Aug 30, 2022): OK, thanks I just wanted to confirm it. Indeed the empty line seems to be the end of the html block according to the link you sent. ![image](https://user-images.githubusercontent.com/47413092/187346126-473e3afb-7ad6-4ff9-b6ff-23df353bfc2c.png)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/markdig#555