Understanding the rendering of related MarkdownObjects #564

Closed
opened 2026-01-29 14:39:50 +00:00 by claunia · 12 comments
Owner

Originally created by @pm64 on GitHub (Sep 22, 2022).

I'm experimenting heavily with Markdig to better understand some of the magic it does. Recently I've observed some confusing results, so posting here for clarification, and to confirm the behavior is expected.

Consider the following:

	var markdown = "- test list item";

	MarkdownPipeline pipeline = new MarkdownPipelineBuilder()
		.UsePreciseSourceLocation()
		.Build();

	var writer = new StringWriter();

	var renderer = new HtmlRenderer(writer);
	pipeline.Setup(renderer);

	MarkdownDocument document = Markdown.Parse(markdown, pipeline);

	foreach (MarkdownObject obj in document.Descendants())
	{
		renderer.Render(obj);
		string html = writer.ToString();
		writer.GetStringBuilder().Length = 0; // Reset writer

		Console.WriteLine($"{obj.GetType().Name} HTML: [{html}]\n");		
	}

Note: brackets used in output to clearly show newlines.

Output
Note: lines in curly braces ("{ITEM A}", etc) have been added for reference and are not part of the actual output.

{ITEM A}
ListBlock HTML: [<ul>
<li>test list item</li>
</ul>
]

{ITEM B}
ListItemBlock HTML: [<p>test list item</p>
]

{ITEM C}
ParagraphBlock HTML: [<p>test list item</p>
]

{ITEM D}
LiteralInline HTML: [test list item]

Each individual MarkdownObject looks great -- but when compared to one another, there are some apparent inconsistencies:

  1. The ListBlock HTML includes its outer <ul> tag. But the ListItemBlock HTML does not include its outer <li> tag. Why is that?
  2. The list items in the ListBlock HTML (item A) are rendered differently (without the <p> tag in list items) than the ListItemBlock and ParagraphBlock (which include the <p> tag). Should the list item fragment not be rendered identically in each case?
Originally created by @pm64 on GitHub (Sep 22, 2022). I'm experimenting heavily with Markdig to better understand some of the magic it does. Recently I've observed some confusing results, so posting here for clarification, and to confirm the behavior is expected. Consider the following: ``` var markdown = "- test list item"; MarkdownPipeline pipeline = new MarkdownPipelineBuilder() .UsePreciseSourceLocation() .Build(); var writer = new StringWriter(); var renderer = new HtmlRenderer(writer); pipeline.Setup(renderer); MarkdownDocument document = Markdown.Parse(markdown, pipeline); foreach (MarkdownObject obj in document.Descendants()) { renderer.Render(obj); string html = writer.ToString(); writer.GetStringBuilder().Length = 0; // Reset writer Console.WriteLine($"{obj.GetType().Name} HTML: [{html}]\n"); } ``` *Note: brackets used in output to clearly show newlines.* Output *Note: lines in curly braces ("{ITEM A}", etc) have been added for reference and are not part of the actual output.* ``` {ITEM A} ListBlock HTML: [<ul> <li>test list item</li> </ul> ] {ITEM B} ListItemBlock HTML: [<p>test list item</p> ] {ITEM C} ParagraphBlock HTML: [<p>test list item</p> ] {ITEM D} LiteralInline HTML: [test list item] ``` Each individual MarkdownObject looks great -- but when compared to one another, there are some apparent inconsistencies: 1. The ListBlock HTML includes its outer \<ul\> tag. But the ListItemBlock HTML does not include its outer \<li\> tag. Why is that? 2. The list items in the ListBlock HTML (item A) are rendered differently (without the \<p\> tag in list items) than the ListItemBlock and ParagraphBlock (which include the \<p\> tag). Should the list item fragment not be rendered identically in each case?
claunia added the question label 2026-01-29 14:39:50 +00:00
Author
Owner

@xoofx commented on GitHub (Sep 22, 2022):

When the rendering was created, the focus was to render a whole document, not to render pieces of them, so I think that I took some shortcuts in the renderers so that e.g I didn't have to create a separate renderer for ListItemBlock for example but inlined the loop in the ListBlock (here)

@xoofx commented on GitHub (Sep 22, 2022): When the rendering was created, the focus was to render a whole document, not to render pieces of them, so I think that I took some shortcuts in the renderers so that e.g I didn't have to create a separate renderer for `ListItemBlock` for example but inlined the loop in the `ListBlock` ([here](https://github.com/xoofx/markdig/blob/bce4b70dc69803b9f45f00c542f6715bc6934981/src/Markdig/Renderers/Html/ListRenderer.cs#L53-L69))
Author
Owner

@pm64 commented on GitHub (Sep 22, 2022):

Understood, thanks @xoofx for the explanation. Should we consider this a defect, or expected behavior? In the example code, I'm not technically seeking to render a piece of a document. I'm rendering a whole document, then trying to correlate sections of the resultant HTML back to the Markdown source. Maybe there's a better way?

@pm64 commented on GitHub (Sep 22, 2022): Understood, thanks @xoofx for the explanation. Should we consider this a defect, or expected behavior? In the example code, I'm not technically seeking to render a piece of a document. I'm rendering a whole document, then trying to correlate sections of the resultant HTML back to the Markdown source. Maybe there's a better way?
Author
Owner

@xoofx commented on GitHub (Sep 23, 2022):

I'm rendering a whole document, then trying to correlate sections of the resultant HTML back to the Markdown source. Maybe there's a better way?

Couldn't you attach an id attribute to each Markdown object so that you can correlate it back more easily? For example there is an extension Pragma line that was used to allow the Visual Studio extension MarkdownEditor to sync the scrolling between the original markdown document and the resulting html.

It might not be enough for your usecase but it shows that it's possible to do something similar.

@xoofx commented on GitHub (Sep 23, 2022): > I'm rendering a whole document, then trying to correlate sections of the resultant HTML back to the Markdown source. Maybe there's a better way? Couldn't you attach an id attribute to each Markdown object so that you can correlate it back more easily? For example there is an extension [Pragma line](https://github.com/xoofx/markdig/blob/master/src/Markdig/Extensions/PragmaLines/PragmaLineExtension.cs) that was used to allow the Visual Studio extension [MarkdownEditor](https://github.com/madskristensen/MarkdownEditor2022) to sync the scrolling between the original markdown document and the resulting html. It might not be enough for your usecase but it shows that it's possible to do something similar.
Author
Owner

@pm64 commented on GitHub (Sep 23, 2022):

Thanks @xoofx, I wasn't aware of the Pragma line extension. On the surface, this information does not appear to address my use case. However, I'm a little confused by the extension's behavior.

I tried running the following test:

MarkdownPipeline pipeline = new MarkdownPipelineBuilder()
.UsePragmaLines()
.Build();

var md = @"- Test item";

var html = Markdown.ToHtml(md, pipeline);

The result is as follows:

<ul id="pragma-line-0">
<li id="pragma-line-0">Test item</li>
</ul>

According to the HTML 5 spec, the id attribute value "must be unique in a document". As such, the HTML generated by this extension would appear to be invalid, right?

Just want to make sure I'm not misusing or misunderstanding the extension.

@pm64 commented on GitHub (Sep 23, 2022): Thanks @xoofx, I wasn't aware of the Pragma line extension. On the surface, this information does not appear to address my use case. However, I'm a little confused by the extension's behavior. I tried running the following test: ``` MarkdownPipeline pipeline = new MarkdownPipelineBuilder() .UsePragmaLines() .Build(); var md = @"- Test item"; var html = Markdown.ToHtml(md, pipeline); ``` The result is as follows: ``` <ul id="pragma-line-0"> <li id="pragma-line-0">Test item</li> </ul> ``` According to the [HTML 5 spec](https://www.w3.org/TR/html4/struct/global.html#h-7.5.2), the id attribute value "must be unique in a document". As such, the HTML generated by this extension would appear to be invalid, right? Just want to make sure I'm not misusing or misunderstanding the extension.
Author
Owner

@xoofx commented on GitHub (Sep 23, 2022):

Sorry, I forgot the details but you need to add the extension .UsePreciseSourceLocation() to your pipeline after pragma.

Even if it is not perfect or there are a few glitches here and there, you won't likely find the exact solution to your requirement, but you can find some inspiration maybe in this extension (e.g here the extension is using lines, you could use a unique id per block instead). Matching the MarkdownObject with the just raw HTML output is unlikely going to be enough safe.

@xoofx commented on GitHub (Sep 23, 2022): Sorry, I forgot the details but you need to add the extension `.UsePreciseSourceLocation()` to your pipeline after pragma. Even if it is not perfect or there are a few glitches here and there, you won't likely find the exact solution to your requirement, but you can find some inspiration maybe in this extension (e.g here the extension is using lines, you could use a unique id per block instead). Matching the MarkdownObject with the just raw HTML output is unlikely going to be enough safe.
Author
Owner

@pm64 commented on GitHub (Sep 23, 2022):

Thank you @xoofx, I agree a solution seems within reach and appreciate you working through this with me. I tried adding .UsePreciseSourceLocation() to the pipeline after pragma and it did not affect the result, do you recall if anything else is needed to render unique ids with that extension?

@pm64 commented on GitHub (Sep 23, 2022): Thank you @xoofx, I agree a solution seems within reach and appreciate you working through this with me. I tried adding `.UsePreciseSourceLocation()` to the pipeline after pragma and it did not affect the result, do you recall if anything else is needed to render unique ids with that extension?
Author
Owner

@xoofx commented on GitHub (Sep 23, 2022):

and it did not affect the result

You have only one line, so the line index is the same 🙂

Copy the extension in your codebase, don't use block.Line but your own unique id per MarkdownObject, use a dictionary that map to this id, profit.

@xoofx commented on GitHub (Sep 23, 2022): > and it did not affect the result You have only one line, so the line index is the same 🙂 Copy the extension in your codebase, don't use `block.Line` but your own unique id per MarkdownObject, use a dictionary that map to this id, profit.
Author
Owner

@pm64 commented on GitHub (Sep 23, 2022):

I see, so use something like block.GetHashCode() instead of block.Line .. makes sense.

Is it possible to access the Markdown document source text from within an IMarkdownExtension? If so, I can solve my problem by embedding Markdown fragments in the HTML as attribute values, like <li data-md="- Test item">Test item</li>.

Edit: I see it's possible to just pass the Markdown source to the extension as an option, just want to confirm the IMarkdownExtension doesn't already have access to it somehow.

@pm64 commented on GitHub (Sep 23, 2022): I see, so use something like block.GetHashCode() instead of block.Line .. makes sense. Is it possible to access the Markdown document source text from within an IMarkdownExtension? If so, I can solve my problem by embedding Markdown fragments in the HTML as attribute values, like `<li data-md="- Test item">Test item</li>`. Edit: I see it's possible to just pass the Markdown source to the extension as an option, just want to confirm the IMarkdownExtension doesn't already have access to it somehow.
Author
Owner

@xoofx commented on GitHub (Sep 23, 2022):

I see, so use something like block.GetHashCode() instead of block.Line .. makes sense.

Not hashcode, it's not unique and not funny to read. 🙂 but a unique id (use a Dictionary<MarkdownObject, int> and maintain it when you visit the graph)

Is it possible to access the Markdown document source text from within an IMarkdownExtension? If so, I can solve my problem by embedding Markdown fragments in the HTML as attribute values, like

  • Test item
  • .

    You can access the MarkdownDocument (but less the source text, I mean, it's possible but complicated, or maybe through the source span/location and extract the markdown source from that), that's what most extensions are doing (e.g the Pragma line just above). You can add attributes directly yourself with an extension.

    @xoofx commented on GitHub (Sep 23, 2022): > I see, so use something like block.GetHashCode() instead of block.Line .. makes sense. Not hashcode, it's not unique and not funny to read. 🙂 but a unique id (use a Dictionary<MarkdownObject, int> and maintain it when you visit the graph) > Is it possible to access the Markdown document source text from within an IMarkdownExtension? If so, I can solve my problem by embedding Markdown fragments in the HTML as attribute values, like <li data-md="- Test item">Test item</li>. You can access the MarkdownDocument (but less the source text, I mean, it's possible but complicated, or maybe through the source span/location and extract the markdown source from that), that's what most extensions are doing (e.g the Pragma line just above). You can add attributes directly yourself with an extension.
    Author
    Owner

    @pm64 commented on GitHub (Sep 23, 2022):

    Sorry, would you mind elaborating on the "through the source span/location" bit? I see that block.Span gives me the Start and Length of the source fragment, but are you saying the full source is also available, or should I just pass that in as an option to the extension?

    @pm64 commented on GitHub (Sep 23, 2022): Sorry, would you mind elaborating on the "through the source span/location" bit? I see that block.Span gives me the Start and Length of the source fragment, but are you saying the full source is also available, or should I just pass that in as an option to the extension?
    Author
    Owner

    @xoofx commented on GitHub (Sep 23, 2022):

    block.Span gives me the Start and Length of the source fragment, but are you saying the full source is also available, or should I just pass that in as an option to the extension?

    No just the span. The source is not passed to the markdown extension, but as you have it somewhere else necessarily, should not be a problem.

    @xoofx commented on GitHub (Sep 23, 2022): > block.Span gives me the Start and Length of the source fragment, but are you saying the full source is also available, or should I just pass that in as an option to the extension? No just the span. The source is not passed to the markdown extension, but as you have it somewhere else necessarily, should not be a problem.
    Author
    Owner

    @pm64 commented on GitHub (Sep 23, 2022):

    Ok, then seems I have an ideal path. Immense thanks for being so generous with your time and expertise, it's greatly appreciated.

    @pm64 commented on GitHub (Sep 23, 2022): Ok, then seems I have an ideal path. Immense thanks for being so generous with your time and expertise, it's greatly appreciated.
    Sign in to join this conversation.
    1 Participants
    Notifications
    Due Date
    No due date set.
    Dependencies

    No dependencies set.

    Reference: starred/markdig#564