Intercepting builtin tags? #57

Open
opened 2026-01-29 14:23:56 +00:00 by claunia · 10 comments
Owner

Originally created by @yetanotherchris on GitHub (Sep 26, 2016).

In the readme you mention you can plug into the core parsing:

Even the core Markdown/CommonMark parsing is pluggable, so it allows to disable builtin Markdown/Commonmark parsing (e.g Disable HTML parsing) or change behaviour (e.g change matching # of a headers with @)

Is there an example of this you could share? I'm looking specifically for image and link tags (as I mentioned in your blog post) - I want to rewrite the urls.

The MarkdownSharp way of doing this is to hack the source, for example add an event handler call inside private string DoImages(string text). Given your architecture I'm guessing it's a lot less messy in Markdig.

Originally created by @yetanotherchris on GitHub (Sep 26, 2016). In the readme you mention you can plug into the core parsing: > Even the core Markdown/CommonMark parsing is pluggable, so it allows to disable builtin Markdown/Commonmark parsing (e.g Disable HTML parsing) or change behaviour (e.g change matching # of a headers with @) Is there an example of this you could share? I'm looking specifically for image and link tags (as I mentioned in your blog post) - I want to rewrite the urls. The MarkdownSharp way of doing this is to hack the source, for example add an event handler call inside `private string DoImages(string text)`. Given your architecture I'm guessing it's a lot less messy in Markdig.
claunia added the enhancement label 2026-01-29 14:23:56 +00:00
Author
Owner

@xoofx commented on GitHub (Sep 26, 2016):

There is currently no public callbacks specifically for post processing image links (or any block/inline elements in fact). The only callback that is being exposed is MarkdownPipelineBuilder.DocumentProcessed from which you can postprocess a MarkdownDocument.

You can also do this by calling directly the Markdown.Parse, post-process the document and then render it with a HtmlRenderer.

Then you can iterate over the inline links elements like this doc.Descendants().OfType<LinkInline>()

That's the easiest solution for now, though it is not the most efficient one, as the Descendants() method is going to walkthrough all blocks and inlines only to return the ones you are interested in.

While developing markdig, I tried to add some callbacks, but I was not really satisfied with the impact they had (in terms of performance, in terms of verbosity they induce for extension developers...etc.). The problem is that some extensions are sometimes transforming a tree and I was not sure how to handle this nicely, things like:

  • a block could be created first, we get an event, than another extension replace it by another one: what should we do? Should we send an event that a previous created block doesn't exist anymore?...etc)
  • would a callback event want to iterate on the tree being build (get the parent of the element...etc.), even if it is still not complete/stable?...etc.

So there is some more thinking/work to be done in order to support efficiently this kind of scenario... Might be able to have a look later this week.

@xoofx commented on GitHub (Sep 26, 2016): There is currently no public callbacks specifically for post processing image links (or any block/inline elements in fact). The only callback that is being exposed is `MarkdownPipelineBuilder.DocumentProcessed` from which you can postprocess a `MarkdownDocument`. You can also do this by calling directly the `Markdown.Parse`, post-process the document and then render it with a `HtmlRenderer`. Then you can iterate over the inline links elements like this `doc.Descendants().OfType<LinkInline>()` That's the easiest solution for now, though it is not the most efficient one, as the `Descendants()` method is going to walkthrough all blocks and inlines only to return the ones you are interested in. While developing markdig, I tried to add some callbacks, but I was not really satisfied with the impact they had (in terms of performance, in terms of verbosity they induce for extension developers...etc.). The problem is that some extensions are sometimes transforming a tree and I was not sure how to handle this nicely, things like: - a block could be created first, we get an event, than another extension replace it by another one: what should we do? Should we send an event that a previous created block doesn't exist anymore?...etc) - would a callback event want to iterate on the tree being build (get the parent of the element...etc.), even if it is still not complete/stable?...etc. So there is some more thinking/work to be done in order to support efficiently this kind of scenario... Might be able to have a look later this week.
Author
Owner

@Kryptos-FR commented on GitHub (Sep 26, 2016):

I'm also interested in this. Currently I can render a MarkdownDocument into XAML (as text) with my custom renderer.

But for creating a WPF document (i.e. an instance of the FlowDocument class) using a renderer is not ideal: some post-process and transformations are required. Should I work directly on the syntax tree inside the MarkdownDocument?

@Kryptos-FR commented on GitHub (Sep 26, 2016): I'm also interested in this. Currently I can render a `MarkdownDocument` into XAML (as text) with my custom renderer. But for creating a WPF document (i.e. an instance of the `FlowDocument` class) using a renderer is not ideal: some post-process and transformations are required. Should I work directly on the syntax tree inside the `MarkdownDocument`?
Author
Owner

@xoofx commented on GitHub (Sep 26, 2016):

@Kryptos-FR not sure that the requested feature here could help your work (selective callback without having to re-visit the tree). In your case, you need to traverse all block and inline elements and create a WPF tree from them. The renderer provides mostly a visitor infrastructure but you can roll-up your own if it doesn't match your process. If you find no way to efficiently do this with the current API or there is just something missing in the renderer API that could be changed to help you, feel free to open another issue, we will look at this problem separately.

@xoofx commented on GitHub (Sep 26, 2016): @Kryptos-FR not sure that the requested feature here could help your work (selective callback without having to re-visit the tree). In your case, you need to traverse all block and inline elements and create a WPF tree from them. The renderer provides mostly a visitor infrastructure but you can roll-up your own if it doesn't match your process. If you find no way to efficiently do this with the current API or there is just something missing in the renderer API that could be changed to help you, feel free to open another issue, we will look at this problem separately.
Author
Owner

@yetanotherchris commented on GitHub (Sep 26, 2016):

Thanks for the pointer, actually the AST is fine for my needs although maybe a walker (similar to the pattern Antlr uses) might be a good strategy going forward, although the way it works now is fairly intuitive - it just needs a few docs. I'll happily add some examples.

Here's how I got it working for now, I haven't tested it with large documents yet though, but I can't see there being an issue.

class Program
{
    static void Main(string[] args)
    {
        var doc = Markdown.Parse("This [link test](http://www.google.com) is a text with some *emphasis*");

        Walk(doc);

        var builder = new StringBuilder();
        var textwriter = new StringWriter(builder);

        var renderer = new HtmlRenderer(textwriter);
        renderer.Render(doc);

        Console.WriteLine(builder.ToString());
        Console.WriteLine("");
        Console.WriteLine("Press any key...");
        Console.ReadKey();
    }

    static void Walk(MarkdownObject markdownObject)
    {
        foreach (MarkdownObject child in markdownObject.Descendants())
        {
            // LinkInline can be both an image or a <a href="...">
            LinkInline link = child as LinkInline;
            if (link != null)
            {
                HtmlAttributes attributes = link.GetAttributes();
                if (attributes == null)
                {
                    attributes = new HtmlAttributes();
                    attributes.Classes = new List<string>();
                }

                if (attributes.Classes == null)
                {
                    attributes.Classes = new List<string>();
                }

                attributes.Classes.Add("btn");
                attributes.Classes.Add("btn-primary");

                link.SetAttributes(attributes);
                Console.WriteLine(link.Url);
            }
        }
    }
}

Edit by @MihaZupan: Remove the recursive call to Walk that would cause N^2 visits.

@yetanotherchris commented on GitHub (Sep 26, 2016): Thanks for the pointer, actually the AST is fine for my needs although maybe a walker (similar to the pattern Antlr uses) might be a good strategy going forward, although the way it works now is fairly intuitive - it just needs a few docs. I'll happily add some examples. Here's how I got it working for now, I haven't tested it with large documents yet though, but I can't see there being an issue. ``` class Program { static void Main(string[] args) { var doc = Markdown.Parse("This [link test](http://www.google.com) is a text with some *emphasis*"); Walk(doc); var builder = new StringBuilder(); var textwriter = new StringWriter(builder); var renderer = new HtmlRenderer(textwriter); renderer.Render(doc); Console.WriteLine(builder.ToString()); Console.WriteLine(""); Console.WriteLine("Press any key..."); Console.ReadKey(); } static void Walk(MarkdownObject markdownObject) { foreach (MarkdownObject child in markdownObject.Descendants()) { // LinkInline can be both an image or a <a href="..."> LinkInline link = child as LinkInline; if (link != null) { HtmlAttributes attributes = link.GetAttributes(); if (attributes == null) { attributes = new HtmlAttributes(); attributes.Classes = new List<string>(); } if (attributes.Classes == null) { attributes.Classes = new List<string>(); } attributes.Classes.Add("btn"); attributes.Classes.Add("btn-primary"); link.SetAttributes(attributes); Console.WriteLine(link.Url); } } } } ``` Edit by @MihaZupan: Remove the recursive call to Walk that would cause N^2 visits.
Author
Owner

@jasel-lewis commented on GitHub (Sep 26, 2019):

@xoofx First off, LOVE Markdig - THANK YOU!

+1 for me on this topic as well. I'm using Markdig within an ASP.NET MVC app and would like to manipulate the URLs generated for an inline image. The static Markdown content resides within a route construct and I'd like to pass in the controller and action names so that I can just use the image's filename in the Markdown (i.e. ![Alternate Image Title](filename.jpg)*Image Caption*)) and get a full absolute path in the HTML output.

I was super excited when I noticed the GetDynamicUrl property of a LinkInline and I see what AutoIdentifierExtension is doing with it, but my hopes were dashed when I noticed the InlineProcessor does not fire any events such as the Closed event that AutoIdentifierExtension is using on the HeadingBlockParser.

I read your reply above and I understand the complexities involved. I'll probably end up just walking the Descendants as provided in the code sample posted by @yetanotherchris (thanks, @yetanotherchris!!). Nevertheless, it would be SUPER nice to hook into the processors using delegates to manipulate certain properties of the differing Syntaxes.

@jasel-lewis commented on GitHub (Sep 26, 2019): @xoofx First off, LOVE Markdig - THANK YOU! +1 for me on this topic as well. I'm using Markdig within an ASP.NET MVC app and would like to manipulate the URLs generated for an inline image. The static Markdown content resides within a route construct and I'd like to pass in the controller and action names so that I can just use the image's filename in the Markdown (i.e. `![Alternate Image Title](filename.jpg)*Image Caption*)`) and get a full absolute path in the HTML output. I was super excited when I noticed the `GetDynamicUrl` property of a `LinkInline` and I see what `AutoIdentifierExtension` is doing with it, but my hopes were dashed when I noticed the `InlineProcessor` does not fire any events such as the `Closed` event that `AutoIdentifierExtension` is using on the `HeadingBlockParser`. I read your reply above and I understand the complexities involved. I'll probably end up just walking the `Descendants` as provided in the code sample posted by @yetanotherchris (thanks, @yetanotherchris!!). Nevertheless, it would be SUPER nice to hook into the processors using delegates to manipulate certain properties of the differing `Syntax`es.
Author
Owner

@MihaZupan commented on GitHub (Sep 26, 2019):

@jasel-lewis Does the Func<string, string> LinkRewriter exposed on the HtmlRenderer solve your use case?
renderer.LinkRewriter = link => "somethingElse/" + link;

I feel that post-processing the MarkdownDocument at the end is more appropriate for such changes.

You should know that now there is a Descendants<T>() method available to make simple modifications easier. It is currently missing an overload where T: Inline, but that is a simple PR change away.

@MihaZupan commented on GitHub (Sep 26, 2019): @jasel-lewis Does the `Func<string, string> LinkRewriter` exposed on the `HtmlRenderer` solve your use case? `renderer.LinkRewriter = link => "somethingElse/" + link;` I feel that post-processing the `MarkdownDocument` at the end is more appropriate for such changes. You should know that now there is a `Descendants<T>()` method available to make simple modifications easier. It is currently missing an overload `where T: Inline`, but that is a simple PR change away.
Author
Owner

@jasel-lewis commented on GitHub (Sep 27, 2019):

@MihaZupan Nice find! ...but unfortunately, no. Using LinkRewriter rewrites every link (even header references). I want to solely rewrite image links (because they exist in static-content folders that are physically buried within the MVC construct). There is no way to tell (with LinkRewriter) if the link currently being rewritten belongs to a LinkInline.

I like the way you think, however. I may create a PR which does something similar and adds a LinkRewriter delegate property to the LinkInlineRenderer because you can do something like this: htmlRenderer.ObjectRenderers.Find<LinkInlineRenderer>();.

As an extension to my prior post, this is how I went about modifying @yetanotherchris's solution - just in case any future on-looker cared:

private void Walk(MarkdownObject markdownObject)
{
    var links = markdownObject
        .Descendants()
        .Where(o => o is LinkInline)
        .Cast<LinkInline>()
        .Where(l => l.IsImage && !l.Url.StartsWith("http"));

    foreach (var link in links)
    {
        link.GetDynamicUrl = () =>
            Markdig.Helpers.HtmlHelper.Unescape(this.absoluteUrlPath + link.Url);
    }
}

Note: The !l.Url.StartsWith("http") is to make an educated guess to ensure we didn't already assign a static URL within the Markdown syntax.

Note2: Code was originally a recursive function per the code above from @yetanotherchris. As @MihaZupan pointed out, the Walk(child) causes a huge, and unnecessary, performance issue. Got rid of it and refined the number of objects being inspected per the Linq query (until Decendants<T>() gets fleshed out for Inlines).

@jasel-lewis commented on GitHub (Sep 27, 2019): @MihaZupan Nice find! ...but unfortunately, no. Using `LinkRewriter` rewrites _every_ link (even header references). I want to solely rewrite _image_ links (because they exist in static-content folders that are physically buried within the MVC construct). There is no way to tell (with `LinkRewriter`) if the link currently being rewritten belongs to a `LinkInline`. I like the way you think, however. I may create a PR which does something similar and adds a `LinkRewriter` delegate property to the `LinkInlineRenderer` because you _can_ do something like this: `htmlRenderer.ObjectRenderers.Find<LinkInlineRenderer>();`. As an extension to my prior post, this is how I went about modifying @yetanotherchris's solution - just in case any future on-looker cared: ```cs private void Walk(MarkdownObject markdownObject) { var links = markdownObject .Descendants() .Where(o => o is LinkInline) .Cast<LinkInline>() .Where(l => l.IsImage && !l.Url.StartsWith("http")); foreach (var link in links) { link.GetDynamicUrl = () => Markdig.Helpers.HtmlHelper.Unescape(this.absoluteUrlPath + link.Url); } } ``` Note: The `!l.Url.StartsWith("http")` is to make an educated guess to ensure we didn't already assign a static URL within the Markdown syntax. Note2: Code was originally a recursive function per the code above from @yetanotherchris. As @MihaZupan pointed out, the `Walk(child)` causes a huge, and unnecessary, performance issue. Got rid of it and refined the number of objects being inspected per the Linq query (until `Decendants<T>()` gets fleshed out for `Inline`s).
Author
Owner

@MihaZupan commented on GitHub (Sep 27, 2019):

Descendants already walks through all the child nodes. Doing it again recursively means you're visiting nodes N^2 times.

@MihaZupan commented on GitHub (Sep 27, 2019): Descendants already walks through all the child nodes. Doing it again recursively means you're visiting nodes N^2 times.
Author
Owner

@JamesQMurphy commented on GitHub (Jan 14, 2020):

@jasel-lewis Thank you for posting that code! I'm just wondering if you or @yetanotherchris or anyone else considered hooking into the RendererBase.ObjectWriteBefore event. This event, along with the ObjectWriteAfter event, does make the MarkdownObject available, allowing you to determine if it's a LinkInline or not.

This is the approach I took:

public string RenderHtml(string markdown)
{
    if (markdown == null) throw new ArgumentNullException("markdown");

    var writer = new StringWriter();
    var renderer = new Markdig.Renderers.HtmlRenderer(writer);
    renderer.ObjectWriteBefore += Renderer_ObjectWriteBefore;
    pipeline.Setup(renderer);

    var document = Markdown.Parse(markdown, pipeline);
    renderer.Render(document);
    writer.Flush();
    return writer.ToString();
}

private void Renderer_ObjectWriteBefore(Markdig.Renderers.IMarkdownRenderer arg1, Markdig.Syntax.MarkdownObject obj)
{
    var link = obj as Markdig.Syntax.Inlines.LinkInline;
    if (link != null && link.IsImage && !(link.Url.StartsWith("http")))
    {
        link.Url = Markdig.Helpers.HtmlHelper.Unescape(this.absoluteUrlPath + link.Url);
    }
}

@xoofx Also a big fan of Markdig, so let me echo @jasel-lewis 's thanks! 😃

@JamesQMurphy commented on GitHub (Jan 14, 2020): @jasel-lewis Thank you for posting that code! I'm just wondering if you or @yetanotherchris or anyone else considered hooking into the `RendererBase.ObjectWriteBefore` event. This event, along with the `ObjectWriteAfter` event, *does* make the `MarkdownObject` available, allowing you to determine if it's a `LinkInline` or not. This is the approach I took: ```csharp public string RenderHtml(string markdown) { if (markdown == null) throw new ArgumentNullException("markdown"); var writer = new StringWriter(); var renderer = new Markdig.Renderers.HtmlRenderer(writer); renderer.ObjectWriteBefore += Renderer_ObjectWriteBefore; pipeline.Setup(renderer); var document = Markdown.Parse(markdown, pipeline); renderer.Render(document); writer.Flush(); return writer.ToString(); } private void Renderer_ObjectWriteBefore(Markdig.Renderers.IMarkdownRenderer arg1, Markdig.Syntax.MarkdownObject obj) { var link = obj as Markdig.Syntax.Inlines.LinkInline; if (link != null && link.IsImage && !(link.Url.StartsWith("http"))) { link.Url = Markdig.Helpers.HtmlHelper.Unescape(this.absoluteUrlPath + link.Url); } } ``` @xoofx Also a big fan of Markdig, so let me echo @jasel-lewis 's thanks! 😃
Author
Owner

@MihaZupan commented on GitHub (Jan 14, 2020):

I haven't concidered it before, but I suppose it should work just as fine.

I personally prefer post-processing the AST prior to rendering like so:

MarkdownDocument document = Markdown.Parse(markdown, pipeline);

foreach (LinkInline link in document.Descendants().OfType<LinkInline>())
{
    if (link.IsImage && !link.Url.StartsWith("http"))
    {
        link.Url = HtmlHelper.Unescape("https://base.com/" + link.Url);
    }
}

renderer.Render(document);

Where
document.Descendants().OfType<LinkInline>()
can become
document.Descendants<LinkInline>() with a trivial PR.

@MihaZupan commented on GitHub (Jan 14, 2020): I haven't concidered it before, but I suppose it should work just as fine. I personally prefer post-processing the AST prior to rendering like so: ```c# MarkdownDocument document = Markdown.Parse(markdown, pipeline); foreach (LinkInline link in document.Descendants().OfType<LinkInline>()) { if (link.IsImage && !link.Url.StartsWith("http")) { link.Url = HtmlHelper.Unescape("https://base.com/" + link.Url); } } renderer.Render(document); ``` Where `document.Descendants().OfType<LinkInline>()` can become `document.Descendants<LinkInline>()` with a trivial PR.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/markdig#57