Includes Extension #62

Closed
opened 2026-01-29 14:24:18 +00:00 by claunia · 14 comments
Owner

Originally created by @daveaglick on GitHub (Oct 18, 2016).

It would be cool to be able to include content from another file before processing. There's precedent for this functionality in other Markdown implementations:

  • This extension for Python-Markdown uses the syntax {!filename!}.
  • This NPM module uses the syntax #include filename to pre-process Markdown files for inclusion.
  • Pandoc-Include uses code fences with the include label to indicate inclusion, though it doesn't appear to be general-purpose inclusion since it still keeps the code block.
  • Here's a Python preprocessor that uses !INCLUDE filename

Of these three, I think prefer the {!filename!} syntax, but don't have a strong preference. I would happily defer on the syntax if you have thoughts on this. A final requirement is that this should work recursively.

I'd also like to see the includes get processed by a swappable I/O abstraction. The default would use the local file system, but an alternate implementation could be provided to the extension. It wouldn't have to be a full file system abstraction. In the simplest form, this interface would just ask for content given a path. Perhaps this flexibility could be easily supported with a delegate that can be specified when activating the extension. Regardless, it's important for use cases like mine (Wyam) where all file system interaction is through the host.

I'm going to attempt an implementation and PR for this if you think it's in-scope.

Originally created by @daveaglick on GitHub (Oct 18, 2016). It would be cool to be able to include content from another file before processing. There's precedent for this functionality in other Markdown implementations: - [This extension](https://github.com/cmacmackin/markdown-include) for Python-Markdown uses the syntax `{!filename!}`. - [This NPM module](https://www.npmjs.com/package/markdown-include) uses the syntax `#include filename` to pre-process Markdown files for inclusion. - [Pandoc-Include](https://hackage.haskell.org/package/pandoc-include) uses code fences with the `include` label to indicate inclusion, though it doesn't appear to be general-purpose inclusion since it still keeps the code block. - [Here's a Python preprocessor](https://noswap.com/projects/markdownpp#includes) that uses `!INCLUDE filename` Of these three, I think prefer the `{!filename!}` syntax, but don't have a strong preference. I would happily defer on the syntax if you have thoughts on this. A final requirement is that this should work recursively. I'd also like to see the includes get processed by a swappable I/O abstraction. The default would use the local file system, but an alternate implementation could be provided to the extension. It wouldn't have to be a full file system abstraction. In the simplest form, this interface would just ask for content given a path. Perhaps this flexibility could be easily supported with a delegate that can be specified when activating the extension. Regardless, it's important for use cases like mine ([Wyam](http://wyam.io)) where all file system interaction is through the host. I'm going to attempt an implementation and PR for this if you think it's in-scope.
claunia added the enhancement label 2026-01-29 14:24:18 +00:00
Author
Owner

@xoofx commented on GitHub (Oct 18, 2016):

While it could be implemented, I'm not really convinced that it should be part of a Markdown processor. Usually, this kind of things are done through a text templating engine on top of Markdown (or whatever text files). For example, scriban is providing an include directive... This is the same for liquid, handlebars...etc. (With the pandora box opened, the risk to ask after for loops, conditions, variables...etc.)
I'm intrigued why Wyam is not using Razor for such things?

@xoofx commented on GitHub (Oct 18, 2016): While it could be implemented, I'm not really convinced that it should be part of a Markdown processor. Usually, this kind of things are done through a text templating engine on top of Markdown (or whatever text files). For example, [scriban](https://github.com/lunet-io/scriban) is providing an [include directive](https://github.com/lunet-io/scriban/blob/master/doc/language.md#99-include-name-arg1argn)... This is the same for liquid, handlebars...etc. (With the pandora box opened, the risk to ask after for loops, conditions, variables...etc.) I'm intrigued why Wyam is not using Razor for such things?
Author
Owner

@daveaglick commented on GitHub (Oct 18, 2016):

Yeah, I generally agree with the sentiment about things like includes, variable substitutions, control flow, etc. being the job of templating engines instead of markup processors.

This stems from an interesting use case I had. The conventional usage pattern in Wyam is to set up some pipelines that processes Markdown files first and then runs both the rendered Markdown and any additional Razor files through a Razor processor to add layouts, process includes, etc.

That usually works great. However, I ran across a pattern where it doesn't work when dealing with artifacts from an externally obtained source like a git repository (I'm working on using Wyam for API docs). In those cases, the Markdown files (like contributing.md) aren't part of the site data. They don't have site-specific front matter and aren't placed into the appropriate folder structure for navigation. To support the static build you need another file (Markdown or otherwise) that has the correct front matter, etc. But it needs to include the artifact from the repo. If that include is done as a Razor file, we've already processed any Markdown content so you'll just get the raw Markdown markup. You could include these types of files onesy-twosy in the build by copying them, but I don't love that either. It's kind of a chicken-and-egg problem.

Anyway, TL;RD: the Markdig extensibility story is fantastic. I can see arguments for this feature being included in the core distribution, and just as valid reasons why it shouldn't. I'll probably implement it either way, so I'll defer to you whether you want me to build it as a generic extension and submit a PR or keep it local to Wyam. No hard feelings if you'd prefer keep this out of the official distribution.

@daveaglick commented on GitHub (Oct 18, 2016): Yeah, I generally agree with the sentiment about things like includes, variable substitutions, control flow, etc. being the job of templating engines instead of markup processors. This stems from an interesting use case I had. The conventional usage pattern in Wyam is to set up some pipelines that processes Markdown files first and then runs both the rendered Markdown and any additional Razor files through a Razor processor to add layouts, process includes, etc. That usually works great. However, I ran across a pattern where it doesn't work when dealing with artifacts from an externally obtained source like a git repository (I'm working on using Wyam for API docs). In those cases, the Markdown files (like `contributing.md`) aren't part of the site data. They don't have site-specific front matter and aren't placed into the appropriate folder structure for navigation. To support the static build you need another file (Markdown or otherwise) that has the correct front matter, etc. But it needs to include the artifact from the repo. If that include is done as a Razor file, we've already processed any Markdown content so you'll just get the raw Markdown markup. You could include these types of files onesy-twosy in the build by copying them, but I don't love that either. It's kind of a chicken-and-egg problem. Anyway, TL;RD: the Markdig extensibility story is fantastic. I can see arguments for this feature being included in the core distribution, and just as valid reasons why it shouldn't. I'll probably implement it either way, so I'll defer to you whether you want me to build it as a generic extension and submit a PR or keep it local to Wyam. No hard feelings if you'd prefer keep this out of the official distribution.
Author
Owner

@xoofx commented on GitHub (Oct 18, 2016):

I see. So yep, I think that It would be best developed outside the core implem for now, and if there is more interest in this, I will reconsider this.
The tricky part is that it will require some internal changes to Markdig (ah! an extension system is always limited!), as you will need to replace the current string being parsed with a new string expanded... so most likely a PR will be needed to open this possibility (note that the sourcecode map/span will not work in this mode)

@xoofx commented on GitHub (Oct 18, 2016): I see. So yep, I think that It would be best developed outside the core implem for now, and if there is more interest in this, I will reconsider this. The tricky part is that it will require some internal changes to Markdig (ah! an extension system is always limited!), as you will need to replace the current string being parsed with a new string expanded... so most likely a PR will be needed to open this possibility (note that the sourcecode map/span will not work in this mode)
Author
Owner

@daveaglick commented on GitHub (Oct 18, 2016):

Okay, great - I'll see how far I get.

I was thinking I might get around adding the included unprocessed content to the current pipeline by recursively calling Markdown.ToHtml() from within the extension and then just passing the result to the renderer. That obviously has some limitations like each included file will have independent state, but maybe that's for the best anyway.

Now that I think about it, maybe there's potential for a very general DelegateExtension. It could use a user-specified block delimiter to identify blocks. Then a user-specific Func<string, string> with an input of the original content inside the block and a return of the HTML to render would provide the rendering.

So for this particular case, I'd use it like this:

var pipelineBuilder = new MarkdownPipelineBuilder();
pipelineBuilder.Configure(_configuration);
var includeExtension = new DelegateExtension(
    "{!", "!}",
    input =>
    {
        var nestedContent = GetFile(input);
        var nestedPipeline = pipelineBuilder.Build();
        return Markdown.ToHtml(nestedContent, nestedPipeline);
    });
pipelineBuilder.Extensions.Add(includeExtension);
var pipeline = pipelineBuilder.Build();
var result = Markdown.ToHtml(content, pipeline);

It seems like that might open the door to more special-purpose extensions. What do you think?

@daveaglick commented on GitHub (Oct 18, 2016): Okay, great - I'll see how far I get. I was thinking I might get around adding the included unprocessed content to the current pipeline by recursively calling `Markdown.ToHtml()` from within the extension and then just passing the result to the renderer. That obviously has some limitations like each included file will have independent state, but maybe that's for the best anyway. Now that I think about it, maybe there's potential for a very general `DelegateExtension`. It could use a user-specified block delimiter to identify blocks. Then a user-specific `Func<string, string>` with an input of the original content inside the block and a return of the HTML to render would provide the rendering. So for this particular case, I'd use it like this: ``` var pipelineBuilder = new MarkdownPipelineBuilder(); pipelineBuilder.Configure(_configuration); var includeExtension = new DelegateExtension( "{!", "!}", input => { var nestedContent = GetFile(input); var nestedPipeline = pipelineBuilder.Build(); return Markdown.ToHtml(nestedContent, nestedPipeline); }); pipelineBuilder.Extensions.Add(includeExtension); var pipeline = pipelineBuilder.Build(); var result = Markdown.ToHtml(content, pipeline); ``` It seems like that might open the door to more special-purpose extensions. What do you think?
Author
Owner

@xoofx commented on GitHub (Oct 19, 2016):

I was thinking I might get around adding the included unprocessed content to the current pipeline by recursively calling Markdown.ToHtml()

Good idea, this is not entirely an expected behavior for a renderer but it is an acceptable implementation and it solves the span problem without requiring to modify internals. 👍

It seems like that might open the door to more special-purpose extensions.

You could but.... while developing all the extensions, I never had really a need for this, as most of the time, special-purpose extensions require very specific handling/matching/rendering. So not sure it is worth it, but anyway, you can experience it.

@xoofx commented on GitHub (Oct 19, 2016): > I was thinking I might get around adding the included unprocessed content to the current pipeline by recursively calling Markdown.ToHtml() Good idea, this is not entirely an expected behavior for a renderer but it is an acceptable implementation and it solves the span problem without requiring to modify internals. 👍 > It seems like that might open the door to more special-purpose extensions. You could but.... while developing all the extensions, I never had really a need for this, as most of the time, special-purpose extensions require very specific handling/matching/rendering. So not sure it is worth it, but anyway, you can experience it.
Author
Owner

@daveaglick commented on GitHub (Oct 28, 2016):

I ended up solving this within my own project, and after thinking about it I tend to agree that file inclusion should be out of scope for the Markdown renderer. I'll close the issue - thanks for the feedback and tips!

@daveaglick commented on GitHub (Oct 28, 2016): I ended up solving this within my own project, and after thinking about it I tend to agree that file inclusion should be out of scope for the Markdown renderer. I'll close the issue - thanks for the feedback and tips!
Author
Owner

@cpfr commented on GitHub (Oct 5, 2017):

Sorry for resurrecting a one-year-old issue, but I am also interested in an import Extension and I would love to see how you (@daveaglick) solved this within you own project. A problem I encountered was that MarkDig only works with strings and has no clue about file paths. This is problematic for a potential include Extension, because included file paths should be relative to the main file. A preprocessor on the other hand would have to understand at least a bit of markdown syntax in order to not include things inside code blocks or in the middle of a sentence. How did you work around those problems?

@cpfr commented on GitHub (Oct 5, 2017): Sorry for resurrecting a one-year-old issue, but I am also interested in an import Extension and I would love to see how you (@daveaglick) solved this within you own project. A problem I encountered was that MarkDig only works with strings and has no clue about file paths. This is problematic for a potential include Extension, because included file paths should be relative to the main file. A preprocessor on the other hand would have to understand at least a bit of markdown syntax in order to not include things inside code blocks or in the middle of a sentence. How did you work around those problems?
Author
Owner

@xoofx commented on GitHub (Oct 5, 2017):

This is problematic for a potential include Extension, because included file paths should be relative to the main file.

@cpfr I don't think there is anything problematic, as it is up to the extension to define what is a relative path is defined to and how it will handle it (error wise...etc.)

You can have a look at the under development port of docfx markdown to markdig with their inclusion extension for Markdown.

The idea is that you can propagate whatever context you want through a Markdig extension.

A preprocessor on the other hand would have to understand at least a bit of markdown syntax in order to not include things inside code blocks or in the middle of a sentence. How did you work around those problems?

In the case of a templating system on top of Markdown, it doesn't have to understand markdown, just its templating language. It is a different way (with different features) to something integrated into Markdown parsing.

@xoofx commented on GitHub (Oct 5, 2017): > This is problematic for a potential include Extension, because included file paths should be relative to the main file. @cpfr I don't think there is anything problematic, as it is up to the extension to define what is a relative path is defined to and how it will handle it (error wise...etc.) You can have a look at the under development port of docfx markdown to markdig with their [inclusion extension](https://github.com/docascode/markdown/tree/dev/MarkdigEngine.Extensions/Inclusion) for Markdown. The idea is that you can propagate whatever context you want through a Markdig extension. > A preprocessor on the other hand would have to understand at least a bit of markdown syntax in order to not include things inside code blocks or in the middle of a sentence. How did you work around those problems? In the case of a templating system on top of Markdown, it doesn't have to understand markdown, just its templating language. It is a different way (with different features) to something integrated into Markdown parsing.
Author
Owner

@daveaglick commented on GitHub (Oct 5, 2017):

@cpfr I ended up going out of band for includes. Markdown (and every other file) is processed by a separate bit of code that looks for a specific include syntax and places the included file(s) inside the Markdown content before it ever reaches Markdig. That made sense for me since my project processes lots of different kinds of files and this was a way to standardize includes across all of them.

@daveaglick commented on GitHub (Oct 5, 2017): @cpfr I ended up going out of band for includes. Markdown (and every other file) is processed by a separate bit of code that looks for a specific include syntax and places the included file(s) inside the Markdown content before it ever reaches Markdig. That made sense for me since my project processes lots of different kinds of files and this was a way to standardize includes across all of them.
Author
Owner

@cpfr commented on GitHub (Oct 9, 2017):

Thanks for the replies. I finally got my import extension working. Here's a hint for everyone who also wants to implement one: Since the AST nodes don't contain any strings for the text content, but only provide views to the original string, the nodes are coupled to the parser classes. I originally intended to directly append the parse results of the imported documents to the main document tree, without the need to write a custom renderer. It turned out that this was a bad idea, due to the parent-node references in the sub-document. Instead, I implemented an ImportBlockNode that contains the parsed sub-document and an ImportBlockRenderer that simply renders the sub-document in the ImportBlockNode.

@cpfr commented on GitHub (Oct 9, 2017): Thanks for the replies. I finally got my import extension working. Here's a hint for everyone who also wants to implement one: Since the AST nodes don't contain any strings for the text content, but only provide views to the original string, the nodes are coupled to the parser classes. I originally intended to directly append the parse results of the imported documents to the main document tree, without the need to write a custom renderer. It turned out that this was a bad idea, due to the parent-node references in the sub-document. Instead, I implemented an ImportBlockNode that contains the parsed sub-document and an ImportBlockRenderer that simply renders the sub-document in the ImportBlockNode.
Author
Owner

@xoofx commented on GitHub (Oct 9, 2017):

@cpfr thanks for the feedback, good catch about the parent thingy, it sounds a suitable approach to sub-render a doc.

@xoofx commented on GitHub (Oct 9, 2017): @cpfr thanks for the feedback, good catch about the parent thingy, it sounds a suitable approach to sub-render a doc.
Author
Owner

@AceTheWiz commented on GitHub (Jan 25, 2021):

Thanks for the replies. I finally got my import extension working. Here's a hint for everyone who also wants to implement one: Since the AST nodes don't contain any strings for the text content, but only provide views to the original string, the nodes are coupled to the parser classes. I originally intended to directly append the parse results of the imported documents to the main document tree, without the need to write a custom renderer. It turned out that this was a bad idea, due to the parent-node references in the sub-document. Instead, I implemented an ImportBlockNode that contains the parsed sub-document and an ImportBlockRenderer that simply renders the sub-document in the ImportBlockNode.

Hi @cpfr can you possible share your solution ?

@AceTheWiz commented on GitHub (Jan 25, 2021): > Thanks for the replies. I finally got my import extension working. Here's a hint for everyone who also wants to implement one: Since the AST nodes don't contain any strings for the text content, but only provide views to the original string, the nodes are coupled to the parser classes. I originally intended to directly append the parse results of the imported documents to the main document tree, without the need to write a custom renderer. It turned out that this was a bad idea, due to the parent-node references in the sub-document. Instead, I implemented an ImportBlockNode that contains the parsed sub-document and an ImportBlockRenderer that simply renders the sub-document in the ImportBlockNode. Hi @cpfr can you possible share your solution ?
Author
Owner

@cpfr commented on GitHub (Jan 27, 2021):

Hi,

unfortunately I am not able to share the solution, because I don't have access to it anymore and - to be honest - I don't really remember the details since this was more than three years ago. I bet that the source code of Markdig would be incompatible to my old solution by now (because I didn't update it since then).

@cpfr commented on GitHub (Jan 27, 2021): Hi, unfortunately I am not able to share the solution, because I don't have access to it anymore and - to be honest - I don't really remember the details since this was more than three years ago. I bet that the source code of Markdig would be incompatible to my old solution by now (because I didn't update it since then).
Author
Owner

@AceTheWiz commented on GitHub (Jan 29, 2021):

Understood. Thank you for taking the time to answer.

@AceTheWiz commented on GitHub (Jan 29, 2021): Understood. Thank you for taking the time to answer.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/markdig#62