How to remove things like CodeBlocks from ToPlainText rendering #669

Open
opened 2026-01-29 14:42:33 +00:00 by claunia · 3 comments
Owner

Originally created by @kaylumah on GitHub (Apr 5, 2024).

Hi Xoofx,

The repo does not have discussions enabled so I am submitting it here. I apologise in advance if there is a better place to put these kind of questions.

For my blog I am looking into a clean way to count the number of words present in a specific article.

I came across the ToPlainText method for my Markdown. That appears to make it mostly clean text.
However, it leaves in things like the code blocks (my blog is technical, so lots of code snippets).
Is there an extension point I missed, in which I can remove code blocks from the PlainText view?

Any pointers would be appreciated

Thanks for the awesome work you did on both Markdig and Scriban
Max

Originally created by @kaylumah on GitHub (Apr 5, 2024). Hi Xoofx, The repo does not have discussions enabled so I am submitting it here. I apologise in advance if there is a better place to put these kind of questions. For my blog I am looking into a clean way to count the number of words present in a specific article. I came across the `ToPlainText` method for my Markdown. That appears to make it mostly clean text. However, it leaves in things like the code blocks (my blog is technical, so lots of code snippets). Is there an extension point I missed, in which I can remove code blocks from the PlainText view? Any pointers would be appreciated Thanks for the awesome work you did on both Markdig and Scriban Max
claunia added the question label 2026-01-29 14:42:33 +00:00
Author
Owner

@xoofx commented on GitHub (Apr 5, 2024):

Is there an extension point I missed, in which I can remove code blocks from the PlainText view?

Not that I'm aware, but you can just take the Markdown AST, search/remove the code blocks, and call PlainText later.

In my own blog post engine, I do it differently, convert to HTML, and extract the text from there with NUglify here

@xoofx commented on GitHub (Apr 5, 2024): > Is there an extension point I missed, in which I can remove code blocks from the PlainText view? Not that I'm aware, but you can just take the Markdown AST, search/remove the code blocks, and call PlainText later. In my own blog post engine, I do it differently, convert to HTML, and extract the text from there with NUglify [here](https://github.com/lunet-io/lunet/blob/8595d9caf7acedfde3ae9aa360470fcb372c4249/src/Lunet.Summarizer/SummarizerHelper.cs)
Author
Owner

@kaylumah commented on GitHub (Apr 5, 2024):

I don't see an equivalent ToText as an extension 8e22754db4/src/Markdig/Markdown.cs (L136)

So based on 8e22754db4/src/Markdig/Markdown.cs (L240)

I think I need to do something like this

            StringWriter writer = new StringWriter();
            MarkdownDocument document = Markdown.Parse(source, pipeline);
            // todo remove codeblocks from Document.Decendants
            HtmlRenderer renderer = new HtmlRenderer(writer)
            {
                EnableHtmlForBlock = false,
                EnableHtmlForInline = false,
                EnableHtmlEscape = false,
            };
            pipeline.Setup(renderer);

            renderer.Render(document);
            writer.Flush();
            string result = writer.ToString();
            return result;
@kaylumah commented on GitHub (Apr 5, 2024): I don't see an equivalent ToText as an extension https://github.com/xoofx/markdig/blob/8e22754db405ccd1ac6b01eca33c8931c56cf6c1/src/Markdig/Markdown.cs#L136 So based on https://github.com/xoofx/markdig/blob/8e22754db405ccd1ac6b01eca33c8931c56cf6c1/src/Markdig/Markdown.cs#L240 I think I need to do something like this ```csharp StringWriter writer = new StringWriter(); MarkdownDocument document = Markdown.Parse(source, pipeline); // todo remove codeblocks from Document.Decendants HtmlRenderer renderer = new HtmlRenderer(writer) { EnableHtmlForBlock = false, EnableHtmlForInline = false, EnableHtmlEscape = false, }; pipeline.Setup(renderer); renderer.Render(document); writer.Flush(); string result = writer.ToString(); return result; ```
Author
Owner

@BeneHenke commented on GitHub (Sep 10, 2024):

You can iterate through your MarkdownDocument and remove blocks like this.

foreach (CodeBlock item in document.Descendants<CodeBlock>()) { document.Remove(item); }

@BeneHenke commented on GitHub (Sep 10, 2024): You can iterate through your MarkdownDocument and remove blocks like this. `` foreach (CodeBlock item in document.Descendants<CodeBlock>()) { document.Remove(item); } ``
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/markdig#669