Support for extracting raw text, and replacing raw text. #205

Closed
opened 2026-01-29 14:30:20 +00:00 by claunia · 5 comments
Owner

Originally created by @pauldotknopf on GitHub (May 3, 2018).

I have a requirement to support generating translated markdown documents from POT/PO files.

I need to parse a document for all raw string values. I will use this to generate a POT document that translators will use to create new languages.

I also need to support replacing raw text during rendering to enable replacing text with the translated value.

Any guidance would be greatly appreciated!

Originally created by @pauldotknopf on GitHub (May 3, 2018). I have a requirement to support generating translated markdown documents from POT/PO files. I need to parse a document for all raw string values. I will use this to generate a POT document that translators will use to create new languages. I also need to support replacing raw text during rendering to enable replacing text with the translated value. Any guidance would be greatly appreciated!
Author
Owner

@pauldotknopf commented on GitHub (May 4, 2018):

Ok, so I figured it out. Let me know if I am heading down a wrong/stupid track?

Here are is my test case which illustrates my requirements.

[Fact]
public void Can_replace_text()
{
    var result = _markdownRenderer.TransformMarkdown("test", val => $"{val}-append");
    Assert.Equal("<p>test-append</p>", result);

    result = _markdownRenderer.TransformMarkdown("test*another*", val => $"{val}-append");
    Assert.Equal("<p>test-append<em>another-append</em></p>", result);

    var content = new StringBuilder();
    content.AppendLine("a | b");
    content.AppendLine("- | -");
    content.AppendLine("0 | 0");
    result = _markdownRenderer.TransformMarkdown(content.ToString(), val => $"{val}-append");
    content = new StringBuilder();
    content.AppendLine("<table>");
    content.AppendLine("<thead>");
    content.AppendLine("<tr>");
    content.AppendLine("<th>a-append</th>");
    content.AppendLine("<th>b-append</th>");
    content.AppendLine("</tr>");
    content.AppendLine("</thead>");
    content.AppendLine("<tbody>");
    content.AppendLine("<tr>");
    content.AppendLine("<td>0-append</td>");
    content.AppendLine("<td>0-append</td>");
    content.AppendLine("</tr>");
    content.AppendLine("</tbody>");
    content.Append("</table>");
    Assert.Equal(content.ToString(), result);
}

Here is what my internal implementation looks like.

public string TransformMarkdown(string markdown, Func<string, string> replacement)
{
    var pipeline = new MarkdownPipelineBuilder()
        .UseAdvancedExtensions()
        .Build();
    var document = Markdig.Markdown.Parse(markdown, pipeline);
    
    // Do a fake rendering that replaces content.
    var replacementRenderer = new TextReplacementRenderer(TextWriter.Null, replacement);
    pipeline.Setup(replacementRenderer);
    replacementRenderer.Render(document);
    
    using (var stringWriter = new StringWriter())
    {
        var htmlRenderer = new HtmlRenderer(stringWriter);
        pipeline.Setup(htmlRenderer);
        htmlRenderer.Render(document);
        stringWriter.Flush();
        return stringWriter.ToString().TrimEnd(Environment.NewLine.ToCharArray());
    }
}

The key part in the above code is my custom TextReplacementRenderer.

public class TextReplacementRenderer : TextRendererBase<TextReplacementRenderer>
{
    public TextReplacementRenderer(TextWriter writer, Func<string, string> replacement) : base(writer)
    {
        ObjectRenderers.Add(new CustomParagraphRenderer());
        ObjectRenderers.Add(new CustomLiteralInlineRenderer(replacement));
    }

    class CustomParagraphRenderer : MarkdownObjectRenderer<TextReplacementRenderer, ParagraphBlock>
    {
        protected override void Write(TextReplacementRenderer renderer, ParagraphBlock obj)
        {
            renderer.WriteLeafInline(obj);
        }
    }

    class CustomLiteralInlineRenderer : MarkdownObjectRenderer<TextReplacementRenderer, LiteralInline>
    {
        readonly Func<string, string> _replacement;

        public CustomLiteralInlineRenderer(Func<string, string> replacement)
        {
            _replacement = replacement;
        }
        
        protected override void Write(TextReplacementRenderer renderer, LiteralInline obj)
        {
            obj.Content = new StringSlice(_replacement(obj.Content.ToString()));
        }
    }
}

The main thing I'm worried about is that the start/end/position properties used all over the place would be invalid. Should I be? Is this approach ok?

@pauldotknopf commented on GitHub (May 4, 2018): Ok, so I figured it out. Let me know if I am heading down a wrong/stupid track? Here are is my test case which illustrates my requirements. ```c# [Fact] public void Can_replace_text() { var result = _markdownRenderer.TransformMarkdown("test", val => $"{val}-append"); Assert.Equal("<p>test-append</p>", result); result = _markdownRenderer.TransformMarkdown("test*another*", val => $"{val}-append"); Assert.Equal("<p>test-append<em>another-append</em></p>", result); var content = new StringBuilder(); content.AppendLine("a | b"); content.AppendLine("- | -"); content.AppendLine("0 | 0"); result = _markdownRenderer.TransformMarkdown(content.ToString(), val => $"{val}-append"); content = new StringBuilder(); content.AppendLine("<table>"); content.AppendLine("<thead>"); content.AppendLine("<tr>"); content.AppendLine("<th>a-append</th>"); content.AppendLine("<th>b-append</th>"); content.AppendLine("</tr>"); content.AppendLine("</thead>"); content.AppendLine("<tbody>"); content.AppendLine("<tr>"); content.AppendLine("<td>0-append</td>"); content.AppendLine("<td>0-append</td>"); content.AppendLine("</tr>"); content.AppendLine("</tbody>"); content.Append("</table>"); Assert.Equal(content.ToString(), result); } ``` Here is what my internal implementation looks like. ```c# public string TransformMarkdown(string markdown, Func<string, string> replacement) { var pipeline = new MarkdownPipelineBuilder() .UseAdvancedExtensions() .Build(); var document = Markdig.Markdown.Parse(markdown, pipeline); // Do a fake rendering that replaces content. var replacementRenderer = new TextReplacementRenderer(TextWriter.Null, replacement); pipeline.Setup(replacementRenderer); replacementRenderer.Render(document); using (var stringWriter = new StringWriter()) { var htmlRenderer = new HtmlRenderer(stringWriter); pipeline.Setup(htmlRenderer); htmlRenderer.Render(document); stringWriter.Flush(); return stringWriter.ToString().TrimEnd(Environment.NewLine.ToCharArray()); } } ``` The key part in the above code is my custom ```TextReplacementRenderer```. ```c# public class TextReplacementRenderer : TextRendererBase<TextReplacementRenderer> { public TextReplacementRenderer(TextWriter writer, Func<string, string> replacement) : base(writer) { ObjectRenderers.Add(new CustomParagraphRenderer()); ObjectRenderers.Add(new CustomLiteralInlineRenderer(replacement)); } class CustomParagraphRenderer : MarkdownObjectRenderer<TextReplacementRenderer, ParagraphBlock> { protected override void Write(TextReplacementRenderer renderer, ParagraphBlock obj) { renderer.WriteLeafInline(obj); } } class CustomLiteralInlineRenderer : MarkdownObjectRenderer<TextReplacementRenderer, LiteralInline> { readonly Func<string, string> _replacement; public CustomLiteralInlineRenderer(Func<string, string> replacement) { _replacement = replacement; } protected override void Write(TextReplacementRenderer renderer, LiteralInline obj) { obj.Content = new StringSlice(_replacement(obj.Content.ToString())); } } } ``` The main thing I'm worried about is that the start/end/position properties used all over the place would be invalid. Should I be? Is this approach ok?
Author
Owner

@pauldotknopf commented on GitHub (May 6, 2018):

Ok, so I've done some more research, and here are my thoughts. Forgive me for thinking out-loud on a public forum. Any input would be appreciated!

I've implemented a custom renderer that just outputs the the syntax in an easy-to-read structure.

Markdown:

Test paragraph *test* with some em
testt

a | b");
"- | -");
0 | 0");

<div class=\"test\"><img src=\"test.jpg\" /></div>

And [this](/somewhere) is a ~~link~~.
And a new line without a space

Syntax:

<?xml version="1.0" encoding="UTF-8"?>
<MarkdownDocument>
   <ParagraphBlock>
      <ContainerInline>
         <LiteralInline>Test paragraph</LiteralInline>
         <EmphasisInline>
            <LiteralInline>test</LiteralInline>
         </EmphasisInline>
         <LiteralInline>with some em</LiteralInline>
         <LineBreakInline>LineBreakInline</LineBreakInline>
         <LiteralInline>testt</LiteralInline>
      </ContainerInline>
   </ParagraphBlock>
   <Table>
      <TableRow>
         <TableCell>
            <ParagraphBlock>
               <ContainerInline>
                  <LiteralInline>a</LiteralInline>
               </ContainerInline>
            </ParagraphBlock>
         </TableCell>
         <TableCell>
            <ParagraphBlock>
               <ContainerInline>
                  <LiteralInline>b</LiteralInline>
               </ContainerInline>
            </ParagraphBlock>
         </TableCell>
      </TableRow>
      <TableRow>
         <TableCell>
            <ParagraphBlock>
               <ContainerInline>
                  <LiteralInline>0</LiteralInline>
               </ContainerInline>
            </ParagraphBlock>
         </TableCell>
         <TableCell>
            <ParagraphBlock>
               <ContainerInline>
                  <LiteralInline>0</LiteralInline>
               </ContainerInline>
            </ParagraphBlock>
         </TableCell>
      </TableRow>
   </Table>
   <HtmlBlock />
   <ParagraphBlock>
      <ContainerInline>
         <LiteralInline>And</LiteralInline>
         <LinkInline>
            <LiteralInline>this</LiteralInline>
         </LinkInline>
         <LiteralInline>is a</LiteralInline>
         <EmphasisInline>
            <LiteralInline>link</LiteralInline>
         </EmphasisInline>
         <LiteralInline>.</LiteralInline>
         <LineBreakInline>LineBreakInline</LineBreakInline>
         <LiteralInline>And a new line without a space</LiteralInline>
      </ContainerInline>
   </ParagraphBlock>
</MarkdownDocument>

I think i've identified the type that will have it's raw contents translated, which is a ContainerInline. I will have a custom renderer that will simple pass the raw markdown through to the new content, but for ContainerInline, the content will be translated.

@pauldotknopf commented on GitHub (May 6, 2018): Ok, so I've done some more research, and here are my thoughts. Forgive me for thinking out-loud on a public forum. Any input would be appreciated! I've implemented a custom renderer that just outputs the the syntax in an easy-to-read structure. **Markdown**: ``` Test paragraph *test* with some em testt a | b"); "- | -"); 0 | 0"); <div class=\"test\"><img src=\"test.jpg\" /></div> And [this](/somewhere) is a ~~link~~. And a new line without a space ``` **Syntax**: ``` <?xml version="1.0" encoding="UTF-8"?> <MarkdownDocument> <ParagraphBlock> <ContainerInline> <LiteralInline>Test paragraph</LiteralInline> <EmphasisInline> <LiteralInline>test</LiteralInline> </EmphasisInline> <LiteralInline>with some em</LiteralInline> <LineBreakInline>LineBreakInline</LineBreakInline> <LiteralInline>testt</LiteralInline> </ContainerInline> </ParagraphBlock> <Table> <TableRow> <TableCell> <ParagraphBlock> <ContainerInline> <LiteralInline>a</LiteralInline> </ContainerInline> </ParagraphBlock> </TableCell> <TableCell> <ParagraphBlock> <ContainerInline> <LiteralInline>b</LiteralInline> </ContainerInline> </ParagraphBlock> </TableCell> </TableRow> <TableRow> <TableCell> <ParagraphBlock> <ContainerInline> <LiteralInline>0</LiteralInline> </ContainerInline> </ParagraphBlock> </TableCell> <TableCell> <ParagraphBlock> <ContainerInline> <LiteralInline>0</LiteralInline> </ContainerInline> </ParagraphBlock> </TableCell> </TableRow> </Table> <HtmlBlock /> <ParagraphBlock> <ContainerInline> <LiteralInline>And</LiteralInline> <LinkInline> <LiteralInline>this</LiteralInline> </LinkInline> <LiteralInline>is a</LiteralInline> <EmphasisInline> <LiteralInline>link</LiteralInline> </EmphasisInline> <LiteralInline>.</LiteralInline> <LineBreakInline>LineBreakInline</LineBreakInline> <LiteralInline>And a new line without a space</LiteralInline> </ContainerInline> </ParagraphBlock> </MarkdownDocument> ``` I think i've identified the type that will have it's raw contents translated, which is a ```ContainerInline```. I will have a custom renderer that will simple pass the raw markdown through to the new content, but for ```ContainerInline```, the content will be translated.
Author
Owner

@pauldotknopf commented on GitHub (May 6, 2018):

@xoofx, can ContainerInline be nested? I'm wondering if I need to support translations with This is {0} a translation where {0} is replaced with the inner ContainerInline.

@pauldotknopf commented on GitHub (May 6, 2018): @xoofx, can ```ContainerInline``` be nested? I'm wondering if I need to support translations with ```This is {0} a translation``` where ```{0}``` is replaced with the inner ```ContainerInline```.
Author
Owner

@pauldotknopf commented on GitHub (May 6, 2018):

I created a custom ContainerInlineRenderer that simply passes through the raw inner markdown. My translation engine will eventually be translating this value, as is.

public class ContainerInlineRenderer : MarkdownObjectRenderer<CustomRenderer, ContainerInline>
{
    protected override void Write(DebugOutputRenderer renderer, ContainerInline obj)
    {
        var start = obj.FirstChild.Span.Start;
        var length = obj.LastChild.Span.End + 1 - start;
        renderer.Write(renderer.OriginalMarkdown.Substring(start, length));
    }
}

My new markdown syntax:

<?xml version="1.0" encoding="UTF-8"?>
<MarkdownDocument>
   <ParagraphBlock>
      <ContainerInline>Test paragraph *test* with some em
testt</ContainerInline>
   </ParagraphBlock>
   <Table>
      <TableRow>
         <TableCell>
            <ParagraphBlock>
               <ContainerInline>a</ContainerInline>
            </ParagraphBlock>
         </TableCell>
         <TableCell>
            <ParagraphBlock>
               <ContainerInline>b</ContainerInline>
            </ParagraphBlock>
         </TableCell>
      </TableRow>
      <TableRow>
         <TableCell>
            <ParagraphBlock>
               <ContainerInline>0</ContainerInline>
            </ParagraphBlock>
         </TableCell>
         <TableCell>
            <ParagraphBlock>
               <ContainerInline>0</ContainerInline>
            </ParagraphBlock>
         </TableCell>
      </TableRow>
   </Table>
   <HtmlBlock />
   <ParagraphBlock>
      <ContainerInline>And [this](/somewhere) is a ~~link~~.
And a new line without a space</ContainerInline>
   </ParagraphBlock>
</MarkdownDocument>
@pauldotknopf commented on GitHub (May 6, 2018): I created a custom ```ContainerInlineRenderer``` that simply passes through the raw inner markdown. My translation engine will eventually be translating this value, as is. ```c# public class ContainerInlineRenderer : MarkdownObjectRenderer<CustomRenderer, ContainerInline> { protected override void Write(DebugOutputRenderer renderer, ContainerInline obj) { var start = obj.FirstChild.Span.Start; var length = obj.LastChild.Span.End + 1 - start; renderer.Write(renderer.OriginalMarkdown.Substring(start, length)); } } ``` My new markdown syntax: ```xml <?xml version="1.0" encoding="UTF-8"?> <MarkdownDocument> <ParagraphBlock> <ContainerInline>Test paragraph *test* with some em testt</ContainerInline> </ParagraphBlock> <Table> <TableRow> <TableCell> <ParagraphBlock> <ContainerInline>a</ContainerInline> </ParagraphBlock> </TableCell> <TableCell> <ParagraphBlock> <ContainerInline>b</ContainerInline> </ParagraphBlock> </TableCell> </TableRow> <TableRow> <TableCell> <ParagraphBlock> <ContainerInline>0</ContainerInline> </ParagraphBlock> </TableCell> <TableCell> <ParagraphBlock> <ContainerInline>0</ContainerInline> </ParagraphBlock> </TableCell> </TableRow> </Table> <HtmlBlock /> <ParagraphBlock> <ContainerInline>And [this](/somewhere) is a ~~link~~. And a new line without a space</ContainerInline> </ParagraphBlock> </MarkdownDocument> ```
Author
Owner

@pauldotknopf commented on GitHub (May 7, 2018):

I finished the library for translating markdown files here. Looking at the tests can give you a good idea of what it does.

With that said, I have effectively answered the original question.

Thanks for listening to me :)

@pauldotknopf commented on GitHub (May 7, 2018): I finished the library for translating markdown files [here](https://github.com/pauldotknopf/markdown-translator). Looking at the [tests](https://github.com/pauldotknopf/markdown-translator/blob/develop/test/MarkdownTranslator.Tests/MarkdownTransformerTests.cs#L87) can give you a good idea of what it does. With that said, I have effectively answered the original question. Thanks for listening to me :)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/markdig#205