Line breaks in convert to plain text #181

New Issue

claunia · 2026-01-29T14:29:39Z

claunia commented

2026-01-29 14:29:39 +00:00

Originally created by @DDAndyChen on GitHub (Jan 16, 2018).

By using Markdown.ToPlainText method, the following markdown

"## New Employee Name:\r\n* First: John \r\n* Last: Smith\r\n\r\n## Start Date:\r\n2018-02-07 \r\n\r\n## Position Title:\r\nCustoms Broker"

is converted to

"New Employee Name:\nFirst: John\nLast: SmithStart Date:2018-02-07Position Title:Customs Broker"

some line breaks are dropped, so "Smith" and "Start" become one word, the same as "2018-02-07" and "Position".
I think it the output would be better in this:

"New Employee Name:\nFirst: John\nLast: Smith\n\nStart Date:2018-02-07\n\nPosition Title:\nCustoms Broker"

that is

New Employee Name:
First: John 
Last: Smith

Start Date:
2018-02-07  

Position Title:
Customs Broker

Originally created by @DDAndyChen on GitHub (Jan 16, 2018). By using *Markdown.ToPlainText* method, the following markdown >"## New Employee Name:\r\n* First: John \r\n* Last: Smith\r\n\r\n## Start Date:\r\n2018-02-07 \r\n\r\n## Position Title:\r\nCustoms Broker" is converted to > "New Employee Name:\nFirst: John\nLast: SmithStart Date:2018-02-07Position Title:Customs Broker" some line breaks are dropped, so "Smith" and "Start" become one word, the same as "2018-02-07" and "Position". I think it the output would be better in this: > "New Employee Name:\nFirst: John\nLast: Smith\n\nStart Date:2018-02-07\n\nPosition Title:\nCustoms Broker" that is ``` New Employee Name: First: John Last: Smith Start Date: 2018-02-07 Position Title: Customs Broker ```

claunia added the bug PR Welcome! labels 2026-01-29 14:29:39 +00:00

claunia closed this issue

2026-01-29 14:29:39 +00:00

claunia commented

2026-01-29 14:29:40 +00:00

@xoofx commented on GitHub (Jan 16, 2018):

Yep, good catch, likely a bug

@xoofx commented on GitHub (Jan 16, 2018): Yep, good catch, likely a bug

claunia commented

2026-01-29 14:29:42 +00:00

@hemantkd commented on GitHub (Mar 18, 2018):

Hi @xoofx,

I'm interested in fixing this bug.
I have cloned the repository and using Visual Studio 2017.

I'm getting the following Build error:
"The specified language targets for uap10.0 is missing. Ensure correct tooling is installed.\r\nMissing File: C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\MSBuild\Microsoft\WindowsXaml\v15.0\Microsoft.Windows.UI.Xaml.CSharp.targets Markdig C:\Users\XXX\.nuget\packages\msbuild.sdk.extras\1.0.9\build\netstandard1.0\MSBuildSdkExtras.Common.targets"

Does the Solution absolutely need the UWP tooling enabled to be able to Build successfully?

@hemantkd commented on GitHub (Mar 18, 2018): Hi @xoofx, I'm interested in fixing this bug. I have cloned the repository and using Visual Studio 2017. I'm getting the following Build error: "The specified language targets for uap10.0 is missing. Ensure correct tooling is installed.\r\nMissing File: C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\MSBuild\Microsoft\WindowsXaml\v15.0\Microsoft.Windows.UI.Xaml.CSharp.targets Markdig C:\Users\XXX\\.nuget\packages\msbuild.sdk.extras\1.0.9\build\netstandard1.0\MSBuildSdkExtras.Common.targets" Does the Solution absolutely need the UWP tooling enabled to be able to Build successfully?

claunia commented

2026-01-29 14:29:42 +00:00

@xoofx commented on GitHub (Mar 18, 2018):

Does the Solution absolutely need the UWP tooling enabled to be able to Build successfully?

Yes

@xoofx commented on GitHub (Mar 18, 2018): > Does the Solution absolutely need the UWP tooling enabled to be able to Build successfully? Yes

claunia commented

2026-01-29 14:29:43 +00:00

@hemantkd commented on GitHub (Mar 22, 2018):

Hi @xoofx,

Thanks for earlier.

After spending some time and having to manually install the UAP version required by the project using the Visual Studio Installer, finally managed to build the solution successfully.

Spent some time looking at the code and common mark spec too.

[Removed the test scenario from the comment as I have realised that it was incorrect]

@hemantkd commented on GitHub (Mar 22, 2018): Hi @xoofx, Thanks for earlier. After spending some time and having to manually install the UAP version required by the project using the Visual Studio Installer, finally managed to build the solution successfully. Spent some time looking at the code and common mark spec too. _[Removed the test scenario from the comment as I have realised that it was incorrect]_

claunia commented

2026-01-29 14:29:43 +00:00

@xavierdecoster commented on GitHub (Oct 5, 2018):

Any plans on fixing this bug? :) Or is there a work-around?

@xavierdecoster commented on GitHub (Oct 5, 2018): Any plans on fixing this bug? :) Or is there a work-around?

claunia commented

2026-01-29 14:29:44 +00:00

@xoofx commented on GitHub (Oct 6, 2018):

Any plans on fixing this bug? :) Or is there a work-around?

No, I don't have any spare time left, PR welcome

@xoofx commented on GitHub (Oct 6, 2018): > Any plans on fixing this bug? :) Or is there a work-around? No, I don't have any spare time left, PR welcome

claunia commented

2026-01-29 14:29:45 +00:00

@xavierdecoster commented on GitHub (Oct 8, 2018):

Sorry to hear that.

It's easily reproducible using the following test (the top one for HTML is already there, I just replicated it for Markdown.ToPlainText() to reproduce the issue):

    [TestFixture]
    public class TestConfigureNewLine
    {
        [Test]
        [TestCase(/* newLineForWriting: */ "\n",   /* markdownText: */ "*1*\n*2*\n",     /* expected: */ "<p><em>1</em>\n<em>2</em></p>\n")]
        [TestCase(/* newLineForWriting: */ "\n",   /* markdownText: */ "*1*\r\n*2*\r\n", /* expected: */ "<p><em>1</em>\n<em>2</em></p>\n")]
        [TestCase(/* newLineForWriting: */ "\r\n", /* markdownText: */ "*1*\n*2*\n",     /* expected: */ "<p><em>1</em>\r\n<em>2</em></p>\r\n")]
        [TestCase(/* newLineForWriting: */ "\r\n", /* markdownText: */ "*1*\r\n*2*\r\n", /* expected: */ "<p><em>1</em>\r\n<em>2</em></p>\r\n")]
        [TestCase(/* newLineForWriting: */ "!!!" , /* markdownText: */ "*1*\n*2*\n",     /* expected: */ "<p><em>1</em>!!!<em>2</em></p>!!!")]
        [TestCase(/* newLineForWriting: */ "!!!" , /* markdownText: */ "*1*\r\n*2*\r\n", /* expected: */ "<p><em>1</em>!!!<em>2</em></p>!!!")]
        public void TestHtmlOutputWhenConfiguringNewLine(string newLineForWriting, string markdownText, string expected)
        {
            var pipeline = new MarkdownPipelineBuilder()
                .ConfigureNewLine(newLineForWriting)
                .Build();

            var actual = Markdown.ToHtml(markdownText, pipeline);
            Assert.AreEqual(expected, actual);
        }

        [Test]
        [TestCase(/* newLineForWriting: */ "\n",   /* markdownText: */ "*1*\n*2*\n",     /* expected: */ "1\n2\n")]
        [TestCase(/* newLineForWriting: */ "\n",   /* markdownText: */ "*1*\r\n*2*\r\n", /* expected: */ "1\n2\n")]
        [TestCase(/* newLineForWriting: */ "\r\n", /* markdownText: */ "*1*\n*2*\n",     /* expected: */ "1\r\n2\r\n")]
        [TestCase(/* newLineForWriting: */ "\r\n", /* markdownText: */ "*1*\r\n*2*\r\n", /* expected: */ "1\r\n2\r\n")]
        [TestCase(/* newLineForWriting: */ "!!!", /* markdownText: */ "*1*\n*2*\n",     /* expected: */ "1!!!2!!!")]
        [TestCase(/* newLineForWriting: */ "!!!", /* markdownText: */ "*1*\r\n*2*\r\n", /* expected: */ "1!!!2!!!")]
        public void TestPlainTextOutputWhenConfiguringNewLine(string newLineForWriting, string markdownText, string expected)
        {
            var pipeline = new MarkdownPipelineBuilder()
                .ConfigureNewLine(newLineForWriting)
                .Build();

            var actual = Markdown.ToPlainText(markdownText, pipeline);
            Assert.AreEqual(expected, actual);
        }
    }

You'll notice this new test fails.

I noticed it's parsing a single ParagraphBlock, and the LineReader parses 2 lines (*1* and *2*), whilst dropping/ignoring the CRLF characters at the end. (LineReader.cs ln 55)

I've no idea why the last occurrence of the new line character is discarded when parsing to plain text, whereas it's not when parsing to html... Trying to spot the difference.

If you have any idea/pointers where to look at, please share :)

@xavierdecoster commented on GitHub (Oct 8, 2018): Sorry to hear that. It's easily reproducible using the following test (the top one for HTML is already there, I just replicated it for `Markdown.ToPlainText()` to reproduce the issue): ``` [TestFixture] public class TestConfigureNewLine { [Test] [TestCase(/* newLineForWriting: */ "\n", /* markdownText: */ "*1*\n*2*\n", /* expected: */ "1\n2\n")] [TestCase(/* newLineForWriting: */ "\n", /* markdownText: */ "*1*\r\n*2*\r\n", /* expected: */ "1\n2\n")] [TestCase(/* newLineForWriting: */ "\r\n", /* markdownText: */ "*1*\n*2*\n", /* expected: */ "1\r\n2\r\n")] [TestCase(/* newLineForWriting: */ "\r\n", /* markdownText: */ "*1*\r\n*2*\r\n", /* expected: */ "1\r\n2\r\n")] [TestCase(/* newLineForWriting: */ "!!!" , /* markdownText: */ "*1*\n*2*\n", /* expected: */ "1!!!2!!!")] [TestCase(/* newLineForWriting: */ "!!!" , /* markdownText: */ "*1*\r\n*2*\r\n", /* expected: */ "1!!!2!!!")] public void TestHtmlOutputWhenConfiguringNewLine(string newLineForWriting, string markdownText, string expected) { var pipeline = new MarkdownPipelineBuilder() .ConfigureNewLine(newLineForWriting) .Build(); var actual = Markdown.ToHtml(markdownText, pipeline); Assert.AreEqual(expected, actual); } [Test] [TestCase(/* newLineForWriting: */ "\n", /* markdownText: */ "*1*\n*2*\n", /* expected: */ "1\n2\n")] [TestCase(/* newLineForWriting: */ "\n", /* markdownText: */ "*1*\r\n*2*\r\n", /* expected: */ "1\n2\n")] [TestCase(/* newLineForWriting: */ "\r\n", /* markdownText: */ "*1*\n*2*\n", /* expected: */ "1\r\n2\r\n")] [TestCase(/* newLineForWriting: */ "\r\n", /* markdownText: */ "*1*\r\n*2*\r\n", /* expected: */ "1\r\n2\r\n")] [TestCase(/* newLineForWriting: */ "!!!", /* markdownText: */ "*1*\n*2*\n", /* expected: */ "1!!!2!!!")] [TestCase(/* newLineForWriting: */ "!!!", /* markdownText: */ "*1*\r\n*2*\r\n", /* expected: */ "1!!!2!!!")] public void TestPlainTextOutputWhenConfiguringNewLine(string newLineForWriting, string markdownText, string expected) { var pipeline = new MarkdownPipelineBuilder() .ConfigureNewLine(newLineForWriting) .Build(); var actual = Markdown.ToPlainText(markdownText, pipeline); Assert.AreEqual(expected, actual); } } ``` You'll notice this new test fails. I noticed it's parsing a single `ParagraphBlock`, and the `LineReader` parses 2 lines (`*1*` and `*2*`), whilst dropping/ignoring the CRLF characters at the end. ([`LineReader.cs` ln 55](https://github.com/lunet-io/markdig/blob/427dc849f7194ba55e09825eb3cdb3168f7eefc1/src/Markdig/Helpers/LineReader.cs#L55)) I've no idea why the last occurrence of the *new line character* is discarded when parsing to plain text, whereas it's not when parsing to html... Trying to spot the difference. If you have any idea/pointers where to look at, please share :)

claunia commented

2026-01-29 14:29:46 +00:00

@xoofx commented on GitHub (Oct 8, 2018):

The problem is not in the parser but in the renderer. The HTML renderer is used today to render as a text. There are likely a few places in the code where we don't output newline, while the HTML doesn't care, the text version would care. I don't like the idea of using the HTML renderer to output plain text, but for some that was the quickest solution. I don't think it is much work in the codebase, but you need to dig into it (HtmlRenderer is quite simple, so it should not be difficult)

@xoofx commented on GitHub (Oct 8, 2018): The problem is not in the parser but in the renderer. The HTML renderer is used today to render as a text. There are likely a few places in the code where we don't output newline, while the HTML doesn't care, the text version would care. I don't like the idea of using the HTML renderer to output plain text, but for some that was the quickest solution. I don't think it is much work in the codebase, but you need to dig into it (HtmlRenderer is quite simple, so it should not be difficult)

claunia commented

2026-01-29 14:29:46 +00:00

@neilha commented on GitHub (Nov 27, 2018):

Hi @xoofx - thanks for merging pull request. Do you have any timelines for publishing a new version to nuget? Thanks.

@neilha commented on GitHub (Nov 27, 2018): Hi @xoofx - thanks for merging pull request. Do you have any timelines for publishing a new version to nuget? Thanks.

claunia commented

2026-01-29 14:29:47 +00:00

@bstoked commented on GitHub (Jan 14, 2020):

Wondering if this fix has been published to nuget yet? On a project that definitely needs it. :) Thanks.

@bstoked commented on GitHub (Jan 14, 2020): Wondering if this fix has been published to nuget yet? On a project that definitely needs it. :) Thanks.

claunia commented

2026-01-29 14:29:47 +00:00

@MihaZupan commented on GitHub (Jan 14, 2020):

Last NuGet is from 3 months ago so definitely yes.

@MihaZupan commented on GitHub (Jan 14, 2020): Last NuGet is from 3 months ago so definitely yes.

Sign in to join this conversation.

Branches Tags

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: starred/markdig#181