Line breaks in convert to plain text #181

Closed
opened 2026-01-29 14:29:39 +00:00 by claunia · 11 comments
Owner

Originally created by @DDAndyChen on GitHub (Jan 16, 2018).

By using Markdown.ToPlainText method, the following markdown

"## New Employee Name:\r\n* First: John \r\n* Last: Smith\r\n\r\n## Start Date:\r\n2018-02-07 \r\n\r\n## Position Title:\r\nCustoms Broker"

is converted to

"New Employee Name:\nFirst: John\nLast: SmithStart Date:2018-02-07Position Title:Customs Broker"

some line breaks are dropped, so "Smith" and "Start" become one word, the same as "2018-02-07" and "Position".
I think it the output would be better in this:

"New Employee Name:\nFirst: John\nLast: Smith\n\nStart Date:2018-02-07\n\nPosition Title:\nCustoms Broker"

that is

New Employee Name:
First: John 
Last: Smith

Start Date:
2018-02-07  

Position Title:
Customs Broker
Originally created by @DDAndyChen on GitHub (Jan 16, 2018). By using *Markdown.ToPlainText* method, the following markdown >"## New Employee Name:\r\n* First: John \r\n* Last: Smith\r\n\r\n## Start Date:\r\n2018-02-07 \r\n\r\n## Position Title:\r\nCustoms Broker" is converted to > "New Employee Name:\nFirst: John\nLast: SmithStart Date:2018-02-07Position Title:Customs Broker" some line breaks are dropped, so "Smith" and "Start" become one word, the same as "2018-02-07" and "Position". I think it the output would be better in this: > "New Employee Name:\nFirst: John\nLast: Smith\n\nStart Date:2018-02-07\n\nPosition Title:\nCustoms Broker" that is ``` New Employee Name: First: John Last: Smith Start Date: 2018-02-07 Position Title: Customs Broker ```
claunia added the bugPR Welcome! labels 2026-01-29 14:29:39 +00:00
Author
Owner

@xoofx commented on GitHub (Jan 16, 2018):

Yep, good catch, likely a bug

@xoofx commented on GitHub (Jan 16, 2018): Yep, good catch, likely a bug
Author
Owner

@hemantkd commented on GitHub (Mar 18, 2018):

Hi @xoofx,

I'm interested in fixing this bug.
I have cloned the repository and using Visual Studio 2017.

I'm getting the following Build error:
"The specified language targets for uap10.0 is missing. Ensure correct tooling is installed.\r\nMissing File: C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\MSBuild\Microsoft\WindowsXaml\v15.0\Microsoft.Windows.UI.Xaml.CSharp.targets Markdig C:\Users\XXX\.nuget\packages\msbuild.sdk.extras\1.0.9\build\netstandard1.0\MSBuildSdkExtras.Common.targets"

Does the Solution absolutely need the UWP tooling enabled to be able to Build successfully?

@hemantkd commented on GitHub (Mar 18, 2018): Hi @xoofx, I'm interested in fixing this bug. I have cloned the repository and using Visual Studio 2017. I'm getting the following Build error: "The specified language targets for uap10.0 is missing. Ensure correct tooling is installed.\r\nMissing File: C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\MSBuild\Microsoft\WindowsXaml\v15.0\Microsoft.Windows.UI.Xaml.CSharp.targets Markdig C:\Users\XXX\\.nuget\packages\msbuild.sdk.extras\1.0.9\build\netstandard1.0\MSBuildSdkExtras.Common.targets" Does the Solution absolutely need the UWP tooling enabled to be able to Build successfully?
Author
Owner

@xoofx commented on GitHub (Mar 18, 2018):

Does the Solution absolutely need the UWP tooling enabled to be able to Build successfully?

Yes

@xoofx commented on GitHub (Mar 18, 2018): > Does the Solution absolutely need the UWP tooling enabled to be able to Build successfully? Yes
Author
Owner

@hemantkd commented on GitHub (Mar 22, 2018):

Hi @xoofx,

Thanks for earlier.

After spending some time and having to manually install the UAP version required by the project using the Visual Studio Installer, finally managed to build the solution successfully.

Spent some time looking at the code and common mark spec too.

[Removed the test scenario from the comment as I have realised that it was incorrect]

@hemantkd commented on GitHub (Mar 22, 2018): Hi @xoofx, Thanks for earlier. After spending some time and having to manually install the UAP version required by the project using the Visual Studio Installer, finally managed to build the solution successfully. Spent some time looking at the code and common mark spec too. _[Removed the test scenario from the comment as I have realised that it was incorrect]_
Author
Owner

@xavierdecoster commented on GitHub (Oct 5, 2018):

Any plans on fixing this bug? :) Or is there a work-around?

@xavierdecoster commented on GitHub (Oct 5, 2018): Any plans on fixing this bug? :) Or is there a work-around?
Author
Owner

@xoofx commented on GitHub (Oct 6, 2018):

Any plans on fixing this bug? :) Or is there a work-around?

No, I don't have any spare time left, PR welcome

@xoofx commented on GitHub (Oct 6, 2018): > Any plans on fixing this bug? :) Or is there a work-around? No, I don't have any spare time left, PR welcome
Author
Owner

@xavierdecoster commented on GitHub (Oct 8, 2018):

Sorry to hear that.

It's easily reproducible using the following test (the top one for HTML is already there, I just replicated it for Markdown.ToPlainText() to reproduce the issue):

    [TestFixture]
    public class TestConfigureNewLine
    {
        [Test]
        [TestCase(/* newLineForWriting: */ "\n",   /* markdownText: */ "*1*\n*2*\n",     /* expected: */ "<p><em>1</em>\n<em>2</em></p>\n")]
        [TestCase(/* newLineForWriting: */ "\n",   /* markdownText: */ "*1*\r\n*2*\r\n", /* expected: */ "<p><em>1</em>\n<em>2</em></p>\n")]
        [TestCase(/* newLineForWriting: */ "\r\n", /* markdownText: */ "*1*\n*2*\n",     /* expected: */ "<p><em>1</em>\r\n<em>2</em></p>\r\n")]
        [TestCase(/* newLineForWriting: */ "\r\n", /* markdownText: */ "*1*\r\n*2*\r\n", /* expected: */ "<p><em>1</em>\r\n<em>2</em></p>\r\n")]
        [TestCase(/* newLineForWriting: */ "!!!" , /* markdownText: */ "*1*\n*2*\n",     /* expected: */ "<p><em>1</em>!!!<em>2</em></p>!!!")]
        [TestCase(/* newLineForWriting: */ "!!!" , /* markdownText: */ "*1*\r\n*2*\r\n", /* expected: */ "<p><em>1</em>!!!<em>2</em></p>!!!")]
        public void TestHtmlOutputWhenConfiguringNewLine(string newLineForWriting, string markdownText, string expected)
        {
            var pipeline = new MarkdownPipelineBuilder()
                .ConfigureNewLine(newLineForWriting)
                .Build();

            var actual = Markdown.ToHtml(markdownText, pipeline);
            Assert.AreEqual(expected, actual);
        }

        [Test]
        [TestCase(/* newLineForWriting: */ "\n",   /* markdownText: */ "*1*\n*2*\n",     /* expected: */ "1\n2\n")]
        [TestCase(/* newLineForWriting: */ "\n",   /* markdownText: */ "*1*\r\n*2*\r\n", /* expected: */ "1\n2\n")]
        [TestCase(/* newLineForWriting: */ "\r\n", /* markdownText: */ "*1*\n*2*\n",     /* expected: */ "1\r\n2\r\n")]
        [TestCase(/* newLineForWriting: */ "\r\n", /* markdownText: */ "*1*\r\n*2*\r\n", /* expected: */ "1\r\n2\r\n")]
        [TestCase(/* newLineForWriting: */ "!!!", /* markdownText: */ "*1*\n*2*\n",     /* expected: */ "1!!!2!!!")]
        [TestCase(/* newLineForWriting: */ "!!!", /* markdownText: */ "*1*\r\n*2*\r\n", /* expected: */ "1!!!2!!!")]
        public void TestPlainTextOutputWhenConfiguringNewLine(string newLineForWriting, string markdownText, string expected)
        {
            var pipeline = new MarkdownPipelineBuilder()
                .ConfigureNewLine(newLineForWriting)
                .Build();

            var actual = Markdown.ToPlainText(markdownText, pipeline);
            Assert.AreEqual(expected, actual);
        }
    }

You'll notice this new test fails.

I noticed it's parsing a single ParagraphBlock, and the LineReader parses 2 lines (*1* and *2*), whilst dropping/ignoring the CRLF characters at the end. (LineReader.cs ln 55)

I've no idea why the last occurrence of the new line character is discarded when parsing to plain text, whereas it's not when parsing to html... Trying to spot the difference.

If you have any idea/pointers where to look at, please share :)

@xavierdecoster commented on GitHub (Oct 8, 2018): Sorry to hear that. It's easily reproducible using the following test (the top one for HTML is already there, I just replicated it for `Markdown.ToPlainText()` to reproduce the issue): ``` [TestFixture] public class TestConfigureNewLine { [Test] [TestCase(/* newLineForWriting: */ "\n", /* markdownText: */ "*1*\n*2*\n", /* expected: */ "<p><em>1</em>\n<em>2</em></p>\n")] [TestCase(/* newLineForWriting: */ "\n", /* markdownText: */ "*1*\r\n*2*\r\n", /* expected: */ "<p><em>1</em>\n<em>2</em></p>\n")] [TestCase(/* newLineForWriting: */ "\r\n", /* markdownText: */ "*1*\n*2*\n", /* expected: */ "<p><em>1</em>\r\n<em>2</em></p>\r\n")] [TestCase(/* newLineForWriting: */ "\r\n", /* markdownText: */ "*1*\r\n*2*\r\n", /* expected: */ "<p><em>1</em>\r\n<em>2</em></p>\r\n")] [TestCase(/* newLineForWriting: */ "!!!" , /* markdownText: */ "*1*\n*2*\n", /* expected: */ "<p><em>1</em>!!!<em>2</em></p>!!!")] [TestCase(/* newLineForWriting: */ "!!!" , /* markdownText: */ "*1*\r\n*2*\r\n", /* expected: */ "<p><em>1</em>!!!<em>2</em></p>!!!")] public void TestHtmlOutputWhenConfiguringNewLine(string newLineForWriting, string markdownText, string expected) { var pipeline = new MarkdownPipelineBuilder() .ConfigureNewLine(newLineForWriting) .Build(); var actual = Markdown.ToHtml(markdownText, pipeline); Assert.AreEqual(expected, actual); } [Test] [TestCase(/* newLineForWriting: */ "\n", /* markdownText: */ "*1*\n*2*\n", /* expected: */ "1\n2\n")] [TestCase(/* newLineForWriting: */ "\n", /* markdownText: */ "*1*\r\n*2*\r\n", /* expected: */ "1\n2\n")] [TestCase(/* newLineForWriting: */ "\r\n", /* markdownText: */ "*1*\n*2*\n", /* expected: */ "1\r\n2\r\n")] [TestCase(/* newLineForWriting: */ "\r\n", /* markdownText: */ "*1*\r\n*2*\r\n", /* expected: */ "1\r\n2\r\n")] [TestCase(/* newLineForWriting: */ "!!!", /* markdownText: */ "*1*\n*2*\n", /* expected: */ "1!!!2!!!")] [TestCase(/* newLineForWriting: */ "!!!", /* markdownText: */ "*1*\r\n*2*\r\n", /* expected: */ "1!!!2!!!")] public void TestPlainTextOutputWhenConfiguringNewLine(string newLineForWriting, string markdownText, string expected) { var pipeline = new MarkdownPipelineBuilder() .ConfigureNewLine(newLineForWriting) .Build(); var actual = Markdown.ToPlainText(markdownText, pipeline); Assert.AreEqual(expected, actual); } } ``` You'll notice this new test fails. I noticed it's parsing a single `ParagraphBlock`, and the `LineReader` parses 2 lines (`*1*` and `*2*`), whilst dropping/ignoring the CRLF characters at the end. ([`LineReader.cs` ln 55](https://github.com/lunet-io/markdig/blob/427dc849f7194ba55e09825eb3cdb3168f7eefc1/src/Markdig/Helpers/LineReader.cs#L55)) I've no idea why the last occurrence of the *new line character* is discarded when parsing to plain text, whereas it's not when parsing to html... Trying to spot the difference. If you have any idea/pointers where to look at, please share :)
Author
Owner

@xoofx commented on GitHub (Oct 8, 2018):

The problem is not in the parser but in the renderer. The HTML renderer is used today to render as a text. There are likely a few places in the code where we don't output newline, while the HTML doesn't care, the text version would care. I don't like the idea of using the HTML renderer to output plain text, but for some that was the quickest solution. I don't think it is much work in the codebase, but you need to dig into it (HtmlRenderer is quite simple, so it should not be difficult)

@xoofx commented on GitHub (Oct 8, 2018): The problem is not in the parser but in the renderer. The HTML renderer is used today to render as a text. There are likely a few places in the code where we don't output newline, while the HTML doesn't care, the text version would care. I don't like the idea of using the HTML renderer to output plain text, but for some that was the quickest solution. I don't think it is much work in the codebase, but you need to dig into it (HtmlRenderer is quite simple, so it should not be difficult)
Author
Owner

@neilha commented on GitHub (Nov 27, 2018):

Hi @xoofx - thanks for merging pull request. Do you have any timelines for publishing a new version to nuget? Thanks.

@neilha commented on GitHub (Nov 27, 2018): Hi @xoofx - thanks for merging pull request. Do you have any timelines for publishing a new version to nuget? Thanks.
Author
Owner

@bstoked commented on GitHub (Jan 14, 2020):

Wondering if this fix has been published to nuget yet? On a project that definitely needs it. :) Thanks.

@bstoked commented on GitHub (Jan 14, 2020): Wondering if this fix has been published to nuget yet? On a project that definitely needs it. :) Thanks.
Author
Owner

@MihaZupan commented on GitHub (Jan 14, 2020):

Last NuGet is from 3 months ago so definitely yes.

@MihaZupan commented on GitHub (Jan 14, 2020): Last NuGet is from 3 months ago so definitely yes.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/markdig#181