Pipe characters detected as part of URL #419

Closed
opened 2026-01-29 14:36:12 +00:00 by claunia · 3 comments
Owner

Originally created by @hamvocke on GitHub (Dec 1, 2020).

Over at Stack Exchange we're seeing an issue with Markdig's link detection within pipe tables (example post here for the curious). I think it breaks down to Markdig's LinkHelper detecting pipes as part of a URL.

Let's assume I've got the following snippet (from a pipe table structure)

| a cell | https://example.com|anothercell |

Ideally, there'd be a space after the example.com URL - but sometimes users don't enter one. If Markdig's UseAutoLinks is enabled, Markdig will turn that URL above into a link like <a href="https://example.com|anothercell">. I'd expect the |anothercell bit to be excluded from the URL in that hyperlink.

If I read RFC 3986 correctly (apologies for pulling out the big guns, I always find RFCs rather unhandy - maybe this answer on Stack Overflow is more approachable), a literal | character is not a valid character in a URI (unless percent-encoded of course).

I think this unit test might reproduce the issue at hand:

[Test]
public void TestUrlWithPipes()
{
    var text = new StringSlice("http://google.com|");
    Assert.True(LinkHelper.TryParseUrl(ref text, out string link));
    Assert.AreEqual("http://google.com", link);
    Assert.AreEqual('|', text.CurrentChar);
}

Should Markdig's LinkHelper.TryParseUrl() be tweaked to exclude | characters from URLs?

Originally created by @hamvocke on GitHub (Dec 1, 2020). Over at Stack Exchange we're seeing an issue with Markdig's link detection within pipe tables (example post [here](https://meta.stackexchange.com/a/357189/507720) for the curious). I think it breaks down to Markdig's `LinkHelper` detecting pipes as part of a URL. Let's assume I've got the following snippet (from a pipe table structure) ``` | a cell | https://example.com|anothercell | ``` Ideally, there'd be a space after the `example.com` URL - but sometimes users don't enter one. If Markdig's `UseAutoLinks` is enabled, Markdig will turn that URL above into a link like `<a href="https://example.com|anothercell">`. I'd expect the `|anothercell` bit to be **excluded** from the URL in that hyperlink. If I read [RFC 3986](https://tools.ietf.org/html/rfc3986#section-2) correctly (apologies for pulling out the big guns, I always find RFCs rather unhandy - maybe [this answer on Stack Overflow](https://stackoverflow.com/a/1547940/208660) is more approachable), a literal `|` character is not a valid character in a URI (unless percent-encoded of course). I _think_ this unit test might reproduce the issue at hand: ```csharp [Test] public void TestUrlWithPipes() { var text = new StringSlice("http://google.com|"); Assert.True(LinkHelper.TryParseUrl(ref text, out string link)); Assert.AreEqual("http://google.com", link); Assert.AreEqual('|', text.CurrentChar); } ``` Should Markdig's `LinkHelper.TryParseUrl()` be tweaked to exclude `|` characters from URLs?
claunia added the question label 2026-01-29 14:36:12 +00:00
Author
Owner

@xoofx commented on GitHub (Dec 1, 2020):

Pipe in table has been a trouble design for Markdig since the beginning

So for example, on GitHub, this:

| table | header|
|-------|---------|
| a cell | https://example.com|anothercell |

Renders to this:

table header
a cell https://example.com

For which Markdig seems to follow the same rule. But if you look at how other parsers are doing on this case, there is not a clear winner...

Markdig is not fully compatible with GitHub, but overall it tries to be more compatible with it.

@xoofx commented on GitHub (Dec 1, 2020): Pipe in table has been a trouble design for Markdig [since the beginning](https://talk.commonmark.org/t/parsing-strategy-for-tables/2027) So for example, on GitHub, this: ``` | table | header| |-------|---------| | a cell | https://example.com|anothercell | ``` Renders to this: | table | header| |-------|---------| | a cell | https://example.com|anothercell | For which Markdig seems to follow the same rule. But if you look at how [other parsers are doing on this case](https://babelmark.github.io/?text=%7C+table+%7C+header%7C%0A%7C-------%7C---------%7C%0A%7C+a+cell+%7C+https%3A%2F%2Fexample.com%7Canothercell+%7C), there is not a clear winner... Markdig is not fully compatible with GitHub, but overall it tries to be more compatible with it.
Author
Owner

@hamvocke commented on GitHub (Dec 1, 2020):

Your example seems to render as I'd expect on GitHub if I include the proper amount of table headers/separators:

| table | header| one more|
|-------|---------|-------|
| a cell | https://example.com|anothercell |
table header one more
a cell https://example.com anothercell
@hamvocke commented on GitHub (Dec 1, 2020): Your example seems to render as I'd expect on GitHub if I include the proper amount of table headers/separators: ``` | table | header| one more| |-------|---------|-------| | a cell | https://example.com|anothercell | ``` | table | header| one more| |-------|---------|-------| | a cell | https://example.com|anothercell |
Author
Owner

@xoofx commented on GitHub (Dec 1, 2020):

Right, and this is were we are departing with GitHub... GitHub before using CommonMark, didn't have clear rules about it (it was an implementation detail). Markdig started to implement some "rules" before GitHub started, and had to make some choices.
GitHub did something differently (which I think is slightly explained in the thread I posted above).

Fixing Markdig is probably a good goal here, because GitHub is likely the "de-facto reference" (because used a lot).
Though I don't have personally any sparetime interest solving this issue, so a PR would be welcome.
Note that the code in Markdig for handling pipe tables is awful, I'm really not proud of it. So if someone is enough brave to challenge it, I would not mind. 😅

@xoofx commented on GitHub (Dec 1, 2020): Right, and this is were we are departing with GitHub... GitHub before using CommonMark, didn't have clear rules about it (it was an implementation detail). Markdig started to implement some "rules" before GitHub started, and had to make some choices. GitHub did something differently (which I think is slightly explained in the thread I posted above). Fixing Markdig is probably a good goal here, because GitHub is likely the "de-facto reference" (because used a lot). Though I don't have personally any sparetime interest solving this issue, so a PR would be welcome. Note that the code in Markdig for handling pipe tables is awful, I'm really not proud of it. So if someone is enough brave to challenge it, I would not mind. 😅
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/markdig#419