[PR #105] [MERGED] Treat trailing full stop after a URL as not being part of the URL #812

Open
opened 2026-01-29 14:45:54 +00:00 by claunia · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/xoofx/markdig/pull/105
Author: @MatthewRichards
Created: 3/29/2017
Status: Merged
Merged: 4/4/2017
Merged by: @xoofx

Base: masterHead: DotTerminatedUrls


📝 Commits (2)

  • 6312bc0 Treat trailing full stop after a URL as not being part of the URL
  • 3305a74 Additional test case

📊 Changes

5 files changed (+60 additions, -1 deletions)

View changed files

📝 src/Markdig.Tests/TestLinkHelper.cs (+14 -0)
📝 src/Markdig/Helpers/ICharIterator.cs (+6 -0)
📝 src/Markdig/Helpers/LinkHelper.cs (+13 -1)
📝 src/Markdig/Helpers/StringLineGroup.cs (+16 -0)
📝 src/Markdig/Helpers/StringSlice.cs (+11 -0)

📄 Description

A bare URL in a markdown file is interpreted as a URL and rendered accordingly in HTML. However, if the URL is followed by a full stop (.) that full stop is interpreted as being part of the URL. For example consider the URL www.google.com. Because there was a full stop immediately after the URL, Markdig would render this in HTML as <a href="www.google.com.">www.google.com.</a>.

This behaviour does not seem terribly unreasonable, given that the trailing dot may constitute a valid URI (see e.g. http://webmasters.stackexchange.com/questions/73934/how-can-urls-have-a-dot-at-the-end-e-g-www-bla-de). However, it's rather inconvenient, and other markdown parsers (such as GitHub's) do not behave this way - see the rendering of the www.google.com link in this comment, above.

I've had a go at implementing this "fix" in Markdig. It's my first look at this source code so I may have done something unpleasant by mistake, but all the tests pass. Basic approach:

  • Add a lookahead (PeekChar) feature to ICharIterator
  • If we find a "." while parsing a URL, peek ahead and if that "." would be the end of the URL, terminate parsing now without including the "."

I chose to implement a single-char peek function, partly because that's all we need, partly because it felt most natural in ICharIterator, and partly because although StringSlice already has an arbitrary-offset peek it looked rather fiddly to implement the equivalent in StringLineGroup.Iterator.

Feedback welcomed! I'm happy to have a go at implementing any further improvements necessary, if you agree with the basic premise but have concerns over the implementation.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/xoofx/markdig/pull/105 **Author:** [@MatthewRichards](https://github.com/MatthewRichards) **Created:** 3/29/2017 **Status:** ✅ Merged **Merged:** 4/4/2017 **Merged by:** [@xoofx](https://github.com/xoofx) **Base:** `master` ← **Head:** `DotTerminatedUrls` --- ### 📝 Commits (2) - [`6312bc0`](https://github.com/xoofx/markdig/commit/6312bc0515322c60f33d826c8eff644c051c53fa) Treat trailing full stop after a URL as not being part of the URL - [`3305a74`](https://github.com/xoofx/markdig/commit/3305a74b0601013dd9846397cbe7fb8a40f77313) Additional test case ### 📊 Changes **5 files changed** (+60 additions, -1 deletions) <details> <summary>View changed files</summary> 📝 `src/Markdig.Tests/TestLinkHelper.cs` (+14 -0) 📝 `src/Markdig/Helpers/ICharIterator.cs` (+6 -0) 📝 `src/Markdig/Helpers/LinkHelper.cs` (+13 -1) 📝 `src/Markdig/Helpers/StringLineGroup.cs` (+16 -0) 📝 `src/Markdig/Helpers/StringSlice.cs` (+11 -0) </details> ### 📄 Description A bare URL in a markdown file is interpreted as a URL and rendered accordingly in HTML. However, if the URL is followed by a full stop (.) that full stop is interpreted as being part of the URL. For example consider the URL www.google.com. Because there was a full stop immediately after the URL, Markdig would render this in HTML as `<a href="www.google.com.">www.google.com.</a>`. This behaviour does not seem terribly unreasonable, given that the trailing dot may constitute a valid URI (see e.g. http://webmasters.stackexchange.com/questions/73934/how-can-urls-have-a-dot-at-the-end-e-g-www-bla-de). However, it's rather inconvenient, and other markdown parsers (such as GitHub's) do not behave this way - see the rendering of the www.google.com link in this comment, above. I've had a go at implementing this "fix" in Markdig. It's my first look at this source code so I may have done something unpleasant by mistake, but all the tests pass. Basic approach: * Add a lookahead (`PeekChar`) feature to `ICharIterator` * If we find a "." while parsing a URL, peek ahead and if that "." would be the end of the URL, terminate parsing now without including the "." I chose to implement a single-char peek function, partly because that's all we need, partly because it felt most natural in `ICharIterator`, and partly because although `StringSlice` already has an arbitrary-offset peek it looked rather fiddly to implement the equivalent in `StringLineGroup.Iterator`. Feedback welcomed! I'm happy to have a go at implementing any further improvements necessary, if you agree with the basic premise but have concerns over the implementation. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
claunia added the pull-request label 2026-01-29 14:45:54 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/markdig#812