No-ascii characters should be supported in AutoLinkInline #278

Closed
opened 2026-01-29 14:32:36 +00:00 by claunia · 5 comments
Owner

Originally created by @OpportunityLiu on GitHub (Mar 7, 2019).

https://babelmark.github.io/?text=%3Chttp%3A%2F%2F%E2%98%83.net%3F%E2%98%83%3E

Input:

<http://☃.net?☃>

Expacted:

<p>
  <a href="http://xn--n3h.net?%E2%98%83">http://☃.net?☃</a>
</p>
Originally created by @OpportunityLiu on GitHub (Mar 7, 2019). https://babelmark.github.io/?text=%3Chttp%3A%2F%2F%E2%98%83.net%3F%E2%98%83%3E Input: ```md <http://☃.net?☃> ``` Expacted: ```html <p> <a href="http://xn--n3h.net?%E2%98%83">http://☃.net?☃</a> </p> ```
claunia added the bugPR Welcome! labels 2026-01-29 14:32:36 +00:00
Author
Owner

@OpportunityLiu commented on GitHub (Mar 7, 2019):

as well as AutoLink

https://babelmark.github.io/?text=http%3A%2F%2F%E2%98%83.net%3F%E2%98%83+https%3A%2F%2Fcommonmark.org%2F

@OpportunityLiu commented on GitHub (Mar 7, 2019): as well as AutoLink https://babelmark.github.io/?text=http%3A%2F%2F%E2%98%83.net%3F%E2%98%83+https%3A%2F%2Fcommonmark.org%2F
Author
Owner

@xoofx commented on GitHub (Mar 7, 2019):

Good catch, PR welcome!

@xoofx commented on GitHub (Mar 7, 2019): Good catch, PR welcome!
Author
Owner

@MihaZupan commented on GitHub (Mar 7, 2019):

I implemented the IsValidDomain function according to the GFM spec.

A valid domain consists of alphanumeric characters, underscores (_), hyphens (-) and periods (.). There must be at least one period, and no underscores may be present in the last two segments of the domain.

That said, it seems that even GFM parses domain names like http://☃.net?☃ into <a href="http://%E2%98%83.net?%E2%98%83" rel="nofollow">http://☃.net?☃</a>.

@xoofx In reference to #252, should the test be changed from alphanumeric characters into is not ascii punctuation (with exceptions), as for example the snowman emoji falls under "other symbols", not alphanumerics?

Also as far as encoding is concerned, do you prefer we encode such domain names using IDNA ("xn--") or percent escaped (the majority of other parsers seem to be using percent escaping, like Markdig does now)?

@MihaZupan commented on GitHub (Mar 7, 2019): I implemented the `IsValidDomain` function according to [the GFM spec](https://github.github.com/gfm/#extended-www-autolink). > A valid domain consists of alphanumeric characters, underscores (_), hyphens (-) and periods (.). There must be at least one period, and no underscores may be present in the last two segments of the domain. That said, it seems that even GFM parses domain names like `http://☃.net?☃` into `<a href="http://%E2%98%83.net?%E2%98%83" rel="nofollow">http://☃.net?☃</a>`. @xoofx In reference to #252, should the test be changed from `alphanumeric characters` into `is not ascii punctuation (with exceptions)`, as for example the snowman emoji falls under "other symbols", not alphanumerics? Also as far as encoding is concerned, do you prefer we encode such domain names using IDNA ("xn--") or percent escaped (the majority of other parsers seem to be using percent escaping, like Markdig does now)?
Author
Owner

@OpportunityLiu commented on GitHub (Mar 8, 2019):

Although most of browsers could handle it, the correct result is <a href="http://xn--n3h.net?%E2%98%83">http://☃.net?☃</a>

https://en.wikipedia.org/wiki/Punycode

@OpportunityLiu commented on GitHub (Mar 8, 2019): Although most of browsers could handle it, the correct result is `<a href="http://xn--n3h.net?%E2%98%83">http://☃.net?☃</a>` https://en.wikipedia.org/wiki/Punycode
Author
Owner

@MihaZupan commented on GitHub (Mar 9, 2019):

I've implemented the changes needed for this, there is one minor problem tho.
IdnMapping class is not available for netstandard1.1 or for portable.
Currently I've put it in a # if region, where those two versions revert to percent-escaping.
@xoofx Would you be okay with non-consistent behavior for those legacy platforms in this case?

@MihaZupan commented on GitHub (Mar 9, 2019): I've implemented the changes needed for this, there is one minor problem tho. `IdnMapping` class is not available for netstandard1.1 or for portable. Currently I've put it in a `# if` region, where those two versions revert to percent-escaping. @xoofx Would you be okay with non-consistent behavior for those legacy platforms in this case?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/markdig#278