mirror of
https://github.com/tenox7/wrp.git
synced 2026-05-21 15:46:40 +00:00
Feature request - Unicode UTF8 translation table #151
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @bobbimanners on GitHub (Dec 18, 2024).
I am enjoying using WRP to enable web browsing on my vintage machines. One feature I would like to see is some sort of table to all translation of commonly-encountered Unicode UTF8 byte sequences to ASCII equivalents.
For English language web readers, I note that a lot of newspapers and web pages use Unicode version of dash (em-dash, etc.), quotation marks, apostrophes etc. In my own software I have previously implemented a filter to convert a few of these common cases into ASCII equivalents.
We'll never get them all (and non-euro languages are obviously a hopeless case), but we could clean up English text very easily, and probably make French, Spanish, German etc., much easier to ready by simply omitting 'accents' / diacriticals. (Or to use German as an example, o-umlaut -> "oe").
You may well consider this out-of-scope.