mirror of
https://github.com/xoofx/markdig.git
synced 2026-02-04 05:44:50 +00:00
Parsing HTML into tree nodes instead of HtmlInline or HtmlBlock objects
#413
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @KvanTTT on GitHub (Nov 9, 2020).
Consider parsing of the following string:
The parser returns
ContainerInlinewith the following children :But I want to get not just the mix of
HtmlInlineand other markdown objects but HTML tree-structure like this:The same related to
HtmlBlock:Such a feature should be included in the basic library or implemented as an external extension?
@gagahpangeran commented on GitHub (Aug 1, 2021):
I also interested with this feature.
Is there any extension to achieve this?
@MihaZupan commented on GitHub (Aug 1, 2021):
I would turn to the AngleSharp library for parsing the HtmlInline
@KvanTTT commented on GitHub (Aug 1, 2021):
@gagahpangeran I've implemented such a parser in one of my projects, see Parser and example test file that is parsed correctly. ANTLR-based lexer and parser are used for HTML. You can extract this to your project and/or convert it to extension.
@MihaZupan I tried different HTML-parsing libraries: Html Agility Pack, AngleSharp. But they work badly with HTML fragments, invalid or unknown tags. Eventually, I decided to write my own HTML lexer/parser based on ANTLR. It works fine and it's much better customizable.