StreamReader #412

Closed
opened 2026-01-29 14:36:05 +00:00 by claunia · 2 comments
Owner

Originally created by @Feofilakt on GitHub (Nov 2, 2020).

Hello!
Is it make sense to use StreamReader for markdown file parsing? I've noticed that Markdig has only method Parse that takes a string. I didn't deal with markdown parsing before, maybe I miss something.

Originally created by @Feofilakt on GitHub (Nov 2, 2020). Hello! Is it make sense to use StreamReader for markdown file parsing? I've noticed that Markdig has only method Parse that takes a string. I didn't deal with markdown parsing before, maybe I miss something.
claunia added the question label 2026-01-29 14:36:05 +00:00
Author
Owner

@MihaZupan commented on GitHub (Nov 2, 2020):

Short answer: no.

If we were to accept a StreamReader, the implementation would 100% be to just call ReadToEnd (allocating a new string for the entire content), then delegate to the existing parsing logic.
As such I don't think it adds any value.

Markdig's ToHtml has 2 phases:

  1. Parse takes in the entire markdown text and parses it to an abstract syntax tree.
  2. Render takes the AST and spits out the corresponding HTML

This approach gives Markdig a lot of customization and extensibility opportunities.

Parsing and the resulting AST are making heavy use of the string type all over the place (albeit in an efficient way). While it could technically be changed to something like Memory<char>, it would likely be slower than just allocating a new string once at the start.

Markdown is complex, so parsing can't be implemented in a single-pass non-seeking manner - therefore we don't really benefit from Streaming APIs (for either source or destination).
Markdig is also very fast and the single string allocation at the start shouldn't make or break the performance of Markdown processing.

@MihaZupan commented on GitHub (Nov 2, 2020): Short answer: no. If we were to accept a `StreamReader`, the implementation would 100% be to just call `ReadToEnd` (allocating a new string for the entire content), then delegate to the existing parsing logic. As such I don't think it adds any value. Markdig's `ToHtml` has 2 phases: 1. `Parse` takes in the entire markdown text and parses it to an abstract syntax tree. 2. `Render` takes the AST and spits out the corresponding HTML This approach gives Markdig a lot of customization and extensibility opportunities. Parsing and the resulting AST are making heavy use of the string type all over the place (albeit in an efficient way). While it could technically be changed to something like `Memory<char>`, it would likely be slower than just allocating a new string once at the start. Markdown is complex, so parsing can't be implemented in a single-pass non-seeking manner - therefore we don't really benefit from Streaming APIs (for either source or destination). Markdig is also very fast and the single string allocation at the start shouldn't make or break the performance of Markdown processing.
Author
Owner

@ondrejpialek commented on GitHub (Jan 7, 2024):

A nice use case these days is feeding streamed md from LLMs, such as what OpenAI expose in their API (I have an IAsyncEnumerable<string> as a response from a particular library).

It would be cool to be able to render HTML that markdown parser could render so far, but I understand it's not trivial, especially since tags would need to closed and then that being undone as more data comes in.

@ondrejpialek commented on GitHub (Jan 7, 2024): A nice use case these days is feeding streamed md from LLMs, such as what OpenAI expose in their API (I have an `IAsyncEnumerable<string>` as a response from a particular library). It would be cool to be able to render HTML that markdown parser could render so far, but I understand it's not trivial, especially since tags would need to closed and then that being undone as more data comes in.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/markdig#412