Question: Determine Word Count? #409

Closed
opened 2026-01-29 14:35:58 +00:00 by claunia · 2 comments
Owner

Originally created by @Mike-E-angelo on GitHub (Oct 22, 2020).

Greetings... thank you for making this great library!

In my case, I am looking for a Markdown parser/library/api that can quickly tell me the line count and word count of a given Markdown document. I see that there is a MarkdownDocument.LineCount so that takes care of one of the requirements.

Is there a way of determining the word count of a document in a similar fashion? I see there's a way of iterating through the document via #381, so I will be attempting to do this, but wanted to quickly ping here to see if there is an easier/more obvious way of doing so.

Thank you for any assistance you can provide. 👍

Originally created by @Mike-E-angelo on GitHub (Oct 22, 2020). Greetings... thank you for making this great library! In my case, I am looking for a Markdown parser/library/api that can quickly tell me the line count and word count of a given Markdown document. I see that there is a `MarkdownDocument.LineCount` so that takes care of one of the requirements. Is there a way of determining the word count of a document in a similar fashion? I see there's a way of iterating through the document via #381, so I will be attempting to do this, but wanted to quickly ping here to see if there is an easier/more obvious way of doing so. Thank you for any assistance you can provide. 👍
claunia added the question label 2026-01-29 14:35:58 +00:00
Author
Owner

@xoofx commented on GitHub (Oct 23, 2020):

It's not straightforward to go through all the AST to collect words. Why don't you perform that on the input string directly with a regex?

  • count words: Regex.Matches(markdownText, @"\b\w+\b").Count
  • count lines: Regex.Matches(markdownText, "\n").Count
@xoofx commented on GitHub (Oct 23, 2020): It's not straightforward to go through all the AST to collect words. Why don't you perform that on the input string directly with a regex? - count words: `Regex.Matches(markdownText, @"\b\w+\b").Count` - count lines: `Regex.Matches(markdownText, "\n").Count`
Author
Owner

@Mike-E-angelo commented on GitHub (Oct 23, 2020):

That's actually what I am doing with plain text documents, @xoofx. 😁 The thought did occur to do the same with Markdown documents but wanted to ensure there wasn't some obvious feature I was overlooking with the API here to be more accurate and/or engrained, for lack of a better word.

So, it looks like I will go with RegEx, then. Thank you for your time and input!

@Mike-E-angelo commented on GitHub (Oct 23, 2020): That's actually what I am doing with plain text documents, @xoofx. 😁 The thought did occur to do the same with Markdown documents but wanted to ensure there wasn't some obvious feature I was overlooking with the API here to be more accurate and/or engrained, for lack of a better word. So, it looks like I will go with `RegEx`, then. Thank you for your time and input!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/markdig#409