Unsafe caching of a previous renderer #457

Closed
opened 2026-01-29 14:37:14 +00:00 by claunia · 9 comments
Owner

Originally created by @AndreyChechel on GitHub (Apr 29, 2021).

There seems to be an issue in the way how a previous renderer is cached in RendererBase.Write method.

Each registered renderer has an Accept method saying whether the renderer can process a particular object. Since the Accept method is virtual, it's possible to create custom selective renderers. The problem with a cached renderer is that it's never tested whether it can accept another object or not - the implementation makes an assumption just by the object type, whereas a custom Accept method can have much more criteria to check.

Moreover, even if the Accept method had invoked on the cached previous render before using it, we'd have still had an issue, because some renderer(s) located in the pipeline prior to the cached one could accept a new object while rejecting the previous object of the same type. That contemplation leads us to the conclusion that previous renderer cache will always be unsafe and risky to use.

Here's an issue we ran into. We needed to emit custom markup for HTML anchors defined as [label](#anchor). Those anchors are actually regular LinkInline objects. We wrote a custom renderer with an overridden Accept method checking that a link URL is an anchor URL, so it's only invoked for anchors. The AnchorRenderer was placed before LinkInlineRenderer in the pipeline, so the AnchorRenderer was supposed to be invoked for each link parsed from a markdown document. However, if a paragraph has two URLs: a regular one, and an anchor - the first URL is processed with LinkInlineRenderer that is cached and then reused for the anchor. This way the AnchorRenderer is never actually invoked.

The workaround is to create a parser that would produce an object of different type, smth like AnchorLinkInline so it's not picked up by LinkInlineRenderer. Which is fine, but requires more entities implemented and adds extra computation costs on parsing stage.

Originally created by @AndreyChechel on GitHub (Apr 29, 2021). There seems to be an issue in the way how a previous renderer is cached in [RendererBase.Write](https://github.com/xoofx/markdig/blob/0a0040450fd4f774dacc71bcaf0ed1651a792cea/src/Markdig/Renderers/RendererBase.cs#L134-L158) method. Each registered renderer has an `Accept` method saying whether the renderer can process a particular object. Since the `Accept` method is virtual, it's possible to create custom selective renderers. The problem with a cached renderer is that it's never tested whether it can accept another object or not - the implementation makes an assumption just by the object type, whereas a custom `Accept` method can have much more criteria to check. Moreover, even if the `Accept` method had invoked on the cached previous render before using it, we'd have still had an issue, because some renderer(s) located in the pipeline prior to the cached one could accept a new object while rejecting the previous object of the same type. That contemplation leads us to the conclusion that previous renderer cache will always be unsafe and risky to use. Here's an issue we ran into. We needed to emit custom markup for HTML anchors defined as `[label](#anchor)`. Those anchors are actually regular `LinkInline` objects. We wrote a custom renderer with an overridden `Accept` method checking that a link URL is an anchor URL, so it's only invoked for anchors. The `AnchorRenderer` was placed before `LinkInlineRenderer` in the pipeline, so the `AnchorRenderer` was supposed to be invoked for each link parsed from a markdown document. However, if a paragraph has two URLs: a regular one, and an anchor - the first URL is processed with `LinkInlineRenderer` that is cached and then reused for the anchor. This way the `AnchorRenderer` is never actually invoked. The workaround is to create a parser that would produce an object of different type, smth like `AnchorLinkInline` so it's not picked up by `LinkInlineRenderer`. Which is fine, but requires more entities implemented and adds extra computation costs on parsing stage.
claunia added the wontfix label 2026-01-29 14:37:14 +00:00
Author
Owner

@xoofx commented on GitHub (Apr 29, 2021):

Yes, it was a choice in design and yes, it was known that, wrongly used, you could get invalid behavior.

The workaround is to create a parser that would produce an object of different type, smth like AnchorLinkInline so it's not picked up by LinkInlineRenderer. Which is fine, but requires more entities implemented and adds extra computation costs on parsing stage.

Yep, that's the reason why it was not like that 😉

So It's by design. The Accept can't rely on anything else than statically type information, not runtime information.

@xoofx commented on GitHub (Apr 29, 2021): Yes, it was a choice in design and yes, it was known that, wrongly used, you could get invalid behavior. > The workaround is to create a parser that would produce an object of different type, smth like AnchorLinkInline so it's not picked up by LinkInlineRenderer. Which is fine, but requires more entities implemented and adds extra computation costs on parsing stage. Yep, that's the reason why it was not like that 😉 So It's by design. The Accept can't rely on anything else than statically type information, not runtime information.
Author
Owner

@xoofx commented on GitHub (Apr 29, 2021):

Yep, that's the reason why it was not like that 😉

I mean, that whatever the solution is, it has a cost, dynamic dispatch of renderer based on runtime data was not a goal, and is left to specialized users that can re-implement an entirely new renderer pipeline if they really want to go that route.

@xoofx commented on GitHub (Apr 29, 2021): > Yep, that's the reason why it was not like that 😉 I mean, that whatever the solution is, it has a cost, dynamic dispatch of renderer based on runtime data was not a goal, and is left to specialized users that can re-implement an entirely new renderer pipeline if they really want to go that route.
Author
Owner

@AndreyChechel commented on GitHub (Apr 29, 2021):

Thanks for the quick response.

I see your point regarding the performance - it totally makes sense. Well, one thing that can be still confusing is that the Accept method gets an actual object instance instead of the object type. Passing a type will likely eliminate potential issues with custom renderers misusing the Accept method.

Anyways, I'm glad that the issue is known and being taken into the account 👍

@AndreyChechel commented on GitHub (Apr 29, 2021): Thanks for the quick response. I see your point regarding the performance - it totally makes sense. Well, one thing that can be still confusing is that the `Accept` method gets an actual object instance instead of the object type. Passing a type will likely eliminate potential issues with custom renderers misusing the `Accept` method. Anyways, I'm glad that the issue is known and being taken into the account 👍
Author
Owner

@MihaZupan commented on GitHub (Apr 29, 2021):

Passing a type will likely eliminate potential issues with custom renderers misusing the Accept method.

That sounds right. We should also fix the one location where we override this internaly: 0a0040450f/src/Markdig/Extensions/AutoLinks/NormalizeAutoLinkRenderer.cs (L14-L27)

(&& link.IsAutoLink; above)

@MihaZupan commented on GitHub (Apr 29, 2021): > Passing a type will likely eliminate potential issues with custom renderers misusing the Accept method. That sounds right. We should also fix the one location where we override this internaly: https://github.com/xoofx/markdig/blob/0a0040450fd4f774dacc71bcaf0ed1651a792cea/src/Markdig/Extensions/AutoLinks/NormalizeAutoLinkRenderer.cs#L14-L27 (`&& link.IsAutoLink;` above)
Author
Owner

@xoofx commented on GitHub (Apr 29, 2021):

Well, one thing that can be still confusing is that the Accept method gets an actual object instance instead of the object type. Passing a type will likely eliminate potential issues with custom renderers misusing the Accept method.

I can explain why: This was actually for performance reason 😉

Initially the method was taking Type, but Type.IsAssignableFrom was slower than performing a check through is directly on the instance, hence the reason why the instance was passed. I haven't benchmarked this with a recent runtime, it could have changed, but yeah, that was - afair - the underlying reason

@xoofx commented on GitHub (Apr 29, 2021): > Well, one thing that can be still confusing is that the Accept method gets an actual object instance instead of the object type. Passing a type will likely eliminate potential issues with custom renderers misusing the Accept method. I can explain why: This was actually for performance reason 😉 Initially the method was taking Type, but Type.IsAssignableFrom was slower than performing a check through `is` directly on the instance, hence the reason why the instance was passed. I haven't benchmarked this with a recent runtime, it could have changed, but yeah, that was - afair - the underlying reason
Author
Owner

@MihaZupan commented on GitHub (Apr 29, 2021):

This should now be a one-time cost per pipeline since we're caching the renderers (and the renderersPerType is not being reset).

@MihaZupan commented on GitHub (Apr 29, 2021): This should now be a one-time cost per pipeline since we're caching the renderers (and the `renderersPerType` is not being reset).
Author
Owner

@xoofx commented on GitHub (Apr 29, 2021):

This should now be a one-time cost per pipeline since we're caching the renderers (and the renderersPerType is not being reset).

the caching was already done I think at that time, but it was still a (small) perf hit. But we can double check for sure if it is no longer an issue.

@xoofx commented on GitHub (Apr 29, 2021): > This should now be a one-time cost per pipeline since we're caching the renderers (and the renderersPerType is not being reset). the caching was already done I think at that time, but it was still a (small) perf hit. But we can double check for sure if it is no longer an issue.
Author
Owner

@AndreyChechel commented on GitHub (Apr 29, 2021):

Adding a rough benchmark result for is operator vs. Type.IsAssignableFrom for .NET5

  • When tested for a base type:
|           Method |      Mean |     Error |    StdDev | Rank |  Gen 0 | Gen 1 | Gen 2 | Allocated |
|----------------- |----------:|----------:|----------:|-----:|-------:|------:|------:|----------:|
|               Is |  4.895 ns | 0.1253 ns | 0.1539 ns |    1 | 0.0057 |     - |     - |      24 B |
| IsAssignableFrom | 16.733 ns | 0.1498 ns | 0.1251 ns |    2 | 0.0057 |     - |     - |      24 B |
  • When tested for an exact type:
|           Method |     Mean |     Error |    StdDev | Rank |  Gen 0 | Gen 1 | Gen 2 | Allocated |
|----------------- |---------:|----------:|----------:|-----:|-------:|------:|------:|----------:|
|               Is | 4.708 ns | 0.0301 ns | 0.0281 ns |    1 | 0.0057 |     - |     - |      24 B |
| IsAssignableFrom | 7.899 ns | 0.0453 ns | 0.0378 ns |    2 | 0.0057 |     - |     - |      24 B |
@AndreyChechel commented on GitHub (Apr 29, 2021): Adding a rough benchmark result for `is` operator vs. `Type.IsAssignableFrom` for .NET5 - **When tested for a base type:** ``` | Method | Mean | Error | StdDev | Rank | Gen 0 | Gen 1 | Gen 2 | Allocated | |----------------- |----------:|----------:|----------:|-----:|-------:|------:|------:|----------:| | Is | 4.895 ns | 0.1253 ns | 0.1539 ns | 1 | 0.0057 | - | - | 24 B | | IsAssignableFrom | 16.733 ns | 0.1498 ns | 0.1251 ns | 2 | 0.0057 | - | - | 24 B | ``` - **When tested for an exact type:** ``` | Method | Mean | Error | StdDev | Rank | Gen 0 | Gen 1 | Gen 2 | Allocated | |----------------- |---------:|----------:|----------:|-----:|-------:|------:|------:|----------:| | Is | 4.708 ns | 0.0301 ns | 0.0281 ns | 1 | 0.0057 | - | - | 24 B | | IsAssignableFrom | 7.899 ns | 0.0453 ns | 0.0378 ns | 2 | 0.0057 | - | - | 24 B | ```
Author
Owner

@MihaZupan commented on GitHub (Apr 29, 2021):

It was already cached in the sense that we would only perform the check once per type/renderer. So for a very large document this would be just noise.
renderersPerType is rebuilt for each renderer instance, so now that the renderers are getting reused this will always be noise.

@MihaZupan commented on GitHub (Apr 29, 2021): It was already cached in the sense that we would only perform the check once per type/renderer. So for a very large document this would be just noise. `renderersPerType` is rebuilt for each renderer instance, so now that the renderers are getting reused this will always be noise.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/markdig#457