Command Palette search algorithm needs to prioritize "longest substring" match, may need multiple passes #9294

Closed
opened 2026-01-31 01:50:59 +00:00 by claunia · 5 comments
Owner

Originally created by @DHowett on GitHub (Jun 26, 2020).

Originally assigned to: @DHowett on GitHub.

I have two commands:

  1. Split Pane, split: horizontal
  2. Split Pane, split: horizontal, profile: SSH: Antares

If I search for sp anta, I expect 2 to weight higher than 1. I'm getting the opposite, though:

image

We only run one pass over each command for the whole search term; [SP]lit p[AN]e, spli[T]: horizont[A]l is a complete match; since it's a subset of the Antares version, we never get to the more specific ... profile: SSH:... part.

Originally created by @DHowett on GitHub (Jun 26, 2020). Originally assigned to: @DHowett on GitHub. I have two commands: 1. Split Pane, split: horizontal 2. Split Pane, split: horizontal, profile: SSH: Antares If I search for `sp anta`, I expect 2 to weight higher than 1. I'm getting the opposite, though: ![image](https://user-images.githubusercontent.com/189190/85883518-f8c6a700-b795-11ea-85d6-ff84b4c4918d.png) We only run one pass over each command for the whole search term; `[SP]lit p[AN]e, spli[T]: horizont[A]l` is a complete match; since it's a subset of the Antares version, we never get to the more specific `... profile: SSH:...` part.
Author
Owner

@DHowett commented on GitHub (Jul 23, 2020):

@zadjii-msft more modest proposal, and one I'd like to work on for Hackathon (!)

It looks like VSCode has three matchers. The first one to return a result just straight-up wins, and there is no weighting. This is important.

  1. Full prefix matching ("needle" will match "[Needle]ss to say")
  2. Word-by-word matching ("needle" will match "[Ne]ed [Ed]ward to [Le]t Alphonse go")
  3. Contiguous substring matching only ("space" will match "sub[space]", but not match "sub[s] must [pace] themselves")

All other matches are sorted strictly alphabetically. This continues to support the case where users memorize commands by their first initial, and doesn't break the case where users might want "p" to match "S[p]lit Pane" for some reason. It'd sort below anything that started with a P because S>P.

I'm guessing that they found users really want a constrained search, not a generic fuzzy search, for their commands. We might be in the same boat because our users will remember key things about the commands (profile name, index, split type, action, etc.).

VSC also returns, as a result of fuzzy match, a list of ranges. If we adopt that, it will make it easier to get the range of text we are supposed to embolden.

@DHowett commented on GitHub (Jul 23, 2020): @zadjii-msft more modest proposal, and one I'd like to work on for Hackathon (!) It looks like VSCode has three matchers. _The first one to return a result just straight-up wins, and there is no weighting._ This is important. 1. Full prefix matching ("needle" will match "[Needle]ss to say") 2. Word-by-word matching ("needle" will match "[Ne]ed [Ed]ward to [Le]t Alphonse go") 3. Contiguous substring matching only ("space" will match "sub[space]", but **not** match "sub[s] must [pace] themselves") All other matches are sorted strictly alphabetically. This continues to support the case where users memorize commands by their first initial, and doesn't break the case where users might want "p" to match "S[p]lit Pane" for some reason. It'd sort below anything that started with a P because S>P. I'm guessing that they found users really want a constrained search, not a generic fuzzy search, for their commands. We might be in the same boat because our users will remember _key things_ about the commands (profile name, index, split type, action, etc.). VSC also returns, as a result of fuzzy match, a list of ranges. If we adopt that, it will make it easier to get the range of text we are supposed to embolden.
Author
Owner

@zadjii-msft commented on GitHub (Jul 23, 2020):

Wait but why no weighting?

@zadjii-msft commented on GitHub (Jul 23, 2020): Wait but why no weighting?
Author
Owner

@DHowett commented on GitHub (Jul 23, 2020):

So, I think that with "first match wins" and running matchers in that order we'll get the vast majority of things users want without having to get into relative ordering discussions like the one that made me file this bug 😄

This is unsubstantiated and, perhaps, unknowable, but I think that most commands are going to be matched by either full-prefix or word-by-word-prefix. There's a higher likelihood that somebody is going to disambiguate an ambiguous match by typing either successive word-start letters or by simply choosing the right one from the very reduced list.

Unfold for a bunch of matching cases. It was longer than I expected.

As an example, take the command names from the OP (plus a couple more)

  • Split Pane, split: horizontal
  • Split Pane, split: horizontal, profile: SSH: Antares
  • Split Pane, split: horizontal, profile: SSH: Vaudeville
  • Split Pane, split: vertical

and the needles sp anta, span and spv and spve

weights

If we do weighted matching, we get (approximately) the following.

sp anta

 33     1   12         1            1 = 12
[Sp]lit[ ]P[an]e, spli[t]: horizont[a]l
 33     1   12         1            1 = 12
[Sp]lit[ ]P[an]e, spli[t]: horizont[a]l, profile: SSH: Antares
 33     1   12         1            1 = 12
[Sp]lit[ ]P[an]e, spli[t]: horizont[a]l, profile: SSH: Vaudeville
 33     1   12         1          1 = 12
[Sp]lit[ ]P[an]e, spli[t]: vertic[a]l

span

weight <span> in <Split Pane, split: horizontal>: 9
weight <span> in <Split Pane, split: horizontal, profile: SSH: Antares>: 9
weight <span> in <Split Pane, split: horizontal, profile: SSH: Vaudeville>: 9
weight <span> in <Split Pane, split: vertical>: 9

spv

weight <spv> in <Split Pane, split: horizontal, profile: SSH: Vaudeville>: 8
weight <spv> in <Split Pane, split: vertical>: 8

it's a toss-up, which is fine - the user may disambiguate later

spve

weight <spve> in <Split Pane, split: vertical>: 11
weight <spve> in <Split Pane, split: horizontal, profile: SSH: Vaudeville>: 9
(others 0)

spva flips them around (which is fine)

first-past-the-post

sp anta

Split Pane, split: horizontal
* prefix: not literal `sp anta` = FALSE
* wordwise: `Sp` at start, no further words = FALSE
* substring: no literal `sp anta` = FALSE

Split Pane, split: horizontal, profile: SSH: Antares
* prefix: not literal `sp anta`
* wordwise: [Sp]lit ... [Anta]res = TRUE

Split Pane, split: horizontal, profile: SSH: Vaudeville
* prefix: not literal `sp anta`
* wordwise: `Sp` at start, no further words = FALSE
* substring: no literal `sp anta` = FALSE

Split Pane, split: vertical
* prefix: not literal `sp anta`
* wordwise: `Sp` at start, no further words = FALSE
* substring: no literal `sp anta` = FALSE

Final list:
- Split Pane, split: horizontal, profile: SSH: Antares

span

Same as above, actually.

spv

Split Pane, split: horizontal
* prefix: not literal `spv` = FALSE
* wordwise: `Sp` at start, no further words = FALSE
* substring: no literal `spv` = FALSE

Split Pane, split: horizontal, profile: SSH: Antares
* prefix: not literal `spv`
* wordwise: `Sp` at start, no further words = FALSE
* substring: no literal `spv` = FALSE

Split Pane, split: horizontal, profile: SSH: Vaudeville
* prefix: not literal `spv`
* wordwise: [Sp]lit ... [V]audeville = TRUE

Split Pane, split: vertical
* prefix: not literal `spve`
* wordwise: [Sp]lit ... [v]ertical = TRUE

Final list:
- Split Pane, split: horizontal, profile: SSH: Vaudeville
- Split Pane, split: vertical

user disambiguates w/ a or e

spve

Split Pane, split: horizontal
* prefix: not literal `spve` = FALSE
* wordwise: `Sp` at start, no further words = FALSE
* substring: no literal `spve` = FALSE

Split Pane, split: horizontal, profile: SSH: Antares
* prefix: not literal `spve`
* wordwise: `Sp` at start, no further words = FALSE
* substring: no literal `spve` = FALSE

Split Pane, split: horizontal, profile: SSH: Vaudeville
* prefix: not literal `spve`
* wordwise: `Sp` at start, no further words = FALSE
* substring: no literal `spve` = FALSE

Split Pane, split: vertical
* prefix: not literal `spve`
* wordwise: [Sp]lit ... [ve]rtical = TRUE

Final list:
- Split Pane, split: vertical

Anyway, I'm not sure fuzzy plus weighting is better. If we want to render the results to the user we are going to bold a bunch of weird letters or we're going to have to do a runoff search to re-weight the matches and optimize for longer substring runs.

If we're going to do that, and reprioritize longer substring runs, we may want to just ... try to guess at the substrings people are going to use (my theory is that they're the beginnings of their action-words) ... which eventually takes us to "why not just match word prefixes?"

@DHowett commented on GitHub (Jul 23, 2020): So, I think that with "first match wins" and running matchers in that order we'll get the vast majority of things users want without having to get into relative ordering discussions like the one that made me file this bug :smile: This is unsubstantiated and, perhaps, unknowable, but I think that most commands are going to be matched by either full-prefix or word-by-word-prefix. There's a higher likelihood that somebody is going to _disambiguate_ an ambiguous match by typing either successive word-start letters or by simply choosing the right one from the very reduced list. <details> <summary>Unfold for a bunch of matching cases. It was longer than I expected.</summary> As an example, take the command names from the OP (plus a couple more) * Split Pane, split: horizontal * Split Pane, split: horizontal, profile: SSH: Antares * Split Pane, split: horizontal, profile: SSH: Vaudeville * Split Pane, split: vertical and the needles `sp anta`, `span` and `spv` and `spve` ## weights If we do weighted matching, we get (approximately) the following. ### `sp anta` ``` 33 1 12 1 1 = 12 [Sp]lit[ ]P[an]e, spli[t]: horizont[a]l 33 1 12 1 1 = 12 [Sp]lit[ ]P[an]e, spli[t]: horizont[a]l, profile: SSH: Antares 33 1 12 1 1 = 12 [Sp]lit[ ]P[an]e, spli[t]: horizont[a]l, profile: SSH: Vaudeville 33 1 12 1 1 = 12 [Sp]lit[ ]P[an]e, spli[t]: vertic[a]l ``` ### `span` ``` weight <span> in <Split Pane, split: horizontal>: 9 weight <span> in <Split Pane, split: horizontal, profile: SSH: Antares>: 9 weight <span> in <Split Pane, split: horizontal, profile: SSH: Vaudeville>: 9 weight <span> in <Split Pane, split: vertical>: 9 ``` ### `spv` ``` weight <spv> in <Split Pane, split: horizontal, profile: SSH: Vaudeville>: 8 weight <spv> in <Split Pane, split: vertical>: 8 ``` it's a toss-up, which is fine - the user may disambiguate later ### `spve` ``` weight <spve> in <Split Pane, split: vertical>: 11 weight <spve> in <Split Pane, split: horizontal, profile: SSH: Vaudeville>: 9 (others 0) ``` `spva` flips them around (which is fine) ## first-past-the-post ### `sp anta` ``` Split Pane, split: horizontal * prefix: not literal `sp anta` = FALSE * wordwise: `Sp` at start, no further words = FALSE * substring: no literal `sp anta` = FALSE Split Pane, split: horizontal, profile: SSH: Antares * prefix: not literal `sp anta` * wordwise: [Sp]lit ... [Anta]res = TRUE Split Pane, split: horizontal, profile: SSH: Vaudeville * prefix: not literal `sp anta` * wordwise: `Sp` at start, no further words = FALSE * substring: no literal `sp anta` = FALSE Split Pane, split: vertical * prefix: not literal `sp anta` * wordwise: `Sp` at start, no further words = FALSE * substring: no literal `sp anta` = FALSE Final list: - Split Pane, split: horizontal, profile: SSH: Antares ``` ### `span` Same as above, actually. ### `spv` ``` Split Pane, split: horizontal * prefix: not literal `spv` = FALSE * wordwise: `Sp` at start, no further words = FALSE * substring: no literal `spv` = FALSE Split Pane, split: horizontal, profile: SSH: Antares * prefix: not literal `spv` * wordwise: `Sp` at start, no further words = FALSE * substring: no literal `spv` = FALSE Split Pane, split: horizontal, profile: SSH: Vaudeville * prefix: not literal `spv` * wordwise: [Sp]lit ... [V]audeville = TRUE Split Pane, split: vertical * prefix: not literal `spve` * wordwise: [Sp]lit ... [v]ertical = TRUE Final list: - Split Pane, split: horizontal, profile: SSH: Vaudeville - Split Pane, split: vertical ``` user disambiguates w/ `a` or `e` ### `spve` ``` Split Pane, split: horizontal * prefix: not literal `spve` = FALSE * wordwise: `Sp` at start, no further words = FALSE * substring: no literal `spve` = FALSE Split Pane, split: horizontal, profile: SSH: Antares * prefix: not literal `spve` * wordwise: `Sp` at start, no further words = FALSE * substring: no literal `spve` = FALSE Split Pane, split: horizontal, profile: SSH: Vaudeville * prefix: not literal `spve` * wordwise: `Sp` at start, no further words = FALSE * substring: no literal `spve` = FALSE Split Pane, split: vertical * prefix: not literal `spve` * wordwise: [Sp]lit ... [ve]rtical = TRUE Final list: - Split Pane, split: vertical ``` </details> Anyway, I'm not sure fuzzy plus weighting is _better_. If we want to render the results to the user we are going to bold a bunch of weird letters _or_ we're going to have to do a runoff search to re-weight the matches and optimize for longer substring runs. If we're going to do that, and reprioritize longer substring runs, we may want to just ... try to guess at the substrings people are going to use (my theory is that they're the beginnings of their action-words) ... which eventually takes us to "why not just match word prefixes?"
Author
Owner

@DHowett commented on GitHub (Jul 23, 2020):

Further discussion: maybe we should do weighting by algorithm -- full prefix is better than wordwise prefix which is further better than contiguous substring.

@DHowett commented on GitHub (Jul 23, 2020): Further discussion: maybe we should do weighting by algorithm -- full prefix is better than wordwise prefix which is further better than contiguous substring.
Author
Owner

@lhecker commented on GitHub (Aug 5, 2020):

For reference and further research I'd like to mention the relatively popular project fzf and its algorithm.

@lhecker commented on GitHub (Aug 5, 2020): For reference and further research I'd like to mention the relatively popular project [fzf](https://github.com/junegunn/fzf) and its [algorithm](https://github.com/junegunn/fzf/blob/e2ae1b249cf2d5258b552cfd682c7c0911981e9b/src/algo/algo.go).
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#9294