Compare commits

...

5 Commits

Author SHA1 Message Date
Mike Griese
41343de212 add image 2024-12-04 11:54:27 -06:00
Mike Griese
df9bfb9cab sometimes units are hard for brains 2024-02-26 11:56:54 -06:00
Mike Griese
ba9307e4a1 this is the most important paragraph in the spec 2024-02-26 11:37:32 -06:00
Mike Griese
3e347ef8bd more notes 2024-02-23 07:00:41 -06:00
Mike Griese
3f38c78bcb committing some VERY old spec drafts that are clearly no longer relevant 2024-02-23 06:33:14 -06:00
9 changed files with 1217 additions and 0 deletions

View File

@@ -0,0 +1,101 @@
---
author: Mike Griese
created on: 2023-01-26
last updated: 2023-01-26
issue id: n/a
---
# Windows Terminal Copilot | Explain that
## Abstract
## Background
### Inspiration
### Execution Strategy
### User Stories
### Elevator Pitch
It's copilot. For the command line.
## Business Justification
## Scenario Details
### UI/UX Design
### Implementation Details
## Tenents
<table>
<tr><td><strong>Compatibility</strong></td><td>
[comment]: # Will the proposed change break existing code/behaviors? If so, how, and is the breaking change "worth it"?
</td></tr>
<tr><td><strong>Accessibility</strong></td><td>
[comment]: # TODO!
</td></tr>
<tr><td><strong>Sustainability</strong></td><td>
[comment]: # TODO!
</td></tr>
<tr><td><strong>Compatibility</strong></td><td>
[comment]: # TODO!
</td></tr>
<tr><td><strong>Localization</strong></td><td>
[comment]: # TODO!
</td></tr>
</table>
[comment]: # If there are any other potential issues, make sure to include them here.
## To-do list
### 🐣 Crawl
* [ ]
### 🚶 Walk
* [ ]
### 🏃‍♂️ Run
* [ ]
### 🚀 Sprint
* [ ]
## Conclusion
[comment]: # Of the above proposals, which should we decide on, and why?
### Future Considerations
[comment]: # Are there other future features planned that might affect the current design of this setting? The team can help with this section during the review.
## Resources
### Footnotes
<a name="footnote-1"><a>[1]:
[Fig]: https://github.com/withfig/autocomplete
[Warp]: https://www.warp.dev/
[Terminal North Star]: ../Terminal-North-Star.md
[Tasks]: ../Tasks.md
[Shell Integration]: ../Shell-Integration-Marks.md
[Suggestions UI]: ../Suggestions-UI.md
[Extensions]: ../Suggestions-UI.md
<!-- TODO! -->
[shell-driven autocompletion]: ../Terminal-North-Star.md#Shell_autocompletion

View File

@@ -0,0 +1,102 @@
---
author: Mike Griese
created on: 2023-01-26
last updated: 2023-01-26
issue id: n/a
---
# Windows Terminal Copilot | Implicit Suggestions
## Abstract
## Background
### Inspiration
### Execution Strategy
### User Stories
### Elevator Pitch
It's copilot. For the command line.
## Business Justification
## Scenario Details
### UI/UX Design
### Implementation Details
## Tenents
<table>
<tr><td><strong>Compatibility</strong></td><td>
[comment]: # Will the proposed change break existing code/behaviors? If so, how, and is the breaking change "worth it"?
</td></tr>
<tr><td><strong>Accessibility</strong></td><td>
[comment]: # TODO!
</td></tr>
<tr><td><strong>Sustainability</strong></td><td>
[comment]: # TODO!
</td></tr>
<tr><td><strong>Compatibility</strong></td><td>
[comment]: # TODO!
</td></tr>
<tr><td><strong>Localization</strong></td><td>
[comment]: # TODO!
</td></tr>
</table>
[comment]: # If there are any other potential issues, make sure to include them here.
## To-do list
### 🐣 Crawl
* [ ]
### 🚶 Walk
* [ ]
### 🏃‍♂️ Run
* [ ]
### 🚀 Sprint
* [ ]
## Conclusion
[comment]: # Of the above proposals, which should we decide on, and why?
### Future Considerations
[comment]: # Are there other future features planned that might affect the current design of this setting? The team can help with this section during the review.
## Resources
### Footnotes
<a name="footnote-1"><a>[1]:
[Fig]: https://github.com/withfig/autocomplete
[Warp]: https://www.warp.dev/
[Terminal North Star]: ../Terminal-North-Star.md
[Tasks]: ../Tasks.md
[Shell Integration]: ../Shell-Integration-Marks.md
[Suggestions UI]: ../Suggestions-UI.md
[Extensions]: ../Suggestions-UI.md
<!-- TODO! -->
[shell-driven autocompletion]: ../Terminal-North-Star.md#Shell_autocompletion

View File

@@ -0,0 +1,284 @@
---
author: Mike Griese
created on: 2023-01-26
last updated: 2023-01-26
issue id: n/a
---
# Windows Terminal Copilot | Overview
## Abstract
GitHub Copilot is a fairly revolutionary tool that offers complex predictions
for code from the context of the file you're working on and some simple
comments. However, there's more potential to use it outside of just the text
editor. Imagine integration directly with the commandline, where Copilot can
offer suggestions based off of descriptions of what you'd like to do. Recent
advances in AI models can enable dramatic new features like this, which can be
added to the Terminal.
## Background
Imagine Copilot turning "get the process using the most CPU" into `Get-Process |
Sort-Object CPU -Desc | Select-Object ID, Name, CPU -First 1`. Both [Fig] and
[Warp] have produced similar compelling user experiences already, powered by AI.
Github Labs are also working on a similar natural language-to-command model with
[Copilot CLI].
Or imagine suggestions based off your command history itself - I just ran `git
add --all`, and Copilot can suggest `git commit ; git push ; gh pr create`. It
remains an open question if existing AI models are capable of predicting
commands based on what the user has previously done at the command line. If it
isn't yet possible, then undoubtably it will be possible soon. This is an
idealized future vision for AI in the Terminal. Imagine "**Intelli**sense for
the commandline, powered by artificial **intelligence**"
Another scenario that current models excel at is explaining code in natural
human language. The commandline is an experience that's frequently filled with
esoteric commands and error messages that might be unintuitive. Imagine if the
Terminal could automatically provide an explanation for error messages right in
the context of the Terminal itself. No need to copy the message, leave what
you're doing and search the web to find an explanation - the answer is right
there.
### Execution Strategy
Executing on this vision will require a careful hand. As much delight as this
feature might bring, it has equal potential for PR backlash. Developers already
hate the concept of "telemetry" on Windows. The idea that the Windows Terminal
has built-in support for logging _every command_ run on the command line, and
sending it to a Microsoft server is absolutely a recipe for a PR nightmare.
Under no circumstances should this be built directly in to the Terminal.
This doc outlines how the Terminal might enable this functionality via a "GitHub
Copilot Extension". Instead of building Copilot straight into the Terminal, it
would become an optional extension users could install. By making this
explicitly a "GitHub Copilot" branded extension, it's clear to the users how the
extension is maintained and operated - it's not a feature of _Windows_, but
instead a _GitHub_ feature.
### User Stories
When it regards Copilot integration in the Terminal, we're considering on the following four scenarios.
1. **[Prompting]**: The User types a prompt, and the AI suggests some commands given that prompt
- For example, the user literally types "give me a list of all processes with
port 12345 open", and that prompt is sent to the AI model to generate
suggestions.
2. **[Implicit Suggestions]**: A more seamless suggestion based solely on what the user has already typed
- In this scenario, the user can press a keybinding to summon the AI to
suggest a command based solely on the contents of the buffer.
- This version will more heavily rely on [Shell Integration]
- This will be referred to as **"Implicit suggestions"**
3. **"[Explain that]"**: Highlight some command, and ask Copilot to explain what it does.
- Additionally, a quick icon that appears when a command fails, to ask AI to
try and explain what an error means.
4. Long lived context - the AI learns over time from your own patterns, and
makes personalized suggestions.
For the sake of this document, we're going to focus on the first three
experiences. The last, while an interesting future idea, is not something we
have the engineering resources to build. We can leverage existing AI models for
the first three in all likelihood.
Each of the first three scenarios is broken down in greater detail in their linked docs.
The following plan refers to specifically overarching elements of the Copilot
extension, which are the same regardless of individual features of the
extension. This list was made with consideration for what's possible _before
Build 2023_, alongside what we want to do _in the fullness of time_.
#### By Build
Story | Size | Description
--|-----------|--
A | 🐣 Crawl | The Terminal can use a authentication token hardcoded in their settings for OpenAI requests
A | 🚶 Walk | The Terminal can load the user's GitHub identity from Windows
#### After Build
Story | Size | Description
--|-----------|--
A | 🐣 Crawl | The Terminal can load in-proc extensions via Dynamic Dependencies
A | 🚶 Walk | Terminal Extensions can provide their own action handlers
A | 🚶 Walk | Terminal Extensions can query the contents of the text buffer
A | 🚶 Walk | [Shell integration] marks can be used to help make AI suggestions more context-relevant
A | 🏃‍♂️ Run | Extensions can provide their own UI elements into the Terminal
A | 🏃‍♂️ Run | Copilot is delivered as an extension to the Terminal
A | 🚀 Sprint | The Terminal supports a status bar that shows the state of the Copilot extension
> **Warning**: TODO! How much of this spec should be the "extensions" spec, vs the
> "copilot" spec? Most of the "work" described by this spec is just "Make
> extensions work". Might want to flesh out that one then.
#### North star user experience
As the user is typing at the commandline, suggestions appear as they type, with
AI-driven suggestions for what to complete. These suggestions are driven by the
context of the commands they've previously run (and possibly other contents of
the buffer).
The user can highlight parts of a command that they don't understand, and have
the command explained in natural language. Commands that result in errors can
provide a menu for explaining what the error is, and how to remedy the issue.
### Elevator Pitch
It's Copilot. For the command line.
## Business Justification
It will delight developers.
## Scenario Details
"AI in the Terminal" covers a number of features each powered by AI. Each of
those features is broken into their own specs (linked above). Please refer to
those docs for details about each individual scenario.
This doc will largely focus on the overarching goal of "how do we deliver
Copilot in the Terminal?".
### Implementation Details
#### Github Authentication
<sup>_By Build 2023_</sup>
We don't know if this will be powered by Github Copilot, or some other
authentication method. This section is left blank while we await those answers.
> **Warning**: TODO! do this
#### Extensions implementation
<sup>_After Build 2023_</sup>
> **Warning**: TODO! do this
Extensions for the Terminal are possible made possible by [Dynamic Dependencies for Main packages]. This is a new feature in Windows SV2 (build 22533 I believe). This enables the Terminal to "pin" another application to the Terminal's own package graph, and load binaries from that package.
Main Packages can declare themselves with the following:
```xml
<Package>
<Properties>
<uap15:dependencyTarget>true</uap15:dependencyTarget>
</Properties>
</Package>
```
This is a new property in the SV2 SDK. That'll allow them be a target of a
Dynamic Dependency. This means that **extensions will be limited to users
running SV2+ builds of Windows**.
```xml
<Package>
<Properties>
<uap15:dependencyTarget>true</uap15:dependencyTarget>
</Properties>
<Applications>
<Application Id="App"
Executable="$targetnametoken$.exe"
EntryPoint="$targetentrypoint$">
<Extensions>
<uap3:Extension Category="windows.appExtension">
<uap3:AppExtension Name="com.microsoft.windows.terminal.extension"
Id="MyTerminalExtension"
DisplayName="...">
<uap3:Properties>
<!-- TODO! Determine what properties we want to put in here -->
<Clsid>{2EACA947-FFFF-4CFA-BA87-BE3FB3EF83EF}</Clsid>
</uap3:Properties>
</uap3:AppExtension>
</uap3:Extension>
</Extensions>
</Application>
</Applications>
</Package>
```
#### Consuming extensions from the Terminal
<sup>_After Build 2023_</sup>
> **Warning**: TODO! do this
## Tenents & Potential Issues
See the individual docs for compatibility, accessibility, and localization
concerns relevant to each feature.
## To-do list
> **Note**: Refer to the individual docs for more detailed plans specific to
> each feature. This section is dedicated to covering only the broad tasks that
> are relevant to the Copilot extension as a whole.
## Before Build Todo's
### 🐣 Crawl
* [ ] Allow the user to store their OpenAI API key in the `settings.json`,
which we'll use for authentication
* This is just a placeholder task for the sake of prototyping, until a real
authentication method is settled on.
### 🚶 Walk
* [ ] Actually do proper authentication.
* This might be through a Github device flow, or a DevID login.
* Remove the support for just pasting the API key in `settings.json` at this point.
## After Build Todo's
> **Warning**: TODO! Almost everything here is just "enable extensions". That might deserve a separate spec.
### 🐣 Crawl
* [ ]
### 🚶 Walk
* [ ]
### 🏃‍♂️ Run
* [ ]
### 🚀 Sprint
* [ ]
## Conclusion
### Future Considerations
#### Shell-driven AI
This document focuses mainly on Terminal-side AI features. We are not precluding
the possiblity that an individual shell may want to implement AI-driven
suggestions as well. Consider PowerShell - they may want to deliver AI powered
suggestions as a completion handler. We will want to provide ways of helping to
promote their experience, rather than focus on a single implementation.
The best way for us to help elevate their experience would be through the
[Suggestions UI] and [shell-driven autocompletion]. This will allow us to
promote their results to a first-class UI control. This is a place where we can
work together better, rather than trying to pick one singular design in this
space and discarding the others.
Similarly, [Copilot CLI] could deliver their results as [shell-driven
autocompletion], to further elevate the experience in the terminal.
## Resources
### Footnotes
<a name="footnote-1"><a>[1]:
[Fig]: https://fig.io/user-manual/ai
[Warp]: https://docs.warp.dev/features/entry/ai-command-search
[Copilot CLI]: https://githubnext.com/projects/copilot-cli/
[Terminal North Star]: ../Terminal-North-Star.md
[Tasks]: ../Tasks.md
[Shell Integration]: ../Shell-Integration-Marks.md
[Suggestions UI]: ../Suggestions-UI.md
[Extensions]: ../Suggestions-UI.md
[Implicit Suggestions]: ./Implicit-Suggestions.md
[Prompting]: ./Prompting.md
[Explain that]: ./Explain-that.md
<!-- TODO! -->
[shell-driven autocompletion]: ../Terminal-North-Star.md#Shell_autocompletion
[Dynamic Dependencies for Main packages]: TODO!

View File

@@ -0,0 +1,320 @@
---
author: Mike Griese
created on: 2023-01-26
last updated: 2023-01-27
issue id: n/a
---
# Windows Terminal Copilot | Prompting
## Abstract
GitHub Copilot is a fairly revolutionary tool that offers complex predictions
for code from the context of the file you're working on and some simple
comments. We envision a scenario where this AI model can be integrated directly
within the Terminal application. This would enable users to type a natural
language description of what they're hoping to do, and recieve suggested
commands to accomplish that task. This has the potential to remove the need for
commandline users to memorize long sets of esoteric flags and options for
commands. Instead, they can simply describe what they want done, and _do it_.
This is one of the many scenarios being considered under the umbrella of "AI in the Terminal". For the other scenarios, see [Overview].
## Background
### Inspiration
Github's own Copilot service was what sparked the initial interest in this area.
This quickly lead to the thought "If it can do this for code, can it work for
command lines too?".
This likely started a cascade of similar implementations across the command-line
ecosystem. Both [Fig] and [Warp] have produced similar compelling user
experiences already, powered by AI. Github Labs are also working on a similar
natural language-to-command model with [Copilot CLI].
This seems to be one of the scenarios that can generate the most value quickly
with existing AI models, which is why it's generated so much interest.
### User Stories
The following plan was made with consideration for what's possible _before Build 2023_, alongside what we want to do _in the fullness of time_.
#### By Build
Story | Size | Description
--|-----------|--
A | ✅ Done | The user can "disable" the extension (by unbinding the action)
A | 🐣 Crawl | The user can use an action to open a dedicated "AI Palette" for prompt-driven AI suggestions.
A | 🐣 Crawl | Suggested results appear as text in the Terminal Control, before the user accepts the command
A | 🐣 Crawl | The AI palette can use a manual API key in the settings to enable openAI access
A | 🚶 Walk | The AI Palette uses an official authentication method (Github login, DevID, etc.)
A | 🚶 Walk | The AI Palette remembers previous queries, for quick recollection and modification.
A | 🚶 Walk | The AI Palette informs the user if they're not authenticated to use the extension
#### After Build
Story | Size | Description
--|-----------|--
A | 🚶 Walk | The AI palette is delivered as an extension to the Terminal
A | 🏃‍♂️ Run | The AI Palette can be moved, resized while hovering
A | 🏃‍♂️ Run | The AI Palette can be docked from a hovering control to a Pane
### Elevator Pitch
It's Copilot. For the command line.
## Business Justification
It will delight developers.
## Scenario Details
### UI/UX Design
![A VERY rough mockup of what this UI might look like](./img/Copilot-in-cmdpal.png)
> **Warning**: TODO! Get mocks from Rodney
### Implementation Details
We'll add a new Control to the Terminal, which we'll dub the `AiPalette`. This
will be forked from the `CommandPalette` code initially, but not built directly
in to it. This `AiPalette` will have a text box, and should be capable of "previewing" actions, in the same way that the Command Palette is. The only action it should need to preview is `sendInput` (which has a prototype implementation linked to [#12861]).
We'll add a new action to invoke this `AiPalette`, which we'll temporarily call
`experimental.ai.prompt`. This will work vaguely like the `commandPalette`
action.
Considering the UX pattern for the OpenAI models is largely conversational, it
will be helpful to users to have a history of the requests they've made, and the
results the model returned, in the UI somewhere. We can store these previous
commands and results in an array in `state.json`. This would work similar to the
way the Command Palette's Commandline mode works currently. We'll need to make a
small modification to store and array of `{prompt, result}` objects, but that
should be fairly trivial.
#### Authentication
<sup>_By Build 2023_</sup>
We don't know if this will be powered by Github Copilot, or some other
authentication method.
While we sort that out, we'll need to make engineering progress, regardless. To
facilitate that, we should just add temporary support for a user to paste an
OpenAI API key in the `settings.json`. This should be good enough to get us
unblocked and making progress with at least one AI model, which we sort out the
specifics of authentication and the backend.
> **Warning**: TODO! Figure out what the official plan here will be, and do that.
#### `HoverPane`
<sup>_By Build 2023_</sup>
After the initial implementation of the `AiPalette`, we'll want to refactor the code slightly to enable arbitrary content to float above the Terminal. This would provide a consistent UI experience for transient content.
This would be something like a `HoverPane` control, which accepts a
`FrameworkElement` as the `Content` property. We'd extract out the actual list
view, text box, etc. of the `AiPalette` and instead invoke a new `HoverPane`
with that `AiPalette` as the content.
This we want to do _before_ Build. This same `HoverPane` could be used to
support **[Explain that]**. That's another scenario we'd like demo'd by Build,
so being able to re-use the same UI base would make sense.
This would also make it easy to swap out the `Content` of the `HoverPane` to
replace it with whatever we need to support authentication flows.
> **Warning**: TODO! Refine this idea after we get mocks from design.
#### Pinning a `HoverPane` to a first-class `Pane`
<sup>_After Build 2023_</sup>
This will require us to support non-terminal content in `Pane`s ([#977]). `Pane`
as a class if very special cased for hosting a `TermControl`, and hosting other
types of `FrameworkElement`s is something that will take some refactoring to
enable. For more details, refer to the separate spec detailing [non-terminal panes](https://github.com/microsoft/terminal/blob/main/doc/specs/drafts/%23997%20Non-Terminal-Panes.md).
Pinning the `HoverPane` would create a new pane, split from the currently active pane.
> **Warning**: TODO! Refine this idea after we get mocks from design.
#### Moving and resizing the `HoverPane`
<sup>_After Build 2023_</sup>
> **Warning**: TODO! after build.
#### Send feedback on the quality of the suggestions
<sup>_After Build 2023_</sup>
> **Warning**: TODO! after build.
## Tenents
<table>
<tr><td><strong>Compatibility</strong></td><td>
We don't expect any regressions while implementing these new features.
</td></tr>
<tr><td><strong>Accessibility</strong></td><td>
Largely, we expect the `AiPalette` to follow the same UIA patterns laid out by the Command Palette before it.
</td></tr>
<tr><td><strong>Localization</strong></td><td>
This feature might end up making the Terminal _more_ accessible to users who's
primary language is not English. The commandline is a fairly ascii-centric
experience in general. It might be a huge game changer for users from
less-represented languages to be able to describe in their native language what
they want to do. They wouldn't need to parse search results from the web that
might not be in their native language. The AI model would do that for them.
</td></tr>
</table>
[comment]: # If there are any other potential issues, make sure to include them here.
## To-do list
## Before Build Todo's
### 🐣 Crawl
* [ ] Introduce a new `AiPalette` control, initially forked from the
`CommandPalette` code
* [ ] TODO! We need design comps to really know what to build here.
* [ ] For the initial commit, just have it accept a prompt and generate a fake
/ placeholder "response"
* [ ] Add a placeholder `experimental.ai.prompt` `ShortcutAction` to open that
`AiPalette`. Bind to no key by default.
* [ ] Make `sendInput` actions previewable, so the text will appear in the
`TermControl` as a _preview_.
* [ ] Hook up an AI model to it. Doesn't have to be the real, final one. Just
_an_ AI model.
* [ ]
### 🚶 Walk
* [ ] Stash the queries (and responses?) in `state.json`, so that we can bring
them back immediately (like the Commandline Mode of the CommandPalette)
* [ ] Move the content of the `AiPalette` into one control, that's hosted by a
`HoverPane` control
* this would be to allow **[Explain that]** to reuse the `HoverPane`.
* This can easily be moved to post-Build if we don't intend to demo [Explain
that] at Build.
* [ ] If the user isn't authenticated when running the `experimental.ai.prompt`
action, open the `HoverPane` with a message telling them how to (or a control
enabling them to)
* [ ] If the user **is** authenticated when running the `experimental.ai.prompt`
action, **BUT** bot authorized to use that model/billing/whatever, open the
`HoverPane` with a message explaining that / telling them how to.
* Thought process: Copilot is another fee on top of your GH subscription. You
might be able to log in with your GH account, but not be allowed to use
copilot.
* [ ]
## After Build Todo's
### 🚶 Walk
* [ ] Extensions can add custom `ShortcutAction`s to the Terminal
* [ ] Change the string for this action to something more final than `experimental.ai.prompt`
* [ ] Extensions can add UI elements to the Terminal window
* [ ] Extensions can request the Terminal open a `HoverPane` and specify the
content for that pane.
* [ ] Extensions can add `Page`s to the Terminal settins UI for their own settings
* [ ] The `AiPalette` control is moved out of the Terminal proper and into a
separate app package
* [ ] ...
### 🏃‍♂️ Run
> The AI Palette can be moved, resized while hovering
> The AI Palette can be docked from a hovering control to a Pane
* [ ] Enable the `HoverPane` control to be resizable with the mouse
* [ ] Enable the `HoverPane` control to be dragable with the mouse
* i.e., instead of being strictly docked to the left of the screen, it's got a
little grabby icon / titlebar that can be used to reposition it.
* [ ] Enable `Pane`s to host non-terminal content
* [ ] Add a button to`HoverPane` to cause it to be docked to the currently active pane
* this will open a new `auto` direction split, taking up whatever percent of
the parent is necessary to achieve the same size as the `HoverPane` had
before(?)
* [ ] ...
### 🚀 Sprint
* [ ] ...
## Conclusion
### Rejected ideas
**Option 1**: Use the [Suggestions UI] for this.
* **Pros**:
* the UI appears right at the place the user is typing, keeing them exactly in
the context they started in.
* Suggestion `source`s would be easy/cheap to add as an extension, with
relatively few Terminal changes (especially compared with adding
extension-specific actions)
* **Cons**:
* The model of prompting, then navigating results that are delivered
asynchronously, is fundamentally not compatible with the way the suggestions
UI works.
**Option 2**: Create a new Command Palette Mode for this. This was explored in greater detail
over in the [Extensions] doc.
* **Pros**: "cheap", we can just reuse the Command Palette for this. _Perfect_, right?
* **Cons**:
* Probably more expensive than it's worth to combine the functionality with
the Command Palette. Best to just start fresh with a new control that
doesn't need to carry the baggage of the other Command Palette modes.
* When this does end up being delivered as a separate package (extension), the
fullness of what we want to customize about this UX would be best served by
another UI element anyways. It'll be VERY expensive to instead expose knobs
for extensions to fully customize the existing palette.
### Future Considerations
The flexibility of the `HoverPane` to display arbitrary content could be
exceptionally useful in the future. All sorts of UI elements that we've had no
place to put before could be placed into `HoverPane`s. [#644], [#1595], and
[#8647] are all extension scenarios that would be able to leverage this.
## Resources
### Footnotes
<a name="footnote-1"><a>[1]:
[Fig]: https://fig.io/user-manual/ai
[Warp]: https://docs.warp.dev/features/entry/ai-command-search
[Copilot CLI]: https://githubnext.com/projects/copilot-cli/
[Terminal North Star]: ../Terminal-North-Star.md
[Tasks]: ../Tasks.md
[Shell Integration]: ../Shell-Integration-Marks.md
[Suggestions UI]: ../Suggestions-UI.md
[Extensions]: ../Suggestions-UI.md
[Overview]: ./Overview.md
[Implicit Suggestions]: ./Implicit-Suggestions.md
[Prompting]: ./Prompting.md
[Explain that]: ./Explain-that.md
<!-- TODO! -->
[shell-driven autocompletion]: ../Terminal-North-Star.md#Shell_autocompletion
[#977]: https://github.com/microsoft/terminal/issues/997
[#12861]: https://github.com/microsoft/terminal/issues/12861
[#4000]: https://github.com/microsoft/terminal/issues/4000
[#644]: https://github.com/microsoft/terminal/issues/644
[#1595]: https://github.com/microsoft/terminal/issues/1595
[#8647]: https://github.com/microsoft/terminal/issues/8647

View File

@@ -0,0 +1,410 @@
---
author: Mike Griese @zadjii-msft
created on: 2023-02-13
last updated: 2023-02-23
issue id: n/a
---
# Terminal AI Extensions
## Abstract
This is a quick and dirty description of how the Terminal could implement our AI
experiences using a extensible backend. This will allow the Terminal to iterat
on AI-powered experiences, without any dedicated AI code in the Terminal itself.
This enables multiple different AI models to be plugged in to the Terminal, each
hosted in their own app package. The Terminal will communicate with these
packages over a well-defined [App Service Connection].
- [Terminal AI Extensions](#terminal-ai-extensions)
- [Abstract](#abstract)
- [Solution Details](#solution-details)
- [Declaring the Extension \& Host](#declaring-the-extension--host)
- [Picking a backend](#picking-a-backend)
- [Establishing the connection](#establishing-the-connection)
- [Connection "API"](#connection-api)
- [Note on responses](#note-on-responses)
- [Prompting](#prompting)
- [Explain this](#explain-this)
- [User Experience and Design](#user-experience-and-design)
- [Potential Issues](#potential-issues)
- [Tenents](#tenents)
- [Before spec is done TODO!s](#before-spec-is-done-todos)
- [Future considerations](#future-considerations)
- [Resources](#resources)
- [Footnotes](#footnotes)
## Solution Details
Below is a very technical description of how we will put this support together.
For the remainder of this doc, we'll be using a hypothetical "GitHub Copilot for
the Terminal" extension for our examples. We'll cover first how the apps will
need to be manifested so they can communicate with one another. Then we'll
briefly touch on how Terminal can use this model to pick from different
extensions to choose it's AI model. Lastly, we'll decribe the API the Terminal
will use to communicate with these extensions.
![](./img/ai-providers-plus-powershell.png)
### Declaring the Extension & Host
Terminal becomes an app _service client_. It is also an app _extension host_. It
is gonna register as the host for `com.microsoft.terminal.aiHost` extensions in
the following way:
```xml
<uap3:Extension Category="windows.appExtensionHost">
<uap3:AppExtensionHost>
<uap3:Name>com.microsoft.terminal.aiHost</uap3:Name>
</uap3:AppExtensionHost>
</uap3:Extension>
```
The Github extension app registers as a `com.microsoft.terminal.aiHost`
extension. It also declares a `windows.appService`, which it will use to service
the extension. In the blob for `aiHost` extension, the App should add a property
indicating the name of the AppService that should be used for the extension. For
example:
```xml
<!-- <Package.Applications.Application.Extensions>... -->
<uap:Extension Category="windows.appService" EntryPoint="CopilotService.AiProviderTask">
<uap3:AppService Name="com.github.copilot.terminalAiProvider" />
</uap:Extension>
<uap3:Extension Category="windows.appExtension">
<uap3:AppExtension Name="com.microsoft.terminal.aiHost"
Id="GitHubCopilot"
DisplayName="GitHub Copilot"
Description="whatever"
PublicFolder="Public">
<uap3:Properties>
<ProviderName>com.github.copilot.terminalAiProvider</ProviderName>
</uap3:Properties>
</uap3:AppExtension>
</uap3:Extension>
```
Extension authors should then refer to [this
example](https://github.com/microsoft/Windows-universal-samples/blob/main/Samples/AppServices/cppwinrt/RandomNumberService/RandomNumberGeneratorTask.cpp)
for how they might implement the `Task` to handle these incoming requests.
### Picking a backend
Terminal will be able to enumerate the apps that implement the `aiHost`
extension. We'll use that as a list for a combobox in the settings to give users
a choice of which backend to choose (or to disable the experience entirely).
When we enumerate those packages, we'll get the `ProviderName` property out of
their manifest, and stash that, so we know how to build the app service
connection to that app. The code conhost & Terminal use today for defterm
already does something similar to get a clsid out of the manifest.
If the user chooses to set the chosen provider to "None", then when they invoke
one of the AI experiences, we'll simply inform them that no AI provider is set
up, and provide a deep link to the Settings UI to point them at where to pick
one.
### Establishing the connection
_[Sample code](https://github.com/microsoft/Windows-universal-samples/blob/ad9a0c4def222aaf044e51f8ee0939911cb58471/Samples/AppServices/cppwinrt/AppServicesClient/KeepConnectionOpenScenario.cpp#L52-L57)_
When the Terminal needs to invoke the AI provider, it will do so in the following fashion:
```c++
//Set up a new app service connection
connection = AppServiceConnection();
connection.AppServiceName(L"com.github.copilot.terminalAiProvider");
connection.PackageFamilyName(L"Microsoft.GithubWhatever.YouGet.ThePoint_8wekyb3d8bbwe");
connection.ServiceClosed({ get_weak(), &KeepConnectionOpenScenario::Connection_ServiceClosed });
AppServiceConnectionStatus status = co_await connection.OpenAsync();
```
This will create an `AppServiceConnection` that the Terminal can use to pass
`ValueSet` messages to the extension provider. These messages aren't great for
any sort of real-time communication, but are performant enough for "the user
clicked a button, now they want a response back".
Once we've got a connection established, we'll need to establish that the app is
authenticated, before beginning to send queries to that connection. TODO! how?
### Connection "API"
> [!IMPORTANT]
>
> TODO!
>
> This section was authored at the start of 2023. We since moved from
> just "a list of commands" to a more chat-like experience. This section is
> super out of date.
>
Terminal will fire off a `ValueSet`s to the provider to perform various tasks we
need[^1]. Depending on what's needed, we'll send different requests, with
different expected payload.
Terminal will only communicate messages on predefined "verbs". This will allow
the Terminal to build its UI and experience regardless of how the backend has
decided to implement its own API. So long as the backend AI provider implements
this API interface, the Terminal will be able to build a consistent UI
experience.
Terminal will keep its manipulation of the input request to a minimum. It is up
to each model provider to craft how it wants to handle each scenario. Different
models might have different APIs for requests and responses. Different apps may
want to customize the context that they provide with the prompt the user typed,
to give more relevant responses. The Terminal tries to not declare how each
extension should interface with a particular AI backend. Instead, the Terminal
only provides a general description of what it would like to happen.
#### Note on responses
In each response below, there's a `result` and `message` property returned to
the Terminal. This allows the app to indicate some sort of error message to the
user. This will likely be most used for authentication errors and network
errors.
In those cases, the Terminal will be able to provide dedicated UI messages to
indicate the error. For example, in the case of an authentication failure, the
Terminal may provide a button to send a message to the service host so that it
can re-authenticate the user.
Or, perhaps, the user might be authenticated, but might not have a particular AI
experience enabled for their account. The Terminal could similarly provide a
button to prompt the user to remedy this. TODO! should we? Or is that the
responsibility of the extension?
#### Prompting
<table>
<thead>
<td>Request</td>
<td>Response</td>
</thead>
<tr>
<td>
```ts
{
"verb": "promptForCommands",
"prompt": string,
"context": {}
}
```
</td>
<td>
```ts
{
"verb": "promptForCommands",
"result": number,
"message": string
"commands": string[],
}
```
</td>
</tr>
</table>
**Purpose**: The `prompt` is a natural-language description of a command to run.
The provider should take this and turn it into a list of `commands` that the
user could run. The `commands` should be commandlines that could be directly ran
at the prompt, without any additional context accompanying them.
We could theoretically put command history in `context`, if that's not PII / if
we're allowed to. That might help refine results. For example, knowing if the
commandline should be a CMD/PowerShell/bash (or other \*nix-like shell) would
greatly refine results.
#### Explain this
<table>
<thead>
<td>Request</td>
<td>Response</td>
</thead>
<tr>
<td>
```ts
{
"verb": "explainThis",
"prompt": string,
"context": {}
}
```
</td>
<td>
```ts
{
"verb": "promptForCommands",
"result": number,
"message": string
"response": string,
}
```
</td>
</tr>
</table>
**Purpose**: The `prompt` is a string of text in the user's terminal. They would
like more information on what it means.
We could theoretically put additional command history in `context`, if that's
not PII / if we're allowed to. Most specifically, I think it might be helpful to
give the entirety of the command that the prompt appeared in, if that's known to
us. Again, that might be PII.
This could be used in two contexts:
* A terminal-buffer initiated "what does this mean" scenario. This could be
something like:
* The user selected some text in the buffer and wants to know what it means
* A command exited with an error, and the Terminal provided a shortcut to
inquire what that particular error means.
* A UI-driven "I need some help" scenario. This is a more ChatGPT-like
experience. The user wants to know more about something, with more context
than just commands as responses.
## User Experience and Design
![](./img/llm-providers-settings-000.png)
_programmer art mockup of settings page displaying list of available Terminal LLM providers_
Presumably then each drill-in page would have individual settings Terminal can
then control for each provider. For example, controlling permissions to what the
plugin can or cannot do / access
## Potential Issues
* [ ] TODO! Branding - how do we want to allow individual providers to specify
branding elements in the AI experiences? I'm thinking things like title text,
logos, etc.
* [ ] TODO! As noted above - how exactly should authentication/subscription
failures be handled?
* Do we need a dedicated "authenticate" verb on the API?
* [ ] TODO! Determine how much additional context can be sent to extensions.
* [ ] TODO! We need to also add a way for Terminal to securely store the allowed
permissions per-provider. For example, if we're even thinking about providing
profile/commandline/history/command context to the provider, the user needs to
be able to disable that on a per-provider basis.
* [ ] TODO! ...
### Tenents
<table>
<tr><td><strong>Sustainability</strong></td><td>
[comment]: # What kind of impacts, if any, will this feature have on the environment?
It's not good, that's for sure.
* This [source] estimated a single ChatGPT query at 6.79 Wh.
* An IPhone 15 has a 12.98 Wh battery
* So a single query is like, .5 phone batteries of power.
* According to [the EIA], the US contributes 0.86 pounds of CO2 per kWh
* Napkin math: We've got 1M users with one query a day. (Obviously, it might be
more users with fewer queries, or fewer with more.)
* That's (6.79Wh * 1000000/day) = 6790000 Wh = 6790 kWh / day
* That's (6790kWh * 0.86 lb CO2 / kWh) = 5839.4 lbs CO2 / day
* = 2.64870729 metric tons CO2 / day
* = 966.77816085 tons/year
Author note: I'd rather not build a product that adds measurable tons of CO2 a
day. Not sure how we can justify this until the power consumption of LLMs comes
down dramatically.
<tr><td><strong>Privacy</strong></td><td>
[comment]: # How will user data be handled? What data will be shared with extension providers?
Terminal will present to users a number of settings to control how much context plugins are able to recieve from the Terminal.
* Currently selected text
* Currently typed commandline
* Most recent (N) command(s)
* and errorlevels
* and output
* Environment variables
* profile commandline(?) (we may want to always provide the target exe, without args, as a bare min)
* Other panes too?
TODO! This list is incomplete; you can help by adding missing items
</td></tr>
<tr><td><strong>Accessibility</strong></td><td>
[comment]: # How will the proposed change impact accessibility for users of screen readers, assistive input devices, etc.
</td></tr>
<tr><td><strong>Security</strong></td><td>
[comment]: # How will the proposed change impact security?
Terminal will have per-provider settings that it controls OUTSIDE of
`settings.json` that controls the permissions for each individual plugin. This
will ensure that plugins do not grant themselves additional permissions by
writing to the Terminal's settings themselves.
</td></tr>
<tr><td><strong>Reliability</strong></td><td>
[comment]: # Will the proposed change improve reliability? If not, why make the change?
</td></tr>
<tr><td><strong>Compatibility</strong></td><td>
[comment]: # Will the proposed change break existing code/behaviors? If so, how, and is the breaking change "worth it"?
</td></tr>
<tr><td><strong>Performance, Power, and Efficiency</strong></td><td>
[comment]: # Will the proposed change
</td></tr>
</table>
## Before spec is done TODO!s
* [ ] TODO! PowerShell folks would like to have the connection be two-way. Can we have extensions invoke experiences in the Terminal?
* [ ] TODO! add interface for Terminal to query what providers are available in each terminal extension
- I think we should do that within a single `uap3:AppExtension`, so that apps
can change the list of providers on the fly, without an update to the app
package
* [ ] ...
## Future considerations
* Maybe it'd be cool if profiles could specify a default LLM provider? So if you
opened the chat / whatever with that pane active, we'd default to that
provider, rather than the one that is otherwise selected as the default?
## Resources
* The [App Service Connection Sample](https://github.com/Microsoft/Windows-universal-samples/tree/main/Samples/AppServices) is basically mandatory reading for how this will work.
### Footnotes
[^1]: A ValueSet isn't exactly JSON, but it is close enough that I'm gonna use it for simplicity
[App Service Connection]: https://learn.microsoft.com/en-us/windows/uwp/launch-resume/how-to-create-and-consume-an-app-service
[source]: https://medium.com/@zodhyatech/how-much-energy-does-chatgpt-consume-4cba1a7aef85
[the EIA]: https://www.eia.gov/tools/faqs/faq.php?id=74&t=11

Binary file not shown.

After

Width:  |  Height:  |  Size: 100 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 100 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 75 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 15 KiB