How Autocomplete Works

On this page

Timing
Context
Filtering

Timing

In order to display suggestions quickly, without sending too many requests, we do the following:

Debouncing: If you are typing quickly, we won’t make a request on each keystroke. Instead, we wait until you have finished.
Caching: If your cursor is in a position that we’ve already generated a completion for, this completion is reused. For example, if you backspace, we’ll be able to immediately show the suggestion you saw before.

Context

Continue uses a number of retrieval methods to find relevant snippets from your codebase to include in the prompt.

Filtering

Language models aren’t perfect, but can be made much closer by adjusting their output. We do extensive post-processing on responses before displaying a suggestion, including:

filtering out special tokens
stopping early when re-generating code
fixing indentation

We will also occasionally entirely filter out responses if they are bad. This is often due to extreme repetition.

You can learn more about how it works in the Autocomplete deep dive.

Quick Start Model setup

Getting Started

Features

Customization

Help

How Autocomplete Works

Timing

Context

Filtering

Getting Started

Features

Customization

Help

​Timing

​Context

​Filtering

Timing

Context

Filtering