If you haven’t already installed Continue, you can do that here for VS Code or here for JetBrains. For more general information on customizing Continue, read our customization docs.
Below we share some of the easiest ways to get up and running, depending on your use-case.
Ollama
Ollama is the fastest way to get up and running with local language models. We recommend trying Llama 3.1 8b, which is impressive for its size and will perform well on most hardware.
- Download Ollama here (it should walk you through the rest of these steps)
- Open a terminal and run
ollama run llama3.1:8b
- Change your Continue config file like this:
config.yaml
models: - name: Llama 3.1 8b provider: ollama model: llama3.1-8b
config.json
{
"models": [
{ "title": "Llama 3.1 8b", "provider": "ollama", "model": "llama3.1-8b" }
]
}
Groq
Check if your chosen model is still supported by referring to the model
documentation. If a model has been
deprecated, you may encounter a 404 error when attempting to use it.
Groq provides the fastest available inference for open-source language models, including the entire Llama 3.1 family.
- Obtain an API key here
- Update your Continue config file like this:
config.yaml
models: - name: Llama 3.3 70b Versatile provider: groq model: llama-3.3-70b-versatile apiKey: <YOUR_GROQ_API_KEY>
config.json
{
"models": [
{
"title": "Llama 3.3 70b Versatile",
"provider": "groq",
"model": "llama-3.3-70b-versatile",
"apiKey": "<YOUR_GROQ_API_KEY>"
}
]
}
Together AI
Together AI provides fast and reliable inference of open-source models. You’ll be able to run the 405b model with good speed.
- Create an account here
- Copy your API key that appears on the welcome screen
- Update your Continue config file like this:
config.yaml
models: - name: Llama 3.1 405b provider: together model: llama3.1-405b apiKey: <YOUR_TOGETHER_API_KEY>
config.json
{
"models": [
{
"title": "Llama 3.1 405b",
"provider": "together",
"model": "llama3.1-405b",
"apiKey": "<YOUR_TOGETHER_API_KEY>"
}
]
}
Replicate
Replicate makes it easy to host and run open-source AI with an API.
- Get your Replicate API key here
- Change your Continue config file like this:
config.yaml
models: - name: Llama 3.1 405b provider: replicate model: llama3.1-405b apiKey: <YOUR_REPLICATE_API_KEY>
config.json
{
"models": [
{
"title": "Llama 3.1 405b",
"provider": "replicate",
"model": "llama3.1-405b",
"apiKey": "<YOUR_REPLICATE_API_KEY>"
}
]
}
SambaNova
SambaNova Cloud provides world record Llama3.1 70B/405B serving.
- Create an account here
- Copy your API key
- Update your Continue config file like this:
config.yaml
models: - name: SambaNova Llama 3.1 405B provider: sambanova model: llama3.1-405b apiKey: <YOUR_SAMBA_API_KEY>
config.json
{
"models": [
{
"title": "SambaNova Llama 3.1 405B",
"provider": "sambanova",
"model": "llama3.1-405b",
"apiKey": "<YOUR_SAMBA_API_KEY>"
}
]
}
Cerebras Inference
Cerebras Inference uses specialized silicon to provides fast inference for the Llama3.1 8B/70B.
- Create an account in the portal here.
- Create and copy the API key for use in Continue.
- Update your Continue config file:
config.yaml
models: - name: Cerebras Llama 3.1 70B provider: cerebras model: llama3.1-70b apiKey: <YOUR_CEREBRAS_API_KEY>
config.json
{
"models": [
{
"title": "Cerebras Llama 3.1 70B",
"provider": "cerebras",
"model": "llama3.1-70b",
"apiKey": "<YOUR_CEREBRAS_API_KEY>"
}
]
}