Deploying LoRA models
This guide will walk you through uploading and deploying your own fine-tuned LoRA models.
Installing firectl
The firectl
command-line interface (CLI) will be used to manage your LLM
models.
curl https://storage.googleapis.com/fireworks-public/firectl/stable/darwin-arm64.gz -o firectl.gz
gzip -d firectl.gz && chmod a+x firectl
sudo mv firectl /usr/local/bin/firectl
sudo chown root: /usr/local/bin/firectl
curl https://storage.googleapis.com/fireworks-public/firectl/stable/darwin-amd64.gz -o firectl.gz
gzip -d firectl.gz && chmod a+x firectl
sudo mv firectl /usr/local/bin/firectl
sudo chown root: /usr/local/bin/firectl
wget -O firectl.gz https://storage.googleapis.com/fireworks-public/firectl/stable/linux-amd64.gz
gunzip firectl.gz
sudo install -o root -g root -m 0755 firectl /usr/local/bin/firectl
Signing in
Run the following command to sign into Fireworks:
firectl signin
Confirm that you have successfully signed in by listing your account:
firectl list accounts
You should see your account ID.
Uploading a fine-tuned model
Make sure to review the requirements for a fine-tuned model. Sample configs for supported models are available here.
To upload a fine-tuned model located at /tmp/falcon-7b-addon/
, run:
firectl create model my-model /tmp/falcon-7b-addon/
Once uploaded, you can see your model with:
firectl list models
Deploying your model
To deploy the model for inference, run:
firectl deploy my-model
Testing your model
Once your model is deployed, you can query it on the model page.
- Visit the list of your models.
- Click on the model you deployed.
- Enter your text prompt and click "Generate Completion"
You should see your model's response streamed below.
Using the API
You can also directly query the model using the /v1/completions
API:
curl \
-H "Authorization: Bearer ${API_KEY}" \
-H "Content-Type: application/json" \
-d '{"model": "accounts/<ACCOUNT_ID>/models/my-model", "prompt": "hello, the sky is"}' \
https://api.fireworks.ai/inference/v1/completions
import fireworks.client
fireworks.client.configure(
api_base="https://api.fireworks.ai/inference",
api_key="<API_KEY>",
)
fireworks.client.Completion.create(
model="falcon-7b-addon",
prompt="Say this is a test",
max_tokens=7,
temperature=0,
)
Cleaning up
Now that you are finished with the guide, you can undeploy the models to avoid accruing charges on your account:
firectl undeploy my-model
You can also delete the model from your account:
firectl delete model my-model
Deployment limits
Non-enterprise accounts are limited to a maximum of 100 deployed models.
Updated 2 months ago