Skip to main content

Local LLM Deployment

To create assistants that run entirely on your machine, you must run a model locally. We recommend the OpenHermes-NeuralChat merged model that is 7 billion parameters and ~6GB. We have tested Rubra with this model, but you can use any model you want at your own risk. Let us know if you'd like support for other models by opening up a Github issue!

We leverage llamafile to distribute and run local LLMs.

Prerequisites

Make sure you meeting the prerequisites before you start.

Setup

  1. Manually download the Llamafile for your OS from Hugging Face or run this command:

    curl -L -o rubra.llamafile https://huggingface.co/rubra-ai/rubra-llamafile/resolve/main/rubra.llamafile
  2. Give the file executable permissions:

    chmod +x rubra.llamafile
  3. Run the model:

    ./rubra.llamafile --ctx-size 16000
    note
    • (Optional) Increase/decrease the context window size with the --ctx-size flag. The default is 16000. A larger context window size will increase the memory usage of the model but will result in high quality responses. Those without a GPU and/or limited RAM (i.e. 8 GB) should keep this value low.
    • GPU Support:
      • -ngl is the number of layers offloaded to the GPU. The default is 35. You can adjust this value to offload more/ess layers to the GPU. Add this to your command: ./rubra.llamafile --ctx-size 16000 -ngl 35
      • Apple Silicon on MacOS
        • You need to have Xcode Command Line Tools installed for llamafile to be able to bootstrap itself
        • If you use zsh and have trouble running llamafile, try running sh -c ./rubra.llamafile --ctx-size 16000. This is due to a bug that was fixed in zsh 5.9+
      • NVIDIA GPUs
      • AMD GPUs

Testing

Congrats! You have a model running on your machine. To test it out, you can run the following command:

curl http://localhost:1234/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer no-key" \
-d '{
"messages": [
{
"role": "system",
"content": "You are a friendly assistant"
},
{
"role": "user",
"content": "Hello world!"
}
]
}'