Explainer: Run Generative AI On Your Own Computer with llamafile

Dulany Weaver

29 May 2024 — 4 min read

llamafile is one of the easiest ways to start using Generative AI on your own computer.

Generative AI can be an incredibly powerful tool with a wide range of applications- taking notes, drafting documents, and providing customer support, just to name a few.

But today’s commercial offerings from Google or OpenAI come with a major risk: privacy. To use their services, you must send your data to their AI – meaning that the privacy of your data is only as good as their security. Many services also use your data directly for training future AI systems, so you never know where that info may pop up in the future.

However, there are other options. There is a vibrant open-source community developing generative AI, and they offer a number of great off-the-shelf systems that perform at ChatGPT levels. One of the easiest to use is llamafile from Mozilla (the company behind the Firefox web browser). Llamafile offers commercial-grade generative AI packed into a single file, and which works seamlessly on every operating system – no fancy hardware needed. Let’s get started!

Downloading the llamafile

First, head over to the llamafile Github page and scroll down to the table titled “Other example llamafiles”.

Screenshot of the llamafile Github homepage.

Screenshot of the “Other example llamafiles” table from the llamafile Github page, showing a number of llamafiles based on different models. — Screenshot of the “Other example llamafiles” table from the llamafile Github page.

You’ll see there are a number of excellent open-source models to choose from. I’d recommend starting with LLaVa 1.5, which is a great combination of “small” (a few GB is small in the gen AI world), high-performing, and multimodal – meaning it can take both text and images as input. Click on the appropriate link in the “llamafile” column of the table, and save the file to your computer.

One-step system configuration for llamafile

Once the file is finished downloading, the next step depends on your operating system.

If you’re using Windows, open up the folder with the downloaded file, right click on the file, and rename it so that it has “.exe” at the end.

Screenshot of a Windows folder containing the .llamafile file, showing a folder named "llamafile" and a file named "llava-v1.5-7b-q4.llamafile. — Screenshot of a Windows folder containing the .llamafile file.

Screenshot of a Windows folder containing the .llamafile file, but renamed with a “.exe” at the end.

(If you’re using Mac or Linux, please see the official quickstart section for next steps.)

Then, double-click on the recently-named “.exe” file. Your computer will likely throw a warning – that’s normal when trying to run an unregistered .exe file for the first time. Click “More Info”, then the “Run Anyway” button. You will likely need some level of Administrator rights on your computer to do this.

Screenshot of the warning message from running the .exe file.

Screenshot of the warning message after clicking “More info”.

Generative AI on your own computer

A command box pop up, showing the details of the system running in the background. You can ignore this.

Some seconds later, you’ll also have a tab pop up on your browser, showing the web page for your new Generative AI assistant. The title says “llama.cpp” – this is the name of the engine running your model. There is also a lot of other technical stuff on the page that lets you tune your assistant if you choose. Ignore that for now, and scroll to the box at the bottom with the note “Say something…” (circled in red in my screenshot). This is where you’ll enter your prompt.

Command window showing the GenAI model’s operations. Ignore this for now.

Screenshot of the llamafile interface webpage. The prompt area is circled in red at the bottom.

After entering your prompt, click the “Send” button just below the box, and that will start your chat session. If you want to interrupt the system, click “Stop”. If you want to clear the chat memory and start fresh, just click “Reset”.

I mentioned above that LLaVa is a multimodal model. If you have a picture you want to feed it, use the “upload image” button, then select an image. It will likely take a little while to process it (my picture from the screenshot below took a few minutes), but it does a nice job of responding.

Screenshot of the AI’s response to a text prompt.

Screenshot of the AI’s response to a picture of a poppy.

To end the session, just close the tab and the command window.

What to do next with your own Generative AI

Congratulations! You now have your own Generative AI assistant running on your own computer! You can use it just like you would ChatGPT, Microsoft Copilot, or other chat-based gen AI system – but, this time, without sharing any private information with a third-party. Llamafile can also be connected to other applications (like VS Code) via API’s – information on that is available in the llamafile documentation.

If you’re interested, you can also explore some of the other models on the llamafile site and see which works best for your needs. The process for using them is the same as above.

As of today, llamafile is by far the easiest way to get started with local generative AI models. Many thanks to Mozilla AI and Justine Tunney for creating llamafile and their continued work to make generative AI accessible to everyone.

Explainer: Run Generative AI On Your Own Computer with llamafile

Dulany Weaver

Downloading the llamafile

One-step system configuration for llamafile

Generative AI on your own computer

What to do next with your own Generative AI

Read more

Who is DeepSeek, and Why Did It Affect Nvidia?

Computer, No Keyboard: The Next Step in Generative AI Interfaces

The Best Commercial AI Image Generator You’ve Never Heard Of

OpenAI Just Filled Every Tier 1 Support Job in the World