generated with MS Designer: LLM generates code

Code faster for FREE, with TabbyML coding assistant

LLMs free and open source

Cedric Ferry
4 min readDec 27, 2023


With the recent advance in Artificial Intelligence and Large Language Models fields, developers are at the forefront of this innovation. You heard of Chat GPT, Google Bard, GitHub Copilot and Llama. Today I want to talk about an Open Source player that leverages LLMs to help developers write code faster and for free.

TabbyML: an open source Coding Assistant

TabbyML is a relatively new project, but already very popular with more than 14 thousand stars on GitHub and counting. The goal of this project is to offer a Copilot-like companion, that developers can run either locally or on their own infrastructure and for free. Here, I will talk exclusively of what you can do locally.

TabbyML already raised more than 3 millions dollars and is being actively developed by the community, which distinguishes it from other Open Source projects like FauxPilot.

What is TabbyML?

TabbyML is a LLM server that can be activated from an IDE plug-in, like VSCode, Intelij, Android Studio or vim.

TabbyML uses contextual code and comments to generate snippets of code.

Its particularity is that unlike GitHub Copilot or Chat GPT, the model runs on your own infrastructure, including on your own computer.

Tabby is designed for performance and is written in Rust.

TabbyML logo

The beauty of customisation thanks to open source

TabbyML is highly customisable. You can select what model you want to run among a set of Open Source LLMs like: StarCoder, CodeLlama and DeepseekCode.

You can provide access to your own code repository to the model, so TabbyML has additional context.

You can run it on your computer, on your own server or via a service like huggingface.

Why running the model locally?

Large language models require powerful computers to run, but in recent years we have seen hardware manufacturers adding specialized processing units in CPU package. On Mac and iPhone with Apple Silicon and the Neural Engine. On Google Pixel devices with Tensor and EdgeTPU, and recently Intel announced the Core line up will get NPUs (Neural Processing Unit). Clearly there is a push in the entire industry to bring AI closer to the users. This is a different approach from services like OpenAI ChatGPT and GitHub Copilot that run AI in the cloud and requires users to pay a monthly fee to get access.

Google Tensor and Apple Silicon M1

Training vs Inference

In LLMs, there are essentially two distinct processes: training which requires massive super-computers with Nvidia GPUs and other specialized hardware to process massive amounts of data from open, public datasets (the Internet, Wikipedia, GitHub, books in the free domain…). Once the model is trained and fine tuned, it is ready for the user to querying also known as inference.

Inference requires less power and can run locally on consumer products. The most recent is the announcement from Google of Gemini Nano that is already running on Pixel 8 Pro devices. TabbyML is about inference therefore is able to run on some consumer hardware, like Apple Silicon, Nvidia GPUs…

Kotlin support added, feedback welcome

TabbyML already support many popular languages such as JavaScript, Python, Rust, Java, Go Lang, C/C++... Kotlin was recently added which open new fields for Android Developers.

Getting started

I encourage you to try Tabby yourself by playing in the playground, checking the supported programming languages and installing Tabby on your computer.

Left: TabbyML suggesting Fibonacci function in Python, Right: TabbyML suggesting tests for fibonacci function. VSCode


With powerful chips, we can now run LLMs inference locally, this will eventually democratises AI.

With a tool like TabbyML, software developers can take advantage of the technology in just a few click and commands.

If you enjoy reading this article, please consider clapping and sharing, thanks for your time!