Run Your Own AI LLM Model on a LowEnd VPS for Only $2.49 a Month! Part One: Ollama and the Model

Aug 26, 2025 @ 7:00 am

ai, ashburn, chatgpt, chicago, claude, dallas, llama, llama 4, llama-3, los angeles, new york, ollama, rack nerd, racknerd, san jose, seattle, toronto

TinyLlama Large Language Models (LLMs) have exploded in popularity, and typically they run in massive datacenters operated by tech giants. But here on LowEndBox, we’re all about democratizing computing. Is it possible to run your own AI on a cheap VPS?

Yes!

Now, to be clear, you can’t self-host a digital brain the size of ChatGPT or Claude. Those models have billions upon billions of parameters and require a lot of specialized compute (GPU) to run. Even something like Llama 3, which is a generation back from Meta’s latest model, is not going to be happy without a GPU, and consumer GPUs push prices outside the LowEnd realm.

So how can we run an LLM on a LowEnd VPS? Quantization.

What is Quantization?

Quantization is the process of reducing the precision of the numbers used to represent a neural network’s weights and activations. Most AI models are originally trained using float32 (32-bit floating-point numbers). These offer high precision but consume lots of memory and compute. Quantization shrinks this down to 8-bit, 4-bit, or even fewer bits, drastically reducing the model size.

So instead of 2³² possible values per weight, you get 2⁴, which is 16 possible values. This can radically shrink the size of models and hence the amount of RAM they use.

Of course, there’s a tradeoff. Because of the loss of precision, you lose subtlety in the language and weaker performance on complex tasks. Or to put it more simply, a 4-bit model is not going to be anywhere as good as a 32-bit model.

In this example, we’ll be using TinyLlama, which is a 1.1B-parameter 4-bit LLM model.

What Kind of VPS Do You Need?

I’m going to be using a RackNerd VPS with these specs:

3 vCPU Cores
60 GB Pure SSD Storage
3.5 GB RAM
5TB Monthly Transfer
Only $29.89/YEAR!

That’s only $2.49/month which is a fantastic deal. You can GET YOUR OWN HERE. I placed mine in Los Angeles, CA but these are available in multiple datacenters in North America, including San Jose, Seattle, Chicago, Dallas, New York, Ashburn, and Toronto.

Setup

I’m using Debian 12 and Ollama, a tool for running LLMs.

Ollama is not in apt. You have a couple options.

Option #1: Install Script (recommended)

There’s a nice self-install script from Ollama. This will install the product and configure it to start at boot.

I recommend going this route. All you need to do is:

Go to ollama.com
Click Download in the upper right
Select Linux
Click the copy icon on the script and paste it into a root shell

Ollama Script Install

That will install Ollama, enable it, and start it.

Option #2: Homebrew Option

Ollama is in Homebrew. I initially went down this path but I’ve found Homebrew to be a bit awkward and buggy with Linux. It’s awesome on MacOS. But on Linux, it sets up a parallel systemd, and I kept running into errors and found others had the same experience. So again…almost mandatory for MacOS, but I prefer not to use it on Linux.

But if you want to try it out, perhaps your experience will be different (or perhaps I’m clueless).

To install Homebrew, visit brew.sh, copy the one-liner under “Install Homebrew,” and paste it in your Linux terminal. Homebrew will not mess with apt or your system package manager. You can have both Homebrew and apt running side by side with no issues. apt manages things it installs and Homebrew manages things it installs. My advice would be to use apt primarily, fill in anything missing with Homebrew, and don’t install the same packages with both.

Homebrew

Note: Homebrew will not install as root, so paste it in a non-root terminal. However, the user who’s installing Homebrew should have sudo capabilities. There are different ways to configure sudo, but the easiest is to install sudo (apt install sudo) and then add whatever user you’re using into the ‘sudo’ group in /etc/group.

Now install Ollama:

brew install ollama

Start Ollama:

brew services start ollama

Trying It Out

We need to get the TinyLlama LLM. Ollama makes this easy:

ollama pull tinyllama

Now run it:

ollama run tinyllama

Here’s a video showing the experience of running a prompt. As you can see, it starts up quickly and output is pretty sprightly:

While this was running, the system was under heavy load:

Ollama top

Still plenty of available RAM:

Ollama RAM

Now that we’ve got Ollama up and running, let’s turn this into a full-fledged web UI like ChatGPT. Stay tuned for part two!

The Year Ahead: Our Predictions for 2026

Looking Back at 2025: What a Wild Year! Our Recap of the Year's Biggest Moments

Here's Some Help to Win an Amazing Gift from RackNerd (A Gaming Laptop, ASTRO Wireless Headset, or M...

What Deal Did You Get This Black Friday? And... Did You Win Anything?

Inside the 2025 Hosting “Trade War”: Hardware Turbulence, Vanishing Deals, and What It Means for All...

REVIEW: I Tried One of DediRock's $6.85/Year VPSes

raindog308

Raindog308 is a longtime LowEndTalk community administrator, technical writer, and self-described techno polymath. With deep roots in the *nix world, he has a passion for systems both modern and vintage, ranging from Unix, Perl, Python, and Golang to shell scripting and mainframe-era operating systems like MVS. He’s equally comfortable with relational database systems, having spent years working with Oracle, PostgreSQL, and MySQL.

As an avid user of LowEndBox providers, Raindog runs an empire of LEBs, from tiny boxes for VPNs, to mid-sized instances for application hosting, and heavyweight servers for data storage and complex databases. He brings both technical rigor and real-world experience to every piece he writes.

Beyond the command line, Raindog is a lover of German Shepherds, high-quality knives, target shooting, theology, tabletop RPGs, and hiking in deep, quiet forests.

His goal with every article is to help users, from beginners to seasoned sysadmins, get more value, performance, and enjoyment out of their infrastructure.

You can find him daily in the forums at LowEndTalk under the handle @raindog308.

1 Comment

Htoo Htet Lin Kyaw:
Still waiting for part2.Interesting🤍
August 27, 2025 @ 7:18 am | Reply