LowEndBox - Cheap VPS, Hosting and Dedicated Server Deals

ELI5: Why is Everything Being Rewritten in Rust?

Rust Programming LanguageCommunity legend @MannDude asked a question on LowEndTalk:

> ELI5: Why is everything being rewritten in Rust?

>Like, whatever happened to the whole, “If it’s not broke, don’t fix it” mantra?

I started to write a long response but then realized this would be better as a LEB article, so here we are.

Recently some core Linux/Unix utilities have been rewritten in Rust and there will probably be more of this.  Why rewrite?

It’s not that these utilities are broken, it’s that the language they were written in (C) is very unsafe and there’s the possibility of lurking bugs and vulnerabilities.  You could say it’s preventative maintenance.

Why Do We Say C is Unsafe?

When the C programming language was developed circa 1970, computers had vastly smaller resources.  The PDP-11, for example, came with 4KB of memory.  Not gigabytes or megabytes – kilobytes.  In such an environment, assembler was typically used, and C – sometimes known as “portable assembler” – is a very low-level language.  With C, every bit counts, and one of its strengths is producing tight code and small executables, which is why it’s so often used for embedded systems.  (Amusingly, what’s considered a “tiny embedded system” in 2026 would be a massive room-sized computer in 1970).

C is a wonderful language, but it comes with some limitations.  These reflect the constraints that C was created within, but also the era.  Many modern concepts simply hadn’t been invented yet.

Here are some limitations with C.

No Bounds Checking

int a[10];
a[100] = 42; // Compiles, but writes to arbitrary memory

Of course, you wouldn’t actually write that code – it’d be something like a[index_variable] and the issue would surface during runtime.  But the point is that the program doesn’t crash or throw an exception – it just writes 42 to arbitrary memory.  This is a major source of vulnerabilities, because malicious code will write code that grabs a root shell to arbitrary memory, and then trick the program to jump to that location and execute the code.

With modern languages, arrays are handled differently.  You can create an array of 10 items, but if you try to write to the 11th slot, an error is thrown.  It’s still a bug but it’s not a vulnerability.

Buffer Overflows

If you’ve ever read a CVE, you know this term.

char name[10];
strcpy(name, "This string is much too long");

This is a crude example but that is legal C code.  In modern languages, strings for example are handled differently.  They’re not fixed-length, and often they’re immutable, so if you want to copy or alter a string, you create a whole new variable.  And as before, if you violate an allocation you get an error or a crash, not a situation where you’re writing to random memory.

As you can see from these two examples, C makes it easy to shoot yourself in the foot.  Modern languages “fail safe”.

Manual Memory Management

This gives rise to three different problems.

First, you can leak memory.  If you have a subroutine that allocates 1MB of memory, does some work with it, and then forgets to release it, then every time than subroutine is called, you’re going to chew up another 1MB of memory.  Over time, your program will grow and grow until it consumes all system memory.  In a small subroutine, it’s easy to keep track of memory, but if you’re calling many subroutines, and maybe one of them encounters a different execution path and so the cleanup is called, you’ve introduced a maddening error.

Second, you can accidentally free the memory twice, which will cause a program crash.

But the worst is allocating memory, freeing it, and then using that variable after it’s been freed.  You have no idea what’s in that memory space then, and C is not going to prevent you from doing that.

For example, take this code, which is mentioned in Jim Blandy’s Programming Rust:

int main(int argc, char **argv) {
  unsigned long a[1];
  a[3] = 0x7ffff7b36cebUL;
  return 0;
}

Blady says that on his laptop, this prints:

undef: Error: .netrc file is readable by others.
undef: Remove password or make file unreadable by others.

What?!?!  Blandy explains:

The array a is only one element long, so using a[3] is, according to the C programming language standard, undefined behavior:

  Behavior , upon use of a nonportable or erroneous program construct or of erroneous data, for which this
International Standard imposes no requirements

Undefined behavior doesn’t just have an unpredictable result: the standard explicitly permits the program to do anything at all . In our case, storing this particular value in the fourth element of this particular array happens to corrupt the function call stack such that returning from the main function, instead of exiting the program gracefully as it should, jumps into the midst of code from the standard C library for retrieving a password from a file in the user’s home directory. It doesn’t go well.

There are techniques and tools to analyze C programs to prevent this, but the main benefit of modern languages is that this can’t happen.  Languages such as Java or Golang use garbage collection, in which the runtime periodically scans all allocated memory and de-allocates what is no longer in use.  This is a fine approach, but does make the runtime a bit chunkier, and it’s still possible to have a memory leak (though a lot more difficult).  Rust takes the approach of analyzing how memory is allocated at compile-time, by using a different memory allocation paradigm and so memory leaks are prevented.

Raw Pointers

Pointers are a programming technique where a variable points at a location in memory.  They’re very useful, but also quite dangerous.  For example:

int *p = (int *)0x12345678;
*p = 5;

That is going to write that integer at that memory address.  C treats all of memory as a giant array and you can go poking around in it and writing whatever you want wherever you want.  If you have a program that writes to specific memory addresses, it’s very easy to make a mistake and overwrite parts of your code.  The OS will prevent you from going outside your program’s assigned space, but regardless this is an avenue for exploitation.  And if your code is running as root…

Weak Type Safety

int x = 5;
float *f = (float *)&x;

Here we’ve created an integer variable but are stuffing a floating point number into it.  Will it work?  Surprisingly, the answer is that it depends on your compiler and OS, because the permitted sizes of things like ints and floats is implementation-specific.  Modern languages type-check more strongly so if you try to put a string into a float, your program will either not compile (ideally) or error out.

Undefined Behavior

int x = 2147483647;
x = x + 1; // Signed integer overflow

…which means that the value of x is…what exactly?  -2147483647?  It’s up to the compiler author to decide because (like the example above from Blady’s book), this is “undefined” behavior.  In theory, a compiler author could decide that overflowing an integer variable should erase your hard drive and that would conform to the standard.  Rust and other modern languages don’t have “undefined” behaviors.

C is the Wild West

This isn’t to say that C is a bad language.  If you’re writing a kernel like Linux, you’re going to need to manipulate memory and hardware at a very low level, and C (or assembler) is ideal for this.  The same is true for robotic controls, elevator circuits, and other embedded environments.

But C is often a foot-gun.  Humans are not perfect and we make mistakes.

Why Rust is Better

With Rust, those mistakes are usually discovered at compile-time.  This is the ideal.  Find your issues when you’re working on the code, not years later when it’s running in production.

Or if the mistake can’t be discovered at compile-time, fail without opening up avenues for exploitation.  For example, let’s say your program is processing data and expects no more than 500 items, so it creates a 500-item array.  When the unexpected happens and it gets input to place a 501st item into that array, it’s going to throw and error and abort, not overwrite unknown memory.  In reality, programmers would probably choose one of Rust’s other advanced collections which can grow the array, but the point is that it fails gracefully.

There is no manual memory management in Rust.  This alone eliminates a huge class of errors and possible exploits.

Rust does allow unsafe things – there is an “unsafe” keyword – but when that’s necessary, those parts of the code are isolated and clearly marked.

TLDR

When many things were written in the past – including nearly all of the typical Unix environment – C was the only game in town.  Today, rewriting that code in Rust eliminates many classes of errors and prevents future bugs and exploits.

 

No Comments

    Leave a Reply

    Some notes on commenting on LowEndBox:

    • Do not use LowEndBox for support issues. Go to your hosting provider and issue a ticket there. Coming here saying "my VPS is down, what do I do?!" will only have your comments removed.
    • Akismet is used for spam detection. Some comments may be held temporarily for manual approval.
    • Use <pre>...</pre> to quote the output from your terminal/console, or consider using a pastebin service.

    Your email address will not be published. Required fields are marked *