Understanding Rust Smart Pointers

Understanding Rust Smart Pointers

What are they and how do they work?

On this post I intend to walk down memory lane a bit to understand what exactly are Smart Pointers? Where do they come from? And of course, how do they work?

Very simply, a smart pointer is an abstract data type that simulates a pointer while providing added features, such as automatic memory management and bounds checking. These are intended to reduce bugs caused by the misuse of pointers, while retaining efficiency.

Before we talk about how Rust deals with Smart Pointers, let’s go back a little and take a look at the history and where smart pointers came from.

A little bit of history

Smart pointers were first popularized in the programming language C++ during the first half of the 1990s as rebuttal to criticisms of C++’s lack of automatic garbage collection.

Even though the concept of smart pointers was popularized with C++, especially the reference-counted type of smart pointers, the immediate predecessor of one of the languages that inspired C++’s design had reference-counted references built into the language. C++ was inspired in part by Simula67 and Simula67’s ancestor was Simula I. Insofar as Simula I’s element is analogous to C++’s pointer without null, and insofar as Simula I’s process with a dummy-statement as its activity body is analogous to C++’s struct. Simula I had reference counted elements (i.e., pointer-expressions that house indirection) to processes (i.e., records) no later than September 1965, as shown in the quoted paragraphs below.

Processes can be referenced individually. Physically, a process reference is a pointer to an area of memory containing the data local to the process and some additional information defining its current state of execution.

Because C++ borrowed Simula’s approach to memory allocation — the new keyword when allocating a process/record to obtain a fresh element to that process/record — it is not surprising that C++ eventually resurrected Simula’s reference-counted smart-pointer mechanism within element as well.

Smart Pointers in C++

Very simply, a smart pointer in C++ is a class with overloaded operators, which behaves like a conventional pointer. Yet, it supplies additional value by ensuring proper and timely destruction of dynamically allocated and facilitates a well-defined object lifecycle.

The Problem with Using Conventional (Raw) Pointers in C++

Unlike many other programming languages, C++ provides full flexibility to the programmer in memory allocation, deallocation and management. Unfortunately, this flexibility is a double-edged sword. On one side it makes C++ a powerful language, but on the other hand it allows the programmer to create problems related to memory-management., such as memory leaks, specially when dynamically allocated objects are not released at the right time.

Here is an example:

SomeClass* pointerData = anObject.GetData();

pointerData->DoSomething();

In the above example, there is no obvious way to tell whether the memory pointer to pointerData

  • Was allocated on the heap , and needs to be deallocated

  • Is the responsibility of the called to deallocate

  • Will pointerData be automatically be destroyed by the object’s destructor

You could say something like. “The programmer just has to be careful and follow proper best practices”. In an ideal world that would be enough, but as you probably already know, in the real world it isn’t. That is why we need some mechanisms to protect us from ourselves.

How do Smart Pointers Help in C++?

Just to be clear, even with all the conventional pointer and conventional memory management techniques, the C++ programmer is not forced to use any of them when he needs to manage data on the heap/free store. The programmer can choose a smarter way to allocate and manage dynamic data by adopting the use of smart pointers in his programs:

smart_pointer smartPointerData = anObject.GetData();
smartPointerData->Display();
(*smartPointerData).Display();

// No need to worry about deallocation
// The Smart Pointer destructor will take care of that for you.

By looking at it, smart pointers may behave like a conventional pointer, but in reality they supply useful features via their overloaded operators and destructors to ensure that dynamically allocated data is destroyed in a timely manner.

Types of Smart Pointers in C++

The management of the memory resource (that is, the ownership model implemented) is what sets smart pointer classes apart. Smart Pointers will decide what to do with the resource when they are copied and assigned to. The simplest implementation often results in not the best performance, whereas the fastest ones might not suit all applications. At the end, it is up to the programmer to understand its needs before he or she decides to use it in his or hers program.

The classification of smart pointers in C++ is basically the classification of their memory resource management strategies. These are:

  • Deep copy

  • Copy and Write (COW)

  • Reference counted

  • Reference linked

  • Destructive copy

I am not going to dive into what each of these are, because although the concepts are similar, they are C++ related and I think this is enough of C++ for now. Otherwise we will start to get too much into the weeds of C++ which was not the purpose of this post. The purpose until now was to give a little bit of background on where Smart Pointers are and where they came from.

Now it is time to dig into Rust Smart Pointers!!

Smart Pointers in Rust

We have been talking about Smart Pointers this, Smart Pointers that, but what about a Pointers.

What in the hell is a Pointer?

Well, a pointer is a variable that contains an address in memory, It points to, or refers to some other data, so the pointer variable it self does not contain the actual data. You can think of it like an arrow to that value.

pointer variable pointer_to_value referencing a point in memory with a value

Let me try to explain with an example. Let’s imagine that our friend John Smith has invited me to visit him at this house and he lives in one of those Gated Communities and our friend gives us the address of the community.

So now we know the general location where he lives, just like your program knows generally where the temporary data for your application would be stored in memory, being the Stack or the Heap.

Once we get to the gated community, we need to retrieve the specific address of his house so when we get to the front gate, we ask the guard in which house our friend John Smith lives. The guard has stored the address to our friend’s house, just like your application would request the address to a specific piece of data from a pointer variable.

Pointers are quite simple in theory, they simply point us to the address where our data is in memory, much like the guard points us to our friends house in the gated community.

Alright, that is enough about pointers in general.

What about pointers in Rust?

Well, Rust has two regular types of pointers called references. They are recognized by ampersand in front of the variable name.

& for an immutable reference. (which is the default behaviour by the way)

fn my_function(my_variable: &String) {
    // do something
}

&mut for a mutable reference.

fn my_function(my_mutable_variable: &mut String) {
    // do something
}

References to a value don’t actually own the value (usually), the just borrow it. In other words, the reference can disappear and the value it pointed at will still exist. This is related to the concept of Ownership in Rust, which I am not going to get into because this rabbit hole is already gone too deep and we need to start climbing our way out.

Finally! Smart Pointers in Rust!

The references I explained above are regular pointers, that only point to some piece of data, but they don’t do much else. Smart Pointers in Rust are actually data structures that not only act like a pointer, but also have additional metadata and extra features. Features that a regular pointer would not have.

You could say that they are smart cookies…

Smart Pointer studying up

Smart Pointers are usually implemented using structs.

One difference between regular references and Smart Pointers in Rust is that references only borrow the data, while in most cases a smart pointer will own the data they point to. In other words, when the smart pointer gets dropped, the data they point to gets dropped.

Another important point is that a Smart Pointers implement the [Deref](https://doc.rust-lang.org/std/ops/trait.Deref.html) and [Drop](https://doc.rust-lang.org/std/ops/trait.Drop.html) traits. The [Deref](https://doc.rust-lang.org/std/ops/trait.Deref.html) trait allows an instance of the smart pointer struct to behave like a reference so you can write code that works with either references or Smart Pointers. The [Drop](https://doc.rust-lang.org/std/ops/trait.Drop.html) trait allows you to customize the code that is run when an instance of the Smart Pointer goes out of scope, which can be very useful.

I am not going into detail into every type of Smart Pointer in Rust. But I do want to cover the most common ones existing in the standard library at least.

  • Box<T> for always allocating values on the heap

  • Rc<T>, a reference counting type that enables multiple ownership

  • Ref<T> and RefMut<T>, accessed through RefCell<T>, a type that enforces the borrowing rules at runtime instead of compile time (to be avoided when possible).

Box

Box is the most simple Smart Pointer and it is used to store data on the heap and not on the stack, even when the data size is known at compile time. The Box Smart Pointer it self will be stored on the Stack, but the data it points to will be stored on the Heap.

Box comes in handy when working with recursive types for example. At compile time the Rust compiler wants to know how much space will be required by a type, and that becomes difficult when working with recursion as in theory, it can be infinite. So, we use Box type to provide the size of the type so that compiler knows how much memory is to be allocated.

enum List {
    Cons(i32, Box),
    Nil,
}

use List::{Cons, Nil};

fn main() {
    let list = Cons(1,
        Box::new(Cons(2,
            Box::new(Cons(3,
                Box::new(Nil))))));
}

In the above example, the Cons variant needs the size of i32 plus space to store the box pointer data. By using the box we have broken the infinite recursive chain and compiler can now figure out the size of List.

Rc

Rc stands for Reference Counted, and is this type of Smart Pointer is used to enable multiple ownership of the data, which by default is not allowed by the Ownership Rust model. The Rc smart pointer type keeps track of the number of references to a value from which we can know how many places our variable is being used. If the reference count gets down to zero then the value is not used anywhere so the value can be cleaned up safely from memory without causing any trouble.

enum List {
    Cons(i32, Rc),
    Nil,
}

use List::{Cons, Nil};
use std::rc::Rc;

fn main() {
    let a = Rc::new(Cons(5, Rc::new(Cons(10, Rc::new(Nil)))));
    println!("count after creating a = {}", Rc::strong_count(&a));
    let b = Cons(3, Rc::clone(&a));
    println!("count after creating b = {}", Rc::strong_count(&a));
    {
        let c = Cons(4, Rc::clone(&a));
        println!("count after creating c = {}", Rc::strong_count(&a));
    }
    println!("count after c goes out of scope = {}", Rc::strong_count(&a));
}

Outputs:

count after creating a = 1
count after creating b = 2
count after creating c = 3
count after c goes out of scope = 2

Here we can see that the Rc count is 1 when we create variable a, after that, when we create another variable b by cloning variable a, the count goes up by 1 to 2. Similarly, when we create another variable c by again cloning variable a, the count further goes up by 1 to 3. After c goes out of scope count goes down by 1 to 2.

RefCell

RefCell is what makes it possible to work with the Interior Mutability Pattern which is a design pattern in Rust that allows you to mutate data even when there are one or more immutable references to the data, which goes completely against the borrowing rules declared in the Ownership model. To be able to do that, the pattern uses unsafe code inside a data structure to bend Rust’s rules about mutation and borrowing.

If you are interesting in learning more about working with unsafe code, you can check it out the book The Rustonomicon.

A common way to use Refcell is in combination with Rc which is a reference counter. If we have multiple owners of some data and we want to give access to mutate data then we have to use Rc that hold a Refcell.

#[derive(Debug)]
enum List {
    Cons(Rc, Rc),
    Nil,
}

use List::{Cons, Nil};
use std::rc::Rc;
use std::cell::RefCell;

fn main() {
    let value = Rc::new(RefCell::new(5));

    let a = Rc::new(Cons(Rc::clone(&value), Rc::new(Nil)));

    *value.borrow_mut() += 10;

    println!("a after = {:?}", a);
    }

Outputs:

a after = Cons(RefCell { value: 15 }, Nil)

In the example above, we have created an instance of Rc<Refcell> and store it in a variable named value and also we created a list a with a cons variant that holds the value. We cloned value so that both a and value have ownership of inner cell which has a value of 5 rather than transferring ownership from value. After that, we add 10 in value by calling borrow_mut() on value and this method return RefMut smart pointer and we use reference operator on it to change the inner value of it.

Conclusion

One of my favorite quotes is “What I cannot create, I do not understand” from Richard Feynman and of course that you are not going to re-create everything and start reinventing the wheel over and over again. But I do believe that it is of the most importance to at least understand how things work up to a certain level. I hope that by understanding what are Smart Pointers, where they came from and what are the different types of it, you can use this knowledge to design better system and make better decisions in the future.

Another nice quote that I like is “Bad programmers worry about the code. Good programmers worry about data structures and their relationships.” from Linus Torvalds, and Smart Pointers are basically a specific type of data structure that relates to data in a certain way, so understanding can only make you a better programer and I hope I could help at least a little.

Don’t forget to follow and hit that clap if you like the content.

Additional sources: