Demystifying Rust pt 2: Vec

February 19, 2024

In the first part, we learned about the allocator api and how to use it. This will assume you've read that part, and build off of knowledge from it.

Structure

Heres our base:

pub struct Vec<T> {
    ptr: NonNull<T>,
    len: usize,
    cap: usize,
}

impl<T> Vec<T> {
    const LAYOUT: Layout = Layout::new::<T>();
    const GLOBAL: Global = Global;
    
    pub fn new() -> Self {
        Self { ptr: NonNull::dangling(), len: 0, cap: 0 }
    }
}

A couple things are structured a little differently than before.

we have addictional len and cap fields
a Layout is generated as an associated constant for any given type
We get a “hook” to global so that we dont need to call default() every time

This way we don't need to call Global::default() or Layout::new::<T>() every time we want to use them. For new, we get a “dangling” pointer, or one that doesn't point to allocated memory, just so we have something to put in our struct. This is okay, because no elements will be read if cap, and by extention len, is 0.

Push

The first real code we'll write is for adding an element to the end of the block. Lets map out what should happen.

pub fn push(&mut self, x: T) {
	if self.len == self.cap {
		//allocate more memory
		//increase self.cap
	}

	//increase self.len
	//write x into self[self.len-1]
}

First, lets handle reallocating our vector when it needs to grow. I'll be using the alloc_layout_extra feature for better layout syntax, but this isnt necessary. The Allocator trait has a method called grow, which takes a pointer, its layout, and the new layout to fit.

// repeat() will error on overflow
// also returns a tuple with other data
// that we dont need
let curr_lay = Self::LAYOUT.repeat(self.cap).unwrap().0;
let new_lay = Self::LAYOUT.repeat(self.cap+1).unwrap().0;

let new_ptr = unsafe {
	Self::GLOBAL.grow(
		self.ptr.cast(),
		curr_lay,
		new_lay,
	)
};

The repeat() method creates a new layout based on size_of::<T>() * n, so we use it to create layouts representative of our current and desired allocations.

new_ptr is currently a result whos error type is equivalent to the pointer just being null, so lets unwrap it and update our data.

self.ptr = new_ptr.unwrap().cast();
self.cap += 1;

Actually, lets extract this procedure into a more generic function:

fn realloc(&mut self, size: usize) {
	let curr_lay = Self::LAYOUT.repeat(self.cap).unwrap().0;
	let new_lay = Self::LAYOUT.repeat(self.cap+size).unwrap().0;
	
	let new_ptr = unsafe { Self::GLOBAL.grow(
		self.ptr.cast(),
		curr_lay,
		new_lay,
	)};
	
	self.ptr = new_ptr.unwrap().cast();
	self.cap += size;	
}

Thats the hard part, so lets go back to our push function. Now the rest of the function is pretty self-explanatory:

pub fn push(&mut self, x: T) {
	if self.len == self.cap {
		self.realloc(1);
	}

	//increase self.len
	//write x into self[self.len-1]
	self.len += 1;
	unsafe { self.ptr.add(self.len-1).write(x) };
}

Pop

Next lets make a function to remove the last element in the vector.

pub fn pop(&mut self) -> Option<T> {
	if self.len == 0 { return None }
	self.len -= 1;
}

Now we need to get the last element. We want to preserve move semantics, but this isn't safely possible with raw pointers.

A Note on Memory Safety

When we read the data from the pointer, we effectively make a bitwise copy of that data. As is discussed in the Copy trait docs, doing this when you shouldn't can lead to some very serious problems. We need to guarantee that only 1 instance of that data can ever be accessed safely [^1].

So, if we want to move data from the heap to the stack, we need to make sure that the heap copy is dead in the water. For that rule to be violated, self.len needs to be increased without writing data to the expanded memory. We aren't responsible for memory past self.len, and it could be literally anything. Therefore, we need to make sure self.len can only be changed when we take special precautions [^2]. Luckily our push and pop methods will do exactly that.

So with that out of the way, lets write our 3 lines of code that necessitated that divergence:

let last = unsafe { self.ptr.add(self.len) };
self.len -= 1;

//moves out of pointer
Some(unsafe { last.read() })

Theres our core functionality of a vector! Lets write a couple index functions:

pub fn get(&self, idx: usize) -> Option<&T> {
    if self.len == 0 { return None }

    Some(unsafe { self.ptr.add(idx).as_ref() })
}
pub fn get_mut(&mut self, idx: usize) -> Option<&mut T> {
    if self.len == 0 { return None }
    Some(unsafe { self.ptr.add(idx).as_mut() })
}

And quickly implement drop:

impl<T> Drop for Vec<T> {
    fn drop(&mut self) {
        for i in 0..self.len {
		//call all destructors
        	unsafe { self.ptr.add(i).drop_in_place() };
        }
    }
}

And we've recreated the most important functions of Vec!

we dont need to make it completely impossible, we just need to make sure that safe rust can't
it wouldnt be a bad idea to consider changing self.len unsafe, in fact std's Vec does this

thanks for reading part 2! This was super fun to write, and i might make a part 3 to implement a bunch of common traits on our vec, lmk if you think that would be cool!