Lesson 3: Types and Patterns

You will recall from last lesson that memory safety requires that all memory accessed

  1. Exists
  2. Contains valid data

In the previous lesson, we looked at how move semantics are used to ensure memory is only accessed when it exists. In this lesson, we will look at how Rust ensures that the memory contains valid data.

In Rust, all values have exactly one type. This type defines the space a value will take up (which is needed to allocate that space for it in memory, typically on the stack), and which possible memory values can occupy this space.

The Rust compiler is very good at inferring types from context for you. Often you will only need to explicitly state your types in a function signature (as here they are required so that you can build a stable API).

Let Statements and Patterns

I've used this before in our previous lesson, but the way I used them will have lulled you into a false sense of familiarity.

let pattern = expression; // type is inferred
let pattern: Type = expression; // type is explicit

You'll notice I say "pattern" and not variable name on the left. This is because whenever new variables are introduced in Rust, it is not done through simple variable declaration as you may be familiar with from other languages.

Patterns are very powerful, and allow you to do all kinds of powerful things like destructuring and pattern matching. There will be more details on the many different things you can do with patterns over the next few chapters.

For now, know that if you encounter an identifier in a pattern, which is basically any valid variable name, it will create a new variable, allocate memory for it based on the size of the type of the variable being created, and bind all or some part of the value the expression on the right hand side of the = evaluates to.

let x = 12; // creates `x` as an `i32`, binds 12 to `x`
let z = 'q'; // creates `z` as a `char`, binds 'q' to `z`

I'll teach you your first complex pattern already! The wildcard pattern. This pattern matches any single value from the expression and ignores it.

let _ = "Hello, world!"; // creates no variable, binds nothing

This is useful if you have a function that returns data, but you wish to discard the data without the compiler complaining at you for having an unused return value.

Another pattern that is useful is the mut identifier pattern. By default, all variables created are immutable and cannot be reassigned or mutated in place. By using the mut identifier pattern you bypass this.

The following fails to compile.

let x = 12;
x = 9; // reassignment
x += 8; // mutation
println!("{number}");
error[E0384]: cannot assign twice to immutable variable `number`
 --> src/main.rs:3:5
  |
1 |     let number = 12;
  |         ------
  |         |
  |         first assignment to `number`
  |         help: consider making this binding mutable: `mut number`
2 |     number = 9; // reassignment
  |     ^^^^^^^^^^ cannot assign twice to immutable variable

error[E0384]: cannot assign twice to immutable variable `number`
 --> src/main.rs:4:5
  |
1 |     let number = 12;
  |         ------
  |         |
  |         first assignment to `number`
  |         help: consider making this binding mutable: `mut number`
2 |     number = 9; // reassignment
3 |     number += 8; // mutation
  |     ^^^^^^^^^^^ cannot assign twice to immutable variable

But by using mut number we can make the number variable mutable, and it will compile and run successfully instead.

let mut number = 12;
number = 9;
number += 8;
println!("{number}");
17

Note that while you cannot reassign a variable, you can "shadow" it by using A new let statement. This creates a new variable instead and gives it the name the old one had.

let number = 12;
let number = 9;
println!("{number}");
9

Fixed Size Primitive Types

Now that you know how to assign a variable, lets look at the simplest data types a variable might have. These are the fixed size primitive types. They take up a fixed amount of space, and have well defined restrictions on which values are legal.

TypeSizeValues
!0
()0()
bool10x00 meaning false, and 0x01 meaning true
u81Natural numbers from 00 to 2812^{8}-1 (unsigned 8 bit integers)
i81Integers from (27)-(2^{7}) to 2712^{7}-1 (signed 8 bit integers)
u162Natural numbers from 00 to 21612^{16}-1 (unsigned 16 bit integers)
i162Integers from (215)-(2^{15}) to 21512^{15}-1 (signed 16 bit integers)
u324Natural numbers from 00 to 23212^{32}-1 (unsigned 32 bit integers)
i324Integers from (231)-(2^{31}) to 23112^{31}-1 (signed 32 bit integers)
f324IEEE 754-2008 binary32 format values (IEEE 754-1985 single precision floats)
char4Any single unicode scalar value.
u648Natural numbers from 00 to 26412^{64}-1 (unsigned 64 bit integers)
i648Integers from (263)-(2^{63}) to 26312^{63}-1 (signed 64 bit integers)
f648IEEE 754-2008 binary64 format values (IEEE 754-1985 dounle precision floats)
u12816Natural numbers from 00 to 212812^{128}-1 (unsigned 128 bit integers)
i12816Integers from (2127)-(2^{127}) to 212712^{127}-1 (signed 128 bit integers)
usize*As per u16, u32, or u64 depending on target platform's pointer type
isize*As per i16, i32, or i64 depending on target platform's pointer type

*the usize and isize types have a platform specific size. For a 32 bit computer they will be 4 bytes large, for a 64 bit computer they will be 8 bytes large. 16 bit architectures are officially supported, but often require extra work as most libraries do not support them.

If you want to tell Rust which numeric type to use for an integer or float without using a let binding with explicit type, you can suffix the literal with the type. With no suffix or typed binding integers will be i32 and floats will be f32.

let b: bool = true;
let n: u8 = 56;
let p = 81u8;
let s: i8 = -9;
let i = 93; // infers i32 by default
let f = 12.4; // infers f32 by default
let c = 'z';

You'll notice there are two fixed size primitive types that I've not included in the above list, and that's because these will be less familiar to you unless you are coming from a small handful of functional languages.

(), the "Unit" type, is a type with only one value: (). Because the type only has one value, it needs 0 bytes of information to store this value. Anywhere the unit type occurs, it's value is already known.

This type is seen most often as a kind of void return type. Indicating that a function did not return any meaningful data, but did terminate normally and return control to its caller. If you create a function with no explicit return type then a type of -> () is implicit. In addition, any expression terminated with a ; evaluates to ().

fn foo() -> () {
    println!("Function returns no data");
}
fn bar() {
    let x = ();
    println!("This also returns no data");
}

It is also possible to use a unit pattern to bind a unit value. This doesn't seem very useful, and on its own it isn't and is equivalent to using a wildcard binding, but you can do it if you really want. There may be some niche case in more advanced patterns where () is desirable over _, I'm not sure though.

// these lines are equivalent
let () = ();
let _ = ();

The final type in the above list is !, the "Never" type. It is a type with zero values that cannot be constructed. Because of this, Rust can coerce it into any other type. This is the type returned by control like return, break, and continue, by functions like std::process::exit which get the operating system to terminate the program, and by infinite loops or other similar non returning structures.

Any code that falls after a variable with the Never type is initialised is unreachable, because the compiler knows it's actually impossible to do that initialisation.

// This `x` is an i32, because the loop breaks with a value of 12.
let x = loop {
    break 12;
};
// This `y` is a `!`, because the loop never terminates.
let y = loop { };
// All code after the above is unreachable, but this would also have the type `!`
let z = std::process::exit(0);
// Because they can't be constructed, it's fine to coerce a `!` type to anything
let a_number: i32 = z;

Arrays

If you want to store multiple of the same type in sequence (directly adjacent) in memory, you can use an array! Arrays have a fixed size known at compile time, and their size will be the size of the type stored in the array, multiplied by the number of elements in the array.

let a: [i32; 4] = [12, -8, 92, 3];
let b: [char; 3] = ['r', 'f', 'd'];
let c = [true, false, false, true, false]; // infers [bool; 5]

If you want to access the values in an array, you can index into it.

let a: [i32; 4] = [12, -8, 92, 3];
let p = a[0]; // p = 12
let q = a[1]; // q = -8
let r = a[2]; // r = 92
let s = a[3]; // s = 3

There is also a pattern we can use that allows us to destructure an array. This is called a slice pattern.

let a: [i32; 4] = [12, -8, 92, 3];
let [p, q, r, s] = a; // slice pattern, p = 12, q = -8, r = 92, s = 3

Each of the identifiers inside the slice pattern is another pattern, which lets you ignore some elements in an array using the wildcard pattern.

let b: [char; 3] = ['r', 'f', 'd'];
let [t, _, u] = b; // t = 'r', u = 'd'

There is also the "rest" pattern, which will match a sequence of zero or more values that are not otherwise bound within the slice pattern.

let c = [true, false, false, true, false];
let [v, .., w] = c; // v = true, w = false

This is also a good time to introduce an extension to the identifier pattern. If you use the form identifier @ pattern, then instead of binding a single value, the value that matches the entier pattern will be bound to the identifier. This is very useful with the rest pattern as it allows you to create a new array from a subset of the source array.

let d = ['H', 'e', 'l', 'l', 'o'];
let [head, tail @ ..] = d; // head = 'H', tail = ['e', 'l', 'l', 'o']
let [mid @ .., _] = tail; // mid = ['e', 'l', 'l']

Tuples

If you want to group together multiple different types into a single type, the simplest way to do it is with a tuple! Like arrays, these must have a fixed number of elements. Unlike arrays, each element has it's own type.

let a: (i32, char) = (12, 'C');
let b = (false, 12.5, ()); // infers type `(bool, f64, ())`

Values inside a tuple can be accessed using numeric field accessors.

let a = (12, 'C', false, 12.5);
let p = a.0; // p = 12
let q = a.1; // q = 'C'
let r = a.2; // r = false
let s = a.3; // s = 12.5

There is also a pattern that allows us to destructure a tuple, called a tuple pattern!

let a = (12, 'C', false, 12.5);
let (p, q, r, s) = a; // p = 12, q = 'C', r = false, s = 12.5

As with arrays, this pattern allows us to use other patterns like rest and wildcard inside of it.

let a = (12, 'C', false, 12.5);
let (head, ..) = a; // head = 12
let (.., penultimate, _) = a; // penultimate = false

Unlike with slices patterns however, you cannot use the ident @ .. syntax, probably while arrays are sequential in memory and easy to copy, tuples can have their fields re-ordered and padding inserted between them according to arcane rules known as the "default representation", meaning a trivial "sub-tuple" is not guarunteed to exist in the way that a subsection of an array is.

References

Last lesson I mentioned that Rust is trying to solve both memory safety and thread safety. To achieve thread safety, values must only be accessed and modified in a way that ensures all threads behave properly. To achieve this, Rust disallows simultaneous mutable access.

To further motivate references, consider the undesirable effect of move semantics that no value of a type with move semantics could be passed to a function without either being explicitly dropped when the function ends, or manually returned to the caller to hand back ownership.

let x = String::from("Hello");
fn exclaim(s: String) -> String {
    format!("{s}!")
}
let x = exclaim(x);
let x = exclaim(x);
println!("{x}");
Hello!!

Consider further that there would be no way for a function to modify a value in place, regardless of if the value has move or copy semantics. In the above example we create a new string with it's own memory each time exclaim is called.

References provide indirection and are the solution to all of these issues, but in order to ensure thread safety Rust has two reference types instead of one.

Immutable References

The first, &T, is called a reference and provides read only access. Many of these references may coexist. Because multiple of these references to the same value can exist, &T has copy semantics regardless of the semantics of T. If a reference to a value exists, we say that the reference borrows the value.

let string: String = String::from("Hello!");
let string_reference: &String = &value;
let another_string_reference = &value;
let number: i32 = 12;
let number_reference: &i32 = &number;
let number_copy: i32 = *number_reference; // can only dereference inner type with copy semantics

References allow you to pass the same value with move semantics to a function multiple times. Something you could not do with only move semantics without explicitly returning the value from each call.

fn print_string(s: &String) {
    println!("{s}");
}
fn main() {
    let string = String::from("Hello, world!");
    print_string(&string);
    print_string(&string);
    print_string(&string);
}
Hello, world!
Hello, world!
Hello, world!

We also have two more patterns for use with references. The first destructures a reference similarly to the * operator, but on the pattern side rather than on the value side. The other one introduces a reference similar to the & operator but on the pattern side.

let number: i32 = 12;
let ref number_reference: i32 = number; // creates a reference similar to `&number`
let &number_copy: &i32 = number_reference; // destructures the reference similar to `*number_reference`

Notice the types included in the let statement match the type of the expression (on the right side of the =), not of the variables created by the pattern. number_reference in the above is still of type &i32, and number_copy is still of type i32. This is true for all patterns in let expressions, it's just less obvious and feels weirder when you use it with & and ref patterns.

let array: [i32; 4] = [1, 2, 3, 4];
let [p, q, r, s]: [i32; 4] = array;

You could obviously also just exclude the types from the above examples and Rust will figure out what you want just fine.

These patterns may not feel immediately useful, but consider the case where we have a reference to an array, and want to destructure a reference to a single value in that array without copying its contents. We can do this using the above pattern.

let array = [1, 2, 3, 4];
let ref_array = &array;
let &[_, ref q, ..] = ref_array; // q = &array[1] = &2

The above desire, of extracting a reference through destructuring, is so common that about a year after Rust hit version 1.0 the identifier pattern was extended to support it1. By default, an identifier will move (or copy if the value has copy semantics) its binding, however if the value being matches is a reference and the pattern does not have an explicit reference destructure, then Rust adds the ref pattern for you behind the scenes.

let [_, q, ..] = ref_array; // As before, q = &array[1] = &2

Mutable References

The second reference type, &mut T, is called a mutable reference and provides read and write access. Only one such mutable reference to a value may exist at a time. A mutable reference cannot coexist with an immutable reference. &mut T always has move semantics (moving the reference, not the value). If a mutable reference to a value exists, we say the mutable reference borrows the value as mutable.

let mut x = 12; // `x` must be `mut` to make `&mut`
let ref_x = &mut x;
*ref_x += 10; // we can mutate through dereferencing
println!("{x}")
22

We also have two new patterns for working with mutable strings, &mut for destructuring and ref mut to introduce a mutable reference. These work the same as their immutable reference counterparts, including with the automatic reference creation when your pattern doesn't destructure a reference which is present.

let mut array = [1, 2, 3, 4];
let ref_array = &mut array;
let &mut [_, ref mut q, ..] = ref_array; // or equivalently
let [_, q, ..] = ref_array; // q = &mut array[2]
*q += 10;
println!("{array:?}");
[1, 12, 3, 4]

Mutable references cannot be coppied, which you may think means they cannot be used to pass a value to a function multiple times. However, because of a convenience called "automatic reborrowing" this is not the case.

fn exclaim_and_say(s: &mut String) {
    s.push_str("!"); // mutate in place
    println!("{s}");
}
fn main() {
    let mut x = String::from("Hello, world");
    let ref_x = &mut x;
    exclaim_and_say(ref_x); // this should move ref_x
    exclaim_and_say(ref_x); // so this should not compile
}
Hello, world!
Hello, world!!

So... how did that work? Automatic reborrowing means that whenever a mutable reference is passed into a function, if it can Rust will create a new mutable reference to pass in instead, preserving your old one. This allows &mut T to pretend to have copy semantics when being passed into functions. There are other circumstances where reborrowing is not possible however, so it's important to remember that this is a convenience and &mut T actually does have move semantics under the hood.

fn main() {
    let mut x = String::from("Hello, world");
    let ref_x = &mut x;
    exclaim_and_say(&mut *ref_x); // manual reborrow for illustration
    exclaim_and_say(&mut *ref_x);
}

Statics and Reference Lifetimes

Both mutable and immutable references have an additional piece of information bound to them called a lifetime. A lifetime is a guard that prevents dropping or otherwise moving out the value being referenced by the lifetime. This ensures that a reference always points to valid data. Consider the following example:

let x = String::from("Hello!");
let y = &x;
drop(x);
println!("{y}");
   Compiling playground v0.0.1 (/playground)
error[E0505]: cannot move out of `x` because it is borrowed
 --> src/main.rs:5:6
  |
1 | let x = String::from("Hello!");
  |     - binding `x` declared here
2 | let y = &x;
  |         -- borrow of `x` occurs here
3 | drop(x);
  |      ^ move out of `x` occurs here
4 | println!("{y}");
  |           --- borrow later used here

Here you can see that Rust prevents us from dropping (moving out) x, because there is a borrow still outstanding.

Lifetimes can be named, but by default all but one lifetime is anonymous and does not have a fixed name that can be referred to. They are generated by the compiler to be as short as possible, so that you can drop the resource being referenced as early as possible.

The one special named lifetime is called 'static, which refers to a lifetime which never drops. This allows the compiler to reason about references that know the value they reference will never go out of scope. This is the type of string literals, as the raw bytes of their string are written into the binary and will never be cleaned up by the program.

let s: &'static str = "String literal with static lifetime!";

You can also create other static references by placing data in a static.

static DATA: [i32; 4] = [1, 2, 4, 8];
let d: &'static i32 = &DATA[2]; // d = &4

This may not seem obviously clear or useful to you right now, but within a few lessons you should have a stronger grasp over lifetimes.

Structs

I've shown you how you can group together types using a tuple, but sometimes you want to encode a little bit more information. After all, .0 isn't the most informative accessor I've ever seen. Structs are the solution to this.

struct MyData {
    my_character: char,
    my_number: i32,
}
let my_data: MyData = MyData {
    my_character: 'q',
    my_number: 94,
};
let a = my_data.my_character; // a = 'q';
let b = my_data.my_number; // b = 94

If the value you want to put in a struct is already in a variable with the same name as the struct field, you can ommit the : value.

let my_character = 'X',
let my_data = MyData {
    my_character, // equivalent to `my_character: my_character`
    my_number: 12,
}

We also have a struct pattern which can be used to destructure this.

let MyData {
    my_character: a,
    my_data: b,
} = my_data; // a = 'q', b = 94

The values after the : inside the struct pattern are themselves patterns.

let MyData {
    my_character: ref a,
    my_data: _,
} = my_data; // a = &my_data.my_character = &'q'

In addition to this, you can use syntax that intentionally resembles the rest pattern if you have many fields and only want to grab some of them.

struct BigStruct {
    integer: i32,
    boolean: bool,
    character: char,
    real: f32,
}
let my_struct = BigStruct {
    integer: 12,
    boolean: true,
    character: 'Q',
    real: 6.5,
};
let BigStruct { integer: a, .. } = my_struct; // a = 12

If you want to name the variable you create the same thing as one of the fields, you do not need to write field_name: field_name.

let BigStruct { real: real, .. } = my_struct; // equivalent to
let BigStruct { real, .. } = my_struct; // real = 6.5

Generic Types

You can give your structs generic type parameters, allowing the user of your struct to determine a type inside one of its fields. This is useful for types like Vec<T>, which store a kind of dynamically sized array (more on these in a later lesson): Vec<i32> gives you a dynamic array of integer, Vec<String> gives you a dynamic array of strings.

You can create your own generic parameters, and put them on both structs and functions. When working with types, you can simply put the <T> adjacent to the type. When calling functions you must use "turbofish" syntax ::<T>, due to a syntax ambiguity in the Rust language which would be present without this.

struct MyWrapper<T> {
    data: T,
    number: usize,
}
fn wrap_data<T>(data: T, number: usize) -> MyWrapper<T> {
    MyWrapper {
        data,
        number
    }
}
let x: MyWrapper<char> = wrap_data::<char>('Q', 12);
let x: MyWrapper<_> = wrap_data::<_>('Q', 12); // `_` gives generic type inferrence
let x = wrap_data('Q', 12); // Rust can of course infer *all* the types instead
let c = x.data; // c: char = 'Q'

When you create a concrete version of these types, such as MyWrapper<char> in the above, Rust generates a new struct or function with the generic substituted for your concrete type. This is called "monomorphisation". Essentially creating the following:

struct MyWrapper<char> {
    data: char,
    number: usize
}
fn wrap_data<char>(data: char, number: usize) -> MyWrapper<char> {
    MyWraper<char> {
        data,
        number
    }
}

Generic Lifetimes

Just like you can add generic types to your structs and functions, you can add generic lifetimes. This is useful as it enables you to name the nomrally anonymous lifetimes, which is necessary to store them inside a struct for example.

struct TwoResources<'a, 'b> {
    first: &'a i32,
    second: &'b i32
}

The above struct for example references resources with two different lifetimes. While an instance of the struct exists neither of those lifetimes can expire. For the purpose of generic parameters '_ instructs Rust to infer the anonymous local lifetime.

static STATIC: i32 = 18;
let local: i32 = 24;
let two_resources: TwoResources<'static, '_> = TwoResources {
    first: &STATIC,
    second: &local
};

Another useful reason for lifetime generics naming lifetimes is it allows you to disambiguate from a type signature which lifetimes will persist in the return from a function.

fn first_string<'a, 'b>(string_a: &'a String, string_b: &'b String) -> &'a String {
    string_a
}

let x = String::from("First resource");
let y = String::from("Second resource");
let z = first_string(&x, &y);
drop(y);
drop(x);
println!("{z}");

In the above, without knowing the internals of the function (if it was a closed source library you were linking to, for example), you still know that z has a lifetime which matches the lifetime of the first parameter, not the second parameter: just from it's public function signature. Rust can also reason about this, and so drop(y) does not cause a compilation error, but drop(x) does!

error[E0505]: cannot move out of `x` because it is borrowed
  --> src/main.rs:10:6
   |
6  | let x = String::from("First resource");
   |     - binding `x` declared here
7  | let y = String::from("Second resource");
8  | let z = first_string(&x, &y);
   |                      -- borrow of `x` occurs here
9  | drop(y);
10 | drop(x);
   |      ^ move out of `x` occurs here
11 | println!("{z}");
   |           --- borrow later used here

Footnotes

  1. You can learn more about this in the RFC that made this change here, though it motivates it with syntax and examples that I have not taught you yet.