Lesson 3: Types and Patterns
You will recall from last lesson that memory safety requires that all memory accessed
- Exists
- Contains valid data
In the previous lesson, we looked at how move semantics are used to ensure memory is only accessed when it exists. In this lesson, we will look at how Rust ensures that the memory contains valid data.
In Rust, all values have exactly one type. This type defines the space a value will take up (which is needed to allocate that space for it in memory, typically on the stack), and which possible memory values can occupy this space.
The Rust compiler is very good at inferring types from context for you. Often you will only need to explicitly state your types in a function signature (as here they are required so that you can build a stable API).
Let Statements and Patterns
I've used this before in our previous lesson, but the way I used them will have lulled you into a false sense of familiarity.
let pattern = expression; // type is inferred
let pattern: Type = expression; // type is explicit
You'll notice I say "pattern" and not variable name on the left. This is because whenever new variables are introduced in Rust, it is not done through simple variable declaration as you may be familiar with from other languages.
Patterns are very powerful, and allow you to do all kinds of powerful things like destructuring and pattern matching. There will be more details on the many different things you can do with patterns over the next few chapters.
For now, know that if you encounter an identifier in a pattern, which is
basically any valid variable name, it will create a new variable, allocate
memory for it based on the size of the type of the variable being created, and
bind all or some part of the value the expression on the right hand side of the
=
evaluates to.
let x = 12; // creates `x` as an `i32`, binds 12 to `x`
let z = 'q'; // creates `z` as a `char`, binds 'q' to `z`
I'll teach you your first complex pattern already! The wildcard pattern. This pattern matches any single value from the expression and ignores it.
let _ = "Hello, world!"; // creates no variable, binds nothing
This is useful if you have a function that returns data, but you wish to discard the data without the compiler complaining at you for having an unused return value.
Another pattern that is useful is the mut identifier
pattern. By default, all
variables created are immutable and cannot be reassigned or mutated in place. By
using the mut identifier
pattern you bypass this.
The following fails to compile.
let x = 12;
x = 9; // reassignment
x += 8; // mutation
println!("{number}");
error[E0384]: cannot assign twice to immutable variable `number`
--> src/main.rs:3:5
|
1 | let number = 12;
| ------
| |
| first assignment to `number`
| help: consider making this binding mutable: `mut number`
2 | number = 9; // reassignment
| ^^^^^^^^^^ cannot assign twice to immutable variable
error[E0384]: cannot assign twice to immutable variable `number`
--> src/main.rs:4:5
|
1 | let number = 12;
| ------
| |
| first assignment to `number`
| help: consider making this binding mutable: `mut number`
2 | number = 9; // reassignment
3 | number += 8; // mutation
| ^^^^^^^^^^^ cannot assign twice to immutable variable
But by using mut number
we can make the number
variable mutable, and it will
compile and run successfully instead.
let mut number = 12;
number = 9;
number += 8;
println!("{number}");
17
Note that while you cannot reassign a variable, you can "shadow" it by using A
new let
statement. This creates a new variable instead and gives it the name
the old one had.
let number = 12;
let number = 9;
println!("{number}");
9
Fixed Size Primitive Types
Now that you know how to assign a variable, lets look at the simplest data types a variable might have. These are the fixed size primitive types. They take up a fixed amount of space, and have well defined restrictions on which values are legal.
Type | Size | Values |
---|---|---|
! | 0 | |
() | 0 | () |
bool | 1 | 0x00 meaning false , and 0x01 meaning true |
u8 | 1 | Natural numbers from to (unsigned 8 bit integers) |
i8 | 1 | Integers from to (signed 8 bit integers) |
u16 | 2 | Natural numbers from to (unsigned 16 bit integers) |
i16 | 2 | Integers from to (signed 16 bit integers) |
u32 | 4 | Natural numbers from to (unsigned 32 bit integers) |
i32 | 4 | Integers from to (signed 32 bit integers) |
f32 | 4 | IEEE 754-2008 binary32 format values (IEEE 754-1985 single precision floats) |
char | 4 | Any single unicode scalar value. |
u64 | 8 | Natural numbers from to (unsigned 64 bit integers) |
i64 | 8 | Integers from to (signed 64 bit integers) |
f64 | 8 | IEEE 754-2008 binary64 format values (IEEE 754-1985 dounle precision floats) |
u128 | 16 | Natural numbers from to (unsigned 128 bit integers) |
i128 | 16 | Integers from to (signed 128 bit integers) |
usize | * | As per u16 , u32 , or u64 depending on target platform's pointer type |
isize | * | As per i16 , i32 , or i64 depending on target platform's pointer type |
*the usize
and isize
types have a platform specific size. For a 32 bit
computer they will be 4 bytes large, for a 64 bit computer they will be 8 bytes
large. 16 bit architectures are officially supported, but often require extra
work as most libraries do not support them.
If you want to tell Rust which numeric type to use for an integer or float
without using a let binding with explicit type, you can suffix the literal with
the type. With no suffix or typed binding integers will be i32
and floats will
be f32
.
let b: bool = true;
let n: u8 = 56;
let p = 81u8;
let s: i8 = -9;
let i = 93; // infers i32 by default
let f = 12.4; // infers f32 by default
let c = 'z';
You'll notice there are two fixed size primitive types that I've not included in the above list, and that's because these will be less familiar to you unless you are coming from a small handful of functional languages.
()
, the "Unit" type, is a type with only one value: ()
. Because the type
only has one value, it needs 0 bytes of information to store this value.
Anywhere the unit type occurs, it's value is already known.
This type is seen most often as a kind of void
return type. Indicating that a
function did not return any meaningful data, but did terminate normally and
return control to its caller. If you create a function with no explicit return
type then a type of -> ()
is implicit. In addition, any expression terminated
with a ;
evaluates to ()
.
fn foo() -> () {
println!("Function returns no data");
}
fn bar() {
let x = ();
println!("This also returns no data");
}
It is also possible to use a unit pattern to bind a unit value. This doesn't
seem very useful, and on its own it isn't and is equivalent to using a wildcard
binding, but you can do it if you really want. There may be some niche case
in more advanced patterns where ()
is desirable over _
, I'm not sure though.
// these lines are equivalent
let () = ();
let _ = ();
The final type in the above list is !
, the "Never" type. It is a type with
zero values that cannot be constructed. Because of this, Rust can coerce it into
any other type. This is the type returned by control like return
, break
, and
continue
, by functions like std::process::exit
which get the operating
system to terminate the program, and by infinite loops or other similar non
returning structures.
Any code that falls after a variable with the Never type is initialised is unreachable, because the compiler knows it's actually impossible to do that initialisation.
// This `x` is an i32, because the loop breaks with a value of 12.
let x = loop {
break 12;
};
// This `y` is a `!`, because the loop never terminates.
let y = loop { };
// All code after the above is unreachable, but this would also have the type `!`
let z = std::process::exit(0);
// Because they can't be constructed, it's fine to coerce a `!` type to anything
let a_number: i32 = z;
Arrays
If you want to store multiple of the same type in sequence (directly adjacent) in memory, you can use an array! Arrays have a fixed size known at compile time, and their size will be the size of the type stored in the array, multiplied by the number of elements in the array.
let a: [i32; 4] = [12, -8, 92, 3];
let b: [char; 3] = ['r', 'f', 'd'];
let c = [true, false, false, true, false]; // infers [bool; 5]
If you want to access the values in an array, you can index into it.
let a: [i32; 4] = [12, -8, 92, 3];
let p = a[0]; // p = 12
let q = a[1]; // q = -8
let r = a[2]; // r = 92
let s = a[3]; // s = 3
There is also a pattern we can use that allows us to destructure an array. This is called a slice pattern.
let a: [i32; 4] = [12, -8, 92, 3];
let [p, q, r, s] = a; // slice pattern, p = 12, q = -8, r = 92, s = 3
Each of the identifiers inside the slice pattern is another pattern, which lets you ignore some elements in an array using the wildcard pattern.
let b: [char; 3] = ['r', 'f', 'd'];
let [t, _, u] = b; // t = 'r', u = 'd'
There is also the "rest" pattern, which will match a sequence of zero or more values that are not otherwise bound within the slice pattern.
let c = [true, false, false, true, false];
let [v, .., w] = c; // v = true, w = false
This is also a good time to introduce an extension to the identifier pattern. If
you use the form identifier @ pattern
, then instead of binding a single value,
the value that matches the entier pattern will be bound to the identifier.
This is very useful with the rest pattern as it allows you to create a new array
from a subset of the source array.
let d = ['H', 'e', 'l', 'l', 'o'];
let [head, tail @ ..] = d; // head = 'H', tail = ['e', 'l', 'l', 'o']
let [mid @ .., _] = tail; // mid = ['e', 'l', 'l']
Tuples
If you want to group together multiple different types into a single type, the simplest way to do it is with a tuple! Like arrays, these must have a fixed number of elements. Unlike arrays, each element has it's own type.
let a: (i32, char) = (12, 'C');
let b = (false, 12.5, ()); // infers type `(bool, f64, ())`
Values inside a tuple can be accessed using numeric field accessors.
let a = (12, 'C', false, 12.5);
let p = a.0; // p = 12
let q = a.1; // q = 'C'
let r = a.2; // r = false
let s = a.3; // s = 12.5
There is also a pattern that allows us to destructure a tuple, called a tuple pattern!
let a = (12, 'C', false, 12.5);
let (p, q, r, s) = a; // p = 12, q = 'C', r = false, s = 12.5
As with arrays, this pattern allows us to use other patterns like rest and wildcard inside of it.
let a = (12, 'C', false, 12.5);
let (head, ..) = a; // head = 12
let (.., penultimate, _) = a; // penultimate = false
Unlike with slices patterns however, you cannot use the ident @ ..
syntax,
probably while arrays are sequential in memory and easy to copy, tuples can have
their fields re-ordered and padding inserted between them according to
arcane rules known as the "default representation", meaning a trivial
"sub-tuple" is not guarunteed to exist in the way that a subsection of an array
is.
References
Last lesson I mentioned that Rust is trying to solve both memory safety and thread safety. To achieve thread safety, values must only be accessed and modified in a way that ensures all threads behave properly. To achieve this, Rust disallows simultaneous mutable access.
To further motivate references, consider the undesirable effect of move semantics that no value of a type with move semantics could be passed to a function without either being explicitly dropped when the function ends, or manually returned to the caller to hand back ownership.
let x = String::from("Hello");
fn exclaim(s: String) -> String {
format!("{s}!")
}
let x = exclaim(x);
let x = exclaim(x);
println!("{x}");
Hello!!
Consider further that there would be no way for a function to modify a value
in place, regardless of if the value has move or copy semantics. In the above
example we create a new string with it's own memory each time exclaim
is
called.
References provide indirection and are the solution to all of these issues, but in order to ensure thread safety Rust has two reference types instead of one.
Immutable References
The first, &T
, is called a reference and provides read only access. Many of
these references may coexist. Because multiple of these references to the same
value can exist, &T
has copy semantics regardless of the semantics of T
.
If a reference to a value exists, we say that the reference borrows the value.
let string: String = String::from("Hello!");
let string_reference: &String = &value;
let another_string_reference = &value;
let number: i32 = 12;
let number_reference: &i32 = &number;
let number_copy: i32 = *number_reference; // can only dereference inner type with copy semantics
References allow you to pass the same value with move semantics to a function multiple times. Something you could not do with only move semantics without explicitly returning the value from each call.
fn print_string(s: &String) {
println!("{s}");
}
fn main() {
let string = String::from("Hello, world!");
print_string(&string);
print_string(&string);
print_string(&string);
}
Hello, world!
Hello, world!
Hello, world!
We also have two more patterns for use with references. The first destructures a
reference similarly to the *
operator, but on the pattern side rather than on
the value side. The other one introduces a reference similar to the &
operator
but on the pattern side.
let number: i32 = 12;
let ref number_reference: i32 = number; // creates a reference similar to `&number`
let &number_copy: &i32 = number_reference; // destructures the reference similar to `*number_reference`
Notice the types included in the let statement match the type of the expression
(on the right side of the =
), not of the variables created by the pattern.
number_reference
in the above is still of type &i32
, and number_copy
is
still of type i32
. This is true for all patterns in let expressions, it's just
less obvious and feels weirder when you use it with &
and ref
patterns.
let array: [i32; 4] = [1, 2, 3, 4];
let [p, q, r, s]: [i32; 4] = array;
You could obviously also just exclude the types from the above examples and Rust will figure out what you want just fine.
These patterns may not feel immediately useful, but consider the case where we have a reference to an array, and want to destructure a reference to a single value in that array without copying its contents. We can do this using the above pattern.
let array = [1, 2, 3, 4];
let ref_array = &array;
let &[_, ref q, ..] = ref_array; // q = &array[1] = &2
The above desire, of extracting a reference through destructuring, is so common
that about a year after Rust hit version 1.0 the identifier pattern was extended
to support it1. By default, an identifier will move (or copy if the value has
copy semantics) its binding, however if the value being matches is a reference
and the pattern does not have an explicit reference destructure, then Rust adds
the ref
pattern for you behind the scenes.
let [_, q, ..] = ref_array; // As before, q = &array[1] = &2
Mutable References
The second reference type, &mut T
, is called a mutable reference and provides
read and write access. Only one such mutable reference to a value may exist at a
time. A mutable reference cannot coexist with an immutable reference. &mut T
always has move semantics (moving the reference, not the value). If a mutable
reference to a value exists, we say the mutable reference borrows the value as
mutable.
let mut x = 12; // `x` must be `mut` to make `&mut`
let ref_x = &mut x;
*ref_x += 10; // we can mutate through dereferencing
println!("{x}")
22
We also have two new patterns for working with mutable strings, &mut
for
destructuring and ref mut
to introduce a mutable reference. These work the
same as their immutable reference counterparts, including with the automatic
reference creation when your pattern doesn't destructure a reference which is
present.
let mut array = [1, 2, 3, 4];
let ref_array = &mut array;
let &mut [_, ref mut q, ..] = ref_array; // or equivalently
let [_, q, ..] = ref_array; // q = &mut array[2]
*q += 10;
println!("{array:?}");
[1, 12, 3, 4]
Mutable references cannot be coppied, which you may think means they cannot be used to pass a value to a function multiple times. However, because of a convenience called "automatic reborrowing" this is not the case.
fn exclaim_and_say(s: &mut String) {
s.push_str("!"); // mutate in place
println!("{s}");
}
fn main() {
let mut x = String::from("Hello, world");
let ref_x = &mut x;
exclaim_and_say(ref_x); // this should move ref_x
exclaim_and_say(ref_x); // so this should not compile
}
Hello, world!
Hello, world!!
So... how did that work? Automatic reborrowing means that whenever a mutable
reference is passed into a function, if it can Rust will create a new mutable
reference to pass in instead, preserving your old one. This allows &mut T
to
pretend to have copy semantics when being passed into functions. There are other
circumstances where reborrowing is not possible however, so it's important to
remember that this is a convenience and &mut T
actually does have move
semantics under the hood.
fn main() {
let mut x = String::from("Hello, world");
let ref_x = &mut x;
exclaim_and_say(&mut *ref_x); // manual reborrow for illustration
exclaim_and_say(&mut *ref_x);
}
Statics and Reference Lifetimes
Both mutable and immutable references have an additional piece of information bound to them called a lifetime. A lifetime is a guard that prevents dropping or otherwise moving out the value being referenced by the lifetime. This ensures that a reference always points to valid data. Consider the following example:
let x = String::from("Hello!");
let y = &x;
drop(x);
println!("{y}");
Compiling playground v0.0.1 (/playground)
error[E0505]: cannot move out of `x` because it is borrowed
--> src/main.rs:5:6
|
1 | let x = String::from("Hello!");
| - binding `x` declared here
2 | let y = &x;
| -- borrow of `x` occurs here
3 | drop(x);
| ^ move out of `x` occurs here
4 | println!("{y}");
| --- borrow later used here
Here you can see that Rust prevents us from dropping (moving out) x
, because
there is a borrow still outstanding.
Lifetimes can be named, but by default all but one lifetime is anonymous and does not have a fixed name that can be referred to. They are generated by the compiler to be as short as possible, so that you can drop the resource being referenced as early as possible.
The one special named lifetime is called 'static
, which refers to a lifetime
which never drops. This allows the compiler to reason about references that know
the value they reference will never go out of scope. This is the type of string
literals, as the raw bytes of their string are written into the binary and will
never be cleaned up by the program.
let s: &'static str = "String literal with static lifetime!";
You can also create other static references by placing data in a static
.
static DATA: [i32; 4] = [1, 2, 4, 8];
let d: &'static i32 = &DATA[2]; // d = &4
This may not seem obviously clear or useful to you right now, but within a few lessons you should have a stronger grasp over lifetimes.
Structs
I've shown you how you can group together types using a tuple, but sometimes you
want to encode a little bit more information. After all, .0
isn't the most
informative accessor I've ever seen. Structs are the solution to this.
struct MyData {
my_character: char,
my_number: i32,
}
let my_data: MyData = MyData {
my_character: 'q',
my_number: 94,
};
let a = my_data.my_character; // a = 'q';
let b = my_data.my_number; // b = 94
If the value you want to put in a struct is already in a variable with the same
name as the struct field, you can ommit the : value
.
let my_character = 'X',
let my_data = MyData {
my_character, // equivalent to `my_character: my_character`
my_number: 12,
}
We also have a struct pattern which can be used to destructure this.
let MyData {
my_character: a,
my_data: b,
} = my_data; // a = 'q', b = 94
The values after the :
inside the struct pattern are themselves patterns.
let MyData {
my_character: ref a,
my_data: _,
} = my_data; // a = &my_data.my_character = &'q'
In addition to this, you can use syntax that intentionally resembles the rest pattern if you have many fields and only want to grab some of them.
struct BigStruct {
integer: i32,
boolean: bool,
character: char,
real: f32,
}
let my_struct = BigStruct {
integer: 12,
boolean: true,
character: 'Q',
real: 6.5,
};
let BigStruct { integer: a, .. } = my_struct; // a = 12
If you want to name the variable you create the same thing as one of the fields,
you do not need to write field_name: field_name
.
let BigStruct { real: real, .. } = my_struct; // equivalent to
let BigStruct { real, .. } = my_struct; // real = 6.5
Generic Types
You can give your structs generic type parameters, allowing the user of your
struct to determine a type inside one of its fields. This is useful for types
like Vec<T>
, which store a kind of dynamically sized array (more on these in
a later lesson): Vec<i32>
gives you a dynamic array of integer, Vec<String>
gives you a dynamic array of strings.
You can create your own generic parameters, and put them on both structs and
functions. When working with types, you can simply put the <T>
adjacent to the
type. When calling functions you must use "turbofish" syntax ::<T>
, due to a
syntax ambiguity in the Rust language which would be present without this.
struct MyWrapper<T> {
data: T,
number: usize,
}
fn wrap_data<T>(data: T, number: usize) -> MyWrapper<T> {
MyWrapper {
data,
number
}
}
let x: MyWrapper<char> = wrap_data::<char>('Q', 12);
let x: MyWrapper<_> = wrap_data::<_>('Q', 12); // `_` gives generic type inferrence
let x = wrap_data('Q', 12); // Rust can of course infer *all* the types instead
let c = x.data; // c: char = 'Q'
When you create a concrete version of these types, such as MyWrapper<char>
in
the above, Rust generates a new struct or function with the generic substituted
for your concrete type. This is called "monomorphisation". Essentially creating
the following:
struct MyWrapper<char> {
data: char,
number: usize
}
fn wrap_data<char>(data: char, number: usize) -> MyWrapper<char> {
MyWraper<char> {
data,
number
}
}
Generic Lifetimes
Just like you can add generic types to your structs and functions, you can add generic lifetimes. This is useful as it enables you to name the nomrally anonymous lifetimes, which is necessary to store them inside a struct for example.
struct TwoResources<'a, 'b> {
first: &'a i32,
second: &'b i32
}
The above struct for example references resources with two different lifetimes.
While an instance of the struct exists neither of those lifetimes can expire.
For the purpose of generic parameters '_
instructs Rust to infer the anonymous
local lifetime.
static STATIC: i32 = 18;
let local: i32 = 24;
let two_resources: TwoResources<'static, '_> = TwoResources {
first: &STATIC,
second: &local
};
Another useful reason for lifetime generics naming lifetimes is it allows you to disambiguate from a type signature which lifetimes will persist in the return from a function.
fn first_string<'a, 'b>(string_a: &'a String, string_b: &'b String) -> &'a String {
string_a
}
let x = String::from("First resource");
let y = String::from("Second resource");
let z = first_string(&x, &y);
drop(y);
drop(x);
println!("{z}");
In the above, without knowing the internals of the function (if it was a closed
source library you were linking to, for example), you still know that z
has
a lifetime which matches the lifetime of the first parameter, not the second
parameter: just from it's public function signature. Rust can also reason about
this, and so drop(y)
does not cause a compilation error, but drop(x)
does!
error[E0505]: cannot move out of `x` because it is borrowed
--> src/main.rs:10:6
|
6 | let x = String::from("First resource");
| - binding `x` declared here
7 | let y = String::from("Second resource");
8 | let z = first_string(&x, &y);
| -- borrow of `x` occurs here
9 | drop(y);
10 | drop(x);
| ^ move out of `x` occurs here
11 | println!("{z}");
| --- borrow later used here