Lesson 2: Move and Copy
It is time for us to begin exploring the theory behind Rust, and what makes it tick.
The first thing to understand is the two problems that Rust solves.
- Memory safety
- Thread safety
These are two classes of error entirely prevented by Rust design, and are significant problems to try and solve. Errors in these domains make up a huge share of the security vulnerabilities and major errors that have plagued software. Many errors would be prevented by Rusts language design, including but not limited to:
- OpenSSL Heartbleed
- NetUSB run_init_netbus
- SecureTransfer goto fail
- GNUTLS goto cleanup
- SChannel Winshock
- JCE Bleichenbacher Attack
- glibc gethostby (GHOST)
To begin solving these, we will first address memory safety. This is, at a high level, ensuring that all memory accessed
- Exists
- Contains valid data
To address this lets start with an example: the following is not Rust code you can compile, because im focusing on concepts not concretes, but consider the following.
function print_string(s) {
/* internals hidden */
}
function main() {
let my_string = "Hello";
print_string(my_string);
print_string(my_string);
}
What would you expect the above code to do. For many languages the answer is "print Hello twice". Rust takes the more controversial answer of "not compiling". The following is an approximation of the error you would see if the above was real rust code.
error[E0382]: use of moved value: `my_string`
--> src/main.rs:8:18
|
6 | let my_string = "Hello";
| --------- move occurs because `my_string` has type `String`, which does not implement the `Copy` trait
7 | print_string(my_string);
| --------- value moved here
8 | print_string(my_string);
| ^^^^^^^^^ value used here after move
|
This is because of the first and most critical thing to understand about Rust: move semantics.
One of the steps Rust must take to ensure memory safety is to know exactly when every piece of data is destroyed. If you attempt to use memory after your program releases it to the operating system you create a "use after free", this is undefined behaviour and may cause nasal demons.
In order to ensure memory safety the Rust compiler meticulously tracks and enforces when all memory is acquired and released, when all values are created and destroyed.
- In Rust all values have a scope.
- Values are destroyed (and any memory they use is freed) when they go out of scope.
- All values have an owner.
- Values go out of scope when their owner goes out of scope.
- The default owner of a value in a variable is the block the variable was declared in.
- A block is a collection of statements enclosed by braces.
- A block goes out of scope when it ends (
}
).
In the example I gave earlier, there are two blocks, one for the main
function, and one for the print_string
function.
In real Rust code, you can create blocks anywhere you like by sprinkling {}
around the statements you want in the block. We aren't writing real Rust code
yet though, so let's not worry about that yet.
Move Semantics
So, values are in scope from when they are created, and in scope until the block
they were created in ends. This doesn't answer why our code from earlier will
refuse to compile though, as it would imply that the "Hello"
value should be
in scope until the end of main
.
However this value is used inside the print_string
function, so the value
must also be available inside that functions block. There are two approaches
most languages take to solving this problem:
Some languages would choose to make a copy of the data, duplicating it in memory, and making the duplicate available in the function. This is called pass by value and may look something like this:
function print_string(s) {
/* internals hidden */
}
function main() {
let my_string = "Hello";
let my_string_copy0 = copy(my_string);
print_string(my_string_copy0);
let my_string_copy1 = copy(my_string);
print_string(my_string_copy1);
}
Others might choose to use indirection and have the "Hello"
value be allocated
in some other location and then give both main
and print_string
access to
this indirection, this is called pass by reference and may look something like
this:
function print_string(s) {
/* internals hidden */
}
function main() {
let my_string_ref = allocate("Hello");
let my_string_ref0 = copy(my_string_ref);
print_string(my_string_ref0);
let my_string_ref1 = copy(my_string_ref);
print_string(my_string_ref1);
}
Rust chooses neither of these approaches, and uses "pass by move" instead. When a function call occurs, parameters are moved into the scope (block) of the function being called. We say the function being called "takes ownership" of the values. This reveals why the example code is rejected in Rust.
When my_string
is passed to print_string
, "Hello"
is moved from the block
created by the main
function into the block created by the print_string
function (and is now named s
). Then, when that first print_string
function
ends and the parameter s
goes out of scope, the value "Hello"
is released
and destroyed. This means that "Hello"
no longer exists to be passed to the
second print_string
call.
Hopefully the error message should make more sense now.
You are probably now thinking something along the lines of "Rust is really inconvenient because it doesn't let me use any values twice", but the above isn't the whole story.
Copy Semantics
Move semantics aren't the only way values are used in Rust. Values which perform no large memory allocation and require no complex cleanup, such as many of Rust's primitive types, use copy semantics. This makes them pass by value. If the first example used a number, for example:
function print_number(n) {
/* internals hidden */
}
function main() {
let my_number = 7;
print_number(my_number);
print_number(my_number);
}
Then the code would actually compile, and output:
7
7
You can tell if a value will use copy semantics instead of move semantics if the
type of the value implements the Copy
trait, but those are Rust things I've
not explained yet—so just trust me that you never really run into issues knowing
if a value will move or copy.
In general, a value will copy instead of move if it performs no allocations
external to the type itself. A number just takes up the space defined by its
type, for example i32
takes up 32 bits so it will use copy semantics. A type
like String
does need an external allocation as the type itself cannot know
how long the text will be, so it uses move semantics.
What about strings though?
It would still be inconvenient for strings to have move semantics. They are a common data type and so it would be nice if Rust could give us a solution to this.
Fortunately, Rust does have a solution. There are two string types1.
When you define a string literal, you create a value with the &str
type.
You'll learn more about this type next lesson, but for now think of it as an
immutable sequence of UTF-8 bytes sitting somewhere in memory. It can't be
modified, resized, or do anything other than be read. Because of these
restrictions, this type can have copy semantics! In the case of a string
literal, that sequence of bytes is embeded directly into your application
binary.
We can create a string with move semantics, which I will do in the examples in
the next section, using String::from("a string literal")
. This gives you a
String
instead of an &str
, which can be resized and controls its own memory
allocation.
This is why the first sample I gave isn't real Rust code. If it was it wouldn't have nicely illustrated move semantics. The following examples show real rust code equivalents of the samples in this article, followed by their outputs when attempting to compile and run them.
Real Rust Examples
In these examples, so thay you can compile them yourself, I have unhidden the
implementation of print_{string,number}
. Do not worry about println!
for
now, as to understand it fully requires understanding macro variadics,
borrowing, and reference coercion. For now, understand that println!("{}", x)
will invoke the necessary operating system machinery to print the value of x
.
If you wish to compile and run them yourself, use
cargo new example
to create an empty Rust project in the example/
directory. Then copy the
example code into src/main.rs
. After this, run the example with
cargo run
Example 1: moving strings
fn print_string(s: String) {
println!("{}", s);
}
fn main() {
let my_string = String::from("Hello"); // String::from gives move semantics
print_string(my_string);
print_string(my_string);
}
Outputs
Compiling example v0.0.1 (/example)
error[E0382]: use of moved value: `my_string`
--> src/main.rs:8:18
|
6 | let my_string = String::from("Hello"); // String::from gives move semantics
| --------- move occurs because `my_string` has type `String`, which does not implement the `Copy` trait
7 | print_string(my_string);
| --------- value moved here
8 | print_string(my_string);
| ^^^^^^^^^ value used here after move
|
note: consider changing this parameter type in function `print_string` to borrow instead if owning the value isn't necessary
--> src/main.rs:1:20
|
1 | fn print_string(s: String) {
| ------------ ^^^^^^ this parameter takes ownership of the value
| |
| in this function
help: consider cloning the value if the performance cost is acceptable
|
7 | print_string(my_string.clone());
| ++++++++
For more information about this error, try `rustc --explain E0382`.
error: could not compile `example` (bin "example") due to previous error
Example 2: copying numbers, pass by value
fn print_number(n: i32) {
println!("{}", n);
}
fn main() {
let my_number = 7;
print_number(my_number);
print_number(my_number);
}
Outputs
Compiling example v0.0.1 (/example)
Finished dev [unoptimized + debuginfo] target(s) in 0.53s
Running `target/debug/example`
7
7
Example 3: copying string literals, pass by static reference
fn print_string(s: &str) {
println!("{}", s);
}
fn main() {
let my_string = "Hello";
print_string(my_string);
print_string(my_string);
}
Outputs
Compiling example v0.0.1 (/example)
Finished dev [unoptimized + debuginfo] target(s) in 0.78s
Running `target/debug/example`
Hello
Hello
Footnotes
-
There's actually at least 8 string types, but only two that show up in 99% of all programming. The rest exist for over-optimisation, compatability with C, and strange operating systems that insist on using UTF-16. ↩