Lesson 2: Move and Copy

It is time for us to begin exploring the theory behind Rust, and what makes it tick.

The first thing to understand is the two problems that Rust solves.

  1. Memory safety
  2. Thread safety

These are two classes of error entirely prevented by Rust design, and are significant problems to try and solve. Errors in these domains make up a huge share of the security vulnerabilities and major errors that have plagued software. Many errors would be prevented by Rusts language design, including but not limited to:

  • OpenSSL Heartbleed
  • NetUSB run_init_netbus
  • SecureTransfer goto fail
  • GNUTLS goto cleanup
  • SChannel Winshock
  • JCE Bleichenbacher Attack
  • glibc gethostby (GHOST)

To begin solving these, we will first address memory safety. This is, at a high level, ensuring that all memory accessed

  1. Exists
  2. Contains valid data

To address this lets start with an example: the following is not Rust code you can compile, because im focusing on concepts not concretes, but consider the following.

function print_string(s) {
  /* internals hidden */
}

function main() {
  let my_string = "Hello";
  print_string(my_string);
  print_string(my_string);
}

What would you expect the above code to do. For many languages the answer is "print Hello twice". Rust takes the more controversial answer of "not compiling". The following is an approximation of the error you would see if the above was real rust code.

error[E0382]: use of moved value: `my_string`
 --> src/main.rs:8:18
  |
6 |     let my_string = "Hello";
  |         --------- move occurs because `my_string` has type `String`, which does not implement the `Copy` trait
7 |     print_string(my_string);
  |                  --------- value moved here
8 |     print_string(my_string);
  |                  ^^^^^^^^^ value used here after move
  |

This is because of the first and most critical thing to understand about Rust: move semantics.

One of the steps Rust must take to ensure memory safety is to know exactly when every piece of data is destroyed. If you attempt to use memory after your program releases it to the operating system you create a "use after free", this is undefined behaviour and may cause nasal demons.

In order to ensure memory safety the Rust compiler meticulously tracks and enforces when all memory is acquired and released, when all values are created and destroyed.

  • In Rust all values have a scope.
  • Values are destroyed (and any memory they use is freed) when they go out of scope.
  • All values have an owner.
  • Values go out of scope when their owner goes out of scope.
  • The default owner of a value in a variable is the block the variable was declared in.
  • A block is a collection of statements enclosed by braces.
  • A block goes out of scope when it ends (}).

In the example I gave earlier, there are two blocks, one for the main function, and one for the print_string function.

In real Rust code, you can create blocks anywhere you like by sprinkling {} around the statements you want in the block. We aren't writing real Rust code yet though, so let's not worry about that yet.

Move Semantics

So, values are in scope from when they are created, and in scope until the block they were created in ends. This doesn't answer why our code from earlier will refuse to compile though, as it would imply that the "Hello" value should be in scope until the end of main.

However this value is used inside the print_string function, so the value must also be available inside that functions block. There are two approaches most languages take to solving this problem:

Some languages would choose to make a copy of the data, duplicating it in memory, and making the duplicate available in the function. This is called pass by value and may look something like this:

function print_string(s) {
  /* internals hidden */
}

function main() {
  let my_string = "Hello";
  let my_string_copy0 = copy(my_string);
  print_string(my_string_copy0);
  let my_string_copy1 = copy(my_string);
  print_string(my_string_copy1);
}

Others might choose to use indirection and have the "Hello" value be allocated in some other location and then give both main and print_string access to this indirection, this is called pass by reference and may look something like this:

function print_string(s) {
  /* internals hidden */
}

function main() {
  let my_string_ref = allocate("Hello");
  let my_string_ref0 = copy(my_string_ref);
  print_string(my_string_ref0);
  let my_string_ref1 = copy(my_string_ref);
  print_string(my_string_ref1);
}

Rust chooses neither of these approaches, and uses "pass by move" instead. When a function call occurs, parameters are moved into the scope (block) of the function being called. We say the function being called "takes ownership" of the values. This reveals why the example code is rejected in Rust.

When my_string is passed to print_string, "Hello" is moved from the block created by the main function into the block created by the print_string function (and is now named s). Then, when that first print_string function ends and the parameter s goes out of scope, the value "Hello" is released and destroyed. This means that "Hello" no longer exists to be passed to the second print_string call.

Hopefully the error message should make more sense now.

You are probably now thinking something along the lines of "Rust is really inconvenient because it doesn't let me use any values twice", but the above isn't the whole story.

Copy Semantics

Move semantics aren't the only way values are used in Rust. Values which perform no large memory allocation and require no complex cleanup, such as many of Rust's primitive types, use copy semantics. This makes them pass by value. If the first example used a number, for example:

function print_number(n) {
  /* internals hidden */
}

function main() {
  let my_number = 7;
  print_number(my_number);
  print_number(my_number);
}

Then the code would actually compile, and output:

7
7

You can tell if a value will use copy semantics instead of move semantics if the type of the value implements the Copy trait, but those are Rust things I've not explained yet—so just trust me that you never really run into issues knowing if a value will move or copy.

In general, a value will copy instead of move if it performs no allocations external to the type itself. A number just takes up the space defined by its type, for example i32 takes up 32 bits so it will use copy semantics. A type like String does need an external allocation as the type itself cannot know how long the text will be, so it uses move semantics.

What about strings though?

It would still be inconvenient for strings to have move semantics. They are a common data type and so it would be nice if Rust could give us a solution to this.

Fortunately, Rust does have a solution. There are two string types1.

When you define a string literal, you create a value with the &str type. You'll learn more about this type next lesson, but for now think of it as an immutable sequence of UTF-8 bytes sitting somewhere in memory. It can't be modified, resized, or do anything other than be read. Because of these restrictions, this type can have copy semantics! In the case of a string literal, that sequence of bytes is embeded directly into your application binary.

We can create a string with move semantics, which I will do in the examples in the next section, using String::from("a string literal"). This gives you a String instead of an &str, which can be resized and controls its own memory allocation.

This is why the first sample I gave isn't real Rust code. If it was it wouldn't have nicely illustrated move semantics. The following examples show real rust code equivalents of the samples in this article, followed by their outputs when attempting to compile and run them.

Real Rust Examples

In these examples, so thay you can compile them yourself, I have unhidden the implementation of print_{string,number}. Do not worry about println! for now, as to understand it fully requires understanding macro variadics, borrowing, and reference coercion. For now, understand that println!("{}", x) will invoke the necessary operating system machinery to print the value of x.

If you wish to compile and run them yourself, use

cargo new example

to create an empty Rust project in the example/ directory. Then copy the example code into src/main.rs. After this, run the example with

cargo run

Example 1: moving strings

fn print_string(s: String) {
    println!("{}", s);
}

fn main() {
    let my_string = String::from("Hello"); // String::from gives move semantics
    print_string(my_string);
    print_string(my_string);
}

Outputs

Compiling example v0.0.1 (/example)
error[E0382]: use of moved value: `my_string`
 --> src/main.rs:8:18
  |
6 |     let my_string = String::from("Hello"); // String::from gives move semantics
  |         --------- move occurs because `my_string` has type `String`, which does not implement the `Copy` trait
7 |     print_string(my_string);
  |                  --------- value moved here
8 |     print_string(my_string);
  |                  ^^^^^^^^^ value used here after move
  |
note: consider changing this parameter type in function `print_string` to borrow instead if owning the value isn't necessary
 --> src/main.rs:1:20
  |
1 | fn print_string(s: String) {
  |    ------------    ^^^^^^ this parameter takes ownership of the value
  |    |
  |    in this function
help: consider cloning the value if the performance cost is acceptable
  |
7 |     print_string(my_string.clone());
  |                           ++++++++

For more information about this error, try `rustc --explain E0382`.
error: could not compile `example` (bin "example") due to previous error

Example 2: copying numbers, pass by value

fn print_number(n: i32) {
    println!("{}", n);
}

fn main() {
    let my_number = 7;
    print_number(my_number);
    print_number(my_number);
}

Outputs

   Compiling example v0.0.1 (/example)
    Finished dev [unoptimized + debuginfo] target(s) in 0.53s
     Running `target/debug/example`

7
7

Example 3: copying string literals, pass by static reference

fn print_string(s: &str) {
    println!("{}", s);
}

fn main() {
    let my_string = "Hello";
    print_string(my_string);
    print_string(my_string);
}

Outputs

   Compiling example v0.0.1 (/example)
    Finished dev [unoptimized + debuginfo] target(s) in 0.78s
     Running `target/debug/example`

Hello
Hello

Footnotes

  1. There's actually at least 8 string types, but only two that show up in 99% of all programming. The rest exist for over-optimisation, compatability with C, and strange operating systems that insist on using UTF-16.