Aliasing, Capabilities, and Ownership in Rust

Felix Klock (@pnkfelix), Mozilla

space: next slide; shift space: prev slide; esc: overview; arrows navigate http://bit.ly/2rM9w0r

What is Rust?

New programming language

  • (... and ecosystem, and development community)

Emphasizing control, safety, and speed

  • (... especially when it comes to concurrency)

Low-level hacking without fear of segfaults nor data races

  • or more simply: "Hack without Fear"

Why ...?

Why use Rust?

  • Fast code, low memory footprint
  • Go from bare metal (assembly; C FFI) ...
    ... to high-level (closures, generic containers) ...
    with zero cost (no GC, unboxed closures, monomorphization); no runtime, cross-lang interop
  • Safety and Parallelism

Safety and Parallelism

Safety

  • No segmentation faults

  • No undefined behavior

  • No data races

(Multi-paradigm) Parallelism

  • msg passing via channels

  • shared state (R/W-capabilities controlled via types)

  • use native threads... or scoped threads... or work-stealing...

Algebraic Data Types

enum BinaryTree { Leaf(i32), Node(Box<BinaryTree>, i32, Box<BinaryTree>) }
fn sample_tree() -> BinaryTree {
    let l1 = Box::new(BinaryTree::Leaf(1));
    let l3 = Box::new(BinaryTree::Leaf(3));
    let n2 = Box::new(BinaryTree::Node(l1, 2, l3));
    let l5 = Box::new(BinaryTree::Leaf(5));
    BinaryTree::Node(n2, 4, l5) }
CONCRETE: ABSTRACT: 4 (STACK) (HEAP) 2 5 sample_tree: 1 3 Node: Box Node: Leaf: Leaf: Leaf: Box (4) (1) (3) (5) (2) Box Box
  • Syntax: enum declares an algebraic (sum-of-product) data type
  • Box<T> is an owning reference to heap-allocated data of type T
  • Boxing of values is explicit (+ all variants of an enum have same size)

Ownership

enum BinaryTree { Leaf(i32), Node(Box<BinaryTree>, i32, Box<BinaryTree>) }
  • match moves input to Leaf(n) or Node(l, n, r) as appropriate
  • *expr dereferences; here, moves tree out of box to local stack slot
  • fn signature: pass-by-value + move semantics = args consumed
fn tree_weight(t: BinaryTree) -> i32 {
    match t {
        BinaryTree::Leaf(payload) => payload,
        BinaryTree::Node(left, payload, right) => {
            tree_weight(*left) + payload + tree_weight(*right)
        }
    }
}

Arguments Consumed

fn tree_weight(t: BinaryTree) -> i32 { ... }

fn consume_semantics() {
    let t = sample_tree();
    let w1 = tree_weight(t);
    let w2 = tree_weight(t);
    println!("w1: {} w2: {}", w1, w2);
}
error[E0382]: use of moved value: `t`
  --> src/a00.rs:67:17
   |
66 |     tree_weight(t);
   |                 - value moved here
67 |     tree_weight(t);
   |                 ^ value used here after move
   |
   = note: move occurs because `t` has type `BinaryTree`,
           which does not implement the `Copy` trait

Sending resources (std::thread)

let (tx, rx) = channel(); // create transmitter/receiver pair
thread::spawn(move || match rx.recv() { // spawn A1, taking ownership of `rx`
                          Ok(v_end) => {
                              println!("A1 received {:?}", v_end)
                          } // (end of scope for `v_end`; automatic clean up.)
                          Err(err) => println!("A1 receive failure {:?}", err),
                   });
let t1 = tx.clone();
thread::spawn(move || { // spawn A2, taking ownership of `t1`
                       let mut v = Vec::new(); v.push('a'); v.push('b'); v.push('c');
                       t1.send(v).expect("A2 send failure");
                   });
thread::spawn(move || { // spawn A3, taking ownership of `tx`
                       let v = vec!['1', '2', '3'];
                       tx.send(v).expect("A3 send failure");
                   });
  • Syntax: |arg, ...| expr (incl. || { expr; ... }) are closures
  • move |arg, ...| expr captures by ownership xfer ("moving")
  • println! and vec! are macros (typesafe printf; Vec building sugar)
  • Destruction 1: at scope end for v_end, buffer + contents reclaimed
  • Destruction 2: at scope end for rx, t1 + tx, channel reclaimed

Sending between threads

(tx, rx) = channel(); rx t1 = tx.clone() t1 tx tx v = Vec v = Vec v.push('1') v.push('a') v.push('2') v.push('b') v.push('3') v.push('c') tx.send(v) t1.send(v) | Ok(v) = rx.recv() println!("A1 received {:?}", v)
  • rx, t1, tx, and each v are all moved from one thread to another
  • t1 cloned from tx; (if tx moved to middle subthread, would not be available for left one)

Not everything can be sent

fn demo_sendable() {
    let (tx, rx) = channel();
    let r1 = Rc::new(vec!['a', 'b', 'c', '1', '2', '3']);
    tx.send(r1.clone()).unwrap();
    let _r2 = rx.recv().unwrap();
    println!("r1: {:?}", r1);
}

(above is fine)

  • FYI: Rc<T> is a reference-counted pointer to a heap-allocated T
  • (Like Box<T>, but with dynamically-tracked shared ownership rather than statically-tracked sole ownership)

Not everything can be sent (really)

fn demo_unsendable() {
    let (tx, rx) = channel();
    let r1 = Rc::new(vec!['a', 'b', 'c', '1', '2', '3']);
    thread::spawn(move || { let r2 = rx.recv().unwrap();
                            println!("received: {:?}", r2); });
    tx.send(r1.clone()).unwrap();
    println!("r1: {:?}", r1);
}
error[E0277]: the trait bound `Rc<Vec<char>>: Send` is not satisfied
   --> src/a00.rs:405:5
    |
405 |     thread::spawn(move || {
    |     ^^^^^^^^^^^^^ `Rc<Vec<char>>` cannot be sent between threads safely
    |
    = help: the trait `Send` is not implemented for `Rc<Vec<char>>`
    = note: required because of the requirements on the impl of `Send`
            for `Receiver<Rc<Vec<char>>>`
    = note: required because it appears within the type
            `[closure@src/a00.rs:405:19: 410:6 Receiver<Rc<Vec<char>>>]`
    = note: required by `std::thread::spawn`

Will revisit these constraints a bit more later

Digression: talk about sharing for a moment

Move semantics alone: insufficient

We are computer scientists

We can solve any problem by introducing an extra level of indirection

-- David J. Wheeler

We need references; allows distinguishing:

  • ownership transfer, versus
  • lending access temporarily

Importance of Ownership

Ownership enables: which removes:
RAII-style destructors source of memory leaks (or fd leaks, etc)
no dangling pointers many resource management bugs
no data races many multithreading heisenbugs

Do I need to take ownership here, accepting the associated resource management responsibility? Would temporary access suffice?

Good developers ask this already!

Rust forces function signatures to encode the answers

(and they are checked by the compiler)

Sharing Data

Pointers: Perhaps the Pandora's Box of Computer Science

  • &mut T, &T
  • (plus library reference types: Box<T>, Cow<T>, Rc<T>, ...)
  • T above ranges over both so-called "sized" and "unsized" types
  • Sized: &char, &mut Vec<u8>, &[i32; 16]
  • Unsized: &str, &mut [u8], &Fn() -> i32

Menagerie: &T, Box<T>, Rc<T> etc

  • T: owned instance of T, stored inline (e.g. in stack frame or record)
  • Box<T>: owned instance of T, stored on heap (so Box<T> itself is just a pointer)
  • &T: pointer to T, but not owned. Extent is limited to some static scope (possibly a scope known only to function's caller).
  • Rc<T>: ref-counted pointer to T; shared ownership. (At end of scope for our Rc<T>, we might be responsible for resource cleanup.)

Why multiple &-reference types?

  • Distinguish exclusive access from shared access

  • Enables safe, parallel API's

Ownership T
Exclusive access &mut T ("mutable")
Shared access &T ("read-only")

A &T provides limited access; cannot call methods that require ownership or exclusive access to T

A &mut T provides temporary exclusive access; even the original owner cannot mutate the object while you have the reference

But cannot claim ownership of T yourself via &mut T, unless you swap in another T to replace one you take (Rust coding pattern)

References and Slices

fn demo_references() {
    let v = vec!['a', 'b', 'c', '1', '2', '3'];
    let ref1 = &v[0..3];
    let ref2 = &v[3..6];
    let f1 = |i| println!("ref1[{}]: {:?}", i, ref1[i]);
    let f2 = |i| println!("ref2[{}]: {:?}", i, ref2[i]);
    f1(1);
    f2(2);
}
v (r) (r) 'a' ref1 'b' 'c' (r) '1' ref2 '2' '3'

Rc<T> (shared ownership)

fn demo_refcounts() {
    let r1 = Rc::new(vec!['a', 'b', 'c', '1', '2', '3']);
    let r2 = r1.clone();
    let f1 = |i| println!("v[{}]: {:?}", i, r1[i]);
    let f2 = |i| println!("v[{}]: {:?}", i, r2[i]);
    f1(1);
    f2(2);
}
r1 count: 2 Vec buf len 6 r2 cap 8 'a' 'b' 'c' '1' '2' '3'

Is &mut only way to encode mutation via reference?

(No!)

Interior Mutability: Cell + RefCell

  • Both types have mutator methods that take &self (not &mut self)
  • Cell<T>: move values in+out (via swapping, or other methods that ensure cell remains sound, e.g. get/set if T: Copy)
  • RefCell<T> provides refs to the T it holds, but dynamically enforces the rules.
  • RefCell::borrow() returns read-only Ref (many-readers), and panic!'s on outstanding mut-borrow (∴ no previous writers)
  • RefCell::borrow_mut() returns read/write RefMut (unique), and panic!'s on any outstanding borrow (∴ no previous readers or writers).

Sharing resources (rayon)

let temp_data = vec!['a', 'b', 'c', '1', '2', '3'];
rayon::scope(|s| {
    let (tx, rx) = channel(); // create transmitter/receiver pair
    s.spawn(move |_s| { // spawn A1, taking ownership of `rx`
        match rx.recv() {
            Ok(v) => println!("A1 received {:?}", v),
            Err(err) => println!("A1 receive failure {:?}", err),
        }
    });
    let data: &[char] = &temp_data; // (N.B. assigned type is *not* `&Vec`)
    let t1 = tx.clone();

    // spawn A2, taking ownership of `t1` (and a copy of `data`)
    s.spawn(move |_s| t1.send(&data[0..4]).expect("A2 send failure"));

    // spawn A3, taking ownership of `tx` (and a copy of `data`)
    s.spawn(move |_s| tx.send(&data[2..6]).expect("A3 send failure"));

}); // (`rayon::scope` waits here until all `s.spawn`'ed threads finish)

Sharing resources between threads

temp_data: (tx, rx) = channel(); rx 'a' 'b' t1 = tx.clone() 'c' t1 , data '1' '2' tx '3' tx , data s s tx.send(s) t1.send(s) | Ok(s) = rx.recv() println!("A1 received {:?}", s)
  • rx, t1, tx still moved from one thread to another.
  • data is shared reference to character slice: freely copyable
  • s's are subslices &data[0..4] + &data[2..6] (overlap is safe!)

Sending and sharing intertwine

Can send &-references and &mut-references

  • &-refs copy (as usual).
  • &mut-refs obey move semantics when sent (as usual)

Examples

fn send_ref_i32(arg: &i32) {
    rayon::scope(move |s| {
        s.spawn(move |_s| println!("arg: {:?}", arg));
    });
}

fn send_ref_vec(arg: &Vec<i32>) {
    rayon::scope(move |s| {
        s.spawn(move |_s| println!("arg: {:?}", arg));
    });
}

fn send_mut_vec(arg: &mut Vec<i32>) {
    rayon::scope(move |s| {
        s.spawn(move |_s| println!("arg: {:?}", arg));
    });
}

So far so good

"Can't always get what you want ..."

Can send &-references and &mut-references ... if data synchronized!

fn send_ref_to_cell(arg: &Cell<i32>) {
    rayon::scope(move |s| {
        s.spawn(move |_s| println!("arg: {:?}", arg));
    });
}
error[E0277]: the trait bound `Cell<i32>: Sync` is not satisfied
   --> src/a00.rs:547:5
    |
547 |     rayon::scope(move |s| {
    |     ^^^^^^^^^^^^ `Cell<i32>` cannot be shared between threads safely
    |
    = help: the trait `Sync` is not implemented for `Cell<i32>`
    = note: required because of the requirements on the impl of `Send` for `&Cell<i32>`
    = note: required because it appears within the type `[closure@src/a00.rs:547:18: 549:6 arg:&Cell<i32>]`
    = note: required by `rayon::scope`

"but [...] you get what you need"

Cell<T>: unsynchronized mutation; incompatible with Sync.

(Synchronous alternatives include AtomicUsize, Mutex<T>)

The Send traits

Send: focused on ownership transfer

But we already know that move semantics alone does not suffice

(We need our references!)

The Send and Sync traits

  • Send and Sync control cross-thread capabilities
  • T: Send implies ownership of T can be tranferred across threads
  • T: Sync implies a reference to T (e.g. &T, Arc<T>) can be shared across threads
  • Send enables Message Passing style concurrency, via channels
  • Sync enables Shared State style concurrency; only T with synchronized access is allowed
  • Rust compiler automatically marks types as Send or Sync, based on a recursive analysis of their structure.
  • Data-structure designer can opt out of above analysis. (If doing unsafe stuff, they may need to do so for soundness!)

See also: https://www.ralfj.de/blog/2017/06/09/mutexguard-sync.html

Rust as enabler of individuals

From "mere script programmer"

to "lauded systems hacker"

Programming in Rust has made me look at C++ code in a whole new light

After experiencing Rust, I dread looking at code from prior projects ... I will now see how riddled with races it was

Thanks