Rust: Fearless at all Levels

Felix Klock (@pnkfelix), Mozilla

space: next slide; esc: overview; arrows navigate https://bit.ly/2pWTjC6

What is Rust?

What is Rust?

New programming language

  • (... and ecosystem, and development community)

Emphasizing control, safety, and speed

  • (... especially when it comes to concurrency)

Low-level hacking without fear of segfaults nor data races

  • or more simply: "Hack without Fear"

Why use Rust?

Safety and Parallelism

Safety

  • No segmentation faults

  • No undefined behavior

  • No data races

(Multi-paradigm) Parallelism

  • msg passing via channels

  • shared state (R/W-capabilities controlled via types)

  • use native threads... or scoped threads... or work-stealing...

Lets talk performance

Demo

Example: Sum of (even) numbers

Add even values in an integer range.

Rust:

fn do_partial_sum(start: i64, to_do: i64) -> i64 {
    let mut result = 0;
    for i in start..(start + to_do) {
        if i % 2 == 0 { result += i; }
    }
    result
}

C++:

int64_t do_partial_sum(int64_t start, int64_t to_do) {
    int64_t result = 0;
    for (int64_t i = start; i < start + to_do; i++) {
        if (i % 2 == 0) { result += i; }
    }
    return result;
}

(avoiding reduction to closed form solution; we're benchmarking!)

Parallelism

Parallelism: Rust (v0)

play: https://bit.ly/2PH9te8

let mut threads = Vec::new();
result = 0;
let mut start = 0;
while start < MAX {
    let incr = if each + val > MAX { MAX - val } else { each };
    threads.push(std::thread::spawn(|| {
        result += do_partial_sum(start, incr);
    }));
    start += sum_per_thread;
}
for t in threads {
    t.join().unwrap();
}

Parallelism: C++

int64_t each = (MAX + thread_count - 1) / thread_count;
result = 0;
std::vector<std::thread> threads;
for (int64_t val = 0; val < MAX; val += each) {
    int64_t incr = (each + val > MAX) ? (MAX - val) : each;
    threads.push_back(std::thread([&, val, incr] () {
                result += do_partial_sum(val, incr);
            }));
}
for (int i = 0; i < thread_count; i++) {
    threads[i].join();
}

Let's try our C++ code

threads average time notes
1 (TBD)
4 (TBD)
1000 (TBD)

Four runs, 1 thread

.
.
.
.
.
.
.
.

Let's try our C++ code

threads average time notes
1 (TBD)
4 (TBD)
1000 (TBD)

Four runs, 1 thread

sum of evens < 2000000000 result: 999999999000000000
1.08s
sum of evens < 2000000000 result: 999999999000000000
1.08s
sum of evens < 2000000000 result: 999999999000000000
1.07s
sum of evens < 2000000000 result: 999999999000000000
1.07s

Let's try our C++ code

threads average time notes
1 1.075s
4 (TBD)
1000 (TBD)

Four runs, 1 thread

sum of evens < 2000000000 result: 999999999000000000
1.08s
sum of evens < 2000000000 result: 999999999000000000
1.08s
sum of evens < 2000000000 result: 999999999000000000
1.07s
sum of evens < 2000000000 result: 999999999000000000
1.07s

Let's try our C++ code

threads average time notes
1 1.075s
4 (TBD)
1000 (TBD)

Four runs, 4 threads

.
.
.
.
.
.
.
.

Let's try our C++ code

threads average time notes
1 1.075s
4 (TBD)
1000 (TBD)

Four runs, 4 threads

sum of evens < 2000000000 result: 999999999000000000
0.29s
sum of evens < 2000000000 result: 999999999000000000
0.29s
sum of evens < 2000000000 result: 999999999000000000
0.29s
sum of evens < 2000000000 result: 999999999000000000
0.29s

Let's try our C++ code

threads average time notes
1 1.075s
4 0.290s
1000 (TBD)

Four runs, 4 threads

sum of evens < 2000000000 result: 999999999000000000
0.29s
sum of evens < 2000000000 result: 999999999000000000
0.29s
sum of evens < 2000000000 result: 999999999000000000
0.29s
sum of evens < 2000000000 result: 999999999000000000
0.29s

Let's try our C++ code

threads average time notes
1 1.075s
4 0.290s
1000 (TBD)

Four runs, 1000 threads

.
.
.
.
.
.
.
.

Let's try our C++ code

threads average time notes
1 1.075s
4 0.290s
1000 (TBD)

Four runs, 1000 threads

sum of evens < 2000000000 result: 999999999000000000
0.29s
sum of evens < 2000000000 result: 999999999000000000
0.30s
sum of evens < 2000000000 result: 999598999001000000
0.29s
sum of evens < 2000000000 result: 999999999000000000
0.30s

Let's try our C++ code

threads average time notes
1 1.075s (but
4 0.290s its all
1000 0.295s bogus!)

Four runs, 1000 threads

sum of evens < 2000000000 result: 999999999000000000
0.29s
sum of evens < 2000000000 result: 999999999000000000
0.30s
sum of evens < 2000000000 result: 999598999001000000
0.29s                                ---------[uh oh]
sum of evens < 2000000000 result: 999999999000000000
0.30s

Data Race!

Data Race!

            threads.push_back(std::thread([&, val, incr] () {
                                        // ~
                        result += do_partial_sum(val, incr);
                     // ~~~~~~
                    }));

Revisiting Rust version

play: https://bit.ly/2PH9te8

let mut threads = Vec::new();
result = 0;
let mut start = 0;
while start < MAX {
    let incr = if each + val > MAX { MAX - val } else { each };
    threads.push(std::thread::spawn(|| {
        result += do_partial_sum(start, incr);
    }));
    start += sum_per_thread;
}
for t in threads {
    t.join().unwrap();
}

Revisiting Rust version

play: https://bit.ly/2PH9te8

let mut threads = Vec::new();
result = 0;
let mut start = 0;
while start < MAX {
    let incr = if each + val > MAX { MAX - val } else { each };
    threads.push(std::thread::spawn(|| {
        result += do_partial_sum(start, incr);
    }));
    start += sum_per_thread;
}
for t in threads {
    t.join().unwrap();
}

Why no timings for Rust version

play: https://bit.ly/2PH9te8

error[E0373]: closure may outlive the current function, but it 
      borrows `result`, which is owned by the current function
  --> partsum/src/v0.rs:47:37
   |
47 |     threads.push(std::thread::spawn(|| {
   |                                     ^^ may outlive borrowed
   |                                        value `result`
48 |         result += do_partial_sum(start, incr);
   |         ------ `result` is borrowed here
help: to force the closure to take ownership of `result` (and any
      other referenced variables), use the `move` keyword

The next version

Version 1

(our array indexes start at 0)

Lets try doing exactly what the compiler suggests

Parallelism: Rust, v1

play: https://bit.ly/2OuTOC8

let mut threads = Vec::new();
result = 0;
let mut start = 0;
while start < MAX {
    let incr = if each + val > MAX { MAX - val } else { each };
    threads.push(std::thread::spawn(move || { // <-- `move` is new
        result += do_partial_sum(start, incr);
    }));
    start += sum_per_thread;
}
for t in threads {
    t.join().unwrap();
}

It compiles!

Let's try our Rust code (v1)

play: https://bit.ly/2OuTOC8

Four runs, 1 thread

.
.
.
.
.
.
.
.

Let's try our Rust code (v1)

play: https://bit.ly/2OuTOC8

Four runs, 1 thread

result: 0
0.03s
result: 0
0.03s
result: 0
0.03s
result: 0
0.03s

"Once it compiles, it works"

Not this time

(sorry to disappoint)

Rust will not prove your code correct

What went wrong in v1

let mut threads = Vec::new();
result = 0;
let mut start = 0;
while start < MAX {
    let incr = if each + val > MAX { MAX - val } else { each };
    threads.push(std::thread::spawn(move || { // <-- `move` is new
        result += do_partial_sum(start, incr); // <-- `result` copied!
    }));
    start += sum_per_thread;
}
for t in threads {
    t.join().unwrap();
}

Ownership

Rust types

Generic Types, Bounded Polymorphism

fn choose<T>(x: T, y: T) -> T { return x; }

fn sum<T: Add>(x: T, y: T) -> T::Output { return x + y; }

fn sum3<T>(x: T, y: T, z: T) -> T where T: Add<Output=T> {
    return x + y + z;
}

Type Constructions

Move Copy Copy if T:Copy
Vec<T>, String, ... i32, char, ... [T; n], (T1,T2,T3), ...

Ownership is important

Ownership enables: which removes:
RAII-style destructors a source of memory leaks (or fd leaks, etc)
no dangling pointers many resource management bugs
no data races many multithreading heisenbugs

Do I need to take ownership here, accepting the associated resource management responsibility? Would temporary access suffice?

Good developers ask this already!

Rust forces function signatures to encode the answers

(and they are checked by the compiler)

Sharing Data: Ownership and References

More Rust types

References to data

  • &mut T, &T

  • (plus library reference types: Box<T>, Cow<T>, Rc<T>, ...)

Version 2

Parallelism: Rust, v2

play: https://bit.ly/2Crmsx1

let mut threads = Vec::new();
result = 0;
let mut start = 0;
while start < MAX {
    let result_ptr = &mut result; // <-- *borrowing* explictly
    let incr = if each + start > MAX { MAX - start } else { each };
    threads.push(std::thread::spawn(move || {
        *result_ptr += do_partial_sum(start, incr);
    }));
    start += each;
}
for t in threads {
    t.join().unwrap();
}

No go

play: https://bit.ly/2Crmsx1

error[E0597]: `result` does not live long enough
  --> partsum/src/v2.rs:46:35
   |
46 |             let result_ptr = &mut result;
   |                                   ^^^^^^ borrowed value does not
   |                                          live long enough
...
59 | }
   | - borrowed value only lives until here
   |
   = note: borrowed value must be valid for the static lifetime...

Borrows cannot outlive their owners

and std::thread::spawn creates a thread that may outlive its parent!

Must threads do this?

Do all threads terminate at arbitrary times?

Alternative threading APIs

Cargo and crates.io

Rayon

https://crates.io/crates/rayon

Crossbeam

https://crates.io/crates/crossbeam

Version 3

Parallelism: Rust, v3

play: https://bit.ly/2pZOTe4

let mut result = 0;
crossbeam::thread::scope(|the_scope| { // <-- create scope here
    let mut threads = Vec::new();
    let mut start = 0;
    let incr = if each + start > MAX { MAX - start } else { each };
    while start < MAX {
        let result_ptr = &mut result; // <-- (still borrowing explicitly)
        threads.push(the_scope.spawn(move || {
            *result_ptr += do_partial_sum(start, incr);
        }));
        start += each;
    }
    for t in threads {
        t.join().unwrap();
    } // now we *know* all threads finish while `result` still on stack
});

Still no go

play: https://bit.ly/2pZOTe4

error[E0499]: cannot borrow `result` as mutable more than once at a
              time
  --> src/main.rs:24:30
   |
24 |    let result_ptr = &mut result;
   |                     ^^^^^^^^^^^ mutable borrow starts here in
   |                                 previous iteration of loop

Why multiple &-reference types?

  • Distinguish exclusive access from shared access

  • Enables safe, parallel API's

Ownership T
Exclusive access &mut T ("mutable")
Shared access &T ("read-only")

Exclusive access

Rust does not allow simultaneous mutable borrows of unsynchronized data

This is usually ensured via static analysis.

(But some standard library types enforce this rule via dynamic checks.)

So what can we do?

  • In both C++ and in Rust, you can easily eliminate the data race
  • common approach: can wrap a mutex around result
  • better: use atomic operation (e.g. compare-and-swap, fetch-and-add)
  • better still: use separate result receiver for each thread, then add those up at end.
  • Point is: Rust forces you to resolve this bug

  • C++ allows silent, scheduler dependent failure (which may arise only rarely)

Version 4

Parallelism: Rust, v4

play: https://bit.ly/2QXsCsq

let mut threads = Vec::new();
result = 0;
let mut start = 0;
while start < MAX {
    let incr = if each + start > MAX { MAX - start } else { each };
    threads.push(std::thread::spawn(move || {
        return do_partial_sum(start, incr);
    }));
    start += each;
}
for t in threads {
    // updates result solely on main thread, after each join.
    result += t.join().unwrap();
}

In case of pedantry

Rust timings

play: https://bit.ly/2QXsCsq

threads average time notes
1 0.820s Slightly faster
4 0.220s than earlier C++ code;
1000 0.235s dunno why

Four runs, 1000 threads

sum of evens < 2000000000 result: 999999999000000000
0.23s
sum of evens < 2000000000 result: 999999999000000000
0.23s
sum of evens < 2000000000 result: 999999999000000000
0.24s
sum of evens < 2000000000 result: 999999999000000000
0.24s

Other cool stuff

Other cool stuff

Many things in Rust that leverage or extend ownership and borrowing to get fast, flexible, safe code

  • Send and Sync traits
  • Lifetimes
  • Object-orientation via trait-references
  • Unboxed closures
  • FFI interop

Growing community

http://rustconf.com/, https://www.rust-belt-rust.com/ (USA)

https://rustfest.eu/ (EU)

Rust Paris: https://www.meetup.com/Rust-Paris/

Rust as enabler of individuals

From "mere script programmer"

to "lauded systems hacker"

Programming in Rust has made me look at C++ code in a whole new light

Thanks