There’s a running joke in the functional programming community. Any Scala program can be written by combining the traverse
function the correct number of times. This blog post is dedicated to that joke.
In Rust, the Iterator
trait defines a stream of values of a specific type. Many common types provide an Iterator
interface. And the built in for
loop construct works directly with the Iterator
trait. Using that, we can easily do something like “print all the numbers in a Vec
“:
fn main() {
let myvec: Vec<i32> = vec![1, 2, 3, 4, 5];
for num in myvec {
println!("{}", num);
}
}
Let’s say we want to do something a bit different: double every value in the Vec
. The most idiomatic and performant way to do that in Rust is with mutable references, e.g.:
fn main() {
let mut myvec: Vec<i32> = vec![1, 2, 3, 4, 5];
for num in &mut myvec {
*num *= 2;
}
println!("{:?}", myvec);
}
Since we’re dedicating this post to functional programmers, it’s worth noting: this looks decidedly not-functional. “Take a collection and apply a function over each value” is well understood in FP circles—and increasingly in non-FP circles—as a map
. Or using more category-theoretic nomenclature, it’s a Functor
. Fortunately, Rust provides a map
method for Iterator
s. Unfortunately, unlike Scala or Haskell, map
doesn’t work on data types like Vec
. Let’s compare, using Haskell:
list :: [Int]
list = [1, 2, 3, 4, 5]
main :: IO ()
main = do
let newList :: [Int]
newList = map (* 2) list
print newList
The map
function from the Functor
typeclass works directly on a list. It produces a new list with the function applied to each value. Let’s try to do the most equivalent thing in Rust:
fn main() {
let myvec: Vec<i32> = vec![1, 2, 3, 4, 5];
let new_vec: Vec<i32> = myvec.map(|x| x * 2);
println!("{:?}", new_vec);
}
This fails with the error message:
no method named `map` found for struct `std::vec::Vec<i32>` in the current scope
That’s because, in Rust, map
applies to the Iterator
itself, not the underlying data structures. In order to use map
on a Vec
, we have to:
Vec
into an Iterator
map
on the Iterator
Iterator
back into a Vec
(1) can be performed using the IntoIterator
trait, which provides a method named into_iter
. And for (3), we could write our own for
loop that fills up a Vec
. But the right way is to use the FromIterator
trait. And the easiest way to do that is with the collect
method on Iterator
.
Let’s write a program that properly uses map
:
fn main() {
let myvec: Vec<i32> = vec![1, 2, 3, 4, 5];
let new_vec: Vec<i32> = myvec
.into_iter()
.map(|x| x * 2)
.collect();
println!("{:?}", new_vec);
}
Fairly straightforward, and our 3 steps turn into three chained method calls. Unfortunately, in practice, using collect
is often not quite as straightforward as this. That’s due to type inference. To see what I mean, let’s take all of the type annotations out of the program above:
fn main() {
let myvec = vec![1, 2, 3, 4, 5];
let new_vec = myvec
.into_iter()
.map(|x| x * 2)
.collect();
println!("{:?}", new_vec);
}
This gives us the very friendly message:
error[E0282]: type annotations needed
--> srcmain.rs:4:9
|
4 | let new_vec = myvec
| ^^^^^^^ consider giving `new_vec` a type
The issue here is that we don’t know which implementation of FromIterator
we should be using. This is a problem that didn’t exist in the pure FP world with map
and Functor
. In that world, Functor
‘s map
is always “shape preserving.” When you map
over a list in Haskell, the result will always be a list.
That’s not the case with the IntoIterator
/FromIterator
combination. IntoIterator
destroys the original data structure, fully consuming it and producing an Iterator
. Similarly, FromIterator
produces a brand new data structure out of thin air, without any reference to the original data structure. Therefore, an explicit type annotation saying what the output type should be is necessary. In our program above, we did this by annotating new_vec
. Another way is to use “turbofish” to annotate which collect
to use:
fn main() {
let myvec = vec![1, 2, 3, 4, 5];
let new_vec = myvec
.into_iter()
.map(|x| x * 2)
.collect::<Vec<_>>();
println!("{:?}", new_vec);
}
Note that we only needed indicate that we were collecting into a Vec
. Rust’s normal type inference was able to figure out:
myvec
should be a Vec
, since it was produced by the vec!
macroAlright, I want to announce to the world that I’ll be doubling these values. It’s easy to modify our map
-using code in Rust to do this:
fn main() {
let myvec = vec![1, 2, 3, 4, 5];
let new_vec = myvec
.into_iter()
.map(|x| {
println!("About to double {}", x);
x * 2
})
.collect::<Vec<_>>();
println!("{:?}", new_vec);
}
But Haskellers will warn you that this isn’t quite so simple. map
in Haskell is a pure function, meaning it doesn’t allow for any side-effects (like printing to the screen). You can see this in action fairly easily:
list :: [Int]
list = [1, 2, 3, 4, 5]
main :: IO ()
main = do
let newList :: [Int]
newList =
map
(x -> do
putStrLn ("About to double " ++ show x)
pure (x * 2))
list
print newList
This code won’t compile, due to the mismatch between an Int
(a pure number) and an IO Int
(an action with side effects which produces an Int
):
Couldn't match type 'IO Int' with 'Int'
Expected type: [Int]
Actual type: [IO Int]
Instead, we need to use map
‘s more powerful cousin, traverse
(a.k.a. mapM
, or “monadic map”). traverse
allows us to perform a series of actions, and produce a new list with all of their results. This looks like:
list :: [Int]
list = [1, 2, 3, 4, 5]
main :: IO ()
main = do
newList <-
traverse
(x -> do
putStrLn ("About to double " ++ show x)
pure (x * 2))
list
print newList
So why the difference between Haskell and Rust here? That’s because Rust is not a pure language. Any function can perform side effects, like printing to the screen. Haskell, on the other hand, doesn’t allow this, and therefore we need special helper functions like traverse
to account for the potential side effects.
I won’t get into the philosophical differences between the two languages. Suffice it to say that both approaches have merit, and both have advantages and disadvantages. Let’s see where the Rust approach “breaks down”, and how FromIterator
steps up to the plate.
In the example above with Haskell, we used side effects via the IO
type. However, traverse
isn’t limited to working with IO
. It can work with many different types, anything which is considered Applicative
. And this covers many different common needs, including error handling. For example, we can change our program to not allow doubling “big” numbers greater than 5:
list :: [Int]
list = [1, 2, 3, 4, 5, 6]
main :: IO ()
main = do
let newList =
traverse
(x ->
if x > 5
then Left "Not allowed to double big numbers"
else Right (x * 2))
list
case newList of
Left err -> putStrLn err
Right newList' -> print newList
Either
is a sum type, like an enum
in Rust. It’s equivalent to Result
in Rust, but with different names. Instead of Ok
and Err
, we have Right
(used by convention for success) and Left
(used by convention for failure). The Applicative
instance for it will stop processing when it encounters the first Left
. So our program above will ultimately produce the output Not allowed to double big numbers
. You can put as many values after the 6
in list
as you want, and it will produce the same output. In fact, it will never even inspect those numbers.
Coming back to Rust, let’s first simply collect all of our Result
s together into a single Vec
to make sure the basics work:
fn main() {
let myvec = vec![1, 2, 3, 4, 5, 6];
let new_vec: Vec<Result<i32, &str>> = myvec
.into_iter()
.map(|x| {
if x > 5 {
Err("Not allowed to double big numbers")
} else {
Ok(x * 2)
}
})
.collect();
println!("{:?}", new_vec);
}
That makes sense. We’ve already seen that .collect()
can take all the values in an Iterator
‘s stream and stick them into a Vec
. And the map
method is now generating Result<i32, &str>
values, so everything lines up.
But this isn’t the behavior we want. We want two changes:
new_vec
should result in a Result<Vec<i32>, &str>
. In other words, it should result in either a single Err
value, or a vector of successful results. Right now, it has a vector of success-or-failure values.Vec
once we see a value that’s too big.To make it a bit more clear, it’s easy enough to implement this with a for
loop:
fn main() {
let myvec = vec![1, 2, 3, 4, 5];
let mut new_vec = Vec::new();
for x in myvec {
if x > 5 {
println!("Not allowed to double big numbers");
return;
} else {
new_vec.push(x);
}
}
println!("{:?}", new_vec);
}
But now we’ve lost out on our map
entirely, and we’re dropping down to using explicit loops, mutation, and short-circuiting (via return
). In other words, this code doesn’t feel nearly as elegant to me.
It turns out that our original code was almost perfect. Let’s see a bit of magic, and then explain how it happend. Our previous version of the code used map
and resulted in a Vec<Result<i32, &str>>
. And we wanted Result<Vec<i32>, &str>
. What happens if we simply change the type to what we want?
fn main() {
let myvec = vec![1, 2, 3, 4, 5, 6];
let new_vec: Result<Vec<i32>, &str> = myvec
.into_iter()
.map(|x| {
if x > 5 {
Err("Not allowed to double big numbers")
} else {
Ok(x * 2)
}
})
.collect();
match new_vec {
Ok(new_vec) => println!("{:?}", new_vec),
Err(e) => println!("{}", e),
}
}
Thanks to the power of FromIterator
, this simply works! To understand why, let’s see some documentation on FromIterator
:
Takes each element in the
Iterator
: if it is anErr
, no further elements are taken, and theErr
is returned. Should noErr
occur, a container with the values of eachResult
is returned.
And suddenly it seems that Rust has implemented traverse
all along! This extra flexibility in the FromIterator
setup allows us to regain the short-circuiting error-handling behavior that FP people are familiar with in traverse
.
In contrast to traverse
, we’re still dealing with two different traits (IntoIterator
and FromIterator
), and there’s nothing preventing these from being different types. Therefore, some kind of type annotation is still necessary. On the one hand, that could be seen as a downside of Rust’s approach. On the other hand, it allows us to be more flexible in what types we generate, which we’ll look at in the next section.
And finally, it turns out we can use turbofish to rescue us yet again. For example:
fn main() {
let myvec = vec![1, 2, 3, 4, 5, 6];
let new_vec = myvec
.into_iter()
.map(|x| {
if x > 5 {
Err("Not allowed to double big numbers")
} else {
Ok(x * 2)
}
})
.collect::<Result<Vec<_>, _>>();
match new_vec {
Ok(new_vec) => println!("{:?}", new_vec),
Err(e) => println!("{}", e),
}
}
So far, we’ve only seen two implementations of FromIterator
: Vec
and Result
. There are many more available. One of my favorite is HashMap
, which lets you collect a sequence of key/value pairs into a mapping.
use std::collections::HashMap;
fn main() {
let people = vec![
("Alice", 30),
("Bob", 35),
("Charlies", 25),
].into_iter().collect::<HashMap<_, _>>();
println!("Alice is: {:?}", people.get("Alice"));
}
And due to how the FromIterator
impl for Result
works, you can layer these two together to collect a stream of Result
s of pairs into a Result<HashMap<_, _>, _>
:
use std::collections::HashMap;
fn main() {
let people = vec![
Ok(("Alice", 30)),
Ok(("Bob", 35)),
Err("Uh-oh, this didn't work!"),
Ok(("Charlies", 25)),
].into_iter().collect::<Result<HashMap<_, _>, &str>>();
match people {
Err(e) => println!("Error occurred: {}", e),
Ok(people) => {
println!("Alice is: {:?}", people.get("Alice"));
}
}
}
In the Haskell world, we have two different concepts of error collection:
Either
, which says “stop on the first error”Validation
, which says “collect all of the errors together”Validation
can be very useful for things like parsing web forms. You don’t want to generate just the first failure, but collect all of the failures together for producing a more user-friendly experience. For fun, I decided to implement this in Rust as well:
I’m tempted to write a Validation “Applicative” in Rust with a FromIterator impl to collect multiple Err values. I have no real need for this, but it still seems fun.
— Michael Snoyman (@snoyberg) October 1, 2020
Subscribe to our blog via email
Email subscriptions come from our Atom feed and are handled by Blogtrottr. You will only receive notifications of blog posts, and can unsubscribe any time.