One of the things I've always found mysterious about the Rust type system is the distinction between generic type parameters and associated types. They both allow one to define a trait with a placeholder type. When does one choose one over the other?
I recently ran into a surprising bug in GoogleTest Rust. The investigation of the cause and the eventual solution helped to shed some light on the difference between these concepts. I've tried to distill my learnings here.
A surprising limitation
GoogleTest is a test assertion library which uses matchers to specify what aspect of a value is to be asserted. For example, eq(123) asserts that the value is equal to the number 123, lt(123) asserts that it is a number less than 123, and gt(123) asserts that it is a number greater than 123. Matchers can be combined with the and method. For example, to assert that a number is strictly between 0 and 10, one can use:
verify_that!(value, gt(0).and(lt(10)))
To my surprise, I discovered that this did not work when the actual value being matched is a string. If we try the following with GoogleTest 0.5.0:
verify_that!("A string", starts_with("A").and(ends_with("string")))
it does not compile:
error[E0282]: type annotations needed
--> example/src/example.rs:123:51
|
123 | verify_that!("A string", starts_with("A").and(ends_with("string")))
| ^^^
|
help: try using a fully qualified path to specify the expected types
|
123 | verify_that!("A string", <str_matcher::StrMatcher<&str> as conjunction_matcher::AndMatcherExt<T>>::and::<str_matcher::StrMatcher<&str>>(starts_with("A"), ends_with("string")))
| +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ~
For more information about this error, try `rustc --explain E0282`.
To understand why this is happening, we need to look at how matchers are built.
The Matcher trait and generics
At the centre of GoogleTest is the trait Matcher. It uses a placeholder, which we call ActualT, to specify the type of the value being matched. Rust provides two ways to express this. One can define a generic trait with a type parameter:
trait Matcher<ActualT> {
fn matches(&self, actual: &ActualT) -> bool;
}
Or one can add an associated type to the trait:
trait Matcher {
type ActualT;
fn matches(&self, actual: &Self::ActualT) -> bool;
}
We initially opted for the first option. Let’s consider what this implies.
A concrete type substituted for the type parameter becomes a part of the type name itself. One needs to specify it in order to name the type, that is, Matcher is not a type, but Matcher<String> is. Using a different type argument changes the type. For example, Matcher<&str> and Matcher<String> are two distinct types in Rust. One consequence of this is that a struct can implement the same generic trait twice, as long as different type arguments are provided to the trait:
impl Matcher<&str> for StringMatcher {…}
impl Matcher<String> for StringMatcher {…}
This means that the same matcher can match a string slice &str or an owned string String. Both of the following work:
let string_reference = "A string";
verify_that!(string_reference, starts_with("A"))
let owned_string = String::from("A string");
verify_that!(owned_string, starts_with("A"))
Now consider what happens when we try to combine matchers with and(). The method and() must be generic with respect to the matcher supplied as its argument:
trait Matcher<ActualT> {
…
fn and(self, right: impl Matcher<ActualT>) -> impl Matcher<ActualT> {…}
}
To resolve the call to and(), the compiler needs to identify the trait implementation being used. To do this, it must know the type ActualT. If the concrete matcher implements the generic Matcher trait for more than one type – as is the case when matching strings – the compiler does not know which one to pick! After all, the two implementations could define different and() bodies:
impl<'a> Matcher<&'a str> for StringMatcher {
…
fn and(self, right: impl Matcher<&'a str>) -> impl Matcher<&'a str> {
…
FooMatcher(…)
}
}
impl Matcher<String> for StringMatcher {
…
fn and(self, right: impl Matcher<String>) -> impl Matcher<String> {
…
BarMatcher(…)
}
}
See this playground for a complete example.
Associated types
Now we consider associated types and how they differ from generic type parameters. It will turn out that using an associated type for ActualT in the Matcher trait fixes the aforementioned ambiguity.
An associated type in a trait is a requirement, the same way as methods declared in traits are requirements. Requirements must be satisfied by the trait implementation. For methods, the body must be provided. For associated types, a concrete type must be named.
Associated types are sometimes called a trait’s output types and generic parameters the trait’s input types. The user of a trait specifies arguments for its generic parameters. An implementation of the trait specifies its associated types.
So, if we used an associated type, how would the implementation specify the type of the actual value being matched? Consider the situation above, where a single matcher can be matched against both &str and String. Unlike the case with type parameters, we can’t implement the trait twice with two different associated types:
impl Matcher for StringMatcher {
type ActualT = String;
…
}
impl Matcher for StringMatcher { // Error! Implementing Matcher twice for StringMatcher!
type ActualT = &str;
…
}
This is disallowed because the compiler needs to pick a trait implementation every time we use StringMatcher as a Matcher. For example:
let x: <StringMatcher as Matcher>::ActualT; // Is this a String or an &str?
So, StringMatcher can't match both &str and String. We need two distinct types:
struct StringMatcherValue;
impl Matcher for StringMatcherValue {
type ActualT = String;
…
}
struct StringMatcherSlice;
impl Matcher for StringMatcherSlice {
type ActualT = &str;
…
}
More succinctly, we can make StringMatcher itself generic with respect to the type it matches:
struct StringMatcher<ActualT> {…}
impl<ActualT> Matcher for StringMatcher<ActualT> {
type ActualT = ActualT;
…
}
In other words, the type of the matcher itself -- not the implementation of the Matcher trait -- encodes the type of the value it matches.
How this change fixes the ambiguity
Adding the constraint that concrete matchers must specify the concrete type they match solves the problem we had with and() and string matching. Let’s look at (roughly) the new definition of and():
trait Matcher {
type ActualT;
…
fn and(self, right: impl Matcher<ActualT = Self::ActualT>) ->
impl Matcher<ActualT = Self::ActualT>
}
This works with strings as one would expect:
verify_that!("A string", starts_with("A").and(ends_with("string")))
// ActualT is &str because that is the type on which the final matcher is used
verify_that!(String::from("A string"), starts_with("A").and(ends_with("string")))
// ActualT is String because that is the type on which the final matcher is used
Rust’s type inference is powerful enough to allow the type ActualT to be inferred by how the matcher is used. The type against which the matcher starts_with("A").and(ends_with("string")) is matched is &str in the first example above and String in the second. Each usage constrains what ActualT can be in a valid compilation unit. The type constraints in the definition of and() carry the inferred type for ActualT through each of the constituent matchers.
Importantly, for each concrete matcher type, there is only one implementation of the trait Matcher and therefore guaranteed only one implementation of the and() method. The Rust compiler does not have to choose which of two concrete trait implementations it must pick as it did when using a generic type parameter. It has a way to infer ActualT, but it does not know how to choose which is the right trait implementation just by looking at how the value is used.
What can we say in general?
The above case feels quite special. What can we say in general about which option to choose?
The Rust Book as well as Rust by Example both frame the choice to use associated types instead of generic type parameters in terms of ergonomics. With generic type parameters, the trait may be implemented more than once but the trait user must specify the parameters. With associated types, there can be only one trait implementation, but the name of the trait does not include the associated type. The example above shows that there is more to this difference: the wrong choice can create surprising limitations on the use of your API.
A quick look at the Rust standard library shows a general preference for associated types in traits. Common traits like Iterator and Future use associated types rather than type parameters.
There are some cases where generic type parameters are necessary:
- The same generic trait instantiated with more than one concrete type should be implemented for a given concrete type. For example, with the
FromandTryFromtraits, since one might want to convert from many different types into a single type, so one implements the trait with each of the possible source types. - The associated type would have to carry additional information, such as a lifetime. The recently introduced generic associated types can help, but they still have some limitations. A use case which falls into one of these limitations may require the use of type parameters.
As a general rule, it's best to use the least powerful feature of a language to achieve your goal. Thus I would recommend using associated types by default. Only if you need the extra flexibility should you use generic type parameters.
Conclusion
The decision of whether to use a type parameter or an associated type is important. I had naively made the decision to use a type parameter early on, and this turned out to lead to a surprising limitation in the library. Changing course after the fact was fairly expensive. Understanding conceptually what is behind the difference may help in making an informed decision. But when in doubt, use associated types.