Phil Booth

Existing by coincidence, programming deliberately

Refactoring with Rust macros

Refactoring boilerplate code is always easy in dynamically-typed languages, but sometimes takes a bit more effort when constrained by strong typing. This is something I was puzzling over recently, when the penny dropped for me about how Rust's macros can be used to bridge the gap.

If you have control over all of the code in question, macros probably aren't needed of course. Some combination of generics, traits and enums would typically provide a better (and more readable) solution. But there are times when the types involved are out of your control and that is a niche which macros can thrive in.

Here is a basic example I encountered last week. Actix-web exports a TestServer struct that helps you test your endpoints. TestServer::new expects a configuration function that takes one argument of type TestApp:

TestServer::new(|app| {
    // Set up routes, resources and middleware
    // by calling methods on `app`...
})

For the production code there is a similar App type, which has methods with the same signatures as the ones on TestApp. Since the routes you want to test are likely the same as the routes on your production server, a natural next step might be to try and write a common route-setup function that works in both contexts. Unfortunately though, TestApp and App are not related. Those "common" methods aren't inherited from some shared trait, they're defined independently on each structure.

So in order to write a function that sets routes on either App or TestApp, you'd have to wrap them in an enum and write code to manually forward all of the method calls to the inner structs. Looking up the associated type information to get those definitions right is tedious busywork, but a simple macro allows you to skip it instead:

macro_rules! init_routes {
    ($app:expr) => {
        // All the initialisation code stays the same,
        // methods are just called on `$app` instead...
    }
}

Now you can invoke the macro from both your test setup and the production code, without it needing to know anything about the type information for $app:

TestServer::new(|app| {
    init_routes!(app);
})

The compiler will concern itself with type-checking later, after the macro has been expanded. All the macro cares about is whether its match arm is satisfied.

Refactoring types

Macros can also be used to eliminate duplication where there is no executable code involved at all, only type information.

For instance, staying with actix, let's say you have an actor that handles some messages:

pub struct MyActor {
    // Whatever state your actor needs...
}

impl Handler<Foo> for MyActor {
    type Result = Result<HashMap<String, String>, Error>;

    fn handle(&mut self, msg: Foo, context: &mut self::Context) -> Self::Result {
        let mut result = HashMap::new();

        // Populate `result` somehow...

        Ok(result)
    }
}

impl Handler<Bar> for MyActor {
    type Result = Result<bool, Error>;

    fn handle(&mut self, msg: Foo, context: &mut self::Context) -> Self::Result {
        let mut result = false;

        // ...

        Ok(result)
    }
}

The code to define the Foo and Bar messages might look something like this:

pub struct Foo {
    pub id: String,
    pub wibble: String,
}

impl Message for Foo {
    type Result = <MyActor as Handler<Foo>>::Result;
}

pub struct Bar {
    pub id: String,
    pub blee: String,
}

impl Message for Bar {
    type Result = <MyActor as Handler<Bar>>::Result;
}

And that pattern might be repeated many more times for other message types too. Conventional refactoring is immediately off the agenda here because we only have types, properties and the Message trait to work with. But that's all meat and drink for a macro:

macro_rules! message {
    ($message:ident {$($property:ident: $type:ty),*}) => {
        pub struct $message {
            $(pub $property: $type,)*
        }

        impl Message for $message {
            type Result = <MyActor as Handler<$message>>::Result;
        }
    }
}

With that in place your message boilerplate now looks like this:

message!(Foo {
    id: String,
    foo: String
});

message!(Bar {
    id: String,
    bar: String
});

An important introduction here was the pair of $( and ),* wrapping the declarations of $property and $type in the match arm of the macro. That denotes repetition and says the enclosed portion of the match can be repeated zero or more times, with a , separating each item. Replacing the * with a + would change that to one or more times and the , could be replaced by anything you like (including nothing at all). But , is a good choice here because it makes the macro more intuitive.

Nested macros

That's all well and good for straightforward cases, but what about when a refactoring has many levels? Perhaps there is a core pattern to be extracted in addition to higher levels that depend on the core? This can also be achieved, although there are a couple of gotchas to be careful of.

Returning to the previous example, you'll have noticed that Foo and Bar share a common id property. If there are no other message types, we could just move id into the macro body inline. But what if there are Baz and Qux messages, which don't have an id property? Nested macros can help you with that:

macro_rules! id_message {
    ($message:ident {$($property:ident: $type:ty),*}) => {
        message!($message {
            id: String
            $(, $property: $type)*
        });
    }
}

id_message!(Foo {
    foo: String
});

id_message!(Bar {
    bar: String
});

message!(Baz {
    baz: String
});

message!(Qux {
    qux: String
});

Another way of expressing the same thing might have been to try and write a higher-order macro, nesting one macro_rules! directly inside another. But in this case that wouldn't work because nested repetition is ambiguous syntax:

macro_rules! message_macro {
    ($macro:ident {$($common_property:ident: $common_type:ty),*}) => {
        macro_rules! $macro {
            ($message:ident {$($property:ident: $type:ty),*}) => {
                pub struct $message {
                    $(pub $common_property: $common_type,)*
                    $(pub $property: $type,)*
                }

                impl Message for $message {
                    type Result = <MyActor as Handler<$message>>::Result;
                }
            }
        }
    }
}

message_macro!(message {});

message_macro!(id_message {
    id: String
});

Here, it would require some magical thinking to infer that we meant for $($property:ident: $type:ty),* to be interpreted as part of the child macro's match arm rather than the parent macro's body. The compiler very reasonably points this out to us with the following error message:

error: attempted to repeat an expression containing no syntax variables matched as repeating at this depth
 --> src/main.rs:4:31
  |
4 |             ($message:ident {$($property:ident: $type:ty),*}) => {
  |

The second obstacle to be wary of with nesting is macro hygiene. Rust's macros are hygienic, which means they each have their own context for expansion. An identifier introduced by an inner context may not be referenced by outer layers, instead you must pass the identifier in from outside. Even if the expansion looks like it would make sense when rolled out manually, the compiler will still complain.

More concretely, consider the following macros for defining route handlers with actix-web:

macro_rules! endpoint {
    ($handler:ident: $dispatcher:ident ($path_type:ty) {$($property:ident: $value:expr),*}) => {
        pub fn $handler(
            (path, state): (Path<$path_type>, State<ServerState>),
        ) -> FutureResponse<HttpResponse> {
            state
                .actor
                .send($dispatcher {
                    $($property: $value),*
                })
                .from_err()
                .and_then(|res| match res {
                    Ok(body) => Ok(HttpResponse::Ok().json(body)),
                    Err(_) => Ok(HttpResponse::InternalServerError().into()),
                })
                .responder()
        }
    }
}

macro_rules! uid_endpoint {
    ($handler:ident: $dispatcher:ident) => {
        endpoint! {
            $handler: $dispatcher (UidParam) {
                uid: path.uid.clone()
            }
        }
    }
}

uid_endpoint! would fail compilation here with:

error[E0425]: cannot find value `path` in this scope

Even though we know path exists because endpoint! always declares it, we still have to pass the identifier in so that uid_endpoint! is allowed to reference it:

macro_rules! endpoint {
    ($handler:ident: $dispatcher:ident ($path:ident: $path_type:ty) {$($property:ident: $value:expr),*}) => {
        pub fn $handler(
            ($path, state): (Path<$path_type>, State<ServerState>),
        ) -> FutureResponse<HttpResponse> {
            state
                .actor
                .send($dispatcher {
                    $($property: $value),*
                })
                .from_err()
                .and_then(|res| match res {
                    Ok(body) => Ok(HttpResponse::Ok().json(body)),
                    Err(_) => Ok(HttpResponse::InternalServerError().into()),
                })
                .responder()
        }
    }
}

macro_rules! uid_endpoint {
    ($handler:ident: $dispatcher:ident) => {
        endpoint! {
            $handler: $dispatcher (path: UidParam) {
                uid: path.uid.clone()
            }
        }
    }
}

Macro debugging

Something that's obvious from the examples in this post is that macros can get pretty grawlixy and hard to read at times. If the compiler is complaining about some code in one of your macros and you're struggling to identify the problem, it can be helpful to look at the rolled-out macro expansions. There is a compiler flag that lets you print them from the command line:

rustc -Z unstable-options --pretty expanded <MODULE PATH>

Where <MODULE PATH> is the path to the source module containing the macro you want to print.

A general rule of thumb I've found helpful is to try and keep lower-level macros as simple as you can, limiting all repetition to just the outermost macros where possible.