One project I’ve been working on for a while now is WhatYouMean - A terminal interface to see word definitions. I’m reasonably proud of it, even having it linked on the homepage, but it’s really nothing special. Just a pretty simple thing that nobody but me really uses that I for some reason ended up really liking. It was only started as an “oooo, that seems like it’d be fun to do” project rather than to solve any actual problem I had - normally googling a definition is basically no hassle at all, especially when I’ve got my phone available - but alas, it ended up becoming something I’ve worked on repeatedly for nearly a year. But enough about that, let’s get into the actual story.

The Basics

About 1 month ago at the time of writing this (early Dec 2022 if you’re too lazy to calculate yourself), I decided that I’d change the API that WYM uses to Wordnik. It’s used by quite a few applications that people actually use (DuckDuckGo’s definition embeds for example) and it has a free tier, so why not give it a shot? For some reason, I also decided that I wanted to have an API key embedded into the application itself so people didn’t need their own key. Not the best of ideas in hindsight, but there’s no real harm in it, so I gave it a shot.

Initially, the code for selecting an API key to use looked like this:

20let key = &args.use_key.unwrap_or(std::env::var("WORDNIK_API_KEY")?);

It may be just a one-liner, but I’ll go over it for those unfamiliar with Rust syntax. let key begins an assignment to the variable key. args.use_key accesses the use_key field on the args variable, which is a struct (called Args because I’m lazy at naming things). unwrap_or is a function that takes an Option<T> (an enum with 2 variants, being Some(T) & None) and either gives you the T in Some(T) or gives you the value passed to it, which must also be of type T. T here is used as a stand-in for any type, such as a String. std::env::var("WORDNIK_API_KEY") looks for an environment variable at runtime called WORDNIK_API_KEY & returns it, or returns an error if one occurred. The ? automatically returns from the outer function (in this case main) with an error, or gives you the value. This is necessary because the return type of std::env::var is Result<T, E>, but unwrap_or needs its argument to be of type T. All that is a really roundabout way of saying that this either uses the value the user supplies as an API key or looks for it on their system.

Did you really have to say all that?

Nah, but it’s nice to know, right?

This line later became the following:

20let key = if let Some(key) = args.use_key {
21        key
22    } else if let Ok(key) = std::env::var("WORDNIK_API_KEY") {
23        key
24    } else {
25        // Using `.into()` because this returns a `str`, not a `String`
26        include_str!("../api_key.txt").into()
27    }

This is essentially doing the same thing, but with different syntax due to needing 3 branches instead of just 2. The if let Some(key) = ... & if let Ok(key) = ... essentially do the same thing. They check whether the given enum is the right variant, then take the value out of it & let you do stuff with it. include_str is a macro. This means that it runs at compile-time instead of at runtime, so the string it returns is stored within the binary itself - meaning I can avoid putting the API key directly into the source code.

Grave Mistake

It was at this point that I became a little bit stupid. Despite this method working perfectly fine, I for some reason wanted to change it. Instead of using a text file I simply added to my .gitignore & leaving it be, I decided to instead change it to use the environment variable on my machine to get an API key at compile time.

19// ...
20    } else {
21        // `std::env!` is (basically)  just `include_str!` 
22        // but for environment variables instead of files
23        std::env!("WORDNIK_API_KEY").into()
24    }

This seems like a decent enough idea… until you realise that not everyone is on my machine. Prebuilt binaries work perfectly fine due to them being built on my machine, but the moment someone tried to install it via cargo1, the build would fail if they didn’t already have a WORDNIK_API_KEY environment variable.

This was unacceptable. The whole point of doing this was so people didn’t need that environment variable set in order to use the program.

You could always just go back to the way it was before. There was nothing wrong with that ¯\_(ツ)_/¯

Technically yeah, but I much preferred to find a different solution. Perhaps even a crate2 that could help me? 🤔🤔🤔

Quit stalling, we haven’t got time to waste

Alright, fine. I decided that I’d store the API_KEY in a .env file (also added to the .gitignore) & use the dotenvy_macro crate to read it at compile time. With that, the relevant code was now this:

use dotenvy_macro::dotenv;
/* ... Skipping many lines ... */
    } else {
        dotenv!("API_KEY").into()
    }

This only made the problem worse. Much worse.

Whereas before, the user could at least fix the issue by themselves if they read the error message, this one couldn’t be rectified. This made the program completely impossible to install using cargo, which is very bad. So, as soon as I found out that this was happening, I yanked3 the offending versions from crates.io & speed-released a fix (it wasn’t really a fix since I just removed the functionality altogether, but this is my blog post & I’ll call it what I want).

How did it take you 3 whole versions to realise that???

Interlude - Realisation

The way I realised it is pretty weird actually. It wasn’t from a bug report or anything like that, nobody even uses WYM to be able to send one, it was because I tried installing it on my phone. Android devices can use Rust tools through the use of the rust package in Termux, so I decided to see how well my program would run, if at all. I only learnt of the issue because it failed to compile when I tried to cargo install it, with an error message showing that it wasn’t just because I was on a mobile device.

And now, back to our regularly scheduled rambling

Inspecting the Macros

Let’s back up for a moment. Earlier, I showed you this code:

18// ...
19    } else {
20        include_str!("../api_key.txt").into()
21    }

This actually does work, even though api_key.txt isn’t part of the package file generated by cargo (check it yourself by cloning the repo, git checkouting the v3.0.0 tag, & running cargo package --list), which is where cargo install pulls from (you can try it yourself by running cargo install whatyoumean@3.0.0). This is weird because the dotenv! version errors because it can’t find the .env file anywhere, which makes sense because, as far as crates.io is concerned, it doesn’t exist. Exactly like how api_key.txt doesn’t exist. So why does include_str! work but dotenv! doesn’t?

From looking at the source code, we find that dotenv! is defined in dotenvy_codegen_impl. If we follow the yellow brick road, we find

12// In `dotenvy_codegen_impl/lib.rs`
13#[proc_macro_hack]
14pub fn dotenv(input: TokenStream) -> TokenStream {
15    if let Err(err) = dotenvy::dotenv() {
16        let msg = format!("Error loading .env file: {}", err);
17        return quote! {
18            compile_error!(#msg);
19        }
20        .into();
21    }
22
23    match expand_env(input) {
24        Ok(stream) => stream,
25        Err(e) => e.to_compile_error().into(),
26    }
27}

This is surprisingly simple. It uses dotenvy::dotenv to check that a .env file exists either within the current folder or in the parent folder, then returns the result of expand_env. expand_env is just a line below:

29fn expand_env(input_raw: TokenStream) -> syn::Result<TokenStream> {
30    let args = <Punctuated<syn::LitStr, Token![,]>>::parse_terminated
31        .parse(input_raw)
32        .expect("expected macro to be called with a comma-separated list of string literals");
33
34    let mut iter = args.iter();
35
36    let var_name = iter
37        .next()
38        .ok_or_else(|| syn::Error::new(args.span(), "dotenv! takes 1 or 2 arguments"))?
39        .value();
40    let err_msg = iter.next();
41
42    if iter.next().is_some() {
43        return Err(syn::Error::new(
44            args.span(),
45            "dotenv! takes 1 or 2 arguments",
46        ));
47    }
48
49    match env::var(&var_name) {
50        Ok(val) => Ok(quote!(#val).into()),
51        Err(e) => Err(syn::Error::new(
52            var_name.span(),
53            err_msg.map_or_else(
54                || match e {
55                    VarError::NotPresent => {
56                        format!("environment variable `{}` not defined", var_name)
57                    }
58                    VarError::NotUnicode(s) => format!(
59                        "environment variable `{}` was not valid unicode: {:?}",
60                        var_name, s
61                    ),
62                },
63                |lit| lit.value(),
64            ),
65        )),
66    }
67}

This has quite a bit more macro shenanigans, but the point is still not all that complicated. lines 30-32 ensure that the macro is called with the correct arguments (string literals, which have the type &str), lines 34-47 make sure there are the right number of arguments, then lines 49 onwards grab the actual environment variables. This explains why it fails pretty well. We don’t even need to look at dotenvy::dotenv to know that if it can’t find a .env file anywhere, it’ll return a compile error & the build will fail.

But then why doesn’t the same happen for include_str!?

Well, if we look at the source code for include_str!

1217    #[stable(feature = "rust1", since = "1.0.0")]
1218    #[rustc_builtin_macro]
1219    #[macro_export]
1220    #[cfg_attr(not(test), rustc_diagnostic_item = "include_str_macro")]
1221    macro_rules! include_str {
1222        ($file:expr $(,)?) => {{ /* compiler built-in */ }};
1223    }

Ah. So it works because it’s a compiler builtin.

So basically it cheats

Well, I wouldn’t put it that way. More like “we need to do some more digging to see what it’s actually doing”.

Inspecting the compiler

The rustc source code is… complicated, but after some digging, you can find include_str here

167/// `include_str!`: read the given file, insert it as a literal string expr
168pub fn expand_include_str(
169    cx: &mut ExtCtxt<'_>,
170    sp: Span,
171    tts: TokenStream,
172) -> Box<dyn base::MacResult + 'static> {
173    let sp = cx.with_def_site_ctxt(sp);
174    let Some(file) = get_single_str_from_tts(cx, sp, tts, "include_str!") else {
175        return DummyResult::any(sp);
176    };
177    let file = match resolve_path(&cx.sess.parse_sess, file.as_str(), sp) {
178        Ok(f) => f,
179        Err(mut err) => {
180            err.emit();
181            return DummyResult::any(sp);
182        }
183    };
184    match cx.source_map().load_binary_file(&file) {
185        Ok(bytes) => match std::str::from_utf8(&bytes) {
186            Ok(src) => {
187                let interned_src = Symbol::intern(&src);
188                base::MacEager::expr(cx.expr_str(sp, interned_src))
189            }
190            Err(_) => {
191                cx.span_err(sp, &format!("{} wasn't a utf-8 file", file.display()));
192                DummyResult::any(sp)
193            }
194        },
195        Err(e) => {
196            cx.span_err(sp, &format!("couldn't read {}: {}", file.display(), e));
197            DummyResult::any(sp)
198        }
199    }
200}

Hmm. Interesting. Very indicative.

You have no idea what’s going on, do you?

Of course I do! It reads a file, then converts the bytes into a String which it then returns, returning an error if the file doesn’t exist or the file is invalid UTF-8

And why exactly does this mean it works where dotenv! doesn’t?

Alright, I don’t know the answer to that, but I’m betting it has something to do with resolve_path. Or maybe the ExtCtxt that gets passed in? That source_map method feels promising.

1066// in `compiler/rustc_expand/src/base.rs`
1067pub fn source_map(&self) -> &'a SourceMap {
1068        self.sess.parse_sess.source_map()
1069    }

SourceMap feels like it might have something… let’s take a look

166// In `compiler/rustc_span/src/source_map.rs
167pub struct SourceMap {
168    /// The address space below this value is currently used by the files in the source map.
169    used_address_space: AtomicU32,
170
171    files: RwLock<SourceMapFiles>,
172    file_loader: Box<dyn FileLoader + Sync + Send>,
173    // This is used to apply the file path remapping as specified via
174    // `--remap-path-prefix` to all `SourceFile`s allocated within this `SourceMap`.
175    path_mapping: FilePathMapping,
176
177    /// The algorithm used for hashing the contents of each source file.
178    hash_kind: SourceFileHashAlgorithm,
179}

And its method load_binary_file

218    /// Loads source file as a binary blob.
219    ///
220    /// Unlike `load_file`, guarantees that no normalization like BOM-removal
221    /// takes place.
222    pub fn load_binary_file(&self, path: &Path) -> io::Result<Vec<u8>> {
223        // Ideally, this should use `self.file_loader`, but it can't
224        // deal with binary files yet.
225        let bytes = fs::read(path)?;
226
227        // We need to add file to the `SourceMap`, so that it is present
228        // in dep-info. There's also an edge case that file might be both
229        // loaded as a binary via `include_bytes!` and as proper `SourceFile`
230        // via `mod`, so we try to use real file contents and not just an
231        // empty string.
232        let text = std::str::from_utf8(&bytes).unwrap_or("").to_string();
233        self.new_source_file(path.to_owned().into(), text);
234        Ok(bytes)
235    }

Hmm… this gave less info than I thought. Let’s go back to SourceMap itself. Specifically, its files field.

pub struct SourceMap {
    // Other fields omitted
    files: RwLock<SourceMapFiles>
}

The RwLock stands for “Read-Write Lock”, it just ensures that only that SourceMap struct can read from or write to the SourceMapFiles. Shenanigans could ensue without it. So, let’s look at SourceMapFiles

160// Literally directly above `SourceMap`
161#[derive(Default)]
162pub(super) struct SourceMapFiles {
163    source_files: monotonic::MonotonicVec<Lrc<SourceFile>>,
164    stable_id_to_source_file: FxHashMap<StableSourceFileId, Lrc<SourceFile>>,
165}

Hmm, so it stores a MonotonicVec of Lrc<SourceFile>s.

“Monotonic” means it either always increases or always decreases - in this case, only increasing in size. I had to look it up too.

Lrc looks like it might be similar to Arc, which is an “Atomic Reference Counted” value - A type of Rc. But what does the “L” mean?

Lrc is an alias of Arc if cfg!(parallel_compiler) is true, Rc otherwise

  • Line 3 of compiler/rustc_data_structures/sync.rs

Ahh, so it’s just either Arc or Rc depending on whether or not the compiler is running in parallel. That makes sense, now let’s take a look at SourceFile

1303// In `compiler/rustc_span_src_lib.rs`
1304/// A single source in the [`SourceMap`].
1305#[derive(Clone)]
1306pub struct SourceFile {
1307    /// The name of the file that the source came from. Source that doesn't
1308    /// originate from files has names between angle brackets by convention
1309    /// (e.g., `<anon>`).
1310    pub name: FileName,
1311    /// The complete source code.
1312    pub src: Option<Lrc<String>>,
1313    /// The source code's hash.
1314    pub src_hash: SourceFileHash,
1315    /// The external source code (used for external crates, which will have a `None`
1316    /// value as `self.src`.
1317    pub external_src: Lock<ExternalSource>,
1318    /// The start position of this source in the `SourceMap`.
1319    pub start_pos: BytePos,
1320    /// The end position of this source in the `SourceMap`.
1321    pub end_pos: BytePos,
1322    /// Locations of lines beginnings in the source code.
1323    pub lines: Lock<SourceFileLines>,
1324    /// Locations of multi-byte characters in the source code.
1325    pub multibyte_chars: Vec<MultiByteChar>,
1326    /// Width of characters that are not narrow in the source code.
1327    pub non_narrow_chars: Vec<NonNarrowChar>,
1328    /// Locations of characters removed during normalization.
1329    pub normalized_pos: Vec<NormalizedPos>,
1330    /// A hash of the filename, used for speeding up hashing in incremental compilation.
1331    pub name_hash: u128,
1332    /// Indicates which crate this `SourceFile` was imported from.
1333    pub cnum: CrateNum,
1334}

Aha! So during compilation, every source file is stored with its filename and all of its contents. I’m willing to bet that this also happens when a crate is packaged to be uploaded to crates.io, so the compiler knows that we need api_key.txt & decides to store it like any other source file!

Then why doesn’t this happen for dotenv!?

Well, I’d bet that it’s because dotenv! isn’t a compiler builtin, so the compiler doesn’t know that it needs to store the .env file as a SourceFile, so it can’t be found when cargo install tries to grab it from crates.io. dotenv! simply has no way of telling the compiler “Hey, Imma need this .env file later, so make sure you store it when you package the crate for publishing”.

So cheating then

No, just a slight limitation - that’s all. The language can’t be perfect.

Back to the Real World

Hard to believe that this began with a simple silly mistake from me, but it can be fun looking through compiler internals when you have a specific goal to achieve. I could’ve gone deeper into implementation details, such as how cargo packages crates, but this is about as deep as is necessary to understand why include_str! works where dotenv! doesn’t - which is helpful to prevent you from making the same mistake I did. I may have been stupid, but you’re now able to not be.

Moral of the story: Never Assume that just because something works on your machine, it’ll work on other people’s. Always double-check things outside of your normal development environment

Well said cat. See you around, folks!

Also remember that the compiler is a cheater

That really wasn’t necessary


  1. Rust’s package manager ↩︎

  2. Rust’s name for packages/libraries ↩︎

  3. Pulled, removed, retracted; take your pick ↩︎