The Curse of “It Works on My Machine”
Contents
One project I’ve been working on for a while now is WhatYouMean - A terminal interface to see word definitions. I’m reasonably proud of it, even having it linked on the homepage, but it’s really nothing special. Just a pretty simple thing that nobody but me really uses that I for some reason ended up really liking. It was only started as an “oooo, that seems like it’d be fun to do” project rather than to solve any actual problem I had - normally googling a definition is basically no hassle at all, especially when I’ve got my phone available - but alas, it ended up becoming something I’ve worked on repeatedly for nearly a year. But enough about that, let’s get into the actual story.
The Basics⌗
About 1 month ago at the time of writing this (early Dec 2022 if you’re too lazy to calculate yourself), I decided that I’d change the API that WYM uses to Wordnik. It’s used by quite a few applications that people actually use (DuckDuckGo’s definition embeds for example) and it has a free tier, so why not give it a shot? For some reason, I also decided that I wanted to have an API key embedded into the application itself so people didn’t need their own key. Not the best of ideas in hindsight, but there’s no real harm in it, so I gave it a shot.
Initially, the code for selecting an API key to use looked like this:
20let key = &args.use_key.unwrap_or(std::env::var("WORDNIK_API_KEY")?);
It may be just a one-liner, but I’ll go over it for those unfamiliar with Rust syntax. let key
begins an assignment
to the variable key
. args.use_key
accesses the use_key
field on the args
variable, which is a struct (called
Args
because I’m lazy at naming things). unwrap_or
is a function that takes an Option<T>
(an enum
with 2 variants,
being Some(T)
& None
) and either gives you the T
in Some(T)
or gives you the value passed to it,
which must also be of type T
. T
here is used as a stand-in for any type, such as a String
. std::env::var("WORDNIK_API_KEY")
looks for an environment variable at runtime called WORDNIK_API_KEY
& returns it, or returns an error if one occurred. The ?
automatically returns from the outer function (in this case main
) with an error, or gives you the value. This is necessary
because the return type of std::env::var
is Result<T, E>
, but unwrap_or
needs its argument to be of type T
.
All that is a really roundabout way of saying that this either uses the value the user supplies as an API key or looks for
it on their system.
Did you really have to say all that?
Nah, but it’s nice to know, right?
This line later became the following:
20let key = if let Some(key) = args.use_key {
21 key
22 } else if let Ok(key) = std::env::var("WORDNIK_API_KEY") {
23 key
24 } else {
25 // Using `.into()` because this returns a `str`, not a `String`
26 include_str!("../api_key.txt").into()
27 }
This is essentially doing the same thing, but with different syntax due to needing 3 branches instead of just 2.
The if let Some(key) = ...
& if let Ok(key) = ...
essentially do the same thing. They check whether the given enum
is the right variant, then take the value out of it & let you do stuff with it. include_str
is a macro
. This means
that it runs at compile-time instead of at runtime, so the string it returns is stored within the binary itself - meaning
I can avoid putting the API key directly into the source code.
Grave Mistake⌗
It was at this point that I became a little bit stupid. Despite this method working perfectly fine, I for some reason
wanted to change it. Instead of using a text file I simply added to my .gitignore
& leaving it be, I decided to instead
change it to use the environment variable on my machine to get an API key at compile time.
19// ...
20 } else {
21 // `std::env!` is (basically) just `include_str!`
22 // but for environment variables instead of files
23 std::env!("WORDNIK_API_KEY").into()
24 }
This seems like a decent enough idea… until you realise that not everyone is on my machine. Prebuilt binaries work perfectly
fine due to them being built on my machine, but the moment someone tried to install it via cargo
1, the build would fail
if they didn’t already have a WORDNIK_API_KEY
environment variable.
This was unacceptable. The whole point of doing this was so people didn’t need that environment variable set in order to use the program.
You could always just go back to the way it was before. There was nothing wrong with that ¯\_(ツ)_/¯
Technically yeah, but I much preferred to find a different solution. Perhaps even a crate2 that could help me? 🤔🤔🤔
Quit stalling, we haven’t got time to waste
Alright, fine. I decided that I’d store the API_KEY in a .env
file (also added to the .gitignore
) & use the
dotenvy_macro crate to read it at compile time. With that, the relevant code
was now this:
use dotenvy_macro::dotenv;
/* ... Skipping many lines ... */
} else {
dotenv!("API_KEY").into()
}
This only made the problem worse. Much worse.
Whereas before, the user could at least fix the issue by themselves if they read the error message, this one couldn’t
be rectified. This made the program completely impossible to install using cargo
, which is very bad. So, as
soon as I found out that this was happening, I yanked3 the offending versions from crates.io
& speed-released a fix (it wasn’t really a fix since I just removed the functionality altogether, but this is my
blog post & I’ll call it what I want).
How did it take you 3 whole versions to realise that???
Interlude - Realisation⌗
The way I realised it is pretty weird actually. It wasn’t from a bug report or anything like that, nobody even uses
WYM to be able to send one, it was because I tried installing it on my phone. Android devices can use Rust
tools through the use of the rust
package in Termux, so I decided to see how well
my program would run, if at all. I only learnt of the issue because it failed to compile when I tried to cargo install
it, with an error message showing that it wasn’t just because I was on a mobile device.
And now, back to our regularly scheduled rambling
Inspecting the Macros⌗
Let’s back up for a moment. Earlier, I showed you this code:
18// ...
19 } else {
20 include_str!("../api_key.txt").into()
21 }
This actually does work, even though api_key.txt
isn’t part of the package file generated by cargo
(check it yourself
by cloning the repo, git checkout
ing the v3.0.0
tag, & running cargo package --list
), which is where cargo install
pulls
from (you can try it yourself by running cargo install whatyoumean@3.0.0
). This is weird because the dotenv!
version errors
because it can’t find the .env
file anywhere, which makes sense because, as far as crates.io is concerned,
it doesn’t exist. Exactly like how api_key.txt
doesn’t exist. So why does include_str!
work but dotenv!
doesn’t?
From looking at the source code, we find that dotenv!
is defined in dotenvy_codegen_impl
. If we follow the yellow brick road,
we find
12// In `dotenvy_codegen_impl/lib.rs`
13#[proc_macro_hack]
14pub fn dotenv(input: TokenStream) -> TokenStream {
15 if let Err(err) = dotenvy::dotenv() {
16 let msg = format!("Error loading .env file: {}", err);
17 return quote! {
18 compile_error!(#msg);
19 }
20 .into();
21 }
22
23 match expand_env(input) {
24 Ok(stream) => stream,
25 Err(e) => e.to_compile_error().into(),
26 }
27}
This is surprisingly simple. It uses dotenvy::dotenv
to check that a .env
file exists either within the current folder or in the parent
folder, then returns the result of expand_env
. expand_env
is just a line below:
29fn expand_env(input_raw: TokenStream) -> syn::Result<TokenStream> {
30 let args = <Punctuated<syn::LitStr, Token![,]>>::parse_terminated
31 .parse(input_raw)
32 .expect("expected macro to be called with a comma-separated list of string literals");
33
34 let mut iter = args.iter();
35
36 let var_name = iter
37 .next()
38 .ok_or_else(|| syn::Error::new(args.span(), "dotenv! takes 1 or 2 arguments"))?
39 .value();
40 let err_msg = iter.next();
41
42 if iter.next().is_some() {
43 return Err(syn::Error::new(
44 args.span(),
45 "dotenv! takes 1 or 2 arguments",
46 ));
47 }
48
49 match env::var(&var_name) {
50 Ok(val) => Ok(quote!(#val).into()),
51 Err(e) => Err(syn::Error::new(
52 var_name.span(),
53 err_msg.map_or_else(
54 || match e {
55 VarError::NotPresent => {
56 format!("environment variable `{}` not defined", var_name)
57 }
58 VarError::NotUnicode(s) => format!(
59 "environment variable `{}` was not valid unicode: {:?}",
60 var_name, s
61 ),
62 },
63 |lit| lit.value(),
64 ),
65 )),
66 }
67}
This has quite a bit more macro shenanigans, but the point is still not all that complicated. lines 30-32
ensure
that the macro is called with the correct arguments (string literals, which have the type &str
), lines 34-47
make sure there are the right number of arguments, then lines 49
onwards grab the actual environment variables.
This explains why it fails pretty well. We don’t even need to look at dotenvy::dotenv
to know that if it can’t find a
.env
file anywhere, it’ll return a compile error & the build will fail.
But then why doesn’t the same happen for include_str!
?
Well, if we look at the source code for include_str!
…
1217 #[stable(feature = "rust1", since = "1.0.0")]
1218 #[rustc_builtin_macro]
1219 #[macro_export]
1220 #[cfg_attr(not(test), rustc_diagnostic_item = "include_str_macro")]
1221 macro_rules! include_str {
1222 ($file:expr $(,)?) => {{ /* compiler built-in */ }};
1223 }
Ah. So it works because it’s a compiler builtin.
So basically it cheats
Well, I wouldn’t put it that way. More like “we need to do some more digging to see what it’s actually doing”.
Inspecting the compiler⌗
The rustc source code is… complicated, but after some digging,
you can find include_str
here
167/// `include_str!`: read the given file, insert it as a literal string expr
168pub fn expand_include_str(
169 cx: &mut ExtCtxt<'_>,
170 sp: Span,
171 tts: TokenStream,
172) -> Box<dyn base::MacResult + 'static> {
173 let sp = cx.with_def_site_ctxt(sp);
174 let Some(file) = get_single_str_from_tts(cx, sp, tts, "include_str!") else {
175 return DummyResult::any(sp);
176 };
177 let file = match resolve_path(&cx.sess.parse_sess, file.as_str(), sp) {
178 Ok(f) => f,
179 Err(mut err) => {
180 err.emit();
181 return DummyResult::any(sp);
182 }
183 };
184 match cx.source_map().load_binary_file(&file) {
185 Ok(bytes) => match std::str::from_utf8(&bytes) {
186 Ok(src) => {
187 let interned_src = Symbol::intern(&src);
188 base::MacEager::expr(cx.expr_str(sp, interned_src))
189 }
190 Err(_) => {
191 cx.span_err(sp, &format!("{} wasn't a utf-8 file", file.display()));
192 DummyResult::any(sp)
193 }
194 },
195 Err(e) => {
196 cx.span_err(sp, &format!("couldn't read {}: {}", file.display(), e));
197 DummyResult::any(sp)
198 }
199 }
200}
Hmm. Interesting. Very indicative.
You have no idea what’s going on, do you?
Of course I do! It reads a file, then converts the bytes into a String
which it then returns,
returning an error if the file doesn’t exist or the file is invalid UTF-8
And why exactly does this mean it works where dotenv!
doesn’t?
Alright, I don’t know the answer to that, but I’m betting it has something to do with resolve_path
. Or
maybe the ExtCtxt
that gets passed in? That source_map
method feels promising.
1066// in `compiler/rustc_expand/src/base.rs`
1067pub fn source_map(&self) -> &'a SourceMap {
1068 self.sess.parse_sess.source_map()
1069 }
SourceMap
feels like it might have something… let’s take a look
166// In `compiler/rustc_span/src/source_map.rs
167pub struct SourceMap {
168 /// The address space below this value is currently used by the files in the source map.
169 used_address_space: AtomicU32,
170
171 files: RwLock<SourceMapFiles>,
172 file_loader: Box<dyn FileLoader + Sync + Send>,
173 // This is used to apply the file path remapping as specified via
174 // `--remap-path-prefix` to all `SourceFile`s allocated within this `SourceMap`.
175 path_mapping: FilePathMapping,
176
177 /// The algorithm used for hashing the contents of each source file.
178 hash_kind: SourceFileHashAlgorithm,
179}
And its method load_binary_file
218 /// Loads source file as a binary blob.
219 ///
220 /// Unlike `load_file`, guarantees that no normalization like BOM-removal
221 /// takes place.
222 pub fn load_binary_file(&self, path: &Path) -> io::Result<Vec<u8>> {
223 // Ideally, this should use `self.file_loader`, but it can't
224 // deal with binary files yet.
225 let bytes = fs::read(path)?;
226
227 // We need to add file to the `SourceMap`, so that it is present
228 // in dep-info. There's also an edge case that file might be both
229 // loaded as a binary via `include_bytes!` and as proper `SourceFile`
230 // via `mod`, so we try to use real file contents and not just an
231 // empty string.
232 let text = std::str::from_utf8(&bytes).unwrap_or("").to_string();
233 self.new_source_file(path.to_owned().into(), text);
234 Ok(bytes)
235 }
Hmm… this gave less info than I thought. Let’s go back to SourceMap
itself. Specifically, its
files
field.
pub struct SourceMap {
// Other fields omitted
files: RwLock<SourceMapFiles>
}
The RwLock
stands for “Read-Write Lock”, it just ensures that only that SourceMap
struct can read from
or write to the SourceMapFiles
. Shenanigans could ensue without it. So, let’s look at SourceMapFiles
160// Literally directly above `SourceMap`
161#[derive(Default)]
162pub(super) struct SourceMapFiles {
163 source_files: monotonic::MonotonicVec<Lrc<SourceFile>>,
164 stable_id_to_source_file: FxHashMap<StableSourceFileId, Lrc<SourceFile>>,
165}
Hmm, so it stores a MonotonicVec
of Lrc<SourceFile>
s.
“Monotonic” means it either always increases or always decreases - in this case, only increasing in size. I had to look it up too.
Lrc
looks like it might be similar to Arc
, which is an “Atomic Reference Counted” value - A type of Rc
. But what
does the “L” mean?
Lrc
is an alias ofArc
if cfg!(parallel_compiler) is true,Rc
otherwise
- Line 3 of
compiler/rustc_data_structures/sync.rs
Ahh, so it’s just either Arc
or Rc
depending on whether or not the compiler is running in parallel. That makes sense,
now let’s take a look at SourceFile
1303// In `compiler/rustc_span_src_lib.rs`
1304/// A single source in the [`SourceMap`].
1305#[derive(Clone)]
1306pub struct SourceFile {
1307 /// The name of the file that the source came from. Source that doesn't
1308 /// originate from files has names between angle brackets by convention
1309 /// (e.g., `<anon>`).
1310 pub name: FileName,
1311 /// The complete source code.
1312 pub src: Option<Lrc<String>>,
1313 /// The source code's hash.
1314 pub src_hash: SourceFileHash,
1315 /// The external source code (used for external crates, which will have a `None`
1316 /// value as `self.src`.
1317 pub external_src: Lock<ExternalSource>,
1318 /// The start position of this source in the `SourceMap`.
1319 pub start_pos: BytePos,
1320 /// The end position of this source in the `SourceMap`.
1321 pub end_pos: BytePos,
1322 /// Locations of lines beginnings in the source code.
1323 pub lines: Lock<SourceFileLines>,
1324 /// Locations of multi-byte characters in the source code.
1325 pub multibyte_chars: Vec<MultiByteChar>,
1326 /// Width of characters that are not narrow in the source code.
1327 pub non_narrow_chars: Vec<NonNarrowChar>,
1328 /// Locations of characters removed during normalization.
1329 pub normalized_pos: Vec<NormalizedPos>,
1330 /// A hash of the filename, used for speeding up hashing in incremental compilation.
1331 pub name_hash: u128,
1332 /// Indicates which crate this `SourceFile` was imported from.
1333 pub cnum: CrateNum,
1334}
Aha! So during compilation, every source file is stored with its filename and all of its contents. I’m willing
to bet that this also happens when a crate is packaged to be uploaded to crates.io, so the compiler knows that
we need api_key.txt
& decides to store it like any other source file!
Then why doesn’t this happen for dotenv!
?
Well, I’d bet that it’s because dotenv!
isn’t a compiler builtin, so the compiler doesn’t know that it needs to store
the .env
file as a SourceFile
, so it can’t be found when cargo install
tries to grab it from crates.io.
dotenv!
simply has no way of telling the compiler “Hey, Imma need this .env
file later, so make sure you store it
when you package the crate for publishing”.
So cheating then
No, just a slight limitation - that’s all. The language can’t be perfect.
Back to the Real World⌗
Hard to believe that this began with a simple silly mistake from me, but it can be fun looking through compiler internals when you
have a specific goal to achieve. I could’ve gone deeper into implementation details, such as how cargo
packages crates, but this
is about as deep as is necessary to understand why include_str!
works where dotenv!
doesn’t - which is helpful to
prevent you from making the same mistake I did. I may have been stupid, but you’re now able to not be.
Moral of the story: Never Assume that just because something works on your machine, it’ll work on other people’s. Always double-check things outside of your normal development environment
Well said cat. See you around, folks!
Also remember that the compiler is a cheater
That really wasn’t necessary