Easy password-based encryption utility

Published on 2024-08-26 by TomatoSoup

A bit ago I made a utility called Warren. It's called that because I wanted something rabbit themed and I eventually decided it was called that because it was a safe place to put things important to you.

Specifically, I wanted to encrypt my backups before uploading them to a remote server. This would have been simple enough but I really don't like the idea of leaving symmetric keys just lying around. That leaves asymmetric encryption, but I also don't like the idea of needing to manage a private key. I don't remember exactly where, possibly inspired by the idea of a brainwallet, I came up with a scheme where the keypair would be wholly derived from a password. The public key would be written to disk and used by the program to encrypt files and when it came time to decrypt files you would enter your password, the private key would be rederived, and then decryption would proceed.

I also wanted to get some experience with Go, so I wrote the program in that. As far as Go is concerned, it's probably not idiomatic. As far as the crypto is concerned, I don't think I made any minor mistakes; either my scheme is rock solid or I fucked everything up.

So what's the scheme?

First of all, let me talk about a Golang design philosophy that I really appreciate but also made this more complicated than necessary: Nondeterminism.

Many languages feature structures like hashmaps. When you iterate them, you start at bucket 0 and go through the elements in order. The items are bucketed according to the language specific hashcode algorithm. In Java, lists update their hashcode according to hashCode = 31*hashCode + e.hashCode(), for every element in the list e. In C#, tuples combine their values by a combination of rolls, additions, and XORs. Critically, however, C# also hashes in a random value that is determined at application start time.

Java does not do this, which means that List.of(1,2,3) will always be 30817. Because Java has always worked this way and there's too much enterprise software out there that depends on this behavior they can't ever change this. Consequentially, Java is vulnerable to a savvy attacker making a lot of objects that hash to the same value and tanking performance. Every running instance of C# hashes things differently. Consistently, but differently. This makes it way harder to attack.

Golang does the same thing. It also will start iterating a hashmap from a random bucket every time you create a new iterator. I don't know if C# has this same feature, but I suspect it does. Golang's standard structures ensure that devs can't accidentally depend on any default behavior not explicitly part of the spec by randomizing behavior. This goal is totally compatible with security and I like it!

Unfortunately for me, when Golang's built in crypto libraries generate asymmetric keys, the random bytes used for key generation get passed through a function called MaybeReadByte. This function has a 50/50 chance of passing each byte forward, ensuring that even if the source of random bytes is deterministic, the generated key won't be. You might notice that my goal requires deterministically generating keypairs.

So sure, it might appear to suffer from the more-is-better fallacy of crypto, but NaCL had to be brought in for deterministic keygen. And SHA3 for an extensible output function to feed the keygen with secure bytes.

Key generation is simple: Canonicalize a password by trimming off newline bytes at the end. It turns out that Linux and Windows differ in the bytes returned from a simple bufio reader that ends on \n. Feed this into Argon2id with default parameters for some bruteforce resistance, feed that into SHA3 to create a stream of secure bytes, and feed that as the random source to NaCL's GenerateKey function to create a key pair. Depending on the mode of operation, one half will be tossed: Either you're generating a keypair and saving the public to disk or you're decrypting something and the private is all you'll use.

NaCL has a function to just outright Box something using a private key. Unfortunately, this requires all the bytes upfront. If you're running this on a pi or something similar, you might be encrypting gigabytes of data with only two gigs of RAM total. Even worse, you're probably running some app on your pi that is the source of the data you wanna remotely backup, so some of that RAM is already spoken for.

In order to use a minimum of RAM, Warren streams files through AES in counter mode. It generates an AES key, Boxes that, and then uses that to encrypt the data. Now, as I understand it Box already works by generating a symmetric key, encrypting it with the private key, and then encrypting everything with that symmetric key. This means we have two symmetric keys and two Message Authentication Codes, with the pair set being used only to protect the second set. This is absolutely, strictly speaking, unnecessary overhead. Avoiding it would have required diving deeper into the crypto libraries than I would have liked and it's using up only double-digit extra bytes so, who gives a shit?

Warren generates the AES and the MAC keys, boxes those, and then streams the payload to encrypt it, encrypting each byte then MACing it so that the message's integrity can be verified before it is decrypted. Written to disk is the 112 bytes from NaCL's box, which is encrypting the 64 bytes of the AES key and the HMAC key by the private key, then the payload, then the 32 bytes of the HMAC.

AES is run in counter mode using an IV of straight 0-bytes. Normally, this would be a sin. The entire point of using an IV is so that the same message encrypted under the same key doesn't yield the same ciphertext. It's even worse using counter mode, where any repeated block, independent of previously unique blocks, will yield the same ciphertext under the same key. Because CTR works to create a keystream like a stream cipher it becomes trivial to decode messages: Once you have two messages with the same key you know that XORing the two messages together will cancel out the key stream and result in the XOR of the plaintexts. The XOR of two plaintexts creates a stream that is vulnerable to a tremendous number of classic attacks.

Notice, however, that every weakness described assumes key reuse. IVs are basically a way to reuse keys without needing to securely redistribute any key material. Because Warren generates new keys for every encryption is is not vulnerable to IV reuse. Any IV is as good as any other, so straight zeroes are used.

Decryption reads the file twice. The first pass verifies the HMAC to ensure the file hasn't been tampered with. It's harder to tamper with CBC mode crypto, but CTR is very vulnerable to it. Then it proceeds to decrypt.

Encryption reads from standard in and writes to standard out so that the file may be directly sent to a remote server. Because decryption requires that it reads the file twice, it reads from disk (and from standard in for the password) and writes to disk.

There are two issues.

One, the library provides no versioning and no way to tweak the Argon2id parameters. It's defaulted to 1 time, 64k memory, and 4 threads. Are those reasonable? The draft RFC says so! The second issue, and somewhat-larger-somewhat-smaller, is that no salt is provided. The entire point of this is to deterministically provide the same keypair for a given input password. Salting it would ruin that and mean that you need to provide the private key file used to encrypt in order to decrypt a file. The salt used for keygen could be written into the private key file, and then into the encrypted file itself. But the main benefit of this would only be to enable the user to create non-identical keyfiles from the same password.

Similarly, the derivation difficulty parameters could be customizable at keyfile generation time and written to the encrypted file. There's not a particularly good reason why I'm not doing this. The 0th version of Warren didn't use Argon2id but instead generated and tossed 2^n bytes from the SHA3 extensible output function. It was up to the user to remember what the n value was. I realized this was some roll-your-own-crypto bullshit and replaced it with Argon2id and the recommended defaults and began using it before I came up with the idea to encoding the values into the encrypted files.

In terms of use, there's three main operations:

Generation: ./warren.exe -keyfile key -generate where key is the file to write to.
Encryption: ./warren.exe -keyfile key < test | tee > encrypted where key is the file from the previous step, test is the file to encrypt, and encrypted is the output plaintext.
Decryption: ./warren.exe -decrypt encrypted -plaintext result where encrypted is the file from the previous step and result is the file to write to. Running this command immediately asks the user for their password.

The next version will probably have some features for salting the keyfile or configuring the difficulty of rederiving the keypair. It will definitely include versioning, so that Warren can recognize what version of the protocol is required to decrypt the file. It'll also have something less hacky than the current way to reading passwords because yikes is that ad hoc. I know that password-reading libraries exist, it's just that I discovered the windows-linux incompatibility with ending newline characters after I already deployed this. So I hacked around it to consistencize the output and put it down as a TODO.

Like I said, I don't think I made any minor mistakes. I'm aware of a lot of common protocol pitfalls, one of the largest is touching crypto primitives unnecessarily. The only place I touch primitives is to stream data and that was a necessary goal. I know where common mistakes could be made and I think I correctly avoided them: Encrypt-then-MAC, avoid key-IV-reuse by never reusing keys, and protect all of this with the battle-hardened NaCL library. If any mistakes exist I am very confident they exist in the password-to-keypair phase of the protocol. There I think the weaknesses are constrained to the lack of salt for keyfile uniqueness and the lack of customization for the difficulty parameters for key stretching. Both of these are resolvable, but I don't think they're particularly lethal. I think these weaknesses are about the same threat-level as a weak password.