Hi there! I'm Cathy, and I solve difficult problems in a variety of fields. This website contains links to a few things that I've worked on in the past.
Most people have data they would prefer not to lose. However, this doesn't always translate into the development of good backup practices.
Back on August 25, 2013,
smashboards.com came under a DDOS attack. The site's host has done a very questionable job handling the attack; at the time of writing, almost two weeks later, the website is still down. Perhaps the most troubling aspect of the whole scenario is that the site owner apparently doesn't have any backups of the thirteen years of data stored on the server, which is currently inaccessible.
As catastrophically bad as this sounds, it's a relatively easy mistake to make for an inexperienced administrator; after all, backups might not seem obviously useful until you actually need them, and you rarely need them. And it can take a lot of effort to set up a secure and maintainable backup system, effort that minor website operators might prefer to invest into making the site itself better.
On occasion, I've been lazy with backups myself. Around four years ago, the internal hard drive on my desktop computer failed and started to give many errors when trying to read from it. It contained a lot of data that I wanted to keep, but didn't have backups of (such as old records and documents). Fortunately, I was able to recover the data using recovery tools. I believe I used GNU ddrescue, and I was able to completely recover all my data.
However, you won't always get so lucky; backups are important and the time to get them right should be viewed as an investment.
One very important principle in making backups is that a problem on the primary machine should not be able to result in the loss of the backups. I plan to expound on that in this article.
Many providers are now offering various forms of cloud storage; the most well-known are probably Amazon Web Services and Google Drive. These services provide large amounts of highly redundant storage at low prices. (This means that your data is replicated across multiple computers in different facilities so that if any one fails, another copy is available.) These services might seem ideal for backing up your data. However, there are a couple issues to keep in mind.
Any plaintext data that you upload to these cloud services may be easily accessible by the NSA. Even if you aren't concerned about that (and maybe you should be), if any of the data you are backing up touches on the privacy rights of other people — such as private chat logs from IM programs, or user data from an online service that you operate — then I believe you have an ethical duty to protect the privacy of those third parties. In practice, that data was probably accessible to the NSA at some earlier point (for example, during the chat conversations), but it's still beneficial to reduce the attack surface area by minimising the number of copies of that data available.
To be on the safe side, any backups you upload into these cloud services should be encrypted — and not by using a proprietary encryption suite with NSA backdoors. (The linked article contains a quote from an ACLU analyst that "backdoors are fundamentally in conflict with good security" — perhaps a bit of an understatement.) I recommend using GPG to encrypt your backups. The Amazon S3 client s3cmd provides GPG encryption out of the box.
The more interesting topic I am going to discuss might be termed "backup isolation". Basically, your backups need to be isolated in the following sense — if an attacker gains control over the original machine, can she modify or delete the backups of that machine? If so, your backups can't be trusted, which largely defeats the point of making backups in the first place.
One obvious example of a failure of backup isolation would be if backups are stored on the same disc as the original data. An attacker who gained control over that computer could then arbitrarily modify the backups. Worse still, if the disc fails, you also lose your backups.
A more subtle failure would be if the backups are stored on another computer on the same network, accessible over
vnc, or some other remote access protocol. An attacker who compromises the primary computer can then simply
ssh into the other machine and arbitrarily modify your backups. Even if you use password authentication or password-encrypted
ssh private keys, it poses little additional barrier to the attacker; she can simply log your passwords, since she has control over your machine.
Still more subtle, if you make backups to some cloud storage provider, then the credentials for that cloud storage provider are probably stored on the same computer — which means, typically, that an attacker who gains control over the computer can arbitrarily modify your cloud storage. For example, if you use
s3cmd, your credentials are stored in
~/.s3cfg, which an attacker can simply view. As far as I can tell, most cloud storage solutions offer no way to create a credential set that can be used to upload new files, but do nothing else. Even Tarsnap, advertised as "Online backups for the truly paranoid", does not appear to offer this feature.
Surprisingly, Amazon S3 actually does offer the ability to craft a suitably limited credential set through bucket policies (but not through traditional ACLs). However, presumably you signed up for Amazon S3 on your personal computer, which means your Amazon credentials are possibly stored in your browser or, at the very least, you are likely to enter them again some time in the future when they can be logged by an attacker, who can then modify your bucket policies and make arbitrary changes to your cloud storage. So bucket policies don't really provide a solution for backing up your personal computer, though they are helpful for other applications.
The theoretical solution to backup isolation might be to introduce a new computer to your network that does literally nothing other than upload backups to your cloud storage solution of choice. This new machine will expose an API that allows client computers to push files to be uploaded, but does not expose any commands to modify the files once uploaded. Credentials for your cloud storage solution will be stored only on the new machine, and never entered anywhere else, ever. Assuming your client API is vulnerability-free, this scheme should be fairly safe, but:
Is it easier to do backup isolation correctly if you rely on physical isolation rather than technically-enforced isolation? If you simply keep your backups on physical detached drives which are not connected to the internet (rather than a fancy cloud storage solution), there's seemingly no way that a remote attacker can access them. Of course, rolling your own backup infrastructure comes with less redundancy than Amazon provides — if any of your drives is lost, that's the end of that data; you don't have multiple facilities.
Alas, it turns out that even physical backup isolation has a potential flaw. If you keep multiple backups on the same physical volume, then when you insert that volume into a computer to take a backup, if that computer has already been compromised, then the attacker can alter or erase all backups on the volume. To mitigate the potential damage, you'll have to "roll" your physical volumes and introduce a new one from time to time or whenever you make a particularly important backup.