Welcome to My Site

In this article I will go through the procedure I have used for setting up a web server to back up directly to an Amazon S3 bucket. I won't go over the many reasons to back up your web server, but in addition to the routine complete server backup--either on a VPS through snapshots or physical server bare metal backups--backing up the web portion of your servers is important in so many ways.

The reason I use S3 is because it is inexpensive, proven reliable, and fast. The reason I don't use rsync is because the method I use to mount the S3 bucket (s3fs) hasn't been reliable and fast enough with the number and size of files required to back up a web server's main directories. This may not be the best way to accomplish the task of backing up a web server to an S3 bucket, but given the lack of any other method, it has worked very well for me to date.

This guide was written using a Ubuntu 12.04 server installation on a VPS with a basic LAMP setup. I will assume you have a basic understanding of the command line interface of a Linux web server, and can access your server through SSH comfortably. I will also assume you have your AWS account set up and ready to create buckets. This guide will link to all the information possible, as installation and setup procedures can change, and I want this guide to stay relevant for as long as possible. I will also be as generic as possible so as to encompass as many different scenarios as possible. Please feel free to shoot me a note or leave a comment below if I can improve upon this guide or if anything is in error.

First thing you need to do is create a bucket and a user and key pair with proper rights to access the bucket. This procedure had changed recently and may very well change again. Creating an S3 bucket is simple enough. It's when you get into the user and permissions that it gets a little complicated and not so intuitive. So once you've created your bucket, go into the AWS AIM and create a new user for this bucket--or add permissions to an existing user--and create a key pair and store it somewhere safe.

The permissions will look something like this, per this guide. They will allow you to access your backup bucket, and only this bucket.

{
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:ListBucket"],
      "Resource": ["arn:aws:s3:::yourbucket"]
    },
    {
      "Effect": "Allow",
      "Action": ["s3:*"],
      "Resource": ["arn:aws:s3:::yourbucket", "arn:aws:s3:::yourbucket/*"]
    }
  ]
}

 You will probably want to test these permissions to make sure you have access before going any further. If you're on a Windows computer, S3 Browser is a good option. I use several Android devices and S3Anywhere is an excelent choice. I don't have any recommendations for iOS or OS X, and if you're on a Linux machine, getting familiar with s3fs is probably a good idea at this point.

Note that you may need to change

      "Action": ["s3:ListBucket"],
      "Resource": ["arn:aws:s3:::yourbucket"]

to

      "Action": ["s3:ListAllMyBuckets"],
      "Resource": ["arn:aws:s3:::*"]

in the above permissions if you want to access this bucket in programs such as S3 Browser. While you won't be able to view the contents of other available buckets, you will be able to see the names of the buckets in the account.

Next you will need a way to mount the S3 bucket to your Linux server, I will be using s3fs, a FUSE based file system. I won't go over the details here because the steps can vary depending on what distribution of Linux you're using. Instead I will link to the official s3fs setup wiki as well as the overview of setting up s3fs. If that's all greek to you, then this method isn't for you. There are other methods to set up and back up to an S3 file system in Linux, but this is the best way I have found. The only thing I will add to this is that it's important to have all the prerequisites installed before attempting to compile and install s3fs. Also, create the directory you will mount to before attempting to mount your bucket, and be sure the directory is empty. I have chosen to mount to the /mnt directory for obvious reasons.

Once you get s3fs installed, you'll need to mount your bucket, and make sure the bucket is mounted every time the server is started. I use Webmin to simplify some of these back-end tasks, but that's entirely my preference. Through Webmin, I set up an etc/init/s3fsonboot.conf script that looks like this:

# s3fsonboot
#
# Mount my s3fs
description  "Mount my s3fs"
start on runlevel [2345]
stop on runlevel [!2345]
exec s3fs yourbucket /mnt/yourmount -oallow_other -ouse_cache=/tmp

The -oallow_other and -ouse_cache=/tmp are options of s3fs that I've found work best. For a description of these options, see the above wiki articles on s3fs. Reboot a couple times to make sure the bucket is mounting properly and working correctly.

Now that we have the bucket mounted, we are ready to get the backup script going. I choose not to use rsync because every time I tried to use rsync it ended up timing out and taking forever on all the small files that have to be transferred. The thought of using rsync is great in theory, and maybe I didn't have my settings right, and maybe it was a bug in the particular version of s3fs I was using, but it didn't work out for me in the real world, so what I decided on was a complete backup every day to a .zip file (.tar or whatever your favorite format is can also be used, I choose .zip because of its relative simplicity to use: zip, unzip, plain and simple.)

So I set up a simple bash script, named it backup.sh and put it in my default user home directory, then created a cron job to perform this task at midnight every night. The script starts out by declaring a system variable called "now", which basically sets the date, it then zips the www and mysql directories (apache 2 and mysql default directories under Ubuntu) and adds the "now" variable to the end of the file name so you can have multiple copies. It then deletes files older than 7 days, so you don't end up with a huge backup directory; this can be set to however many days you want. You can also set up lifecycle rules in your bucket to push older backups off to RRS or Glacier storage instead of, or in addition to this daily cleaning.

now=$(date +"%m%d%Y")
zip -r /mnt/yourmount/mysql_$now /var/lib/mysql/
zip -r /mnt/yourmount/www_$now /var/www/
find /mnt/yourmount/* -mtime +7 -exec rm {} \;
@daily /bin/bash /home/user/backup.sh

This usually completes in less than a minute, even with a /var/www that's 300+MB in size. This time will of course depend on what type of connection you have going out of your server.

That's it! You should be set up to back up your web server's main web directories daily, and keep a week's worth of history in case you need to roll one of your sites back to a previous date, and if you enable versioning or lifecycle rules in your bucket, you will have even more backup history with little added cost.

Published in Blog
Back to top