Using S3 as a File System

Ignoring for the fact that there are very good reasons not to do this (latency, eventual consistency, etc), using S3 buckets as filesystems actually fills a niche that AWS doesn’t currently even try. Sure you could just setup an NFS instance on EC2, or even use the storage gateway service to go an off-service data store, but that is overkill for many use cases.

So here’s how I got it done.

First of all, there are some options on how to implement this. Here are the ones I tried out :
http://code.google.com/p/s3ql/
https://code.google.com/p/s3backer/
http://code.google.com/p/s3fs/wiki/FuseOverAmazon
https://github.com/tongwang/s3fs-c

I ended up choosing s3fs-c. I used s3fs at first, but the lack of other-client compatibility was a killer. I’m still not sure why this is even a problem to be solved, given that the s3fs-c fork didn’t make major changes to get that feature. Or maybe there are major changes that I didn’t notice on my, admittedly cursory, review.

So I took an S3 bucket, and EC2 instance running ubuntu 12.04, and did the following.

I’ll be using the following placeholders for this writeup :
IAM_USER = the IAM user created for this
IAM_ID = the IAM user ID
FS_BUCKET = the S3 bucket being used
ACCESS_KEY = the IAM access key to use
SECRET_KEY = the IAM secret key to use

First of all, you need to setup an S3 bucket and monkey with it just a little bit. You might also want to setup an IAM user to access the bucket. If you don’t do this, you’ll need to use your root AWS credentials, which isn’t a great idea for what I hope are obvious reasons.

Make a note of your Access and Secret keys for the user. I also setup the following user rule for my ‘s3fs’ user :

{
"Statement": [
{
"Effect": "Allow",
"Action": "*",
"Resource": "arn:aws:s3:::FS_BUCKET/*"
}
]
}

I’m 85% sure this isn’t needed as long as you setup the bucket policy described later, but it doesn’t hurt and helps limit what damage the user could possibly do in case those credentials got out.

Now setup a bucket. give the bucket the following policy :

{
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::IAM_ID:user/IAM_USER"
},
"Action": "s3:*",
"Resource": "arn:aws:s3:::FS_BUCKET"
}
]
}

I’ve omitted some lines here, but the policy wizard can fill in the blanks for you.

Now we move onto the client system.

Update the ubuntu host with the packages you’ll need to build and verify the s3fs utility.

sudo apt-get update
sudo apt-get upgrade
sudo apt-get install build-essential libfuse-dev fuse-utils libcurl4-openssl-dev libxml2-dev mime-support s3cmd unzip

Grab the latest s3fs-c from here : https://github.com/tongwang/s3fs-c
Unzip it, and configure/make/make install it.

wget https://github.com/tongwang/s3fs-c/archive/master.zip
unzip s3fs-c-master.zip
cd s3fs-c-master
./configure
make
sudo make install

Now you need to configure s3fs access. Create an ‘/etc/passwd-s3fs’ file with these contents :

ACCESS_KEY:SECRET_KEY

Make that file 600.

Edit the ‘/etc/fuse.conf ‘ file so that the following line is NOT commented :

user_allow_other

Create an mount point. Something like /mnt/FS_BUCKET should work just fine, but that’ll depend on your use case.

Now you can mount the bucket :

sudo s3fs FS_BUCKET /mnt/FS_BUCKET -oallow_other

And you should be able to read/write so the S3 bucket.

You can even add this to your /etc/fstab like so :

s3fs#FS_BUCKET /mnt/FS_BUCKET fuse allow_other 0 0

However, if the S3 mount doesn’t work quite right on boot, then your instance will get stuck in the famous AWS infinite reboot loop. This should be rare, but it might be worth having a config management system, or even a cron script, test/mount as needed.

Here are some caveats to keep in mind :

  • Permissions are free and clear, so be careful how you use this. Newer versions of s3fs (not s3fs-c) claim to fix this, but not that I was able to really see.
  • The documentation claims that use_cache is a valid option. While that seems like a great thing, and appears to not cause problems at first, I kept getting bad file descriptor errors when I turned it on.
  • If you add this to your fstab, don’t change your instance size afterwards. For some reason this broke booting for me. Changing back to the original instance size worked, though. Crazy.

UPDATE
As of version 1.74 s3fs handles the items that i used the -c fork for. It is possible, even likely, that it has done so further back than 1.74, but that’s the version i just tried. With that in mind, I’d recommend https://code.google.com/p/s3fs/wiki/FuseOverAmazon over the above. Configuration and install is the same, though, so I haven’t updated this article.

Using S3 as a File System