zefer's posterous

Copy data between S3 buckets using EC2 and Ruby

This week, I had to transfer about 200GB of images from an Amazon S3 bucket in the EU to a bucket in the US-East region. I spent a while looking for good ways to achieve this, and for that reason I'm sharing my solution as it works very well and will hopefully save someone some time!

I used a 32-bit Ubuntu 10.04 official AMI, on a Micro EC2 instance. Here's the code, and the Linux commands I used, below:

If you want to move (rather than copy), simply change line 27 from s3.copy to s3.move.

Filed under  //   S3   ec2   ruby  

Comments [2]

HTML & AJAX solutions to upload files to S3

Uploading direct to S3 from the browser is a great way to avoid putting long-running requests through your webservers, and tying up their precious resources. In some cases this might be essential, for example when using the Heroku platform, all user requests are limited to 30 seconds which means if a user tries to upload a large file it is likely to timeout before it completes.

There are a number of ways that you can do a direct browser-to-s3 upload, a popular method is using Flash, but this post is going to show you how it can be done with pure HTML, and will also describe a workaround to allow AJAX uploads. Both these solutions use a simple HTTP form POST, and you do not expose your S3 credentials. Before you get your hopes too high, the AJAX work-around does require a proxy, but this is a pure Nginx solution that can be run cheaply on an EC2 micro instance.

Key findings

  • You can upload direct to S3 from regular HTML forms, without exposing your S3 credentials
  • You need some form of proxy to AJAX POST files to S3
  • You can AJAX POST files to S3 using a simple Nginx server to handle the CORS premissions
  • S3 may make it possible to do direct AJAX uploads in the future, using CORS

Regular HTML form

Amazon Web Services have published a great article on how to POST a file to an S3 bucket using a form. You create a pre-authorised form by including a Base64 encoded 'policy' field describing some conditions, and an encrypted 'signature' string, which is created using your AWS secret key, which only you know. S3 checks the 'policy' and 'signature', and authenticates the upload.

The article provides code examples on how to 'sign' the form with these Base64 and encrypted fields, or you may be interested in Ungulate, a Ruby gem which makes the process even easier; you probably won't need the ungulate_server part.

Pure AJAX

Unfortunately, you cannot just POST the file/form via AJAX; this is due to cross-domain security restrictions, as we are posting to a different domain. Modern browsers now allow us to do cross-site AJAX using CORS, this works by an exchange of HTTP headers between the browser and server, which instructs the browser that it may allow the cross-site requests. Unfortunately, S3 have not implemented this, until they do, we cannot POST a file direct to S3 via AJAX.

My solution is to configure an Nginx proxy, which responds to the cross-domain HTTP OPTIONS requests, and adds the necessary cross-domain headers. In my case, I am running Nginx on a micro EC2 instance, here's the rubharb:

Compile Nginx with the Headers More module

Oh boy, I really wanted to avoid the dreaded words 'compile from source' in this post, but we need to use the 3rd-party Headers More Nginx module to add HTTP headers to our responses. The compile process is actually quite straightforward, and this is the preferred way of using Nginx. Here is how I did it on Ubuntu 10.04:

If you get stuck, the Nginx wiki is pretty good. You may prefer to use the optional non-3rd-party Headers module, in which case you will need to modify my example config, below.

Simple proxy configuration

We need to tell Nginx to do 2 things:

  1. Respond to the cors OPTIONS requests with the necessary permissions as HTTP headers
  2. Pass the POST (upload) requests to our S3 bucket

The config is nice and simple:

You can either replace /etc/nginx/sites-available/default conf with the above, or manage your nginx config differently. And make sure you restart Nginx after making config changes.

I also have a slightly more complex nginx config example with added security and additional timeout / upload limit parameters to allow large uploads.

I highly recommend you understand the CORS specifications or the easier-to-follow Mozilla article, to understand what these headers are doing, and the implications may be.

AJAX file POST

Firstly, you need to create a signed form, as described above. You then post the form to the Nginx proxy and watch the magic happen! You should send an empty string as the success_action_redirect field, as this ensures S3 responds with a 204 NO CONTENT response, not a 303 redirect. This is useful to ensure our JavaScript treats the response as sucessful. Note that you need to sign your form with the same field contents or it will be rejected by S3.

Here is my simple example which uses some of the new HTML5 file, AJAX, and progress goodness. I based my example on this HTML5 upload article.

Done!

So there you have it. Hopefully you'll be able to get a working mock-up from this article. Here are a few additional things to consider:

Filed under  //   AJAX   CORS   HTML5   Nginx   S3   upload  

Comments [9]

My Twitter mood-sensitive Arduino robot

I've wanted to try out Arduino for a while now, I finally bought some kit from Oomlout and on Sunday, I got stuck in. I started with lesson 1 in my kit, and got an LED to turn on and off, at this point I realised I'm going to have to get more ambitious! So, I came up with this...

Img_20101212_210509

I called it Tweetmooduino, some hardware that reacts to changes in 'mood' using a Twitter sentiment analysis of any subject. When the mood improves, a smiley face is displayed, when it worsens, a frowny face is displayed. The idea is you get a real-time sense of 'mood' for a topic.

I've pushed my code to GitHub, hopefully someone might find it useful: https://github.com/zefer/Tweetmooduino

The project uses an ethernet shield to access the tweetsentiments.com API, this provides a sentiment analysis of any given subject. Rather than access the data directly, I use YQL to format it as XML and return a much smaller dataset.

The API request is repeated every 30 seconds. The code parses and analyses the response, and works out the mood change since the last request.

The output is an 8x8 LED matrix. This was quite puzzling at first, when I realised there were 64 LEDs but only 16 inputs! You control it by continuously scanning over the rows of LEDs and turning the appropriate columns on or off, much like a TV. It took a bit of getting used to, but after a short while, I managed to get a smiley face out of it, and this made me happy too!

Once I had both the smiley face output, and the Twitter mood data working, it was time to put the two together. This is where I got confused. When the ethernet controller was active, the 8x8 LED matrix started acting strangely, whole rows were failing to illuminate, and others were showing at different brightness. It took a while to realise that the ethernet shield was sending it's own output on some of the pins I was using for the matrix. I managed to isolate them to pins 11 and 13, so I moved these to analogue 18 and 19 and tweaked my code to ensure I associated the correct columns with the correct output pins. If you look at the LED matrix code, this is why there is a seemingly random non-sequencial mapping of pins to columns.

Here's the circuit for the 8x8 LED matrix (pdf), but bear in mind I moved pins 11 and 13 to 18 and 19 respectively, to get stop the ethernet shield from interfering.

So that's it! I've set it to analyse the subject "6music". I listen to BBC 6 Music at work, and the result is that it smiles when people like what tracks are played, and frowns when they don't!

Filed under  //   arduino   twitter   yql  

Comments [0]

Recession music and the best of 2010

I've struggled with 2010's new music, for a short while I thought I must be getting old, and loosing touch. I've always listened to music from a massive range of genres and origins, I need variety, and want to hear creativity, I want to hear sounds and techniques that I wasn't expecting, but most of all, I want to hear things I haven't heard before. 2010 just hasn't delivered, it's a fashion that has spread across so many genres of modern music, and the fashion is bland, predictable, reserved, uneventful recession music!

The Mercury Music Prize sums it up really, The XX. I really find The XX boring to listen to, but I can recognise that their sound is a brilliantly minimal representation of the zeitgeist, i.e. nothing much happens in a nothing much kind of way. No thanks.

But hey, it's not all bad, despite their low numbers, some albums have escaped, and some of them are brilliant. This is a list of the new albums in 2010 that really stood out.

!!! - Strange Weather, Isn't It

Even Warp Records have been churning out dullness lately, but this is definitely an exception. Upbeat, great rhythms, and unpredictable, everything I love about !!!. The drummer fell down a lift shaft and died last November, he was an awesome drummer with a distinct 'loose'  and syncopated style, it's a real shame. The new drum sound is a great fit, however, a tighter, electronic production giving the album a bit more of a dance music feel. It's bloody good!

The Fall - Your Future Our Clutter

Fall

I was utterly blown away by the sheer awesomeness of the production on this album. The drum sound is ridiculously good, and the whole thing basically just rocks. You might be thinking I'm one of those obsessive The Fall fans who will rave about anything they produce, well I'm not, I can take them or leave them, but I'm going to rave about this album because it's truly brilliant.

The Roots - How I Got Over

Roots

I stuck this on my phone before jumping on the train to Carlisle, on crutches, armed with two cans of cold Stella and a Cornish pasty. The pasty was tasty, the beer hit the spot, and The Roots had me nodding my head in my own little headphone-enclosed world. Like the previous two albums, this is best listened to loud (with beer and a pasty).

The Heligoats - Goodness Gracious

Heligoats

Unlike the previous three albums, I know nothing about the origin of this band, I had never heard them before stumbling across this album, and I found myself listening to it quite a lot, mostly on the walk to and from work (on bloody crutches, of course!). I get bored of stuff quickly, but I didn't get bored of this. I'm in no rush to put it back on (oh, maybe I am a bit bored of it!), anyway, it represents a time-slice of 2010 and that's a good thing.

The Black Keys - Brothers

Brothers

It took me ages before I listened to this. I listen to BBC 6 Music every day at work, and they had murdered a couple of tracks by over playing them (I have a low overplay threshold!), so I kept it aside. I jumped on the turbo-trainer at home to do 45 mins of the most boring possible form of exercise, and as I set off, I realised I hadn't picked anything to listen to. I reached for my headphones, hit play, and inadvertently had selected this album. I was in 'the zone' immediately, it's a damn fine album. If I hadn't heard any of The Black Keys before listening to it, I'd have gone bonkers with excitement, if you're fortunate enough to be in that position, grab this album, you're in for a treat!

Grinderman - Grinderman 2

Grinderman2

I was really 'into' the first Grinderman album, and was excited about this follow-up ever since the day it was announced, bloody ages ago! It's rough around the edges, in more ways than one, but that's the general ethos with this side project from Nick Cave. However, as much as I like this album, I do miss the superior writing and production of the Nick Cave & The Bad Seeds stuff; "Dig, Lazarus, Dig!!!" absolutely blew me away, it's a kick-ass rocking masterpiece!

Gil Scott-Heron - I'm New Here

Gil

I don't really know how to describe this album, ageing-legend-poetry-story-soul, or maybe not. It's a fantastic album, but don't listen to it on your own, in a dark room, after half a bottle of whisky; that's the only time you shouldn't listen to it.

And here are a handful of other great 2010 albums which didn't quite make the list:

  • Stornoway - Beachcombers Windowsill
  • Four-Tet - There is Love in You
  • Hybrid - Disappear Here
  • Drive-By Truckers - The Big To-Do
  • Black Mountain - Wilderness Heart
  • Squarepusher - Shobaleader One

So, that's my list, I listened to a lot, I've only listed a little. There must be so much I'm missing out on, so please comment below with your favourite albums of 2010 so I can feed these hungry ears!

Check out my last.fm profile, and add me as a friend.

Filed under  //   albums   music  

Comments [0]

Railo Cache Benchmark - CouchDB, MongoDB, RAM

Having recently built an API which makes heavy use of caching, I have been trying out different cache storage methods. Here is a quick benchmark of CouchDB vs MongoDB vs RAM. All software was run locally on my dev machine. The load-testing was performed using Apache JMeter, the results are based on 2000 samples of identical, mixed requests, using a primed cache.

The API I load-tested is a CFML application, which uses the MachII framework and is running on Railo 3.2. No code changes were made to run the different tests, switching cache storage was achieved by simple config changes in the Railo admin.

Here are the results (response times in milliseconds):

They're all fast, but what amazes me is how little overhead MongoDB adds on top of direct RAM storage!

Filed under  //   CouchDB   MongoDB   Railo   caching  

Comments [10]

Running Closure Compiler via ANT - Combine, compress & error-check your JavaScript at build-time

Using ANT, it is really easy to combine and minify your multiple JavaScript files into a single file. This can really improve website performance, it's one of many optimisations you should consider.

Here is a simple example ANT script that will:

  1. Combine all your JavaScript into a single file all.js
  2. Use Closure Compiler to check for 'compilation' errors and minify/compress the script to all.min.js
  3. Copy index.dev.html > index.html
  4. Replace everything between <!-- [start combine] --> .. <!-- [end combine] --> (i.e. the individual JavaScript <include> tags) in index.html with a single all.min.js include
  5. Updates index.html with a timestamp indicating when the build was performed

If this sort of optimising and caching floats-your-boat, then the YSlow tool and it's accompanying documentation is a great source of inspiration.

If you're looking for an easy way to combine JavaScript and CSS at runtime, using Coldfusion, then checkout my Combine project.

You may want to consider adding a timestamp to the url path of the final <include> this would allow you to deliver your JavaScript with a far-future expires header, reducing server load and increasing performance by increasing the browser caching.

Filed under  //   ANT   Closure Compiler   JavaScript  

Comments [0]

Allow cross-domain AJAX access to your ColdFusion application

Browsers have in-built restrictions to prevent cross-site scripting. The downside of this is that cross-domain AJAX requests will fail, unless you use the JSONP script 'hack'. I call this a hack because it bypasses the usual XMLHttpRequest (XHR) method of making AJAX requests, and as a result, the requests are taken out of your control and into the hands of the browser. In a lot of cases this is fine, but I found myself in the position where I had to make extensive use of HTTP caching, and the inflexibility of the JSONP script method was quite restrictive.

This is where Cross-Origin Resource Sharing comes in, a server can be configured to allow browsers to have cross-domain access. But note that this is not supported by all browsers.

Mozilla have an excellent document describing the protocol: https://developer.mozilla.org/En/HTTP_access_control but sometimes, you just want to see code! So here is a quick CFML method I knocked-up to enable cross-domain access to an application. Fire the method at the start of every request, and you're done.

I recommend that you spend time understanding the protocol by debugging the HTTP requests, using a tool like Charles.

Filed under  //   AJAX   Coldfusion   cfml  

Comments [0]

Apache log rotation - Windows batch script

Here is the batch script I use to rotate the Apache log files on our Windows servers. My script is based on this one http://www.sprint.net.au/~terbut/usefulbox/apachelogrot.htm with a few small modifications.

Here's what it does:

  • stops the Apache service
  • renames all the .log files to include a timestamp
  • starts Apache
  • zips all the timestamped log files into an archive
  • deletes old archives (the noatk variable specifies number of archives to keep)

To use it:

  • copy it in to a text file named something.bat
  • modify the variables: logpath, servicename, noatk to match your environment
  • tweak the timestamp variables (if required) to get the correct format, depending on your system
  • run it! I run it weekly via scheduled task

 @echo off
:: Name - svrlogmng.bat
:: Description - Server Log File Manager
::
:: History
:: Date         Author        Change
:: 22-May-2005  AGButler    Original
:: 04-Mar-2009  AGButler    Updates - thanks to Geoff Brisbine
 ::
:: 10-Mar-2010    JRoberts    Modified for my specific needs:
::                            - corrected timestamp for our systems
::                            - log path and service name as variables
::                            - use pushd/popd rather than dir - allows UNC paths
 ::                            - archives ALL .log files in the log path

:: ========================================================
:: setup variables and parameters
:: ========================================================
 
:: generate date and time variables
:: may need to swap around the k j i variables to get the yyyymmdd format
for /F "tokens=5,6,7 delims=/ " %%i in ('echo.^|date^|find "current" ') do set trdt=%%k%%j%%i
 :: the following should get the windows time to get the hhmmsstt format
for /F "tokens=5,6,7,8 delims=:. " %%i in ('echo.^|time^|find "current" ') do set trtt=%%i%%j%%k%%l
set nftu=%trdt%-%trtt%
 
:: the directory path of the Apache log files
set logpath=C:\Program Files\Apache Software Foundation\Apache2.2\logs
:: the name of the Apache Windows service
set servicename=Apache2.2
 
:: set the Number Of Archives To Keep
set /a noatk=26


:: ========================================================
:: turn over log files
:: ========================================================
 
:: change to the apache log file directory
pushd "%logpath%"

:: stop Apache Service
net stop "%servicename%"

:: rename all .log files with the timestamp
forfiles -m *.log -c "cmd /c move @file %nftu%_@file"
 
:: start Apache Service
net start "%servicename%"

:: ========================================================
:: zip todays Access and Error log files, then delete old logs
:: ========================================================
 
:: zip the files
7za a -tzip %nftu%_logs.zip %nftu%_*.log

:: del the files
del /Q %nftu%_*.log


:: ========================================================
:: rotate the zip files
:: ========================================================
 
:: make list of archive zip files
type NUL > arclist.dat
for /F "tokens=1,2 delims=[] " %%i in ('dir /B *_logs.zip ^| find /N "_logs.zip"') do echo  %%i = %%j>> arclist.dat
 
:: count total number of files
for /F "tokens=1 delims=" %%i in ('type arclist.dat ^| find /C "_logs.zip"') do set tnof=%%i

:: setup for and create the deletion list
set /a negtk=%noatk%*-1
 set /a tntd=%tnof% - %noatk%

type NUL>dellist.dat
for /L %%i in (%negtk%,1,%tntd%) do find " %%i = " arclist.dat >> dellist.dat

:: del the old files
for /F "tokens=3 delims= " %%i in ('find "_logs.zip" dellist.dat') do del /Q %%i
 
:: remove temp files
del /Q arclist.dat
del /Q dellist.dat

:end
popd
 

Filed under  //   Apache   Windows   batch  

Comments [0]

Handy Google Analytics tracking snippet for Mura CMS

This is the tracking code I have added to a new Mura CMS site. It's dead simple, with a couple of useful extras:
  1. It only tracks when the site is running in 'production' (use settings.ini.cfm to define the environment modes)
  2. 404 pages are tracked as 404.html along with the actual site URL the user hit, and the referrer; this allows you to keep track of your SEO and broken links
Hint: to create a custom 404 page in Mura, simply create a page named "404" in the Mura site manager

<cfif getConfigBean().getMode() eq 'production'>

<cfoutput><script type="text/javascript">
var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
</script>
<script type="text/javascript">
try {
var pageTracker = _gat._getTracker("UA-xxxxxxx-x");
<cfif $.content().getfilename() eq '404'>
pageTracker._trackPageview("/404.html?page=" + document.location.pathname + document.location.search + "&from=" + document.referrer);
<cfelse>
pageTracker._trackPageview();
</cfif>
} catch(err) {}</script></cfoutput>

</cfif>

Filed under  //   Coldfusion   Mura   cfml   google analytics  

Comments [2]