Bulk Programmatic Conversion of Photos from flickr to SmugMug 
2007-02-04, 18:38 - General
Having moved from flickr to SmugMug I was faced with the prospect of moving all my flickr photos across. A quick look at flickr reveals there is no easy way to get your photos back (not even a "Zip & Download" for galleries), so faced with the prospect of a weekend of picture-by-picture clicking I decided to get my stuff moved across in bulk programatically. Here's how I did it ...

General Approach


I wanted to preserve the titles of each of the pictures, but I was happy to dump everything from flickr into a single 'import' gallery on SmugMug and then use SmugMug's civilised organising features to get everything into the galleries I wanted.

Looking at flickr's Terms of Use (which also points out that the terms change when moving to a Yahoo! Id) I notice that users "must not modify, adapt or hack Flickr.com" -- so thoughts of a grand adapting web app for flickr (with a 'copy to SmugMug' button) went by the wayside, and instead I opted for a command-line application.

As a programming language I chose Groovy, partly as learning a new programming language on the fly makes it interesting :-)

Preliminaries


Groovy runs on the Java platform, but a couple of third-party libraries are helpful to keep things nice: the Jakarta Commons HttpClient provides an excellent higher-level abstraction for dealing with the network interaction, and Elliotte Rusty Harold's XOM is a lovely library for dealing with XML without too much syntactic cruft.

Logging In


First up, we need to get a couple of sessions going: one with flickr and one with SmugMug. Here's some code (I'm not posting the whole thing -- but am happy to share if anyone's interested. Oh, and I hope the variable names make it 'self documenting' ;-))

First, to log into flickr:
// Login to flickr by POSTing to the "old skool" form target
def post = new PostMethod( flickrLoginUrl )
post.addParameter( "email", flickrUsername )
post.addParameter( "password", flickrPassword )
def status = flickrClient.executeMethod( post )
println "Flickr login, status=$status"

And then, to SmugMug ...
// Login to SmugMug over SSL using their REST API
def smLoginUrl = this.smugMugApiUrlStub
+ "method=smugmug.login.withPassword"
+ "&APIKey=" + smugMugApiKey
+ "&EmailAddress=" + smugMugUsername
+ "&Password=" + smugMugPassword
def get = new GetMethod( smLoginUrl )
smClient.executeMethod( get )

It's interesting to note the difference of approach here. When using flickr, it appears the username and password are sent as plain text - yikes! SmugMug's rest API, OTOH, is exposed though an SSL layer.

Also note, that in best REST fashion, the login GET to SmugMug responds with an XML document. We can then use XOM to extract the SessionID from this response -- we'll need it later to interact with SmugMug.

// Build a XOM XML document from the returned bytes
def doc = new Builder( false ).
build( new ByteArrayInputStream( get.responseBody ) )
get.releaseConnection()

// Use XPath to get the SM SessionID
sessionId = doc.query( "/rsp/SessionID" ).get( 0 ).value
println "SmugMug session ID is " + sessionId

Enumerating the flickr photos



The approach here it to enumerate the sets, and then for each set to enumerate the photos. From the "Your sets" page, I was hoping to get hold of the XML and pull out the URLs for each of the sets' own pages.

It was here I got a nasty shock - the flickr pages aren't XHTML. They're not even (according to the W3C Markup Validation Service ) valid HTML!

So, time to observe, and scrape with a regexp:
/*
* Given the bytes of a flickr "sets" page, this
* iterates over each set ...
*/
static void processAllSets( byte[] page )
{
String s = new String( page )

def pat = /a class="Seta" href="([^"]+)" title="([^"]+)"/
def matcher = ( s =~ pat )
println "Found $matcher.count sets"

for( index in 0 .. matcher.count - 1 )
{
def setUrl = matcher[ index ][ 1 ]
def setTitle = matcher[ index ][ 2 ]
println
"Got set at $setUrl, entitled '$setTitle'. Processing ..."
processSet( setUrl, setTitle )
}
}

Note Groovy's nice syntax for handling regexps.

Now we've got a URL for each set we can (using a regexp again) get a URL for all the links to thumbnails on that set's page
/*
* Given a flickr set, this iterates over each of the photos in it
*/
static void processSet( String setUrl, String setTitle )
{
def get = new GetMethod( "http://flickr.com" + setUrl )
def status = flickrClient.executeMethod( get )

println "Got setUrl, status $status"
String s = new String( get.responseBody )

// get the photo titles and thumbnail URLs
def pat = /title="([^"]+)" class="thumb_link"[^>]+><img src="([^"]+)/
def matcher = ( s =~ pat )
println "Found $matcher.count photos in set"

for( i in 0 .. matcher.count - 1 )
{
def photoTitle = matcher[ i ][ 1 ]
def photoThumbUrl = matcher[ i ][ 2 ]
println "Got thumbnail at $photoThumbUrl, for photo entitled '$photoTitle'. Processing ..."
processPhoto( photoThumbUrl, photoTitle )
}
}

With this we'll end up with the URL of each of our photo thumbnails, and its caption. But we don't want to upload thumbnails to SmugMug, but our full-size original pictures. Luckily, flickr appears to follow a naming convention so that by changing the "_s.jpg" to "_o.jpg" in our URLs, we can synthesise the URL of the original photo.
/*
* Given a flickr photo, this copies it to SmugMug
*/
static void processPhoto( String photoThumbUrl, String photoTitle )
{
def photoUrl = photoThumbUrl.replace( "_s.jpg", "_o.jpg" )
def get = new GetMethod( photoUrl )
def status = this.flickrClient.executeMethod( get )
def byte[] raw = get.responseBody

this gives us (in the byte array called raw) our original image data. Next we have to generate an MD5 checksum, as this is required by the SmugMug upload mechanism. It's here that having Java on tap comes in very handy ...
def md = MessageDigest.getInstance( "MD5" )
md.update( raw )
def digestBytes = md.digest()
def checksum = ""
for( index in 0 .. digestBytes.length - 1 )
{
checksum += Integer.
toString( ( digestBytes[ index ] & 0xff ) + 0x100, 16 ).
substring( 1 )
}
println "Content length is: "+ raw.length
println "MD5 is: " + checksum
println "Starting POST ..."</pre></div>

Finally, we're ready to do the upload. Again, the REST way allows us to do this by POSTING our raw image data to SmugMug with the headers correctly set:
def put = new PostMethod( smugMugUploadUrl )
put.addRequestHeader( "Content-Length", "" + raw.length )
put.addRequestHeader( "Content-MD5", checksum )
put.addRequestHeader( "X-Smug-SessionID", this.sessionId )
put.addRequestHeader( "X-Smug-Version", "1.1.1" )
put.addRequestHeader( "X-Smug-ResponseType", "REST" )
put.addRequestHeader( "X-Smug-AlbumID", this.smugMugUploadGalleryId )
put.addRequestHeader( "X-Smug-Caption", photoTitle )
put.setRequestBody( new ByteArrayInputStream( raw ) )

smClient.executeMethod( put )
println "POST complete"

Et voila! automatic transfer of images from flickr to SmugMug.

The transfer process takes a while, as every image needs to get downloaded to, and then uploaded from, the client machine. It would be nice if SmugMug allowed pictures to be uploaded by URL (thereby bypassing the need to route the data through client machines with their measly domestic bandwidth) -- but maybe this opens up too much opportunity for abuse.

Conclusions


This does the trick, but for flickr uses with larger collections (multi-page sets, etc), some more code will be required. It might well be worth investigating flickr's own API for a more robust approach ...

In general though, I wish flickr had provided a better way for getting photos back in bulk - it would have made life a lot easier.



11 comments ( 2370 views )  | permalink  |  stumble this |  digg it!


<<First <Back | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | Next> Last>>