Attended Hadoop Day today at the Amazon PacMed building. Free “conference” setup – and it was very worthwhile.

The morning was panels and general discussions – the brightest spot being learning about the whirr project. I’m afraid the link has 100% suck to it because it says nothing about what the project is. I haven’t ready through all the code, but reports from others at the conference say that it is “nice tools to work with Hadoop”. Hopefully so. It’s a maven/java project with zippo documentation – not even a README that has an overview of any use.

The afternoon I sat in the “intermediate” track and picked up some really interesting pieces. I got the in-depth scoop on what’s happening with Hadoop and adding security from Jakob Homan, got a great introduction to Mahout from Jake Mannix (about to be a search geek employed at Twitter), learned about Prezi, which I’m thinking I’ll inflict on my coworkers some time, and was amused and interested in cascalog.

Cascalog = Closure + datalog + cascading

It turned out to be surprisingly (to me – I’m being unfair to Closure really) expressive and readable for making interesting and complex queries from Hadoop data structures – a very nice abstraction setup. I snorted at the thought of handing it to someone who had trouble with SQL though – it’s for programmers, not business analysts. I do wish I’d been able to get more skinny and in-depth viewing of cascading – it looks really effective at making queries and processing hadoop based data. I would have also liked to get some real meat and details on Oozie, which is Yahoo’s workflow engine for submitting mapreduce jobs into their hadoop clusters.

I took off a little early from the conference, but it was very definitely worthwhile. I wish the amazon environment had better wifi connectivity (rather sucked for guests), but in the end I didn’t really need it for what I gathered.

