Cassandra Day Denver 2014 has ended
Back To Schedule
Tuesday, October 14 • 1:00pm - 1:40pm
Reading Cassandra SSTables Directly for Offline Data Analysis

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Here at FullContact we have lots and lots of contact data. In particular we have more than a billion profiles over which we would like to perform ad hoc data analysis. Much of this data resides in Cassandra, and we have many analytics MapReduce jobs that require us to iterate across terabytes of Cassandra data. To solve this problem we've implemented our own splittable input format which allows us to quickly process large SSTables for downstream analytics.

avatar for Ben Vanberg

Ben Vanberg

Software Engineer, FullContact
Professional Software Engineer since 1999, and working on big data solutions for the past 5 years. Currently working at FullContact where Cassandra is at the center of our ecosystem.

Tuesday October 14, 2014 1:00pm - 1:40pm PDT
Track A

Attendees (0)