Loading…
This event has ended. Create your own event → Check it out
This event has ended. Create your own
View analytic
Tuesday, October 14 • 1:00pm - 1:40pm
Reading Cassandra SSTables Directly for Offline Data Analysis

Sign up or log in to save this to your schedule and see who's attending!

Here at FullContact we have lots and lots of contact data. In particular we have more than a billion profiles over which we would like to perform ad hoc data analysis. Much of this data resides in Cassandra, and we have many analytics MapReduce jobs that require us to iterate across terabytes of Cassandra data. To solve this problem we've implemented our own splittable input format which allows us to quickly process large SSTables for downstream analytics.

Speakers
avatar for Ben Vanberg

Ben Vanberg

Software Engineer, FullContact
Professional Software Engineer since 1999, and | working on big data solutions for the past 5 years. Currently working | at FullContact where Cassandra is at the center of our ecosystem.


Tuesday October 14, 2014 1:00pm - 1:40pm
Track A

Attendees (15)