Cassandra project chair: We're taking on Oracle

With Cassandra 2.0 due in July, Apache project chair Jonathan Ellis says his team is focusing on such issues as ease of use

Apache Cassandra is an open source, NoSQL database accommodating large-scale workloads and attracting a lot of attention, having been deployed in such organizations as Netflix, eBay, and Twitter. It was developed at Facebook, which open-sourced it in 2008, and its database can be deployed across multiple data centers and in cloud environments.

Jonathan Ellis is the chair of the project at Apache, and he serves as chief technical officer at DataStax, which has built a business around Cassandra. InfoWorld Editor-at-Large Paul Krill spoke with Ellis at the company's recent Cassandra Summit 2013 conference in San Francisco, where Ellis discussed efforts to make the database easier to use and how it has become a viable competitor to Oracle's relational database technology.

InfoWorld: What is the biggest value-add for Cassandra?

Ellis: It's driving the Web applications. We're the ones who power Netflix, Spotify. Cassandra is actually powering the applications directly. It lets you scale to millions of operations per second and software-as-a-service, machine-generated data, Web applications. Those are all really hot spots for Cassandra.

InfoWorld: What is the value added by Cassandra over the rival NoSQL database, MongoDB?

Ellis: I think [the MongoDB developers would] be willing to concede that Cassandra does better on the scaling and performance end of things. MongoDB would also claim, and I would concede to them, that they're probably doing better on the ease-of-use front. That's part of what's driving our CQL (Cassandra Query Language) story -- we're aware that historically it's been a weak point for us. We're fixing that with CQL.

InfoWorld: How does that fix it?

Ellis: It's hard to show without showing you kind of the train wreck that you had with Thrift (Cassandra's legacy API). But since CQL is so similar to SQL, it makes it much easier for relational developers to wrap their heads around what Cassandra is giving them. You get indexes out of it, and you get things like the collections that make building your applications a lot easier with Cassandra. [CQL] reduces the learning curve, first of all, because you do have similar syntax to SQL, but it also increases your productivity even after you've gotten over the learning curve by giving you some syntactic sugar for common constructs. The production-ready [version] of CQL hit in January of this year.

InfoWorld: Does Oracle have anything to fear from Cassandra? Are you displacing any Oracle installations?

Ellis: Yes. It's getting hard to count them, almost. We're really doing a lot of that. We had a press release yesterday about Ooyala, Netflix, and Openwave and going into a lot of detail about their experiences replacing Oracle with Cassandra.

Apache Cassandra database winning converts from Oracle
InfoWorld: You mentioned native Cassandra drivers for languages including Java, .Net, PHP, Python, and Ruby. What is the significance of that for Cassandra?

You mentioned native Cassandra drivers for languages including Java, .Net, PHP, Python, and Ruby. What is the significance of that for Cassandra?

Ellis: Native drivers allow us to get more performance and allows the server to notify the client asynchronously of events that the client might be interested in, like a new node join in the cluster, so another client will want to load-balance across that node as well. Or say a user on a different connection created a table. You want to tell this other client -- "hey, the schema has changed and you might want to be aware of that."

InfoWorld: So the drivers are mostly for Java, .Net, and other client applications to tunnel back to Cassandra databases?

Ellis: Right. Yes.

InfoWorld: You talked about Cassandra 2.0. You said it is coming out in July?

Ellis: Right. The end of July is what we're targeting.

InfoWorld: What are the most important new features, and what are the most important deletions?

Ellis: The most important new features are the lightweight transactions and triggers. Triggers let you push some computation into the Cassandra cluster. I don't know if I'd really classify any of the deletions as really important. I guess, from a historical perspective, super columns, getting rid of those makes us veterans happy, since they caused a lot of grief for us early on. But I wouldn't really characterize the deletions as interesting to a broader audience. They're mostly interesting to the technical Cassandra people.

InfoWorld: What is a super column?

Ellis: Think of it as an early attempt to do what we're doing now with CQL collections. The problems that we had with that were that it was not performing enough.

This story, "Cassandra project chair: We're taking on Oracle," was originally published at InfoWorld.com. Get the first word on what the important tech news really means with the InfoWorld Tech Watch blog. For the latest developments in business technology news, follow InfoWorld.com on Twitter.

Related:

Copyright © 2013 IDG Communications, Inc.