How do I freaking scale Oracle?

A quick-and-dirty guide to scaling your apps with Oracle DB

For the record, I'm a NoSQLer and a big data guy. After nearly 20 years of dealing with what I've come to refer to as "the beast" (aka Oracle), I've turned coat and decided I prefer solving the problem rather than mitigating it.

That said, many companies I work with have spent 20 years painting themselves into an Oracle corner. While they may have one eye on a brighter future, they still must ensure their Oracle database is high-performance and highly available -- and scales as well as possible. Despite what you may read in NoSQL vendor marketing materials (or even in my blog), it is possible to scale Oracle.

More from JavaWorld

Also see: Which freaking (NoSQL) database should I use? and Which freaking PaaS should I use?.

Want more Java enterprise news? Get the JavaWorld Enterprise Java newsletter delivered to your inbox.

There's just one catch: Nothing I'm going to suggest costs less than my house for a nontrivial installation. Heck, when you include the hardware and data center resources, some cost more than my whole neighborhood.

What availability and scalability really mean

Before I describe how to accomplish high availability or scalability, we'd all be better served if you know what I mean by those terms.

High availability means close to zero downtime -- in other words, a system that can tolerate a one-node failure with either zero downtime or minimal downtime (usually less than 5 minutes) for the failover process. Sometimes this can be done across data centers to allow for high availability even in a disaster scenario.

As for scalability, it's important to understand from the outset that performance is a different concept. Performance is the time it takes to service a request. Scalability means flexibility in expanding or reducing your server capacity and infrastructure. Higher scalability means handling a larger number of requests within our expectation of performance.

There is some overlap, however, in performance and scalability. Ideally, a database and its related infrastructure handles all availability issues under the hood so that the client can stay "dumb." However, if certain aspects of transaction logic are outsourced to the client, then performance and scalability can both be at risk.

Take sharding, for example. The dragon is always complexity -- and including sharding logic in application code introduces a bug risk associated with the sharding algorithm "rolled" by the developer. This threatens scalability because the sharding algorithm might not be very good or correct (for example, Foursquare in October 2010).

What do you need? You need fault tolerance against server outages, storage outages, and data center outages (aka disaster). You'll also need a way to balance load against servers and/or CPU cores as well as disks. Several products and techniques can accomplish this.

Oracle RAC

Oracle RAC is mainly about load balancing and high availability to Oracle processes. It uses a shared disk architecture, which means that while it keeps the database available against a server outage, it does not protect against any outages that affect storage.

Moreover, Oracle RAC doesn't really help with overall scalability concerns if your database wasn't originally CPU-bound -- the load on the disk, if anything, increases. As a database is tuned, it eventually becomes disk-bound; RAC isn't much help with this. Also, if your data center goes dark, RAC probably isn't your wide-area solution.

Nonetheless, with any large Oracle system, you need RAC. It helps with a lot of common RDBMS and Oracle-specific problems:

  1. Connection management. In cooperation with Oracle's JDBC driver, the connections from Database clients such as your application servers will load balance to Oracle instances.
  2. High availability. In cooperation with Oracle Clusterware, the system can "hide" downed back-end processes or even entire instances from end-users.
  3. Load management. Different types of application requests can be grouped and automatically routed to specific instances or restricted in the amount of resources they use.

Oracle separates the management tools for RAC under separate branding as Oracle Clusterware -- and lumps the likes of Oracle Enterprise Manager under Clusterware, which provides much of the infrastructure for RAC.

Mirroring

Mirroring can be useful for a hot standby, although you probably won't be balancing load between mirrored sites.

There are several mirroring products from the block layer on up. They are helpful for availability, but may not be that great for disaster recovery, as mirroring doesn't really scale across the WAN -- although you may be able to achieve acceptable performance within a 20-mile radius. Conventional wisdom says mirroring within that radius protects against most disasters, and statistical analysis backs that up.

That said, few were prepared for a major hurricane in New York City. No one generally thinks about massive Northeastern blackouts, either, but there have been a few. Twenty miles away may not be far enough.

Another major disadvantage of storage-level solutions is that you can't mix versions of Oracle. You'll have to either combine your storage-level solution with something else or create scheduled maintenance windows for outages.

Mirroring technologies at the block layer include products like:

  1. Storage replication solutions like SRDF. These allow for both synchronous and asynchronous transfer. The synchronous mode ensures consistency between the primary device and the backup. I've yet to see a working Oracle cluster that did not run in the lower-performing synchronous mode.
  2. Storage virtualization products like VPLEX. These can be thought of as really advanced SAN versions of what Linux offers as "logical volume management." Mainly storage virtualization is for growing and shrinking storage, as well as balancing load across disks, but it can mirror logical volumes (a logical volume is a group of disks that act as if they were one disk to the next layer above). This is sometimes done at the hardware layer; it can also be achieved at the file system/operating system layer.
  3. RAID 1. At the lower end you have traditional RAID disk mirroring, which can be implemented at the hardware or controller level or the software level. Most Linux distributions can do this out of the box.

Transaction replication

Oracle bought GoldenGate and has been pushing it lately as opposed to its Oracle Streams product that debuted as a built-in feature of Oracle 9. GoldenGate captures DML events from the transaction log, writes them to "trail files," then pumps them across even a WAN.

You can configure the database in "multimaster" setups where both instances allow writes. In this event, you can have write conflicts. Write conflicts have to be resolved and GoldenGate has ways to do that, but this may require work on your part. Multimaster can help read performance from a scalability standpoint but does not help at all with write performance. The same number of writes (systemwide) will take place on each node, resulting in no greater performance than before.

Implementing GoldenGate requires that you create mapping files. This will feel a lot like a system integration project, and depending on the size and complexity of the database, it may have the scope and risk of one. A key advantage to GoldenGate is that it allows you to mix and match Oracle versions.

You need to do due diligence in performance testing and ensure this product meets your performance goals. Check out Oracle's diagram of how GoldenGate works and you'll ask: Wow, how long will it take to do this on my database and network infrastructure?

How do I freaking scale Oracle?

A number of other third-party products do transaction replication -- indeed, Oracle has another product called Data Guard, which can do transaction replication in single-master mode. However, Oracle seems to be struggling to explain why it has both Data Guard and GoldenGate. The simpler explanation may be that GoldenGate will take you longer to implement.

Partitioning

The primary "data" way to scale in Oracle comes down to partitioning. Here's Oracle's definition:

Partitioning allows a table, index, or index-organized table to be subdivided into smaller pieces, where each piece of such a database object is called a partition. Each partition has its own name, and may optionally have its own storage characteristics.

Now here's 10gen's explanation of sharding in MongoDB:

Sharding distributes a single logical database system across a cluster of machines. Sharding uses range-based portioning to distribute documents based on a specific shard key.

If you have RAC for the instance balancing and partitioning for the data balancing, then you have much of what sharding provides while still in your familiar Oracle database, right? Almost. There's more involved in setting up and maintaining partitioning in Oracle. There are multiple schemes for how to partition.

Hybrid NoSQL/cache

All of that helps spread the load or provide ways to avoid a heart attack if a server or disk is lost. But what about taking some load off of the database? Oracle again has two products that it bought for the purpose:

  1. TimesTen. This can be used as an in-memory database cache and is entirely JDBC/SQL-based. There are alternatives such as VMware's SQLFire that actually may be more mature and provide similar functionality, but require ANSI SQL and lack support for PL/SQL. TimesTen also has some limitations and quirks related to Oracle RDBMS-specific features. The sales pitch is that TimesTen is just "thrown in front" and you press the Go button. The truth is that if you have a bunch of PL/SQL and have coded to Oracle and not ANSI, you may be headed for a substantial system integration project.
  2. Coherence. This is a traditional replicated cache similar to JBoss's Infinispan, VMware's GemFire, and Software AG's Teracotta. Since Oracle acquired Coherence, development appears to have dropped off somewhat; Oracle appears to be more aggressively pushing TimesTen. All of these products allow you to write a custom CacheLoader and CacheWriter -- and it may actually be easier to do so than port all your queries to a more standard dialect of SQL or even a subset of PL/SQL. All of these products have some interesting features for localizing and distributing data. In general, it's best to think of them as in-memory key value stores (a type of NoSQL database) with a write-behind queue for a SQL database or indeed other storage mediums. With both LAN and WAN replication and other features, replicated caches can also serve as at least a partial DR and HA solution, depending on the size and complexity of your database and whether you can fit most or all of it in memory.

Conclusion

As I've outlined here, Oracle offers various methods and technologies to achieve most of what a NoSQL database can do in terms of scalability and availability. So why NoSQL if Oracle can do it all?

Well, we haven't yet addressed cost. It isn't that you can't scale Oracle -- but you must weigh the costs of doing so. Most of the companies I work with have existing systems, and the cost of migration to a different type of database after years of coding directly to Oracle is daunting by comparison. For new systems, however, NoSQL can look a lot more attractive.

This article, "How do I freaking scale Oracle?," was originally published at InfoWorld.com. Keep up on the latest developments in application development and read more of Andrew Oliver's Strategic Developer blog at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.

Copyright © 2013 IDG Communications, Inc.