BETA
This is a BETA experience. You may opt-out by clicking here
Edit Story

What Big Data Strategists Can Learn From A Con Artist

Oracle

Thirty years ago, a con man named John Drewe pulled off what might be the most elaborate and damaging art fraud of the twentieth century.

He befriended and persuaded a talented painter named John Myatt to create forgeries of famous works of art, and then infiltrated the archives of art institutions to place false documentation about the paintings in their files.

As a result, leading art buyers and galleries believed the authenticity of the forged works because the provenance—the record of ownership—appeared valid.

Drewe sold some 200 forgeries before he was arrested. But because of his talent for faking documentation, less than half have been identified. Many are likely still out there, proudly displayed as the work of Alberto Giacometti, Graham Sutherland, Jean Dubuffet, and others.

The scandal shook the art world. Just how many high-dollar art sales were made based on bad information may never be known.

While Drewe’s deception was intentional, the risk of making bad decisions based on seemingly valid information is one that businesses face every day.

In this era of big data, with unprecedented quantities of information at hand and growing pressure on businesses to make sense of it quickly, it’s a very real possibility to design and invest in a masterpiece of a strategy or marketing campaign that’s based on bad data.

It’s not that the data necessarily was bad initially. But because technologies like Hadoop have made it easy and inexpensive for organizations to store lots and lots of data, keeping track of what’s been collected, where it came from, and how those bits of data are connected is a huge undertaking.

And it’s far more complex than the comparatively linear provenance of a work of art.

“By the time data shows up in a report or in a dashboard, it’s gone through multiple hops and multiple tools,” explains Jeff Pollock, Oracle vice president of product marketing. “The data was likely manipulated several times by other programs. And if you are not able to see or visualize or deconstruct all of the things that happened to that data before it got to the point where it’s displayed in a pretty graph in a dashboard, you may make business assumptions based on what has become bad data.”

Making Data Traceable

Some industries, like banking and insurance, are required by law to archive data for a long time, and they often store old data in Hadoop because it’s relatively inexpensive to do so. However, Pollock says, the lineage of that data needs to be archived as well to satisfy audit and compliance requirements.

The mortgage crisis of 2008 was an example of data provenance run amok. Subprime mortgages were granted to borrowers, then packaged with others and sold, then repackaged and resold multiple times until it was impossible to trace back to the original transaction.

While Cloudera and Hortonworks, two of the most popular Hadoop distributions, are addressing the issue of provenance, Pollock notes that an issue with their approach is that “almost no organizations use their big data environment in isolation from the rest of their enterprise assets.”

“They’ll bring data from existing applications and from existing IT environments into the Hadoop cluster,” he explains. “There, they do work with the data and then often the data actually moves on to a data warehouse, which in turn supports a variety of business intelligence tools.”

In fact, the full lineage of that data may trace back to CRM or billing or e-commerce applications that are way upstream. To ensure data integrity, organizations must be able to trace data from the original source all the way to the final report used by the business, including how the data was connected and transferred along the way.

Oracle Enterprise Metadata Management can show that extreme breadth of data lineage across many different enterprise layers, Pollock says. “This is really valuable for businesses that use Hadoop clusters with other non-Hadoop applications,” he says.

Governance from the Get-Go

Too often, organizations establish a Hadoop data lake but don’t make data governance controls a priority, Pollock says. “When that happens, it’s very easy for your data to become so polluted that the amount of work to clean it up later becomes cost-prohibitive,” he says.

Pollock likens this situation to a library that just piles books on the floor rather than cataloging and organizing them under the Dewey Decimal System. The more books that come into it, the harder it is to go back and organize the inventory and find what you need.

Metadata management enables a business to inventory data and establish a record of its source and how it’s affiliated with other data.

Pollock compares Oracle Enterprise Metadata Management’s Business Glossary to the card catalog of the library, “so you can look up what’s there and even find correlations between the subjects—even using multiple business terms,” he says. “While it’s really one big pile of data, you can have multiple business units and multiple users each effectively with their own Dewey Decimal System of terminology for organizing the data according to how they need to see it.”

But what if several departments use multiple applications to query the same table of data, and one department drops a column or alters a table? For example, if the marketing department launches a customer loyalty program and adds a “loyalty status” column to a data file about customers, other applications that use that data could experience errors.

And if a table is removed—say, during an application upgrade—the other applications depending on that table can then get caught in a domino effect of bad data. Oracle’s metadata management tool addresses both of those issues, Pollock says.

Obviously, good data can go bad in a variety of ways. Unless a business has the right tools to identify problems and prevent all of those scenarios from causing damage, a company may not realize anything’s wrong until it’s too late.

Just like the art galleries and buyers who were duped by Drewe 30 years ago.

See Oracle.com for more: