Information in the Clouds Grounded by Data Quality?
By: David Loshin
21-Apr-2010
Having recently attended the annual Mecca for data management professionals, the 2010 Enterprise Data World (EDW) conference, I noticed how the growing interest in cloud computing had evolved into its own track of presentations, including one full-day tutorial and a small variety of sessions. Paraphrasing Wikipedia, cloud computing is a means for exploiting shared computational resources (both storage and CPU) to provide on-demand computing capability as a “for-pay” utility.
While the term is relatively new, the technical concepts are not, building on ideas that have been promoted for the past 30 (if not more) years. However, the part that is different from that of a typical “network of workstations” or a deployment of grid computing is not a technical one but rather is an economic one. Instead of making a capital investment in the purchase of hardware, software, and services, the use of computing in a utility model becomes an expense with little or no up front costs, thereby reducing overall costs and risk.
This model sounds great, especially for smaller businesses that are not prepared to spend huge amounts of money to buy computers but still want to do “big projects.” One of the greatest challenges, though, has to do with the safety, security, and quality of data injected into a cloud environment. And conveniently, the EDW sessions touched on some of these points, such as data governance, master data management, and data protection.
But in my opinion, the discussion of managing cloud data quality did not go far enough, especially since the increasing availability of large-scale application capabilities (particularly analytical applications) drives the focus towards functionality and away from oversight. The discussions typically focus on what is being done with little or no regard for ensuring that the results are trustworthy.
How do we address this? One approach is to look at the operationlization of “cloud data quality services” – identifying key data quality capabilities that are suited to the cloud model and then materialized as services, such as data profiling, parsing and standardization, cleansing, and enhancement.
customer community