|
Grant Title: Measuring Data Quality Using Provenance
Information is one of the biggest assets for most enterprises. In today’s information age, almost every decision is based on a detailed analysis of data recorded in diverse sources ranging from structured databases to the World Wide Web. To ensure that data retrieved from different sources is used appropriately and within context, it is imperative that the provenance of the data be recorded and made available to its users. Provenance refers to the knowledge that enables a piece of data be interpreted correctly. It is the essential ingredient that ensures that users of data (for whom the data may or may not have been originally intended) understand the background of the data. This includes elements such as, who (person) or what (process) created the data, where it came from, how it was transformed, the assumptions made in generating it, and the processes used to modify it. In a nutshell, provenance is a critical element in determining overall data quality. Businesses unable to understand their data quality are at great risk of making products that fail and may lose their competitive edge.
In this research we propose to develop a comprehensive framework for evaluating data quality and understanding the relationship between provenance and quality. Using new product design and development as our domain, we will partner with Raytheon Missile Systems (RMS), a defense contractor located in Tucson, Arizona, to develop this framework. We will specify different types of dimensions to evaluate quality of data including accuracy, completeness, and timeliness. We will also develop a clear definition for each of the dimensions of data quality and relate them to stages in the data life cycle, i.e. starting from creation of data, its management, and consumption, to eventual archiving and/or disposal. We will then define metrics to represent data quality along each of these dimensions and show how these metrics can be combined to provide a comprehensive data quality index. These metrics will be based on elements of provenance and how they interact with individual quality dimensions. We will also evaluate the effectiveness of our data quality framework using a series of case studies at Raytheon.
The total amount of funding for this project is $540,000 for 2 years.
-----------
Contact information:
Dr. Sudha Ram
McClelland Professor of MIS
Department of MIS
430J McClelland Hall
Eller College of Management
University of Arizona
Tucson, AZ 85721
Email:
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
URL: http://vishnu.eller.arizona.edu/ram
Phone: (520)-621-2748
Fax: (520)-621-2433
|