Enterprise Data Quality – Best Practices

Enterprise Data quality is the measure of data accuracy, completeness, and integrity across a business. Within enterprise business systems, problems with data quality are notorious for compromising timely and accurate customer communication.

While in the past, the most common cause of poor data quality was error-prone manual entry, but today, enterprise data has a lifecycle, moving in various directions within and beyond a business.

Resulted in, organizations with poor data quality spending huge time working with conflicting reports and flawed business plans, resulting in erroneous decisions that are made with outdated, inconsistent, and invalid data.

Best Practices

Data quality issues can seriously obstruct the actual operation of critical systems like Business Intelligence, Customer Relationship Management and Supply Chain Management. In this context, data quality best practices act as the basic building blocks that ensure optimal business improvement and better results.

The following best practices will ensure that organizational databases exhibit the required degrees such as those of completeness, integrity, accuracy, validity and uniqueness, whether it is at the data entry or data migration levels.

Define the data items that are causing problems

In a perfect world, we take a highly organized approach to data quality. That may well involve a full data audit, examining every data item (EmployeeFirstName, CustomerLastName, etc.) in every transactional system. In such audits, each data item is evaluated for:

Its quality level
The damage that poor quality data is causing
The cost of fixing bad data

In practice, most enterprises only realize the importance of data quality when they encounter specific, and usually expensive, problems.

Profile the data

Data profiling is simply a way of examining the variety of data in a given data item. For numerical items, profiling will tell us the smallest and largest values, the mean, the mode, the standard deviation, the distribution and so on. For text data, it will tell us things such as the mean length, the distribution and frequency of values.

Define what the data should be

Profiling is easy; working out what the data should be like is more difficult. This will involve determining the acceptable domain of values – basically a list of all acceptable values. For example, for a Gender data item, only “Male” or “Female” might be considered as acceptable, or “M” or “F” if you abbreviate.

We can also determine the rules that can be used to verify the data (e.g., those entitled “Mr.” can only be male, whereas a “Professor” may be of either gender). This step usually involves a great deal of discussion with users of the systems to identify what constitutes acceptable data.

Define how to clean the data

This is a challenging process. What updates can be run to clean existing data? And what changes should be made to the applications to ensure that poor-quality data cannot be entered? (If the data entering through third party applications, such changes may be impossible to make in practice – so the only alternative may be to run cleaning processes every night.)

Decide if the changes are worth it.

The cost of cleaning data is different. If you start with really bad data, it is pretty easy to spot the gross errors and fix them, hence the initial relatively low cost of fixing the data. As you get closer to data quality perfection, you incline to be hunting much more uncertain errors, which are much harder to fix. As a result, the cost of fixing the errors increases.

Conclusion

The bottom line is that data quality is an important process but complex though. Often it is avoided because of complications and misunderstanding. However, once data quality problems have a significant effect on the actual enterprise data, using formal data quality best practices will provide significant benefits to the organizations in terms of data consistence across the enterprise systems and process resulted in better decision making.

Note: The above article is authored by Mr.Suresh Chintala from Enterprise Integration & Information Management Practice @ Aspire Systems.