This post is a very brief read on the concepts of Data Quality, a cornerstone of of Data Governance, and an important component of Data Strategy. Here are those concepts:
- Completeness: Whether the data has been populated or not. For example, if in customer address records, 25% (¼) were missing city or email, this data should be considered incomplete.
- Accuracy: Similar to (1), but instead of missing/Null entries, the data could be suspect. As example, customer address record shows Miami/TX as city/state. When combined with multitude of other discrepancies, the entire record (Database row) can become suspect.
- Staleness: How old is this data, and when was it last confirmed? For time-series data, Staleness has a different particular meaning than mostly static data. Different Staleness rules should be applied to different data elements.
- Relevance: This should be at #1, but I place it last because it only makes sense after understanding the previous items. While implementing decision systems, keep in mind that all data is not equal. Some data points are much more significant than others, depending on the question one is trying to answer. Relevance is derived from a combination of factors, and we cover Data Relevance more completely in another post.
Let’s face it – many Enterprise Data Assets are a complete mess due to a variety of causes. By employing a number of techniques, we can fix those data problems and convert your data into a well performing Enterprise asset.