
Awesome data quality resources
data-qualityawesomeawesome-resourcesdata
Awesome Data Quality Resources
A curated list of resources for testing, monitoring, and improving data quality across various data environments.
Table of Contents
Frameworks and Libraries
Open Source
- elementary - Data monitoring and observability tailored to dbt. GitHub
 - mobydq - Tool for data engineering teams to run & automate data quality checks on their data pipeline. GitHub
 - ydata-quality - Python library for assessing data quality throughout stages of the data pipeline development. GitHub
 - great-expectations - Tool for data testing, documentation, and profiling. GitHub
 - deequ - Library by Amazon for defining unit tests for data with a focus on large datasets. Based on Apache Spark. GitHub
 - soda - Enables data testing through extended SQL queries. GitHub
 - dqm - Another data quality monitoring tool implemented using Spark. GitHub
 - owl-sanitizer - Lightweight data validation framework based on Spark. GitHub
 - griffin - Data Quality solution for distributed data systems at any scale in both streaming and batch data context. GitHub
 
Commercial
- Bigeye - Continuous data quality monitoring and anomaly detection. Website
 - Soda - Data testing and monitoring platform. Website
 - Databand - Data pipeline observability and monitoring. Website
 - Monte Carlo - Data observability platform. Website
 - Sifflet - Data quality monitoring and observability. Website
 - Validio - Real-time data quality monitoring. Website
 - Lightup - Data quality checks and monitoring. Website
 - Lantern - Data quality and observability. Website
 - Metaplane - Data quality monitoring for data teams. Website
 - Datafold - Proactive data quality platform. Website
 - Acceldata - Data observability and quality management. Website
 - Anomalo - Automated data quality monitoring. Website
 - Marquez - Metadata service for collecting, aggregating, and visualizing a data ecosystem's metadata. GitHub
 
Books and Methodologies
- Complete Data Quality Methodology (CDQM) - By Carlo Batini/Monica Scannapieco. Book
 - Data Quality Assessment Framework - By Arkady Maydanchik. Book
 - CIHI Information Quality Framework - From the Canadian Institute for Health Information. Resource
 - Enterprise Knowledge Management - By David Loshin. Book
 - MIKE2.0 - Open Source initiative for Enterprise Information Management. Website
 - Ten Steps to Quality Data and Trusted Information - By Danette McGilvray. Book
 - Total Information Quality Management (TIQM) - By Larry English. Book
 
Tools
Open Source Tools
- Deequ - For defining unit tests for data. GitHub
 - dbt Core - Data transformation tool with built-in testing capabilities. GitHub
 - MobyDQ - Automates data quality checks. GitHub
 - Great Expectations - Data validation and profiling. GitHub
 - Soda Core - Python library for data reliability. GitHub
 - Cucumber - Behavior-driven development tool for data quality testing. GitHub
 
Commercial Tools
- Ataccama - Comprehensive data quality and catalog suite. Website
 - Informatica - Data quality and observability platform. Website
 - Talend - Data quality solutions with real-time monitoring. Website
 - IBM InfoSphere QualityStage - Data quality and governance. Website
 - Precisely Trillium Quality - Enterprise data quality tool. Website
 - Adverity - Marketing data integration with data quality management. Website
 - Oracle Enterprise Data Quality - Robust data profiling and cleansing. Website
 
Articles and Guides
- A Guide to Data Quality Tools: The 4 Leading Solutions - Zendata. Article
 - Top Data Quality Management Tools to Choose in 2024 - Mad Devs. Article
 - Data Quality Management: Tools, Pillars, and Best Practices - lakeFS. Article
 - Best Data Quality Tools for 2024: Top 10 Choices - Adverity. Article
 - The 8 Best Data Quality Management Tools and Software for 2025 - Solutions Review. Article
 - 9 Best Tools for Data Quality in 2024 - Datafold. Article
 - Data Quality Management Best Practices: A Short Guide - Zendata. Article