Gain a practical introduction to DataOps, a new discipline for delivering data science at scale inspired by practices at companies such as Facebook, Uber, LinkedIn, Twitter, and eBay. Organizations need more than the latest AI algorithms, hottest tools, and best people to turn data into insight-driven action and useful analytical data products. Processes and thinking employed to manage and use data in the 20th century are a bottleneck for working effectively with the variety of data and advanced analytical use cases that organizations have today. This book provides the approach and methods to ensure continuous rapid use of data to create analytical data products and steer decision making.
Practical DataOps shows you how to optimize the data supply chain from diverse raw data sources to the final data product, whether the goal is a machine learning model or other data-orientated output. The book provides an approach to eliminate wasted effort and improve collaboration between data producers, data consumers, and the rest of the organization through the adoption of lean thinking and agile software development principles.
This book helps you to improve the speed and accuracy of analytical application development through data management and DevOps practices that securely expand data access, and rapidly increase the number of reproducible data products through automation, testing, and integration. The book also shows how to collect feedback and monitor performance to manage and continuously improve your processes and output.
- Develop a data strategy for your organization to help it reach its long-term goals
- Recognize and eliminate barriers to delivering data to users at scale
- Work on the right things for the right stakeholders through agile collaboration
- Create trust in data via rigorous testing and effective data management
- Build a culture of learning and continuous improvement through monitoring deployments and measuring outcomes
- Create cross-functional self-organizing teams focused on goals not reporting lines
- Build robust, trustworthy, data pipelines in support of AI, machine learning, and other analytical data products
This book is a very detailed and practical guide to becoming proficient at data operations – data production, data management, testing, integration, data science and reporting. The book accomplishes this in three ways:
1) The book provides an overarching framework for operationalizing data in an enterprise.
2) It provides a logical progression of the key concepts, processes and responsibilities that championing data operationalization entails.
3) It delves deeply into all the key components necessary for data operationalization including:
a) Data management – detailed discussion about all the key data management concepts including data integration, ETL, data masking, data workflow, data governance, data curation, and how all these concepts tie into an enterprise-wide data management framework.
b) Testing – Detailed discussion about testing methods, strategies, responsibilities, how one should approach testing, and how one can achieve testing goals.
c) Automation – Automation is the engine that drives rapid and reliable data operationalization. This book provides strategies for how to achieve automation.
d) Data Science – Discusses briefly key concepts in modern data science, but focuses primarily on the practices necessary to operationalize data science, that includes the need for integration testing and a stable pipeline for deploying machine learning models.
e) Reporting – What one should know about reporting, what good reporting vs bad reporting is, and how organizations should approach reporting.
f) Deployment – What matters, how to get the most out of your data, and what is needed to be successful. The book provides a very useful checklist to get started with Operationalizing data.
The book provides a comprehensive and pragmatic approach to operationalizing data. Chapters are organized logically, one can easily understand the important concepts and practices, and then later in the book each chapter builds upon its predecessors such that by the end of the book one has a good grasp of the entire process.
Sections such as Why Operationalization Matters are particularly helpful to one’s understanding of the rationale behind data operationalization.
I highly recommended this book to anyone who is interested in efficiently and rapidly gaining experience and knowledge in the important world of data operations.
Admittedly, the book is very data-science heavy and there isn’t much discussion about statistics, machine learning methods, and SQL, this book is a must-read for anyone interested in data management, data integration, data science, testing, and deployment methods in the highly important field of data operations.