Chapter 1: What is Productive and Efficient Data Science?
Chapter Goal: To introduce the readers with the concept of doing data science tasks efficiently and more productively and illustrating potential pitfalls in their everyday work.
No of pages - 10
Subtopics
. Typical data science pipeline
. Short examples of inefficient programming in data science
. Some pitfalls to avoid
. Efficiency and productivity go hand in hand
. Overview of tools and techniques for a productive data science pipeline
. Skills and attitude for productive data science
Chapter 2: Better Programming Principles for Efficient Data Science
Chapter Goal: Help readers grasp the idea of efficient programming techniques and how they can be applied to a typical data science task flow.
No of pages - 15
Subtopics
. The concept of time and space complexity, Big-O notation
. Why complexity matters for data science
. Examples of inefficient programming in data science tasks
. What you can do instead
. Measuring code execution timing
Chapter 3: How to Use Python Data Science Packages more Productively
Chapter Goal: Illustrate handful of tricks and techniques to use the most well-known Python data science packages - Numpy, Pandas, Matplotlib, Seaborn, Scipy - more productively.
No of pages - 20
Subtopics
. Why Numpy is faster than regular Python code and how much
. Using Numpy efficiently
. Using Pandas productively
. Matplotlib and Seaborn code for and productive EDA
. Using SciPy for common data science tasks
Chapter 4: Writing Machine Learning Code More Productively
Chapter Goal: Teach the reader about writing efficient and modular machine learning code for productive data science pipeline with hands-on examples using Scikit-learn.
No of pages - 15
Subtopics
. Why modular code for machine learning and deep learning
. Scikit-learn tools and techniques
. Systematic evaluation of Scikit-learn ML algorithms in automated fashion
. Decision boundary visualization with custom function
. Hyperparameter search in Scikit-learn
Chapter 5: Modular and Productive Deep Learning Code
Chapter Goal: Teach the reader about mixing modular programming style in deep learning code with hands-on examples using Keras/TensorFlow.
No of pages - 25
Subtopics
. Why modular code and object-oriented style for deep learning
. Wrapper functions with Keras for faster deep learning experimentations
. A single function to streamline image classification task flow
. Visualize activation functions of neural networks
. Custom callback functions in Keras and their utilities
. Using Scikit-learn wrapper for hyperparameter search in Keras
Chapter 6: Build Your Own Machine Learning Estimator/Package
Chapter Goal: Illustrate how to build a new Python machine learning module/package from scratch.
No of pages - 15
Subtopics
. Why write your own ML package/module?
. A simple example vs. a data scientist's example
. A good, old Linear Regression estimator - with a twist
. How do you start building?
. Add utility functions
. Do more with object-oriented approach
Chapter 7: Some Cool Utility Packages
Chapter Goal: Introduce the readers to the idea of executing data science tasks efficiently by going beyond traditional stack and utilizing exciting, new libraries.
No of pages - 20
Subtopics
. The great Python data science ecosystem
. Build pipeline using "pdpipe"
. Check data integrity and expectations with "great_expectations"
. Speed up Numpy and Pandas using Numexpr
. Discover best fitted distributions using "distfit"
Chapter 8: Testing the Machine Learning Code
Chapter Goal: Teach the readers some basic principles of testing Python code and how to apply them to a specific case of machine learning module.
No of pages - 20
Subtopics
. Why testing boosts productivity
. Basic principles and variations of testing
. Data science or machine learning testing is somewhat different
. A PyTest module for a ML module
Chapter 9: Memory and Timing Profiling
Chapter Goal: Illustrate how to measure and profile typical data science and machine learning code/ module.
No of pages - 15
Subtopics
. Why profiling is important
. Well-known profilers out there
. cProfile
. Memory_profile
. Scalene
Chapter 10: Scalable Data Science
Chapter Goal: Demonstrate the importance of scalability in data science tasks with hands-on examples.
No of pages - 15
Subtopics
. Data science pipeline needs to be easily scalable
. Common problems - out-of-memory and single-threading
. What options are out there?
. Hands-on example with Vaex
. Hands-on example with Modin
Chapter 11: Parallelized Data Science
Chapter Goal: Demonstrate the importance of parallel processing in data science tasks with hands-on examples.
No of pages - 15
Subtopics
. Data science pipeline should take advantage of parallel computing
. Two great options - Ray and Dask
. Hands-on example with Dask cluster
. Hands-on example with "Ray serve" and actors
Chapter 12: GPU-Based Data Science for High Productivity
Chapter Goal: Illustrate how to harness the power of GPU-based hardware for common data science tasks and classical machine learning.
No of pages - 20
Subtopics
. GPU-powered data science (not deep learning)
. The RAPIDS ecosystem
. CuPy vs. NumPy
. CuDF vs. Pandas
. CuML vs. Scikit-learn
Chapter 13: Other Useful Skills to Master
Chapter Goal: Give an overview of other related skills to master for executing data science tasks more efficiently.
No of pages - 25
Subtopics
. Key things to learn
. Understanding the basics of web technologies
. Going from local to cloud
. Simple web app to showcase a data science project
. GUI programming for a quick demo
. Being comfortable with container technologies
. Putting it all together
Chapter 14: Wrapping It Up
Chapter Goal: Show a summary of all the things discussed and some future projections.
No of pages - 10
Subtopics
. Chapter-wise summary
. What were not discussed in this book
. Future projections
. General advice for upcoming data scientists
Show more