Name: Productive and Efficient Data Science with Python: With Modularizing, Memory Profiles, and Parallel/Gpu Processing
Brand: Apress
SKU: 319158285
Price: 79.95 SGD
Availability: InStock

Productive and Efficient Data Science with Python

With Modularizing, Memory Profiles, and Parallel/Gpu Processing

By Sarkar, Tirthajyoti

Rating

Format

Paperback, 383 pages

Published

United States, 1 July 2022

Chapter 1: What is Productive and Efficient Data Science?

Chapter Goal: To introduce the readers with the concept of doing data science tasks efficiently and more productively and illustrating potential pitfalls in their everyday work.

No of pages - 10

Subtopics

. Typical data science pipeline

. Short examples of inefficient programming in data science

. Some pitfalls to avoid

. Efficiency and productivity go hand in hand

. Overview of tools and techniques for a productive data science pipeline

. Skills and attitude for productive data science

Chapter 2: Better Programming Principles for Efficient Data Science

Chapter Goal: Help readers grasp the idea of efficient programming techniques and how they can be applied to a typical data science task flow.

No of pages - 15

Subtopics

. The concept of time and space complexity, Big-O notation

. Why complexity matters for data science

. Examples of inefficient programming in data science tasks

. What you can do instead

. Measuring code execution timing

Chapter 3: How to Use Python Data Science Packages more Productively

Chapter Goal: Illustrate handful of tricks and techniques to use the most well-known Python data science packages - Numpy, Pandas, Matplotlib, Seaborn, Scipy - more productively.

No of pages - 20

Subtopics

. Why Numpy is faster than regular Python code and how much

. Using Numpy efficiently

. Using Pandas productively

. Matplotlib and Seaborn code for and productive EDA

. Using SciPy for common data science tasks

Chapter 4: Writing Machine Learning Code More Productively

Chapter Goal: Teach the reader about writing efficient and modular machine learning code for productive data science pipeline with hands-on examples using Scikit-learn.

No of pages - 15

Subtopics

. Why modular code for machine learning and deep learning

. Scikit-learn tools and techniques

. Systematic evaluation of Scikit-learn ML algorithms in automated fashion

. Decision boundary visualization with custom function

. Hyperparameter search in Scikit-learn

Chapter 5: Modular and Productive Deep Learning Code

Chapter Goal: Teach the reader about mixing modular programming style in deep learning code with hands-on examples using Keras/TensorFlow.

No of pages - 25

Subtopics

. Why modular code and object-oriented style for deep learning

. Wrapper functions with Keras for faster deep learning experimentations

. A single function to streamline image classification task flow

. Visualize activation functions of neural networks

. Custom callback functions in Keras and their utilities

. Using Scikit-learn wrapper for hyperparameter search in Keras

Chapter 6: Build Your Own Machine Learning Estimator/Package

Chapter Goal: Illustrate how to build a new Python machine learning module/package from scratch.

No of pages - 15

Subtopics

. Why write your own ML package/module?

. A simple example vs. a data scientist's example

. A good, old Linear Regression estimator - with a twist

. How do you start building?

. Add utility functions

. Do more with object-oriented approach

Chapter 7: Some Cool Utility Packages

Chapter Goal: Introduce the readers to the idea of executing data science tasks efficiently by going beyond traditional stack and utilizing exciting, new libraries.

No of pages - 20

Subtopics

. The great Python data science ecosystem

. Build pipeline using "pdpipe"

. Check data integrity and expectations with "great_expectations"

. Speed up Numpy and Pandas using Numexpr

. Discover best fitted distributions using "distfit"

Chapter 8: Testing the Machine Learning Code

Chapter Goal: Teach the readers some basic principles of testing Python code and how to apply them to a specific case of machine learning module.

No of pages - 20

Subtopics

. Why testing boosts productivity

. Basic principles and variations of testing

. Data science or machine learning testing is somewhat different

. A PyTest module for a ML module

Chapter 9: Memory and Timing Profiling

Chapter Goal: Illustrate how to measure and profile typical data science and machine learning code/ module.

No of pages - 15

Subtopics

. Why profiling is important

. Well-known profilers out there

. cProfile

. Memory_profile

. Scalene

Chapter 10: Scalable Data Science

Chapter Goal: Demonstrate the importance of scalability in data science tasks with hands-on examples.

No of pages - 15

Subtopics

. Data science pipeline needs to be easily scalable

. Common problems - out-of-memory and single-threading

. What options are out there?

. Hands-on example with Vaex

. Hands-on example with Modin

Chapter 11: Parallelized Data Science

Chapter Goal: Demonstrate the importance of parallel processing in data science tasks with hands-on examples.

No of pages - 15

Subtopics

. Data science pipeline should take advantage of parallel computing

. Two great options - Ray and Dask

. Hands-on example with Dask cluster

. Hands-on example with "Ray serve" and actors

Chapter 12: GPU-Based Data Science for High Productivity

Chapter Goal: Illustrate how to harness the power of GPU-based hardware for common data science tasks and classical machine learning.

No of pages - 20

Subtopics

. GPU-powered data science (not deep learning)

. The RAPIDS ecosystem

. CuPy vs. NumPy

. CuDF vs. Pandas

. CuML vs. Scikit-learn

Chapter 13: Other Useful Skills to Master

Chapter Goal: Give an overview of other related skills to master for executing data science tasks more efficiently.

No of pages - 25

Subtopics

. Key things to learn

. Understanding the basics of web technologies

. Going from local to cloud

. Simple web app to showcase a data science project

. GUI programming for a quick demo

. Being comfortable with container technologies

. Putting it all together

Chapter 14: Wrapping It Up

Chapter Goal: Show a summary of all the things discussed and some future projections.

No of pages - 10

Subtopics

. Chapter-wise summary

. What were not discussed in this book

. Future projections

. General advice for upcoming data scientists

Our Price

$79.95

Elsewhere

$86.78

Save $6.83 (8%)

Ships from USA Estimated delivery date: 30th Apr - 8th May from USA

Free Shipping Worldwide

Buy together with Data Wrangling with Python at a great price!

Buy Together

$156.33

Product Description

Chapter 1: What is Productive and Efficient Data Science?

Chapter Goal: To introduce the readers with the concept of doing data science tasks efficiently and more productively and illustrating potential pitfalls in their everyday work.

No of pages - 10

Subtopics

. Typical data science pipeline

. Short examples of inefficient programming in data science

. Some pitfalls to avoid

. Efficiency and productivity go hand in hand

. Overview of tools and techniques for a productive data science pipeline

. Skills and attitude for productive data science

Chapter 2: Better Programming Principles for Efficient Data Science

Chapter Goal: Help readers grasp the idea of efficient programming techniques and how they can be applied to a typical data science task flow.

No of pages - 15

Subtopics

. The concept of time and space complexity, Big-O notation

. Why complexity matters for data science

. Examples of inefficient programming in data science tasks

. What you can do instead

. Measuring code execution timing

Chapter 3: How to Use Python Data Science Packages more Productively

Chapter Goal: Illustrate handful of tricks and techniques to use the most well-known Python data science packages - Numpy, Pandas, Matplotlib, Seaborn, Scipy - more productively.

No of pages - 20

Subtopics

. Why Numpy is faster than regular Python code and how much

. Using Numpy efficiently

. Using Pandas productively

. Matplotlib and Seaborn code for and productive EDA

. Using SciPy for common data science tasks

Chapter 4: Writing Machine Learning Code More Productively

Chapter Goal: Teach the reader about writing efficient and modular machine learning code for productive data science pipeline with hands-on examples using Scikit-learn.

No of pages - 15

Subtopics

. Why modular code for machine learning and deep learning

. Scikit-learn tools and techniques

. Systematic evaluation of Scikit-learn ML algorithms in automated fashion

. Decision boundary visualization with custom function

. Hyperparameter search in Scikit-learn

Chapter 5: Modular and Productive Deep Learning Code

Chapter Goal: Teach the reader about mixing modular programming style in deep learning code with hands-on examples using Keras/TensorFlow.

No of pages - 25

Subtopics

. Why modular code and object-oriented style for deep learning

. Wrapper functions with Keras for faster deep learning experimentations

. A single function to streamline image classification task flow

. Visualize activation functions of neural networks

. Custom callback functions in Keras and their utilities

. Using Scikit-learn wrapper for hyperparameter search in Keras

Chapter 6: Build Your Own Machine Learning Estimator/Package

Chapter Goal: Illustrate how to build a new Python machine learning module/package from scratch.

No of pages - 15

Subtopics

. Why write your own ML package/module?

. A simple example vs. a data scientist's example

. A good, old Linear Regression estimator - with a twist

. How do you start building?

. Add utility functions

. Do more with object-oriented approach

Chapter 7: Some Cool Utility Packages

Chapter Goal: Introduce the readers to the idea of executing data science tasks efficiently by going beyond traditional stack and utilizing exciting, new libraries.

No of pages - 20

Subtopics

. The great Python data science ecosystem

. Build pipeline using "pdpipe"

. Check data integrity and expectations with "great_expectations"

. Speed up Numpy and Pandas using Numexpr

. Discover best fitted distributions using "distfit"

Chapter 8: Testing the Machine Learning Code

Chapter Goal: Teach the readers some basic principles of testing Python code and how to apply them to a specific case of machine learning module.

No of pages - 20

Subtopics

. Why testing boosts productivity

. Basic principles and variations of testing

. Data science or machine learning testing is somewhat different

. A PyTest module for a ML module

Chapter 9: Memory and Timing Profiling

Chapter Goal: Illustrate how to measure and profile typical data science and machine learning code/ module.

No of pages - 15

Subtopics

. Why profiling is important

. Well-known profilers out there

. cProfile

. Memory_profile

. Scalene

Chapter 10: Scalable Data Science

Chapter Goal: Demonstrate the importance of scalability in data science tasks with hands-on examples.

No of pages - 15

Subtopics

. Data science pipeline needs to be easily scalable

. Common problems - out-of-memory and single-threading

. What options are out there?

. Hands-on example with Vaex

. Hands-on example with Modin

Chapter 11: Parallelized Data Science

Chapter Goal: Demonstrate the importance of parallel processing in data science tasks with hands-on examples.

No of pages - 15

Subtopics

. Data science pipeline should take advantage of parallel computing

. Two great options - Ray and Dask

. Hands-on example with Dask cluster

. Hands-on example with "Ray serve" and actors

Chapter 12: GPU-Based Data Science for High Productivity

Chapter Goal: Illustrate how to harness the power of GPU-based hardware for common data science tasks and classical machine learning.

No of pages - 20

Subtopics

. GPU-powered data science (not deep learning)

. The RAPIDS ecosystem

. CuPy vs. NumPy

. CuDF vs. Pandas

. CuML vs. Scikit-learn

Chapter 13: Other Useful Skills to Master

Chapter Goal: Give an overview of other related skills to master for executing data science tasks more efficiently.

No of pages - 25

Subtopics

. Key things to learn

. Understanding the basics of web technologies

. Going from local to cloud

. Simple web app to showcase a data science project

. GUI programming for a quick demo

. Being comfortable with container technologies

. Putting it all together

Chapter 14: Wrapping It Up

Chapter Goal: Show a summary of all the things discussed and some future projections.

No of pages - 10

Subtopics

. Chapter-wise summary

. What were not discussed in this book

. Future projections

. General advice for upcoming data scientists

EAN

9781484281208

ISBN

1484281209

Writer

Sarkar, Tirthajyoti

Publisher

Apress

Other Information

Illustrated

Dimensions

25.4 x 17.8 x 2.1 centimeters (0.70 kg)

Chapter 1: What is Productive and Efficient Data Science.- Chapter 2: Better Programming Principles for Efficient Data Science.- Chapter 3: How to Use Python Data Science Packages more Productively.- Chapter 4: Writing Machine Learning Code More Productively.- Chapter 5: Modular and Productive Deep Learning Code.- Chapter 6: Build Your Own Machine Learning Estimator/Package.- Chapter 7: Some Cool Utility Packages.- Chapter 8: Testing the Machine Learning Code.- Chapter 9: Memory and Timing Profiling.- Chapter 10: Scalable Data Science.- Chapter 11: Parallelized Data Science.- Chapter 12: GPU-Based Data Science for High Productivity.- Chapter 13: Other Useful Skills to Master.- Chapter 14: Wrapping It Up.

About the Author

Dr. Tirthajyoti Sarkar lives in the San Francisco Bay area works as a Data Science and Solutions Engineering Manager at Adapdix Corp., where he architects Artificial intelligence and Machine learning solutions for edge-computing based systems powering the Industry 4.0 and Smart manufacturing revolution across a wide range of industries. Before that, he spent more than a decade developing best-in-class semiconductor technologies for power electronics.
He has published data science books, and regularly contributes highly cited AI/ML-related articles on top platforms such as KDNuggets and Towards Data Science. Tirthajyoti has developed multiple open-source software packages in the field of statistical modeling and data analytics. He has 5 US patents and more than thirty technical publications in international journals and conferences.
He conducts regular workshops and participates in expert panels on various AI/ML topics and contributes tothe broader data science community in numerous ways. Tirthajyoti holds a Ph.D. from the University of Illinois and a B.Tech degree from the Indian Institute of Technology, Kharagpur.

Table of Contents

About the Author