Python Data Science Cookbook
Practical solutions across fast data cleaning, processing, and machine learning workflows with pandas, NumPy, and scikit-learn
I wrote this cookbook to save you time troubleshooting and more time discovering insights. These recipes tackle the literal problems you'll face—mismatched keys, shape errors, memory leaks, rate limits—so that each step builds toward a smooth, automated workflow.
About
About the Book
This book's got a bunch of handy recipes for data science pros to get them through the most common challenges they face when using Python tools and libraries. Each recipe shows you exactly how to do something step-by-step. You can load CSVs directly from a URL, flatten nested JSON, query SQL and NoSQL databases, import Excel sheets, or stream large files in memory-safe batches.
Once the data's loaded, you'll find simple ways to spot and fill in missing values, standardize categories that are off, clip outliers, normalize features, get rid of duplicates, and extract the year, month, or weekday from timestamps. You'll learn how to run quick analyses, like generating descriptive statistics, plotting histograms and correlation heatmaps, building pivot tables, creating scatter-matrix plots, and drawing time-series line charts to spot trends. You'll learn how to build polynomial features, compare MinMax, Standard, and Robust scaling, smooth data with rolling averages, apply PCA to reduce dimensions, and encode high-cardinality fields with sparse one-hot encoding using feature engineering recipes.
As for machine learning, you'll learn to put together end-to-end pipelines that handle imputation, scaling, feature selection, and modeling in one object, create custom transformers, automate hyperparameter searches with GridSearchCV, save and load your pipelines, and let SelectKBest pick the top features automatically. You'll learn how to test hypotheses with t-tests and chi-square tests, build linear and Ridge regressions, work with decision trees and random forests, segment countries using clustering, and evaluate models using MSE, classification reports, and ROC curves. And you'll finally get a handle on debugging and integration: fixing pandas merge errors, correcting NumPy broadcasting mismatches, and making sure your plots are consistent.
Key Learnings
- You can load remote CSVs directly into pandas using read_csv, so you don't have to deal with manual downloads and file clutter.
- Use json_normalize to convert nested JSON responses into simple tables, making it a breeze to analyze.
- You can query relational and NoSQL databases directly from Python, and the results will merge seamlessly into Pandas.
- Find and fill in missing values using IGNSA(), forward-fill, and median strategies for all of your data over time.
- You can free up a lot of memory by turning string columns into Pandas' Categorical dtype.
- You can speed up computations with NumPy vectorization and chunked CSV reading to prevent RAM exhaustion.
- You can build feature pipelines using custom transformers, scaling, and automated hyperparameter tuning with GridSearchCV.
- Use regression, tree-based, and clustering algorithms to show linear, nonlinear, and group-specific vaccination patterns.
- Evaluate models using MSE, R², precision, recall, and ROC curves to assess their performance.
- Set up automated data retrieval with scheduled API pulls, cloud storage, Kafka streams, and GraphQL queries.
Table of Content
- Data Ingestion from Multiple Sources
- Preprocessing and Cleaning Complex Datasets
- Performing Quick Exploratory Analysis
- Optimizing Data Structures and Performance
- Feature Engineering and Transformation
- Building Machine Learning Pipelines
- Implementing Statistical and Machine Learning Techniques
- Debugging and Troubleshooting
- Advanced Data Retrieval and Integration
Feedback
Packages
Pick Your Package
All packages include the ebook in the following formats: PDF and EPUB
This Book Only
Minimum price
Suggested price$29.99$29.99
This Book + Extras Downloadable (Learning PyTorch 2.0, Second Edition)
Minimum price
Suggested price$56.99This package consists of 2 Books. The primary purchased book is available in your library and to download the extra ebooks, go to your Leanpub Library, select the book you purchased. Then, you will see a link (or links) to download the Extras included with the book
$49.99
- Learning PyTorch 2.0, Second EditionUtilize PyTorch 2.3 and CUDA 12 to experiment neural networks and deep learning models
This book is also available in the following packages:
This Book + Extras Downloadable (Google JAX Cookbook)
This package consists of 2 Books. The primary purchased book is available in your library and to download the extra ebooks, go to your Leanpub Library, select the book you purchased. Then, you will see a link (or links) to download the Extras included with the book
- Google JAX CookbookPerform machine learning and numerical computing with combined capabilities of TensorFlow and NumPy
- Minimum price
- $49.99
- Suggested price
- $56.99
- Google JAX Cookbook
This Book + Extras Downloadable (Python AI Programming)
This package consists of 2 Books. The primary purchased book is available in your library and to download the extra ebooks, go to your Leanpub Library, select the book you purchased. Then, you will see a link (or links) to download the Extras included with the book
- Python AI ProgrammingNavigating fundamentals of ML, deep learning, NLP, and reinforcement learning in practice
- Minimum price
- $49.99
- Suggested price
- $56.99
- Python AI Programming
This Book + Extras Downloadable (Learning PyTorch 2.0, Second Edition + Google JAX Cookbook)
This package consists of 3 Books. The primary purchased book is available in your library and to download the extra ebooks, go to your Leanpub Library, select the book you purchased. Then, you will see a link (or links) to download the Extras included with the book
- Learning PyTorch 2.0, Second EditionUtilize PyTorch 2.3 and CUDA 12 to experiment neural networks and deep learning models
- Google JAX CookbookPerform machine learning and numerical computing with combined capabilities of TensorFlow and NumPy
- Minimum price
- $69.99
- Suggested price
- $76.99
- Learning PyTorch 2.0, Second Edition
Author
About the Author
GitforGits | Asian Publishing House
We are the engineer’s publisher, the coder’s mentor, and the content alchemist—meticulously turning dense tech into practical gold. With a growing library of 100+ titles, we don’t just develop technical books, rather we build roadmaps for professionals across Python, MySQL, DevOps, Rust, AI, Kotlin, Arduino, Golang and everything around the massive IT ecosystem. Every chapter, every script, every project is a tool in the hands of developers who want to get things done.
Where others summarize, we construct step-by-step learning blueprints, cutting through clutter, banning the fluff, and ensuring every paragraph delivers hands-on value. Our audience isn’t learning from scratch—they’re leveling up with purpose, and we stand by them with code-first content, consistent project workflows, and a zero-redundancy approach.
Get the free Community Edition
Enter your name and email address and click the buttons to the right to get the free Community Edition in PDF or EPUB, or just click this link to read a shorter sample online here
The Leanpub 60 Day 100% Happiness Guarantee
Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.
Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.
You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!
So, there's no reason not to click the Add to Cart button, is there?
See full terms...
Earn $8 on a $10 Purchase, and $16 on a $20 Purchase
We pay 80% royalties on purchases of $7.99 or more, and 80% royalties minus a 50 cent flat fee on purchases between $0.99 and $7.98. You earn $8 on a $10 sale, and $16 on a $20 sale. So, if we sell 5000 non-refunded copies of your book for $20, you'll earn $80,000.
(Yes, some authors have already earned much more than that on Leanpub.)
In fact, authors have earned over $14 million writing, publishing and selling on Leanpub.
Learn more about writing on Leanpub
Free Updates. DRM Free.
If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).
Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.
Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.
Learn more about Leanpub's ebook formats and where to read them
Write and Publish on Leanpub
You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!
Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.
Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.