Advanced Python for Data Science

Python is a rich and powerful language, but many data scientists merely scratch the surface, and often feel uncertain about what lies beneath. This book will go deep into the heart of Python, to truly understand its components, and how we can stitch them together to build better scientific workflows and machine learning systems.

Scott Gorlin

Minimum price

$39.99

$49.99

You pay

$49.99

Author earns

$39.99

PDF

EPUB

WEB

365

Pages

55,818Words

About

About the Book

Today, most people enter the world of Data Science through the buzz and allure of “AI.” We tackle Kaggle challenges, voraciously consume Stack Overflow, and eat, live, and breathe through the Jupyter Notebook. Python, along with its “killer app” of Machine Learning, has done nothing short of revolutionize the way we “do data science,” and the world is a more interesting place because of it!

The Big Cloud providers, and many open source tools, have done wonders to democratize this technology. But, ‘easy access’ to high technology comes with a cost - we can easily go too far, rely too much on the tools we have today, and forget how to build the tools we need to truly transform our individual projects.

Most of the time, your impact as a Data Scientist is limited by your ability to enact your ideas - not by the ideas themselves. You can train a model on ‘clean’ data using Scikit Learn or FastAI, or run an ANOVA, in a notebook. Enacting that idea means getting to the data in the first place. It means knowing how to store it. It means processing your data at scale. It means running your processing script, reliably, every day on fresh data. It means testing that script. It means collaborating on that script with a coworker - or 10 - as the project scales. It means curating a library and building tools to solve the same problem for 5 new projects. It means packaging a model up for distribution - sharing with another data scientist, or deploying it as a service.

It means changing the way you think about problems by adopting new paradigms that accelerate you - and your work - across your organization. It means building an approach to data science within the broader python ecosystem.

This book is about python, and how to be an effective python programmer, as a Data Scientist. We learn the advanced python skills we need to accelerate you, and solve the real, daily problems you face in your DS role.

Share this book

Feedback

Email the Author

Author

About the Author

Scott Gorlin

Scott Gorlin, Ph.D., is an executive and leader of applied science and machine learning, with a particular emphasis on enabling data scientists to enhance - and reproduce - their work through code. A programmer by hobby since elementary school, Dr. Gorlin quickly realized the potential of applying foundational and advanced programming concepts to research and development, and has worked to enhance his and his teams’ coding standards and practices since the beginnings of his professional career.

This aspect of his work evolved into teaching a formal graduate course, Advanced Python for Data Science, which is offered through Harvard Extension School’s degree and certificate programs in machine learning.

Dr. Gorlin received his doctoral degree in 2011 from the Massachusetts Institute of Technology in Computational and Systems Neuroscience, and went on to lead R&D for Choicestream, an ad-tech firm. He currently is Senior Director of Enterprise Science and Trusted AI at Liberty Mutual, and continues to teach Advanced Python at Harvard Extension School in his free time.

Table of Contents

Preface

Introduction

What is this book?
Who is it for?
What will you learn?
The state of the book

An introduction to Advanced Python

What is Advanced Python?
Tech Requirements
Helpful Themes
Readings
Workflows

Continuous science

Debugging
“Bug Report” Rules
Primers
Higher Levels
Testing
Readings

Scientific workflows

What is Python?
Config
Decorators
Bootstraping
Readings

Packages and iteration

Packages
Versioning
Functional Programming

Avoiding the `for` loop

Primitives
Vectorizing
Einstein Summation
Iteration Primitives
Vectorization: A Case Study
Iterators
Readings
Skeletons

Classes, composition, and graphs

Inheritance
Composition

The DAG

Graphical Programs
What does Data Science look like?
The Revelation

Luigi

Project scaffolding
The Task
The Pieces
The Big Picture
Atomicity
Atomicity
Readings

Graphs

Luigi
The Big Picture
Salted Graphs
The Sorry State of Stateful Data
Advanced Luigi
Data

Dask and Parquet

Micro Sciences
Dask - Basics
Rookie Mistakes
Executive Summary
Split, Apply, Combine
Data Containers
Parquet
Dask - Partitioning
Case Study - Fancy Indexing
Dask and Luigi
Readings

Django and SQL

Mutability
Living Data
Django
ORM
Django Code
ORM Breakdowns
The Competition
Readings

API’s and Data

Metaprogramming
DB Design
Atomic Targets
Migrations
The Web
APIs
Reading

More Meta

Api’s and Clients
Factories
Optimization
Readings
Algorithms

Smart & Lazy Coding

Parallel Code
Memory Views
Memoization
Sketching
Readings

Visualization

Data Viz
Declarative Grammars
Javascript and HTML5
Colormaps
Data Shading
Readings

Where We Are

“Python”
Testing
Workflows
Higher Levels
Deployment
Looping
Functional Coding
Composition
Graphical Programs
Data Scaling
The Web
DB’s
API’s
Meta
Optimization
Visualization
Appendix

Changelog

Get the free sample chapters

Click the buttons to get the free sample in PDF or EPUB, or read the sample online here

Download Sample PDF Download Sample EPUB Read Sample Online Read online

The Leanpub 60 Day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.

You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!

So, there's no reason not to click the Add to Cart button, is there?

See full terms...

Earn $8 on a $10 Purchase, and $16 on a $20 Purchase

We pay 80% royalties on purchases of $7.99 or more, and 80% royalties minus a 50 cent flat fee on purchases between $0.99 and $7.98. You earn $8 on a $10 sale, and $16 on a $20 sale. So, if we sell 5000 non-refunded copies of your book for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earned over $14 million writing, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub

Advanced Python for Data Science

You pay

Author earns

...Or Buy With Credits!

About

Share this book

Categories

Feedback

Author

Contents

Preface

Introduction

An introduction to Advanced Python

Continuous science

Scientific workflows

Packages and iteration

Avoiding the `for` loop

Classes, composition, and graphs

The DAG

Luigi

Graphs

Dask and Parquet

Django and SQL

API’s and Data

More Meta

Smart & Lazy Coding

Visualization

Where We Are

Changelog

Get the free sample chapters

The Leanpub 60 Day 100% Happiness Guarantee

Earn $8 on a $10 Purchase, and $16 on a $20 Purchase

Free Updates. DRM Free.

Write and Publish on Leanpub

About

Share this book

Categories

Feedback

Author

Contents

Preface

Introduction

An introduction to Advanced Python

Continuous science

Scientific workflows

Packages and iteration

Avoiding the for loop

Classes, composition, and graphs

The DAG

Luigi

Graphs

Dask and Parquet

Django and SQL

API’s and Data

More Meta

Smart & Lazy Coding

Visualization

Where We Are

Changelog

Get the free sample chapters

The Leanpub 60 Day 100% Happiness Guarantee

Earn $8 on a $10 Purchase, and $16 on a $20 Purchase

Free Updates. DRM Free.

Write and Publish on Leanpub

Avoiding the `for` loop