Learn Data with Bash Shell
Explore real-world data at the Linux command line
Can you build a script to count the number of sequences in a Big data consisting hundred thousands of nucleotide sequences in 30 seconds? You may wonder to know, this wouldn't take more a than a few words in Bash "grep -c "^>" data.fa" ! This book will help you to become an expert in bash and learn to explore real-world large data sets.
About
About the Book
Bash may not the best way to handle all kinds of data! But, there often comes a time when you are provided with a pure Bash environment, such as what you get in the common Linux based super computers and you just want an early result or view of the data before you drive into the real programming, using Python, R and SQL, SPSS, and so on.
Expertise in these data-intensive languages also comes at the price of spending a lot of time on them. In contrast, bash scripting is simple, easy to learn and perfect for mining textual data! Particularly if you deal with genomics, microarrays, social networks, life sciences, and so on. It can help you to quickly sort, search, match, replace, clean and optimise various aspect of your data, and you wouldn’t need to go through any tough learning curves.
There are several examples of practical data mining that will have a flow of importing specific data resources into flat text-type files. Bash can run different programs (grep, sort, sed, and so on) on those files, clean, optimise and extract preliminary views (cut, csvlook, view, cat, head, etc.) of the data. There is one part of data mining, which involves unstructured data and then transforming it into a structured one (awk, shell). A scripting language like Bash can be very useful for doing the transformation. We strongly believe, learning and using Bash shell scripting should be the first step if you want to say, Hello Big Data!
This book starts with some practical bash-based flat file data mining projects involving:
- University ranking data [Previews: Part I, Part II, Part III] Sample video lectures.
- Facebook data [Previews: Part I, Part II]
- Crime Data
- Shakespeare-era plays and poems data
If you haven’t used Bash before, feel free to skip the projects and get to the tutorials part. Read the tutorials and then come back to the projects again. The tutorial section will introduce with bash scripting, regular expressions, AWK, sed, grep and so on.
Finally, it gives you a concise beginner friendly guide to the big data landscape including an overview of the critical Big Data tools such as HDFS, MapReduce, YARN, Flume, Hive and more. The book finishes with a near-complete list of references to all the relevant command line and Big data tools.
Get the interactive version!
Packages
Pick Your Package
All packages include the ebook in the following formats: PDF, EPUB, and Web
The Book
Minimum price
Suggested price$15.00The Book only!
$9.99
The Book + Data sets + Code Samples + Video Lectures
Minimum price
Suggested price$25.00The Book + Data sets + Code Samples + Video Lectures (animated)!
$15.00
- Data setsProject data sets: a) University ranking data, b) Facebook data c)AU Crime Data d) Shakespeare-era plays and poems data
- Code samplesCode samples for the Learn Data with Bash Shell projects
- Video TutorialsInstructional videos and whiteboard animations covering every project in this book
Author
About the Author
Scientific Programmer
Scientific programming is a rapidly growing multidisciplinary field that uses advanced computing capabilities to understand and solve complex problems.
The Scientific programming school team helps you to learn the use of scientific programming languages, such as CUDA, Julia, OpenMP, MPI, C++, Matlab, Octave, Bash, Python Sed and AWK including RegEx in processing scientific and real-world data. The team is formed by PhD educated instructors in the areas of Computational Sciences.
The team deploys interactive courses at Scientific Programming School (now Learnitive.com) which is an interactive and advanced e-learning platform for learning scientific coding giving you the opportunity to run scientific codes/ OS commands as you learn with playgrounds and Interactive shells inside your browser.
Contents
Table of Contents
About
Introduction
- What is Bash ?
- When Bash is useful?
- Bash in data mining
- Who is this book for?
- How to read this book?
- Part 1: Projects
Project 1: The ‘US News’ Uni Ranks
- Learning objectives
- Data download
- Dataset Preview
- Data Analysis
- Find the colleges
- Finding the percent of colleges in the ranklist
- Listing the Institutes from a given state
- Finding the number of Institutes from each state
- Finding a correlation between ranks and tuition fees?
- Chapter Summary
Project 2: Facebook Data Mining
- Data download
- Learning objectives
- Dataset Preview
- How many colums and rows?
- How the data looks like?
- Data Analysis
- How many status, in each status type?
- Find the most popular status entry
- Chapter Summary
Project 3: Best Australian Cities - Least Crimes
- Learning objectives
- Data download
- Data Preview
- Finding the number of rows and columns
- The hard way
- The easy way
- Data Analysis
- Finding the top most crime in the whole country
- Finding the top most crime per city
- Finding the best city in Australia!
- Chapter Summary
Project 4: Mining Shakespear-era Plays and Poems
- Learning objectives
- Data download
- Data Preview
- Analysis
- How many plays/poems?
- How many plays/poems by each author?
- What are the most frequent words?
- Chapter Summary
- Part 2: Tutorials
Hello Bash!
- Important Information
whichbash?Hello world!bash- Bash variables
- Bash functions
- Bash meta characters
- Bash quotation basics
- Read and store user input
- Bash redirections
- Bash
if-else(conditional statements) - Bash
casestatement - Bash
loopstatements - Bash arithmatic
- Bash arrays
Hello ! Regular Expressions
- Important Information
- REGEX Types
- Basic Regular Expressions
- Metachar
. - Metachar
[ ] - Metachar
[^ ] - Metachar
^ - Metachar
$ - Metachar
( ) - Metachar
* - Metachar
{m,n} - Extended Regular Expressions
- Metachar
? - Metachar
+ - Metachar
| - REGEX Character Classes
- REGEX Look Arounds
- REGEX Atomic Groups
(?>) - How to Use REGEX in Bash?
Hello! AWK
- Important Information
- AWK Built-in Variables
- AWK statements
- AWK built-in functions
- AWK Examples
- Example 1. AWK
printfunction - Example 2. AWK
printspecific field - Example 3. AWK’s
BEGINandENDActions - Example 4. AWK fields variable (
$1,$2and so on) - Example 5. AWK built-in variables
- Example 6. AWK fields comparison
> - Self-contained AWK scripts
Hello! SED, GREP and Find
- SED - Stream Editor
- Important Information
- SED substitution
- Some important SED options
- SED substitute and regular expressions
- SED delete
- SED print
- SED grouping
- GREP
- GREP and regular expressions
- Find command
find - Part 3: Hello Big Data!
- Big Data Terminologies
- HDFS
- Map Reduce
- YARN
- Flume
- SOOOP
- Hive
- Pig
- Spark
- HBase
- Big Data file formats
Conclusion
- References
- Bash
- REGEX
- AWK
- SED
- GREP
- Big data
- A companion book
The Leanpub 60 Day 100% Happiness Guarantee
Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.
Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.
You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!
So, there's no reason not to click the Add to Cart button, is there?
See full terms...
Earn $8 on a $10 Purchase, and $16 on a $20 Purchase
We pay 80% royalties on purchases of $7.99 or more, and 80% royalties minus a 50 cent flat fee on purchases between $0.99 and $7.98. You earn $8 on a $10 sale, and $16 on a $20 sale. So, if we sell 5000 non-refunded copies of your book for $20, you'll earn $80,000.
(Yes, some authors have already earned much more than that on Leanpub.)
In fact, authors have earned over $14 million writing, publishing and selling on Leanpub.
Learn more about writing on Leanpub
Free Updates. DRM Free.
If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).
Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.
Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.
Learn more about Leanpub's ebook formats and where to read them
Write and Publish on Leanpub
You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!
Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.
Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.