Python Data Science Handbook- An essential handbook for Python enthusiasts

Aug. 24, 2017, 9:06 p.m. By: Prakarsh Saxena

Python Data Science Handbook

In a world where each of us is surrounded by data and its insights, Data Science does seem a promising field of work and research to many. Although there are many books and courses which help us dive into the area right from the scratch, it is also essential for the adepts to have a separate book containing all their commonly used terminologies and techniques suitable for their level, so that they don’t have to scavenge through the whole book or course right till the end. Also, for those wanting to switch from an R or SAS background to Python, this book – Python Data Science Handbook- will be something to grab a hold on.

About the book

Python has grown to be in popular use amongst many researchers due to its humongous community of developers, its versatility, and also because of its powerful libraries and packages for storing, manipulating, and gaining insights from data. The book gathers its uniqueness because of capturing every important usage of various Python libraries and packages finding immense use in Data Science, namely IPython, NumPy, Pandas, Matplotlib, Scikit-Learn and other similarly related tools.

The book, written by Jake VanderPlas and published by O’Reilly, does not claim to be something which will be used to someone who wants to get into the field from the start. As the name suggests, the Handbook is targeted towards the working professionals and researchers to help them with the common usages of the libraries and methods in Python related to the field of Data Science. The readers will find it an ideal reference, which will help them tackle day- to- day issues: manipulating, transforming, cleaning, visualizing data, or in building statistical or machine learning models for better forecasting. In simple words, this is a must- have a reference for scientific and statistical computing in Python adepts, and also for those who have a command over the subject but in other languages, and want to implement the same in Python.

The book contains 5 chapters, each of which gives an introduction to the standard libraries listed above and gradually going in the depth of the usage with common examples to clear the confusions. The book starts with describing IPython, working efficiently in IPython Shell, with its common errors and debugging techniques. It moves on to introducing NumPy and its elements useful for analytics, elaborating about the usage of its arrays data structure to simplify working with large data. The third chapter talks about data manipulation with Pandas and using Data Frames in Python, and teaches techniques of working with the data type and handling missing data in the same, also focusing on pivot tables, time series data and vectorized string operations with Pandas. Matplotlib forms the fourth chapter covering various techniques to visualize our data for better insights with the help of various types of plots like line, scatter, 3D, contour plots, histograms and also details about including multiple subplots within the same graph, each of which is distinguishable from one other with detailed legends and colours for every plot. The concluding chapter revolves around Machine Learning in Python with the help of Scikit-Learn, eventually going in depth of various Machine Learning Models like Support Vector Machines (SVM), Decision Trees, k-Means Clustering, Principal Component Analysis (PCA) etc. The handbook sums up as an interesting guide for a smooth working in Python for Data Science.

About the Author

Jake VanderPlas is a Senior Data Science Fellow at the University of Washington’s eScience Institute. He researches and teaches in a variety of areas, including Astronomy, Astrostatistics, Machine Learning and Scalable Computation. After an undergraduate degree in Physics from Calvin College, he took a few years off to teach English at a non- profit student center in Japan. He returned to USA after some years and spent time as an outdoor educator which rekindled his passion for Science, following which he completed his Post Doctoral from the University of Washington in 2012 and then joining the eScience Institute in 2014. He also is active on Github with over 4 thousand followers and 160+ repositories to his profile. You can view his Github page here.

Github Link: Python Data Science Handbook