Tag Archives: learn data science

How To Be A Data Scientist: The Comprehensive Guide

BY Roger Huang

What is Data Science?

Data scientist roles are often one of the most highly-paid and highly-rewarding jobs out there. Glassdoor has cited data scientist at the #1 position for most-satisfying job in the United States. With the explosive growth of unstructured data, there has never been a greater need for data scientists.

This has prompted a wave of questioning about how to be a data scientist, with upwards of 600 people a month searching for that on Google.

IBM predicts demand for data scientists will increase 28% by 2020. Data science roles are among LinkedIn’s fastest and most growing emerging fields with about 650% growth since 2012.

Data science combines statistical knowledge, programming chops and domain expertise/communication skills. You’ll work on dealing with large amounts of data and get as much insight at scale as you can.

Job Prerequisites

To become a data scientist, you have to have a solid understanding of statistics, mathematics and the theory behind different algorithms.

You also have to have enough programming chops, usually in a language such as Python or R to iterate with data science models.

You also have to be able to communicate your findings to top executives. You need to have enough domain expertise to understand your data and the implications of it.

Typically, most roles will need advanced degrees and programming experience. STEM degrees are preferred. However, some companies will hire undergraduates straight from school — and advanced degrees, while preferred, are not a hard prerequisite. You can do data science without a PhD or even a Master’s degree.

Data Science Salary

Based on a Kaggle survey, data scientists and the adjacent field of machine learning engineers earn the highest median salary ($120,000 USD) in the United States of America. Australia closely follows at about $110,000 USD. Other countries fall swiftly down the median, with data scientists earning close to $15,000 USD in both Russia and India.

While it’s clear that you can earn a lot being a data scientist, it’s also true that there are nuances.

The division in the United States makes this clear. States like California and New York have the highest volume of data science jobs. California data scientists average about $140,000 USD in yearly salary. Washington and New York State follow up in the $115,000 USD to $120,000 USD range. New Jersey, Maryland, and North Carolina are around there as well.

California is home to Silicon Valley and the growing startups in San Francisco. Washington state headquarters both Microsoft and Amazon. New York state and adjacent states like New Jersey host large vibrant startup ecosystems including Silicon Alley. While all these figures need to be adjusted for cost of living (different states like Kansas come first due to their low cost of living in another analysis), they show a key tenet of raw data science salaries: to earn as much as possible, you’ll have to go to where data is most valuable.

Factors That Increase Your Data Science Salary

I wrote this handy guide from Springboard on factors that increase your data science salary after doing some research. The most important ones are the data science tools you have experience with, the industry you work in, the location you choose to work in (as discussed above), the data science role you choose, your experience and degrees, and the individual negotiation for each salary.

Understanding big data tools like Spark and data visualization tools like D3.js, a powerful and advanced custom library for strong visualization might increase your yearly salary by between $8,000 USD and $15,000 USD.

It’s not just data science in general that drives your salary, it’s also the individual components you’re familiar with. Premiums are paid for data scientists who know how to handle large amounts of data in a distributed fashion, and those who can work with powerful data visualization libraries.

If you have up to 15 data science tools mastered, it can increase your salary about $30,000 USD.

You’ll also want to work in an industry that has access to a lot of valuable data. This tends to be software or social media companies who pay the highest for data scientists (think Facebook or Google).

You’ll want to make sure you’re working as a data scientist or data engineer, not a data analyst. Most intro-level roles in the data space are data analyst roles. It will affect your future salary if you stay in data analyst roles or only apply to them.

As discussed, your location is key as well. If you want the absolutely highest raw salary, you’ll have to move to the United States, and you’re likely going to be working in one of the tech hubs there (either San Francisco/Silicon Valley, New York City, or the DC area). However, you should note that the amount of salary you can gain on location, while high, may not be as high as other factors that don’t need you to move.

Finally, your level of experience can make a dramatic difference. Having ten or more years of experience can add around $30,000 USD to your yearly salary as a data scientist. And while degrees might not be a hard prerequisite, those with advanced degrees do tend to earn more as data scientists.

Data Science Curriculum/Checklist

First, you’ll want to start with enough programming knowledge so you can play around with the different concepts and libraries. In practice, a lot of the statistics and mathematics is abstracted away by different programming libraries. It’s best to learn some of the basics of statistics and programming at the same time. If you had to focus on one area, start with the programming practice.

Most machine learning and data science libraries (including Pandas, Numpy and scikit-learn, the mainstays of data science) are compiled in Python. You’ll want to start there, and work with Anaconda so you can manage different packages and dependencies. Once you’re in, you can find different courses to practice your Python programming, and practice live in the Jupyter Notebook offered, which is an intuitive and easy-to-access editor for code that can be run locally and uploaded or given version control quite easily by hooking it up with Git and a Github account.

Here’s the documentation for how to get started with Anaconda and Jupyter notebook. The following post summarizes different ways of working with Jupyter notebooks and version control. Finally, this post from freeCodeCamp explains Git and the importance of version control.

While you can work on Jupyter Notebook in a local context by yourself and seldom do anything but upload your finished experiments and files to Github (something I’ve often done), building in the habit of working with version control is a great practice.

It’s the default method of collaboration between different programmers, who must ensure that code doesn’t conflict — so if you want to work on a data science team, or any software team for that matter, it’s always good to start with good habits.

You’ll also want to use version control to revert back in case something goes wrong and to maintain a steady thread of progress.

Programming

R vs Python

A large part of the data science ecosystem debate is whether or not to use R or Python as an intro-level programming language to get started. In this article for The Next Web, I wrote that it was ideal to know both. Realistically, if you had to choose, I would go with Python. We’ll start there, but I’ll add some R resources in case.

R

R on Codecademy

Codecademy can help you practice your R skills before you start applying it to data science use cases.

Introduction to R for Data Science

This interactive course is given by Microsoft on the edX platform, and is completely free to access. You will need to pay $99 USD if you want to have a verified certificate on your profile.

Python

49 Essential Resources to Learn Python

I wrote this list of resources to learn Python, going from beginner to advanced. Go through and pick out the resources that are data science and machine learning-specific.

Learning Python: From Zero to Hero

A text-based tutorial that summarizes the basics of Python. It will get you from knowing zero to Python hero.

Python Tutorial: Learn Python For Free | Codecademy

Codecademy was how I learned Python. Working through the interactive course modules will help you move through and learn by doing and practicing.

SQL

21 of the Best Free Resources to Learn SQL

You’ll want to practice your SQL as well if you’re looking to become a data scientist. A large amount of data is still held in structured SQL tables. Practicing with SQL will help you extract that data and work with it.

SQLZoo

SQLZoo works partly as a Wiki, partly as a set of interactive exercises. I use this to sharpen my SQL skills when I need to practice.

Pandas

Pandas dataframes are the default unit of data wrangling in data science work. Pandas allows you to organize your data in a tabular, structural fashion similar to a SQL table or an Excel spreadsheet. It also allows you to use Python to programmatically treat data.

Pandas Cookbook

This handy guide goes over the Pandas library and different things you can do with it from grouping to aggregation functions. It’s a handy interactive guide to Pandas — and it’s how I first started getting familiar with the library and data science in general.

A Comprehensive Guide to Data Wrangling

This guide helps define data wrangling, why it’s important, and introduces a few new functions and situations in Pandas to get you comfortable with it.

Statistics

Once you’re able to source data, you’ll need the statistical ability to be able to draw insight from the data you’ve collected.

As you’re learning the programming you need, you need to be able to understand statistics to manipulate data, understand it, and evaluate different models. This often involves at least a basic understanding of probability, frequentist and Bayesian statistics.

Statistics and Probability: KhanAcademy

This interactive video-filled course will help you catch up on frequentist statistics, confidence intervals, p-values, and more. It’ll serve as a refresher if you’ve encountered these concepts in university, and a learning opportunity if you haven’t.

A Concrete Introduction to Probability by Peter Norvig

This iPython Notebook allows you to directly work with probability concepts in your own version of Jupyter Notebook should you desire. It expresses probability ideas in very readable Python code, helping to combine both your programming practice and statistics knowledge.

Bayes’s Theorem: A Visual Introduction

This post introduces Bayesian theory with a lot of visualizations. It can take the visualizations to really crystalize Bayesian thinking, especially since it involves a lot of segmentation on probability.

Introduction to Bayesian Inference

This tutorial uses a Python library to explain Bayesian reasoning through a model of click-throughs on ads. Use it to understand Bayesian inference in practice.

Mathematics

Once you’re done with the statistics, it can be good to understand some of the mathematics behind data science and machine learning even if most of the detail is not something you’ll confront everyday given how abstracted away most of the math is.

The Mathematics of Machine Learning

The Towards Data Science article sums up the categories of mathematics you need to learn as well as links to different courses.

Mathematics of Machine Learning

This book is offered as a free PDF, covering several sections of machine learning math in detail from analytic geometry to vector calculus.

Machine Learning

Now that you’ve refreshed or embraced statistics and programming concepts, it’s time to take it all together and learn the machine learning algorithms you can use on your data.

A Tour of Machine Learning Algorithms

Starting with foundational concepts in machine learning such as the difference between supervised and unsupervised learning (and semi-supervised) we can then drill down into the different categories of machine learning algorithms and broadly see how the logic works with a set of visualizations.

10 Machine Learning Algorithms You Need To Know

This Towards Data Science Medium post then dives a bit deeper into ten specific machine learning algorithms, giving code implementations of a few so you can see them in practice on data.

Data Modelling/Evaluation

After all the work on different algorithms, it’s time to refresh what makes for a good data model. How do you know if your model is working? This section of resources will help you put that together.

Part-4 Data Science Methodology From Modelling to Evaluation

The article summarizes the data science methodology. In this section, it focuses on how evaluating your model fits with the broader work of machine learning and data science.

Various ways to evaluate a machine learning model’s performance

The following tutorial includes a breakdown of evaluation metrics beyond accuracy such as the confusion matrix and the ROC curve.

Data Visualization

Python Matplotlib Guide

Matplotlib is the default data visualization library embedded in Python, and something designed to be used off-the-bat with Pandas. You can use its visualizations to get a quick sense of the data yourself without needing to export it. This guide goes over the basics of Matplotlib and how it’s constructed.

Visualization With Seaborn

Seaborn is a Python library that provides more compelling data visualizations than the default Matplotlib library. Use this tutorial to get familiar with it.

Intro to D3.js with ten examples

This D3.js introduction helps get you started with the powerful JavaScript library. The related chart collection helps collect tons of examples of different charts you can use to visualize your data in R, Python and more.

Datasets to Practice With

Datasets | Kaggle

Kaggle, the online data science competition platform, offers a variety of datasets you can use to practice your data science skills. The datasets feature ranking and comments so you can follow the most trending datasets. You can study what others have done with them as inspiration for your own projects.

19 Free Public Data Sets for Your Data Science Project

The link above is a list of 19 free, public datasets ranging from United States census data to FBI crime data.

Awesome Public Datasets

A Github repository that hosts a wide variety of open, public databases. They are organized by their domain. This is a great definitive resource for free datasets.

Awesome IPFS Datasets

This website hosts datasets, some of them quite large, hosted on IPFS (the interplanetary file system). This is a distributed, decentralized protocol of storing data that goes beyond HTTP’s standard server-client relationships. In theory, this means that datasets downloaded through IPFS might be faster to get. After all, you’ll be working with a swarm of hosts rather than just a single one.

Registry of Open Data on AWS

Amazon Web Services, which helps host much of the content on the web today, also has this registry that helps people find open data hosted on its cloud services. It includes examples of what people have done with that data.

BigQuery Open Datasets

Google hosts the above datasets on BigQuery, its big data storage solution. They include the complete revision history of Wikipedia up to April 2010, and weather information from the NOAA since late 1929.

Data Science Courses/Bootcamps

The curriculum might be a bit too much to handle as a learner — and that’s perfectly fine. It’s meant as a bare-bones categorization of the material you need to learn to get into data science. However, if you want to refine your learning, there are a few options out there. I’ve linked to a list of bootcamps and courses. Be aware that I worked for Springboard.

Data Science Bootcamps

Data Science Bootcamps, CourseReport

CourseReport has a list of different data science bootcamps, with ratings and real student reviews given for each course.

Springboard

Springboard offers a variety of mentored bootcamps where you’re given personal attention from a data science expert and career coaching. It also comes with a job guarantee. Either get a job or your tuition back once you’re accepted.

Udacity

Udacity offers nanodegrees where you can dive deep into specific data science topics such as self-driving cars.

Data Science Courses

Coursera (Data Science)

Coursera offers a variety of online course options for data science. Use them well to deepen your learning in the field.

Udemy Data Science

Udemy offers a variety of data science courses created by different independent teachers on its platform.

Data Science Interview Questions

The data science interview tends to fall into many steps, with some being technical and some being non-technical. I wrote this guide for Springboard on the data science interview process to fully flesh it out. I’ve added some sample questions you might expect, some with solutions, under each section.

Initial Recruiter Call

Before you’re assessed by a hiring manager, you’ll usually have a call with a recruiter to determine if you’re a fit with the company. They’ll ask general questions about your motivations and career path and see if you’re a fit with what the hiring manager wants.

Sample questions

Why this company?
What interests you about the role?
What are your salary expectations?

Technical Interview

A hiring manager will ask you technical questions related to your knowledge of statistics and programming. Here are about 109 data science questions with solutions. For programming, you can try HackerRank challenges as well to stay sharp before your interview.

Technical Case Study

Part of the interview process will involve either an in-depth review of a project you worked on or a case study where you work with your (hopefully) future team. This will involve detailed questions about work you’ve done or how you’d approach a project. You might have to do a take-home assignment or to work on a problem with the hiring manager.

Behavioral Interview

The behavioral part of the interview will test your management and communication skills as well as fit with the team. It’s usually done by the hiring manager rather than the recruiter.

Job Boards And Resources

There are many data science specific job resources and career sites out there worth following. Here are a few where you can find resources and data science job postings.

KDNuggets

KDNuggets features a job board and many resources for aspiring data scientists and the community at large.

Kaggle

Kaggle features a host of different resources for data scientists, including datasets that are free and public for use, a customized version of a Python kernel that allows for automated version control as well as collaboration with other Kaggle users and a host of competitions that can help you practice and show your data science skills.

Data Elixir

Data Elixir is a newsletter dedicated to data science resources and jobs. Sign up to get a periodical update on the data science ecosystem.

Data Science Weekly

Another great newsletter filled with breaking news and tons of learning opportunities. Data Science Weekly is well worth subscribing to.

Hacker News Jobs

Hacker News Jobs is a great spot to cleanly aggregate machine learning and data science job positions from technologists who post on Hacker News “Who’s Hiring?” threads.

What’s great about these postings is that you’ll often find a lot of context and a direct connection to a hiring manager, who will often leave their email directly on a posting to make themselves available for connection. You can easily search for data science specific postings.

Angellist Jobs

AngelList is the world’s largest repository of startups, many of whom are looking to hire for data science roles. You can filter specifically for data science roles, location and industry.

Do You Need A Degree Or Not?

This is an ongoing discussion. Advanced degrees help increase your data science salary and some hiring managers display a bias towards those degrees. Many hiring positions demand a minimum of a bachelor’s degree.

However, DJ Patil, the fromer Chief Data Scientist of the United States, called on recruiters and companies to judge candidates based on what they did with data, not their education.

While the data science community often draws from the same ethos of do-it-yourself learning-by-doing that typifies the open source community, it can be a more gated process because of the statistics and math knowledge needed, as well as the communication skills data scientists need to develop.

Work experience can fill a lot of gaps here, but to get into the industry, it’s possible you might have to start with a data analyst role then move up in a data scientist role, or settle for a junior data science role or internship if you have no experience and no degree.

Despite the emergence of Masters programs targeted for data science, the truth is that you don’t absolutely need a degree to succeed in data science.

Sample Data Science Job Roles

Data Scientist – Personalization @ Spotify

This role at Spotify involves a lot of teamwork and data exploration. It focuses on data modeling. Data engineers help to bring pipelines of data for you to model properly. This role is more focused on the product analytics team, and as a result, is cross-functional in nature. While there is a demand for degrees, most of the other requirements involve applied experience.

Entry Level Data Scientist @ IBM

This entry level role doesn’t need a degree — rather just the skills that make up the data science curriculum. The focus is on communication, tools, and the different models that make up data science roles.

Candid Data Science Career Advice On How To Be A Data Scientist

Here’s some candid career advice from different data scientists in the field to help you with how to be a data scientist:

Claire Longo (Senior Machine Learning Engineer @ Twilio): Beat imposter syndrome by choosing a focus area to master. Talk about the stuff you don’t know as well as the stuff that you do.

Jess Zhang (Inference Data Scientist @ Airbnb): Throw out the first number and do your research when it comes to negotiations. Look through courses and continually refresh and learn so you have a toolbox you can rely on. Find somebody who believes in you, sometimes through networking at data science meetups.

Anmol Rajpurohit (Senior Software Engineer @ Splunk Enterprise Cloud): Data science isn’t for everybody. Make sure you know what you’re getting into before you start a career in data science.

Checklist On How To Be A Data Scientist

1- Learn basic statistics, including frequentist, Bayesian thinking and probability theory.

2- Learn how to programmatically source and organize data, preferably with Python.

3- Learn more the more advanced statistics and mathematics behind data science at scale, from linear algebra, to model evaluation.

4- Practice your learning and work on projects with production-level datasets. Build a portfolio for hiring managers.

5- Prepare for the data science job interview process.

6- Accept a data science job offer (after many months of effort, most likely).

7- Continually practice!

101+ Resources to Learn Data Science

BY Roger Huang

Many people are seeking to learn data science these days. It’s become a trendy topic associated with high salaries and some of the most interesting problems in the world. This demand has created many different resources in the data science space. People have curated their selection of favorite resources to learn data science, but I was seeking out something more comprehensive — so I built this list. Here’s my attempt at getting you my favorite resources in the data science space so you can understand what’s going on in the field — and how you can get your hands dirty and start learning right away.

Full disclosure: I work for Springboard (one of the data science education providers listed below).

What is data science?

First, let’s start with an overview of what seems to have become a popularized buzzword and defining exactly what you want to learn: data science. Data science is the combination of three kinds of skillsets: statistics, programming and business knowledge. It’s the interplay between these crafts where you’ll find a data scientist — somebody who will programmatically examine large data sets for precious business insights — somebody who can combine computer science knowledge with business insight.

You can use data science concepts and training to do data mining and get statistical inferences from large datasets. Using advanced techniques such as natural language processing and unsupervised learning, you can tame the power of computation and get precious data insights others simply cannot access. That will be attractive to all sorts of potential employers in the data science field, from Silicon Valley to Wall Street.

In order to get there though, you have to start with the basic techniques and basic concepts that underlie data science. Learning data science requires having an understanding of the process that goes behind it, and the various components that are required to bring everything together. Let’s get started on getting you know that knowledge.

Overview

You’ll want to get an overview of the field and the processes and concepts that make up data science so you can learn data science.

1- Data Scientist: The Sexiest Job of the 21st Century

In this seminal article, ex-Chief Data Scientist of the United States, DJ Patil, goes into exactly what makes a career in data science so compelling. It’s great fuel to the fire if you’re looking to learn data science.

2– What is data science?

This overview of data science by Berkeley delves into how data science came to be — and the average salary you can expect in the field.

3– Data Science Salary Survey (2016) – O’Reilly

O’Reilly, a leading publication and media company on the cutting edge of technology, dives deeper into what tools and factors go into higher data science salaries. They’ve surveyed hundreds of data scientists in the field. Learn what pays and what doesn’t with data science careers through their research!

4– Data Science (Wikipedia)

Wikipedia’s overview of data science goes over the history of the field and points to many different resources in the field. It can be a handy jumping-off point for further research.

5– Building Data Science Teams

This piece by DJ Patil goes into the different roles inherent in a data scientist’s job — and exactly how best to build out a data science team.

6- Data Science Process

This piece by Springboard goes into what the day-to-day of data science looks like — tracing it all the way to a first principles view of exactly what steps effective data science requires.

Interactive Tutorials

Now that you’re done with an overview of the topic, it’s time to get your hands a bit dirty with interactive tutorials that will help you learn different parts of data science — whether that’s the statistical theories behind machine learning algorithms, or the programming skills you’ll need to implement those theories.

Statistics/Math

Understanding probability and the basics of statistics is essential to being able to understand machine learning methods and how to handle massive amounts of data. Linear algebra and the ability to manipulate different expressions of data (in matrix form or otherwise) will also be incredibly helpful in detailing what data scientists do. You’ll want to refresh your statistics knowledge and get a handle on the math you need to know to join their ranks.

7– KhanAcademy (Statistics/Probability)

This free course from KhanAcademy serves as a great catch-up on the basics of probability and statistics.

8– Introduction to Statistics in R (Datacamp)

Learn a bit of R (a programming language commonly used in data science) and statistics at the same time with this interactive walkthrough from Datacamp.

9– Statistics 101

This Youtube playlist from the Harvard Extension School covers everything from random variables to different statistical distributions.

SQL

Knowing SQL and how to query from relational databases is a skill that is one of the building blocks of data science. You’ll often use SQL to source your data for further analysis — or even to transform your data on the spot.

10– Mode Analytics SQL School

Mode Analytics teaches SQL through the use of case studies with real data. It’s an interactive experience that’ll teach you the basics of SQL by having you run through a dataset with some simple yet powerful commands.

11– Learn SQL (Codecademy)

Codecademy, well known for its basic curated tutorials in different programming languages, has this simple interactive module that will help you learn SQL.

12– SQLCourse

This is an older tutorial, but one that still holds up as an example of an organized approach to learning SQL.

Python

Python is one of the workhorse languages of data science — one of the most popular along with R. The large open-source community that powers Python enables it to be a powerful, versatile programming language that can help facilitate everything from data wrangling to training powerful machine learning models. It’s a powerful tool you’ll want to learn as you learn data science.

13– Pandas Cookbook

This interactive set of code examples walks you through how to get started with Pandas, the data processing library most commonly used in Python. It’s built by Julia Evans.

14–Intro to Python for Data Science (DataCamp)

This interactive course will walk you through the basics of the data science libraries for Python.

15– Gentle introduction to scikit-learn

This gentle introductory tutorial will help you understand one of the most powerful machine learning and data science libraries out there: Python’s scikit-learn. You’ll be able to train simple, off-the-shelf data models in a matter of minutes.

16– A dramatic tour through Python’s data visualization landscape

This somewhat witty and whimsical walkthrough will help you explore the difference between the major data visualization tools in the Python ecosystem — including some options that were ported from R!

17- Web scraping with BeautifulSoup

This short guide will teach you how to take information from different websites and render it into a format that is easy for machines to process — a handy skill for anybody looking to work with many different datasets. I often use the set of techniques described to scrape tables from Wikipedia so I can process that data in Python.

R is another popular programming language used for data science — in fact, it’s often pitted against Python as a comparable tool. The truth is that you can use both — and in fact, being conversant in both can only help you progress faster and further as a data scientist.

18– Introduction to R (Datacamp)

Here is the equivalent of the Datacamp introduction to Python — except this time for R, another common data science programming language.

19– A complete tutorial to learn R from scratch

This tutorial, rendered as a blog post, offers a comprehensive A to Z guide to getting started in R. It covers everything from importing data into R to creating predictive models with it.

20 – Try R

Sponsored by O’Reilly Media, this interactive course will reward you with a badge for each fundamental building block of R you learn.

Hadoop

Hadoop is a big data framework meant to facilitate the treatment and storage of large data sets that have be processed in parallel by many different servers in order to yield actionable insights.

21– Hadoop Tutorial (Tutorialspoint)

This set of tutorials on Hadoop will help you understand how big data frameworks work — and how you can apply Hadoop to your data.

22– Hortonworks Sandbox tutorial on Hadoop

This interactive Hadoop sandbox by Hortonworks lets you play with Hadoop code.

Spark

Spark helps solve some speed, flexibility and efficiency issues with Hadoop through the use of a new data structure: the RDD or resilient distributed dataset.

23– Apache Spark Tutorial (TutorialsPoint)

TutorialsPoint offers a similar tutorial to Spark as it does for Hadoop.

24– Hands-on introduction to Spark

Hortonworks has a sandbox that will let you play around with Spark code.

Courses/Workshops

The following courses and online workshops will help you learn data science in an organized fashion. Use these resources to accelerate your learning of data science if you need to. A lot of these courses will help you find data science work, and you’ll likely be able to do data science projects after finishing them.

25– Fast.ai

This massive online course, built by a Kaggle champion in machine learning, will help you learn about neural networks and how to train machine learning models.

26– Foundations of Data Science (Springboard)

This course offered by Springboard features a curated selection of resources in R, SQL and the basics of machine learning, as well as personalized mentoring from data science experts who work in the field.

27– Data Science Intensive (Springboard)

Yet another course offered by Springboard, though this one is more advanced. Focused on Python and teaching the intricacies of machine learning methods, this course will help you use different machine learning techniques with ease.

28–Data Science Career Track (Springboard)

Springboard’s Data Science Career Track is the first online bootcamp to offer a data science job or your tuition back. With personalized career coaching, mentorship from data scientists and exclusive employer partnerships, Springboard is putting it all on the line to help you get a job in data science.

29– Data Science (Coursera)

Coursera partnered with Johns Hopkins University to deliver this nine-course series on data science, covering everything from tools to advanced machine learning methods.

30– Machine Learning (Coursera)

This curated set of machine learning courses taught by Andrew Ng (the famous Stanford professor who founded Coursera in the first place) is one of the best resources to consult as you start understanding data science.

31– Thinkful Data Science Bootcamp

Thinkful, an online education provider, provides a data science bootcamp that will curate your learning of data science and Python.

32– Intro to Machine Learning (Udacity)

Udacity offers a free mini-course curated by Facebook and Tableau to help guide you through to doing analysis of the Enron email database.

33– Data Science Certificate (Harvard Extension School)

This data science certificate offered by the Harvard Extension School can help you learn data science while getting credits and credibility from one of the leading universities in the world.

34– Statistics with R (Coursera)

This selection of courses created in partnership with Duke University will help you understand basic probability and the use of Bayes’ Rule through the use of R.

35– Data Science (EdX)

This set of curated learning paths in data science can help you get accreditation in the field — if you’re willing to pay for it.

36– Insight Data Science Fellowship

The Insight Data Science Fellowship is a special type of data science education program — it takes talented PhD. students who have already demonstrated technical skills and aptitude, and helps them bridge the gap between academia and industry with a postgraduate fellowship that combines the best of academic rigor with industry knowledge.

37–Data Science (General Assembly)

General Assembly, one of the largest online education providers in the world, offers courses in data science.

38– Galvanize

If you’re looking for an in-person experience to learn data science instead of something online, Galvanize can help. This link leads to the San Francisco experience — however, Galvanize itself is present in many different other cities.

39– Coursereport Data Science Reviews

Here are some reviews of different data science courses in Coursereport — this will allow you to pick and choose between many different options with fair reviews from previous students on display.

40– Switchup Data Science Reviews

Here are some more reviews of different data science courses, this time from Switchup, another course review site.

Books

Oftentimes it’s not a great course that helps you learn the most — it can be one single resource within that course — say a particularly well-written book. This selection of data science books can help you understand data science in detail.

41– Bayesian Methods for Hackers

This book, delivered as an extended Github repository, can help you understand Bayesian inference and how to think about probabilities by working through them in code.

42– Think Stats

This O’Reilly book helps you conceptualize statistical concepts by having you work with them in Python.

43– Think Bayes

This book combines Python programming with Bayesian inference, and can be a handy resource in case the books above aren’t enough.

44– Deep Learning

This free technical book by some of the scions of deep learning and artificial intelligence (Ian Goodfellow, Yoshua Bengio and Aaron Courville) will help you understand exactly how to think about deep learning and neural networks.

45– Learn Python the Hard Way

In case you need a refresher on Python, Learn Python the Hard Way will help you break down exactly what you need to do to master Python. While it focuses on an older version of Python, the first principles taught here can be useful to those looking to freshen up their knowledge of Python — though you shouldn’t become overly dependent on this book as it has quite a rigid philosophy on one particular version of Python.

46– The Data Science Handbook

This Data Science Handbook curates insights from 25 data science leaders and distills what it truly means to work in this exciting new field.

47– Data Science from Scratch

This book from O’Reilly goes into the first principles of data science, looking beyond the programming tools and frameworks.

48– Storytelling with Data

This book will help you visualize insights that you find within your data and teach you how to communicate them effectively so that you can drive impact with your data findings.

49–Exploratory Data Analysis with R

Roger D. Peng, an expert in statistics, has written this book to teach how to look through datasets with the R programming language.

50–Interactive Data Visualization for the Web

This online book will teach you how to use frameworks such as D3.js to make your visualizations fully interactive on the web.

51–Machine Learning Yearning

This book by Andrew Ng, the famous artificial intelligence leader who founded Coursera, is going to be released soon — sign up to get drafts of new chapters as they come in!

Curated Collections

I know you’re looking for curated resources to learn data science. There’s more than just this list right here — and each collection will help you expand your knowledge and collection of great data science resources even further.

52– Awesome Machine Learning

This Github repository follows the “Awesome” method of curating the best resources in a particular space — in this case, all the different resources you’d need to learn machine learning.

53– Awesome Deep Learning Papers

In case you ever wanted to get a handle on the science behind the amazing technology being built out of artificial intelligence, this awesome curation of deep learning papers will help you continually be on top of exciting new developments.

54– Awesome TensorFlow

TensorFlow is an awesome deep learning framework: this Github repository will have everything you need to learn more.

55– Awesome Data Science

This repository is everything it promises: an awesome curation of different data science resources.

56– Data Analysis Learning Path (Springboard)

This learning path curates different resources in an intuitive fashion so that you can learn the data analysis skills required for data science.

57–The Open Source Data Science Masters

This is a curated curriculum of free, open-source resources to learn data science — consider it a masters’ degree for a fraction of the price.

General Resources

58– A visual intro to machine learning

This interactive, visual view of data science in action can help you conceptualize data science, especially if you prefer to learn visually.

59– Deep Learning Review (Nature)

This paper summarizes some of the latest findings in deep learning and artificial intelligence and it is written by one of the founding fathers of modern artificial intelligence research: Geoffrey Hinton.

60– Build a deep learning machine

This fun little tutorial by O’Reilly will teach you how to build a computer that you can use specifically for data science purposes.

61- How can I become a data scientist (Quora)

This Quora thread contains different thoughtful replies on how to become a data scientist — and includes a bevy of free resources to boot!

62– Becoming a Data Scientist

This blog charts the author, Renee, and her path from being a SQL analyst to becoming a full-fledged data scientist.

Career Advice

Becoming a data scientist is now a career path that many envy — however, getting started and placing yourself in a position where you are paid to practice data science doesn’t start and end with technical skills. Here’s a set of resources that will help spell out exactly what you need to do to have a successful data science career.

63– 2015 Data Science Salary Survey (O’Reilly)

This salary survey by O’Reilly was curated from about 600 respondents who divulged their salary and what they did at work. It’s an informative read on what the average salaries are like in data science and what factors or technical skills can either increase your data science salary — or set it on the path to stagnating.

We already highlighted the 2016 survey as part of our general overview of data science, but the 2015 survey will add even more context on how the data science industry works — and how much you should expect to be paid.

64– Guide to Data Science Jobs (Springboard)

This guide to Data Science Jobs by Springboard curates a variety of job seeker and hiring manager stories and seeks to inform you on every element of what it takes to get a data science job: from how to get hiring managers to notice your profile, to advice on what technologies and skills you should practice before doing a data science interview.

65– Guide to Data Science Interviews (Springboard)

This companion guide to the Guide to Data Science Jobs by Springboard runs you through different interview questions and exactly what hiring managers are thinking when they are on the other side of the table. It’s a comprehensive overview of the data science interview process — and it provides you actionable tips on how to ace the data science interview.

66– Getting your first job in data science

This blog post goes over different general tips on how to get that first job in data science.

67– Data Science Career Paths

This blog post by Springboard breaks down the difference between data analysts, data scientists and data engineers.

Datasets

In order to really get started and to learn data science, you have to have datasets to play with. The following resources will link you to different datasets you can experiment with as you’re learning data science techniques and putting them into practice.

68– 19 Free Public Datasets (Springboard)

This curated list of 19 free public datasets will help you get started on your path to learn data science!

69– Kaggle Datasets

This list of datasets curated by Kaggle comes with upvote functionality as well as comments, so you can exactly which datasets are the most exciting — and what work has already been done with them.

70– Reddit Datasets

This subreddit can be a handy way to pick out new datasets, and see some of the most popular ones.

71– Data.world

This new social network has evolved around sharing great datasets and bringing data fans together!

72– Google BigQuery Datasets

Google BigQuery has open-sourced some interesting big data sets–from Reddit comments to Github activity.

73– Quandl

Quandl is a search engine mostly used for financial and economic data. Comb through if you’re looking in that space for data to play with.

74– Public Big Datasets

This curated list of big datasets can help you practice with Hadoop or Spark.

75– Wikipedia dumps

Wikipedia dumps data from its database and makes it free to analyze every so often. Sift through here if you want to query the world’s largest collection of knowledge on your quest to learn data science.

76– Open Street Map

This collection of open-source geographic data extends around the world in its reach!

Resources/Blogs to Follow

You’ll want to keep an eye on different resources and blogs that update frequently as you learn data science. This ensures that you’re always on top of the latest developments — and it can be a stimulating way to keep your data science skills sharp.

77– Top data scientists to follow on Twitter

This is a list of data science influencers you’ll want to consider following to get to know more about the industry.

78– 50 of the best data science blogs

This curated list of data science blogs will help you find the best blogs to follow as you learn data science.

79– Ultimate guide to data science blogs

This larger, extended guide to data science blogs has a lot more entries — feel free to take a look if you feel like you want something comprehensive to digest.

80– KDNuggets

KDNuggets is one of the largest data science communities on the web, and their blog regularly posts interesting data science content.

81– R-bloggers

R Bloggers is a data science blog focused on tutorials to learn R and different resources in the R ecosystem.

82– Dataconomy

Dataconomy focuses on larger trends in data science rather than many technical tutorials. It’s the data science blog with the largest focus on the European data science scene as well.

83– Analytics Vidhya

Analytics Vidhya contains plenty of technical tutorials on many data science topics.

84– Big Data Made Simple

Big Data Made Simple is a relatable blog that conveys different topics in data science in an approachable manner.

85– Yhat blog

The Yhat blog is always filled with interesting tutorials and data science case studies.

86– Machine Learning Mastery

Machine Learning Mastery focuses on the intricacies of machine learning.

87– Learndatasci

Learndatasci is a blog that offers a broad overview of different data science topics.

88– Mastersindatascience

Mastersindatascience is the resource to consult if you wanted to look at paid offerings to learn data science.

Newsletters

If you want regular updates in your inbox on the latest news in data science, there’s no better way to do that than to subscribe to the following data science newsletters.

89 – Data Science Weekly

This weekly newsletter summarizes the latest tutorials and resources in data science. It’s a very useful resource if you’re looking to learn data science.

90– Data Elixir

Another data science newsletter that will keep you informed on the latest happenings in data science.

91– Python Weekly

This weekly Python newsletter curates a selection of the finest Python resources, many of them related to data science.

92– Datafloq

This handy newsletter promises to be a one-stop shop for you when it comes to big data trends.

93- The Analytics Dispatch

Mode Analytics provides a dispatch to keep you informed on all things analytics and BI-related.

94- Postgres Weekly

This Postgres Weekly newsletter keeps you informed on the latest Postgres updates.

95- O’Reilly Data Newsletter

A premium data science newsletter, O’Reilly will often curate the best data science resources that have popped up.

Communities

While newsletters and blogs are great, interactive communities where participants share articles and comment on them together can truly help you entrench your data science knowledge. Here are just a few of those communities where you can learn data science and interact with different data science practitioners.

96– Datatau

Datatau is a sort of Hacker News for data science resources where data science practitioners discuss the latest news and upvote the best articles.

97– Reddit Datascience

This subreddit deals with general data science topics.

98– Reddit Machine Learning

This subreddit deals with more in-depth machine learning materials and discussions.

99– Reddit Deep Learners

This subreddit deals with how to learn artificial intelligence and deep learning.

100– Reddit Data is Beautiful

This subreddit contains impactful data visualizations that are visually appealing — and a true set of examples if you want to display your data in a beautiful manner.

101– Data Science Stack Exchange

This subcomponent of the Stack Exchange network deals with technical questions and solutions in data science.

102– Quora Data Science

This section of Quora is composed of many of the questions posed about data science — it is an awesome resource for those looking to learn data science.

Hopefully the resources above have been helpful for you to learn data science: let me know in the comments below what you think about them or whether you think there are some I missed!