Data Science/Artificial Intelligence

Where To Get Free GPU Cloud Hours For Machine Learning

An Introduction To The Need For Free GPU Cloud Compute

GPUs were once used solely for video games. Now, they power machine learning models around the world with their unique configuration and processing power. Getting free GPU cloud hours has become a need for many machine learning practitioners and hobbyists.

In brief summary, your traditional CPUs are good for complex calculations performed sequentially, while GPUs are excellent for many simple parallel calculations performed across multiple cores. GPUs take advantage of the fact that their hardware structure and architecture is meant to do shallow calculations in parallel faster than a CPU can do them in sequence.

That makes them the perfect fit to train deep neural networks. The new RAPIDS framework also allows us to extend this to regular machine learning work and to data visualization tasks. This has led to speedups that can take algorithms that normally take upwards of 30 minutes, and reduce them to speeds of 3 seconds.

How do we take best advantage of this scenario? Fortunately, there are many GPU cloud providers that are offering free GPU cloud compute time so you can run experiments and try out these new processes.

1 – Google Colab

Google Colab offers you the opportunity to easily upload Python Notebooks into the cloud and interact with Github/Git to pull repositories to modify or to push work in Colab files to Github. If you have a Google Drive account, you can easily access your Colab notebooks in your Google Drive. You’ll be able to easily switch into GPU runtime mode by clicking Runtime on the top of the menu bar.

Specs:

  1. Free access to Tesla K80 GPU
  2. Up to 12 hours of consecutive runtime per day
  3. 12 GB of RAM

2- Kaggle GPU (30 hours a week)

Kaggle is a platform that allows data scientists and machine learning engineers the ability to demonstrate their capabilities with creating accurate models.

They offer 30 hours a week of free GPU time through their Kernels. The hardware they use are NVIDIA TESLA P100 GPUs. The intent of Kaggle is to offer them for deep learning, and they don’t accelerate workflows with other processes — though it’s possible you might try using RAPIDS with pandas and sci-kit learn like functions.

While the GPU time is offered for free, they do offer certain recommendations. You should, as with Google Compute, monitor when you’re using GPU time and switch it off when you’re not. Even if it’s monetarily free, you’ll want to be careful with the time you’re allotted. The limit of six hours of consecutive runtime means that you won’t be able to train complex state-of-the-art models that often take days to fully train.

Specs:

  1. Free access to NVIDIA TESLA P100 GPUs
  2. Up to 30 hours a week of free GPU time, with six hours of consecutive runtime
  3. 13 GB of RAM

3- Google Cloud GPU

For each Google account that you register with Google Cloud, you can get $300 USD worth of GPU credit. That can get you over 850 hours of GPU training time on their Nvidia Tesla T4. In practice though, you’ll want to try more powerful GPU instances with Google Cloud since you can get a baseline free with Google Colaboratory. You’d be able to train relatively powerful models in that time, or use it to practice machine learning work with RAPIDS. This tutorial goes over the setup of the GPU.

Note that when you set up the virtual machine, if you don’t turn it off when you’re not using it, you’ll still get billed, and you’ll get billed if you go past the $300 USD quota, so be careful to avoid unneeded charges.

4- Microsoft Azure

Microsoft Azure also offers a $200 credit when you sign up, which you can use for Azure’s GPU options. This blog post explains how you can get up to $500 a year in credits.

5- Gradient (Free community GPUs)

Tired of using Google/Microsoft infrastructure or want to try something new? Gradient offers free community GPU cloud usage attached to their notebooks. This blog post offers a more in-depth perspective on their community notebooks.

6- Twitter Search for Free GPU Cloud Hours

You can always keep an eye out for promo codes and other cloud providers offering free GPU Cloud Hours by looking at Twitter and searching for relevant keywords.

With the right search query, you’ll be alerted to the latest offerings. I’ll try to retweet a few if you want to follow my personal Twitter account.

7-An alternative: build your own machine learning computer with GPU

If you’re tired of more limited cloud compute constraints, from cost to execution time limits, one solution might be to go as far as building our your own machine. Your only constraint is the power cost, which can be higher than expected with these powerful machines.

Still, you’ll be able to fully control your configuration and the hardware you use. It can be very cost-efficient, since you can run your own machine 24/7 — and you can build your own machine learning GPU rig for less than $1,000.

Data Science/Artificial Intelligence, Learning Lists, Resources Lists

49 Essential Resources To Learn Python

Hi, I’m Roger, and I’m a self-taught data analyst/scientist (but only on my good days). I spent a lot of time thinking about Python — and here’s a compilation of resources that helped me learn Python and can hopefully help you.

I’ve broken it down to:

Beginner resources for those just starting with programming and Python

Intermediate resources for those looking to apply the basics of Python knowledge to fields like data science and web development

Advanced resources for those looking to get into concepts like deep learning and big data with Python

Exercises that help practice and cement Python skills in practice

Beginner Resources To Learn Python

learn python

1- Welcome to Python.org

The official Python site offers a good way to get started with the Python ecosystem and to learn Python, including a place to register for upcoming events, and documentation to get started.

2-Learn Python the Hard Way

An online book with a paid and a free version. The free version goes into an outline of the content and can be a useful to-do list.

3-Basic Data Types in Python – Real Python

RealPython dives into the different data types in Python in detail. Learn the difference between floating point and integers, what special characters can be used in Python and more.

4-How to Run Your Python Scripts – Real Python

This simple intro to Python scripts through the command line and text editors will get you up and running for your first Python experiments — a handy tool to get you started as you learn Python.

5-Python Tutorial: Learn Python For Free | Codecademy

Codecademy offers a free interactive course that helps you practice the fundamentals of Python while giving you instant, game-like feedback. A great device for learning Python for those who like to practice their way to expertise.

6-Google’s Python Class | Python Education | Google Developers

The official Python development class from Google’s developers. This tutorial is a mix of interactive code snippets that can be copied and run on your end and contextual text. This is a semi-interactive way to learn Python from one of the world’s leading technology companies.

7-Learn Python – Free Interactive Python Tutorial

This interactive tutorial relies on live code snippets that can be implemented and practiced with. Use this resource as a way to learn interactively with a bit of guidance.

8-Jupyter Notebook: An Introduction – Real Python

Want an easy, intuitive way to access and work with Python functions? Look no further than Jupyter Notebook. It’s much easier to work with than the command line and different cobbled together scripts. It’s the setup I use myself. This tutorial will help you get started on your path to learn Python.

9-Python Tutorial – W3Schools

W3Schools uses the same format they use to teach HTML and others with Python. Practice with interactive and text snippets for different basic functions. Use this tutorial to get a firm grounding in the language and to learn Python.

10-Python | Kaggle

Kaggle is a platform which hosts data science and machine learning competitions. Competitors work with datasets and create as accurate of a predictive model as possible. They also offer interactive Python notebooks that help you learn the basics of Python. Choose the daily delivery option to have it become an email course instead.

11-Learning Python: From Zero to Hero – freeCodeCamp.org

This text-based tutorial aims to summarize all of the basic data and functional concepts in Python. It dives into the versatility of the language by focusing on the object and class portions of the object-oriented part of Python. By the end of it, you should have a neat summary of objects in Python as well as different data types and how to iterate or loop over them.

12-BeginnersGuide – Python Wiki

This simple tutorial on the official Python Wiki is chock-full of resources, and even includes a Chinese translation for non-English speakers looking to learn Python.

13-Python Tutorial – Tutorialspoint

Set up in a similar fashion to W3Schools, use Tutorialspoint as an alternative or a refresher for certain functions and sections.

14-Python (programming language) – Quora

The Quora community is populated with many technologists that learn Python. This section devoted to Python includes running analysis and pressing questions on the state of Python and its practical application in all sorts of different fields, from data visualization to web development.

15-Python – DEV Community – Dev.to

Dev.to has user-submitted articles and tutorials about Python from developers who are working with it every day. Use these perspectives to help you learn Python.

16-Python Weekly: A Free, Weekly Python E-mail Newsletter

If you’re a fan of weekly newsletters that summarize the latest developments, news, and which curate interesting articles about Python, you’ll be in luck with Python Weekly. I’ve been a subscriber for many months, and I’ve always been pleased with the degree of effort and dedication placed towards highlighting exceptional resources.

17-The Ultimate List of Python YouTube Channels – Real Python

For those who like to learn by video, this list of Youtube channels can help you learn in your preferred medium.

18-The Hitchhiker’s Guide to Python

Unlike the rest of the resources listed above, the Hitchhiker’s guide is much more opinionated and fixated on finding the best way to get set up with Python. Use it as a reference and a way to make sure you’re optimally set up to be using and learning Python.

19-Python: Online Courses from Harvard, MIT, Microsoft | edX

edX uses corporate and academic partners to curate content about Python. The content is often free, but you will have to pay for a verified certificate showing that you have passed a course.

20-Python Courses | Coursera

Coursera’s selection of Python courses can help you get access to credentials and courses from university and corporate providers. If you feel like you need some level of certification, similar to edX, Coursera offers a degree of curation and authentication that may suit those needs.

Intermediate Resources

learn python

21-Getting started with Django | Django

The official Django framework introduction will help you set up so that you can do web development in Python.

22-LEARNING PATH: Django: Modern Web Development with Django

This resource from O’Reilly helps fashion a more curated path to learning Django and web development skills in Python.

23-A pandas cookbook – Julia Evans

I learned how to clean and process data with the Pandas Cookbook. Working with it enabled me to clean data to the level that I needed in order to do machine learning and more.

It works through an example so you can learn how to filter through, group your data, and perform functions on it — then visualize the data as it needs be. The Pandas library is tailor-built to allow you to clean up data efficiently, and to work to transform it and see trends from an aggregate-level basis (with handy one-line functions such as head() or describe).

The Pandas cookbook is the perfect intro to it.

24-Newest ‘python’ Questions – Stack Overflow

The Stack Overflow community is filled with pressing questions and tangible solutions. Use it a resource for implementation of Python and your path to learn Python.

25-Python – Reddit

The Python subreddit offers a bunch of different news articles and tutorials in Python.

26-Data Science – Reddit

The Data Science subreddit offers tons of resources on how to use Python to work with large datasets and process it in interesting ways.

27-Data science sexiness: Your guide to Python and R

I wrote this guide for The Next Web in order to distinguish between Python and R and their usages in the data science ecosystem. Since then, Python has pushed ever-forward and taken on many of the libraries that once formed the central basis of R’s strength in data analysis, visualization and exploration, while also welcoming in the cornerstone machine learning libraries that are driving the world. Still, it serves as a useful point of comparison and a list of resources for Python as well.

28-Data Science Tutorial: Introduction to Using APIs in Python – Dataquest

One essential skill when it comes to working with data is to access the APIs services like Twitter, Reddit and Facebook use to expose certain amounts of data they hold. This tutorial will help walk you through an example with the Reddit API and help you understand the different code responses you’ll get as you query an API.

29-Introduction to Data Visualization in Python – Towards Data Science

Once you’re done crunching the data, you need to present it to get insights and share them with others. This guide to data visualization summarizes the data visualization options you have in Python including Pandas, Seaborn and a Python implementation of ggplot.

30-Top Python Web Development Frameworks to Learn in 2019

If you want a suite of options beyond Django to develop in Python and learn Python for web applications, look no further than this compilation. The Hacker Noon publication will often feature useful resources on Python outside of this article as well. It’s worth a follow.

Advanced Resources

learn python

31-Beginner’s Guide to Machine Learning with Python

This text-based tutorial helps introduce people to the basics of machine learning with Python. Towards Data Science, the Medium outlet with the article in question, is an excellent source for machine learning and data science resources.

32-Free Machine Learning in Python Course – Springboard

This free learning path from Springboard helps curate what you need to learn and practice machine learning in Python.

33-Machine Learning – Reddit

The Machine Learning subreddit oftentimes focuses on the latest papers and empirical advances. Python implementations of those advances are discussed as well.

34-Python – KDnuggets

KDNuggets offers advanced content on data science, data analysis and machine learning. Its Python section deals with how to implement these ideas in Python.

35-Learn Python – Beginner through Advanced Online Courses – Udemy

Udemy offers a selection of Python courses, with many advanced options to teach you the intricacies of Python. These courses tend to be cheaper than the certified ones, though you’ll want to look carefully at the reviews.

36-A Brief Introduction to PySpark – Towards Data Science

This introduction to PySpark will help you get started with working with more advanced distributed file systems that allow you to deal and work with much larger datasets than is possible under a single system and Pandas.

37-scikit-learn: machine learning in Python

The default way most data scientists use Python is to try out model ideas with scikit-learn: a simple, optimized implementation of different machine learning models. Learn a bit of machine learning theory then implement and practice with the scikit-learn framework.

38-The Next Level of Data Visualization in Python – Towards Data Science

This tutorial walks through more advanced versions of data visualizations and how to implement them, allowing you to take a preview of different advanced ways you can slice your data from correlation heatmaps to scatterplot matricies.

39-Machine Learning with Python | Coursera

Coursera’s selection of courses on machine learning with Python are veryw well-known. This introduction offered with IBM helps to walk you through videos and explanations of machine learning concepts.

40-Home – deeplearning.ai

Deeplearning.ai is Andrew Ng’s (the famous Stanford professor in AI and founder of Coursera) attempt to bring deep learning to the masses. I ended up finishing all of the courses: they offer certification and are a refreshing mix of both interactive notebooks where you can work with the different concepts and videos from Andrew Ng himself.

41-fast.ai · Making neural nets uncool again

This curated course on deep learning helps break down section-by-section aspects of machine learning. Best of all, it’s completely free. I often use fast.ai as a refresher or a deep dive into a deep learning idea I don’t quite understand.

42-Learn and use machine learning | TensorFlow Core | TensorFlow

This tutorial helps you use the high-level Keras component of TensorFlow and Google cloud infrastructure to do deep learning on a set of fashion images. It’s a great way to learn and practice your deep learning skills.

Exercises To Learn Python

learn python

43-Datasets | Kaggle

Kaggle offers a variety of datasets with user examples and upvoting to guide you to the most popular datasets. Use the examples and datasets to create your own data analysis, visualization, or machine learning model.

44-Practice Python

Practice Python has a bunch of beginner exercises that can help you ease into using Python and practicing it. Use this as an initial warmup exercise before you tackle different projects and exercises.

45-Python Exercises – W3Schools

The Python exercises on W3Schools follow the sections in their tutorials, and allow you to get some interactive practice with Python (though the exercises are in practice very simple).

46-Solve Python | HackerRank

HackerRank offers a bunch of exercises that require you to solve without any context. It’s the best way to practice different functions and outputs in Python in isolation (though you’ll still want to do different projects to be able to cement your Python skill.) You’ll earn points and badges as you complete more challenges. This certainly motivates me to learn more. A very useful sandbox for you to learn Python with.

47-Project Euler: About

Project Euler offers a variety of ever-harder programming challenges that aim to test whether you can solve mathematical problems with Python. Use it to practice your mathematical reasoning and your Pythonic abilities.

48-Writing your first Django app, part 1 | Django documentation | Django

This documentation helps you get on the ground with your first Django app, allowing you to use Python to get something up on the web. Once you’ve started with it, you can build anything you want.

49-Top 100 Python Interview Questions & Answers For 2019 | Edureka

Should you ever be in an interview where your Python skills are at question, this list of interview questions will help as a useful reminder and refresher and a good way for you to practice and cement different Python concepts.

Data Science/Artificial Intelligence, Learning Guides

Learn machine learning with Python: a free curated curriculum

How to learn data science and deep learning in Python

I recently wrote a 80-page guide to how to get a programming job without a degree, curated from my experience helping students do just that at Springboard. This excerpt is a part where I focus on how to learn machine learning in Python.

How to learn machine learning in Python is a very popular topic: with the rise of artificial intelligence, programmers have been able to do everything from beating human masters at Go to replicating human-like speech. At the foundation of this fantastic technological advance are programming and statistics principles you can learn.

Here’s how to learn machine learning in Python:

Sponsored link: 

Excel can be a powerful tool for data exploration and analysis when dealing with small data sets, but for anything more complex it often makes more sense to use Python. PyXLL lets you keep the best of both by integrating Python into Excel. You can use Excel as an interactive user interface and use Python to do the data fetching, cleaning and computation.

Python Basics

learn machine learning

Before you learn how to run, you have to learn how to walk. Most people who start learning machine learning and deep learning come from a programming background: if you do, you can skip this section. However, if you’re new to programming or you’re new to Python, you’ll want to take a look through this section.

Codecademy for Python

Codecademy is an online platform for learning programming, with free interactive courses that encourage you to fully type out your code to solve simple programming problems.

Introduction to Python for Data Science

This interactive Python tutorial is created by Datacamp, and is more suited to introducing how Python basics work in the context of data science.

11 Great Resources to Learn and Work in Python

This list of resources will point you to great ways to immerse yourself in Python learning. It’s a broad list filled with different resources that will help you, no matter your learning style.

Installing Jupyter Notebook

These are instructions for installing Jupyter Notebook, an intuitive interface for Python code. You’ll have all of the important Python libraries you need pre-installed and you’ll be easily able to export out and show all of your work in an easy-to-visualize fashion. I strongly suggest that you use Jupyter as your default tool for Python, and the rest of this learning path assumes that you are.

Statistics Basics

learn machine learning

In order to learn machine learning in Python, you not only have to learn the programming behind it — you’ll also have to learn statistics. Here are some resources that can help you gain that fundamental knowledge.

Khan Academy, Math, and Statistics

Khan Academy is the largest source of free online education with an array of free video and online courses. This section on Khan Academy will teach you the basic statistics concepts you need to know to understand machine learning, deep learning and more — from mode, median, mean to probability concepts.

Probabilistic Programming & Bayesian Methods for Hackers

This book will delve into Bayesian methods and how to program with probabilities. Combined with your budding knowledge of Python, you’ll be quickly able to reason with different statistical concepts. It’s a book the author gave out for free — and its deeply interactive nature promises to engage you into these new concepts.

Pandas

learn machine learning

The main workhorse of data science in Python is the Pandas data science library, an open-source tool that allows for a tabular organization of large datasets and which contains a whole array of functions and tools that can help you with both data organization, manipulation, and visualization. In this section, you’ll be given the resources needed to learn Pandas which will help you to learn machine learning in Python.

Cooking with Pandas

Julia Evans, a programmer based in Montreal, has created this simple step-by-step tutorial on how to analyze data in Pandas using noise complaint and bike data. It starts with how to read CSV data into Pandas and goes through how to group data, clean it, and how to parse data.

Official Pandas Cookbook

The official Pandas cookbook involves a number of simple functions that can help you with different datasets and hypothetical transformations you might want to do on your data. Take a look and play with it to extend your knowledge of Pandas.

Data Exploration and Wrangling

learn machine learning

Before you can do anything with the data, you’ll want to explore it, and do what is called exploratory data analysis (EDA) — summarize your dataset and get different insights from it so you know where to dig deeper. Fortunately, tools like Pandas are built to give you relevant and surprisingly deep summary insights into your data, allowing you to shape which questions you want to explore next.

By looking through your dataset from afar, you’ll already be able to understand what faults the dataset might have that will keep you from completing your analysis: missing values, wrongly formatted data etc. This is where you can start processing and transforming the data into a form that you want to answer your questions. This is called “data wrangling” — you are cleaning the data and making sure that it is able to answer all of your questions in this step.

Python Exploratory Data Analysis with Pandas

This article from Datacamp goes through all of the nuts and bolts functions you need in order to take a slightly deeper look at your data. It covers topics ranging from summarization of data to understanding how to select certain rows of data. It also goes into basic data wrangling steps such as filling in null values. There are interactive embedded code workspaces so you can play with the code in the article while you are digesting its concepts.

A Comprehensive Introduction to Data Wrangling

This blog article from Springboard is filled with code examples that describe how you can filter data, detect and drop invalid/null values from your dataset, how to group data such that you can perform aggregated analyses on different groups of data (ex: doing an analysis of survival rate on the Titanic by gender or passenger class) and how to handle time series data in Python. Finally, you’ll learn how to export out all of your work in Python so that you and others can play around with it in different file formats such as the Excel-friendly CSV.

Pandas Cheat Sheet

This Pandas cheat sheet, hosted on Github, can be an easy, visual way to remember the Pandas functions most essential to data exploration and wrangling. Keep it as a handy reference as you go out and practice some more.

Data Visualization

learn machine learning

Data exploration and data visualization work together hand-in-hand. Learning how to visualize data in different plots can be important is seeing underlying trends.

Beginner’s Guide to Matplotlib

This legend of resources on the official matplotlib library (the workhorse library for Python data visualization) will help you understand the theory behind data visualization and how to build basic plots from your data.

Seaborn Python Tutorial

The Seaborn library allows people to create intuitive plots that the standard matplotlib library doesn’t cover easily: things like violin plots and box plots. Seaborn comes with very compelling graphics right out of the box.

Introduction to Machine Learning

learn machine learning

Machine learning is a set of programming techniques that allow computers to do work that can simulate or augment human cognition without the need to have all parameters or logic explicitly defined.

The following section will delve into how to use machine learning models to create powerful models that can help you do everything from translating human speech to machine code, to beating human grandmasters at complex games such as Go.

It’s important before we get started implementing ideas in code that you understand the fundamentals of machine learning. This section will help you understand how to test your machine learning models, and what statistics you should use to measure your performance. It is an essential cornerstone to your drive to learn machine learning in Python. 

A Visual Introduction to Machine Learning

This handy visualization will allow you to understand what machine learning is and the basic mechanisms behind it through a visual display of how machines can classify whether a home is in New York or in San Francisco.

Train/Test Split and Cross-Validation in Python

This article explains why you need to split your dataset into training and test sets and why you need to perform cross-validation in order to avoid either underfitting or overfitting your data. Does that seem like a lot of jargon to you? The article will define all of these different concepts, and show you how to implement them in code.

Sci-kit Learn

learn machine learning

Sci-kit learn is the workhorse of machine learning and deep learning in Python, a library that contains standard functions that help you map machine learning algorithms to datasets.

It also has a bunch of functions that will allow you to easily transform your data and split it into training and test sets — a critical part of machine learning. Finally, the library has many tools that can evaluate the performance of your machine learning models and allow you to choose the best for your data.

You’ll want to make sure you know how to effectively use the library if you want to learn machine learning in Python.

A Gentle Introduction to Scikit-Learn

This post introduces a lot of the history and context of the Sci-Kit Learn library and it gives you a list of resources and documentation you can pursue to further your learning and practice with this library.

Scikit-Learn Documentation

The official scikit-learn documentation is filled with resources and quick start guides that will help you get started with Scikit-Learn and which will help you entrench your learning.

Regression

learn machine learning

Regression involves a breakdown of how much movement in a trend can be explained by certain variables. You can think about it as plotting a Y or dependent variable versus a slew of X or explanatory variables and determining how much of the movement in Y is dependent on individuals factors of X, and how much is due to statistical noise.

There are two main types of regression that we’re going to talk about here: linear regression and logistic regression.  Linear regression measures the amount of variability in a dependent factor based on an explanatory factor: you might, for example, find out that poverty levels explain 40% of the variability in the crime rate. Logistic regression mathematically transforms a level of variability into a binary outcome. In that way, you might classify if a name is most likely to be either male or female. Instead of percentages, logistic regression produces categories.

You’ll want to study both types of regression so you can get the results you need.

Simple and Multiple Linear Regression in Python

This informative Medium piece goes into the theory and statistics behind linear regression, and then describes how to implement it in Sci-Kit Learn.

Building a Logistic Regression in Python, Step-by-Step

This Medium tutorial uses the Sci-Kit Learn tools available to implement a logistic regression model. The amount of detail in each step will help you follow along.

Clustering

learn machine learning

Another type of machine learning model is called clustering. This is where datasets are grouped into different categories of data points based on the proximity between one point and other groups of points. Mastering clustering is an important part of learning machine learning in Python. 

An Introduction to Clustering and different methods of clustering

Analytics Vidhya has presented this comprehensive introduction to clustering methods: it’s good to get a handle on this theory before you try implementing it in code.

Customer Segmentation using Python

This article from Yhat demonstrates how to do simple K-means clustering across different wine customers. It’ll take your learning in Pandas and Scikit-Learn and combine them into a useful clustering example.

Deep Learning/Neural Networks

learn machine learning

Neural networks are an attempt to simulate how the human mind works (on a very simplified level) in computational code. They have been a great advance in artificial intelligence — and while in some ways they are a black box of complex algorithms working in tandem to learn how data generalizes, their practical applications have exponentially multiplied in the last few years. Deep learning encompasses neural networks as well as other approaches meant to simulate human intelligence. They are an important part to learn if you want to learn machine learning in Python. 

In a huge breakthrough, Google’s AI beats a top player at the game of Go

This short Wired article isn’t a technical tutorial: it’s the recounting of an epic match between a human grandmaster at Go, a game that was supposed to be so complex for computers to win that technology to do so wasn’t supposed to come until around the 2030s. By leveraging the power of neural networks, Google was able to bring AI victory forward some two decades or so. This article should give you a great glimpse at the potential and power of neural networks.

A Beginner’s Guide to Neural Networks in Python and SciKit Learn 0.18

This example-laden tutorial uses the neural networks module in the Scikit-Learn library to build a simple neural network that can classify different types of wine. Follow along and play with the code so you can get a feel for how to build neural networks.

Develop Your First Neural Network in Python With Keras Step-By-Step

This tutorial from Machine Learning Mastery uses the Python implementation of the Keras library to build slightly more powerful and intricate neural networks. Keras is a code library built to optimize for speed when it came to experimenting with different deep learning models.

Big Data

learn machine learning

Big data involves a lot of volume and velocity of data. It’s an amount of data, measured in petabytes, that can’t be processed easily with tools like Pandas, which are based on the processing power of one laptop or computer.

You’ll want to scale out to controlling many processors and servers and passing data through a network to process data at scale. Tools that allow you to map and reduce data between multiple servers and others such as Spark and Hadoop play an important role here. It’s time to take the learning you’ve had before this and apply it to massive data sets! You can’t learn machine learning in Python without dealing with big data. 

Get Started With Pyspark and Jupyter Notebook in 3 Minutes

This blog post will help you get set up with PySpark, a Python library that brings the full power of Spark to you in the Jupyter Notebook format you’ve been used to working in. PySpark can be used to process large datasets that can go all the way to petabytes of data!

PySpark Video Tutorial

This video tutorial will help you get more context about PySpark and will provide sample code for tasks such as doing word counts over a large collection of documents.

Using Jupyter on Apache Spark: Step-by-Step with a Terabyte of Reddit Data

This tutorial from Insight goes a little further than installation instructions and gets you working with Spark on a terabyte (that’s 1024 gigabytes!) of Reddit comment data.

Machine Learning Evaluation

learn machine learning

Now that you’ve learned a baseline for all of the theory and code you need to learn machine learning in practice, it’s time to learn what metrics and approaches you can use to evaluate your machine learning models.  

Metrics to Evaluate Machine Learning Algorithms in Python

In this tutorial, you’ll learn about the different metrics used to evaluate the performance of different machine learning approaches. You’ll be able to implement them in Scikit-Learn and Jupyter right away!

Model evaluation, model selection, and algorithm selection in machine learning

This long six-part series (check the end of this blog post for more posts after) goes deep into the theory and math behind machine learning evaluation metrics. You’ll come out of the whole thing with a deeper knowledge of how to measure machine learning models and compare them against one another.

Suggested daily routine

Learning isn’t often a static thing. You need ongoing practice to master a skill. Here’s a suggested learning routine you can implement in your day to make sure you practice and expand your knowledge and learn machine learning in Python.

Here’s my suggested daily routine:

  1. Continue working on something in machine learning at all times
  2. Go to StackOverflow, ask and answer questions
  3. Read the latest machine learning papers, try to understand them
  4. Practice your code whenever you can by looking through Github machine learning repositories
  5. Do Kaggle competitions so you can extend your learning and practice new machine learning concepts

At the end, you’ll have effectively mastered how to learn machine learning in Python!

Want more material like this? Check out my guide on how to get a programming job without a degree.

Learning Lists

99+ Places To Get Free Courses

Here is a list of places where you can find collections of free courses. It’s important to start learning with different resources, and to get different topics under your wing – and this is a list of resources that can help you get there.

General collections of free courses

Consult this section of the list for course providers and universities that have decided to release some of their catalog for free, and which include a variety of free courses in different subjects.

  • Coursera has a list of about 3,000 free courses in a variety of subjects ready for you to take.
  • Stanford Online offers free courses in a variety of topics, from health & medicine, education, to engineering.
  • Not to be outdone, Harvard has a selection of free courses on offer, about 150, including sections from the famous CS50 — which has offered a free introductory computer science education for countless individuals.
  • OpenLearn is an initiative of the Open University, which offers remote distance learning for free, with courses ranging from money and business to languages.
  • Springboard is an online bootcamp that specializes in teaching digital skills. The platform also offers a variety of free resources, from e-books to curated learning paths that help you tackle and organize your learning in a variety of subjects, from business to mobile development.
  • Khan Academy offers free courses in many subjects, from mathematics to history. The courses are offered on the Khan Academy platform, which comes with AI tools that help with learning, to a gamified system that allows you to earn badges as you learn.
  • Learn with Google is a specific initiative within Google that focuses on helping people take free online courses in everything from the fundamentals of digital marketing to the basics of code. There are a selection of courses in the catalog that are free.
  • Udemy has a section dedicated to all free courses offered on their platform.
  • FutureLearn is another company that offers free offerings from their catalogue, everything from IT to medicine.
  • MITOpenCourseware offers a selection of free material from MIT courses. There are free lecture notes, exams and videos for you to look through on a lot of the active course catalog within MIT itself.
  • Open Yale Courses does the same thing as other Ivy League universities in offering a selection of introductory level courses to take for free.
  • The UCLA Extension School offers a variety of free courses from the UCLA catalog.
  • Saylor Academy offers a variety of courses that can be used to create a for-credit pathway that is entirely free. In that respect, it goes a step beyond some of the university offerings here – by allowing people to actually get more than the course material, but an actual class credit.
  • Mooc.org is an initiative from edX that allows access to 3,000+ massively online courses.
  • Openculture offers a list of university courses that are for free from a wide variety of universities, and includes a listing of resources like the above.
  • Learning platform Skillshare offers a variety of free courses from their platform creators.
  • Simplilearn is another platform that offers free courses for a variety of topics. Most of these are focused on cutting-edge digital skills like data analysis and cybersecurity.
  • Lighthouse Labs is a bootcamp that teaches cutting-edge digital skills: there’s an online catalog of introductory tech courses.
  • IBM offers different training options, including a variety of free ones.
  • Deakin is another university platform for free courses that boasts of a community of 60,000+ learners.
  • Oxford Home Study is a platform focused on free UK-based courses. Though the courses are free, they require a small fee for a certificate of completion.
  • Upgrad is a site that offers courses in emerging technology skills. A selection of their catalog is available for free.
  • LinkedIn Learning has video courses on pretty much any topic in the world. There is a free one-month trial so you can try it as a series of free courses, but you’ll have to pay for it later.
  • Kadenze is a platform that offers certification and material from free online courses spread from university material.
  • Bingham Young University offers a selection of their courses for free in this catalog. They’re not for credit, but they cover a vast array of subjects from astronomy to interior design.
  • LINCS has a section filled with free courses in nine topics. This is an initiative of the United States Department of Education.
  • University of Washington has a website that shows the free or discounted courses it has on offer through either edX or Coursera.
  • East Sussex College is another university offering free courses, but there’s a couple of twists. The first is that the college is UK-based unlike the US-based ones listed previously. The second is that these courses are fully funded for those 19 and over and eligible to study in the UK – the courses are also fully accredited, and completers get a certification.
  • Hugh Baird College also offers free courses, but you must be based in Liverpool.
  • Pearson is an edtech platform and company that offers an extensive amount of services in edtech. One of these is a selection of free courses for adult learners.
  • British Columbia Institute of Technology offers free courses in vocational and technical fields. The university is based in British Columbia in Canada, and has been established since 1964.

Specific-topic free course providers

In this section, we have providers that are more niche: rather than general platforms with free courses on many topics, these course providers tend to focus on one area.

  • freeCodeCamp offers a free course to teach you how to code and learn the fundamentals of many coding technologies, from web development, to basic data science and analysis. The blog is filled with useful resources as well.
  • Code.org gives you the option of doing one hour of training in code with a variety of free tutorials.
  • MakeCode is a Microsoft initiative to help teach free coding courses, including an option to advance to JavaScript and Python, within Minecraft.
  • Programiz is ready to help you train up in some of the most popular and useful programming languages of today with free courses.
  • Sabio is a coding bootcamp that offers its prep course for free.
  • reedsy offers a selection of 50+ free courses for writers that allows you to refine your ability to write books.
  • Nonprofitready offers a series of free courses on non-profit skill development, from fundraising to grant writing.
  • Webflow, a UI/UX tool offers a university of sorts with free courses and tutorials for design.
  • Canva is a tool that lets you easily create your own graphic design templates and assets. They offer a school that allows you to master graphic design elements as well as the tool itself, with all of the courses being free.
  • Baseline offers a free UI/UX design bootcamp which features a free Slack community, and lessons in everything from an introduction to design, to digital product design.
  • Figma Crash Course is a resource of free Youtube tutorials put together to give you a powerful deep dive into one of the most popular design tools out there.
  • DesignBetter is an initiative by UI/UX design prototyping tool Invision: it offers free handbooks that dig deep into design topics.
  • Awwwards Academy is an educational resource created by the well-known Awwwards site. It offers interactive sessions that have been recorded.
  • Envato Tuts+ has an outline for self-study for graphic design that will help organize your learning here.
  • CreativeLive is a platform that offers free courses on graphic design and other design skills. The full courses sometimes have to be paid for, but there are free trials.
  • Domestika offers free courses for creatives, everything from art styles and photography.
  • MongoDB offers a university for teaching MongoDB skills which will allow you to learn the skills needed to navigate modern non-SQL databases for web applications.
  • SAP offers free courses for preparation for SAP certifications that will help with cloud computing and architecture work.
  • Chessable helps you develop your chess skills for free, with video courses that help explain openings, midgame, and endgame patterns as well as common tactics.
  • Lectera has free courses that span the gamut from business to public speaking.
  • Shopify, the leading online platform for e-commerce, offers courses from experts on how to start your own digital business.
  • Meta has a learning section and courses for people trying to learn how to run Facebook ads and more. Note that you need to be logged into your Meta account.
  • Firstaidforfree has free online courses with no hidden fees that revolve around first aid. For example, you can learn CPR online for free on the platform.
  • Berklee Online offers a variety of practical music free courses, everything from theory to how to do marketing with music.
  • EC Council offers a 3-part introductory cybersecurity course series for educators and beginners in the field.
  • Hubspot, a marketing and sales tool, offers free courses in both topics.
  • Hootsuite gives you social media certification and courses. The platform training is free, but there is a $99 one-time fee for certification.
  • The Corporate Finance Institute offers a selection of free finance courses, from building pitch decks to financial analysis. There are about 40 free courses if you click on the free checkbox in the filtering options.
  • Palo Alto Networks offers a variety of free cybersecurity courses to sharpen your skills, from the fundamentals of network security to the basics behind cloud security.
  • CBTNuggets has some IT courses and trainings available for free after a 7-day trial – and there’s continued support for 53 hours of free training even if you cancel past the free trial.
  • History U offers high school students free courses on American history from leading university lecturers. Critical moments from the Civil War to the American Revolution are covered: leading American thinkers such as Frederick Douglass are profiled in detail.
  • The Institute of Historical Research is a United Kingdom-based organization that offers free online courses on the tools behind history, from digital preservation, to using digital tools and techniques to advance historical research.
  • MadameNoire offers a list of five free courses focused specifically on black history.
  • Duolingo is an application that is the world’s leading way to learn languages. You can learn almost every language in the world with free interactive courses that allow you to practice the languages of your choice on your phone.
  • USALearns has a bunch of free courses that teach English – and can help learners understand the path to US citizenship as well.
  • ESOL Courses has a catalog of free courses on English for young learners, and those looking to learn English as a second language.
  • BBC Languages has free courses and tutorials on 40+ languages. While the site is not updated anymore, it remains active with links to resources and courses.
  • LanguageGuide offers interactive courses with sound exercises in 25 languages.
  • Nike Training is ready to offer free training and courses when it comes to fitness.
  • Do Yoga With Me helps you with the training you need to become a yoga instructor, with 200-hour yoga teaching certification.
  • Github Free Courses Repository is a Github page over an underlying Github repository that curates free software engineering courses, from web development frameworks, to mobile development.
  • Dash is built by bootcamp General Assembly to learn software engineering through interactive exercises and building projects like a sample website.
  • Rithm School is a bootcamp that focuses on software development and which offers free coding courses.
  • Bento curates free coding tutorials so you can access a bunch of learning materials at once.
  • Microsoft Learn has free learning paths for people looking to learn software engineering topics.
  • CodeGym offers a free set of software engineering courses but mostly focused on the Java programming language.
  • Joy of WP gives you the opportunity to build your own WordPress site with free videos.
  • Santa Clara University offers free business courses that can come with a certificate from the university. The courses are about how to start and expand your own business.
  • Harvard Business School has previews of its learning experience through free e-books and sample lessons.
  • The Small Business Administration, a federal agency of the United States, offers free trainings in a variety of business topics.
  • SERanking has a free video course by Joe Williams on how to rank content for SEO.
  • Quintly has this free course on social media analytics that covers how to find KPIs and create automated reporting.
  • Yoast is an incredibly popular WordPress plugin that helps you with SEO optimization. They offer free SEO training through their academy.
  • Neil Patel, an online digital marketing expert, offers this course on email marketing.
  • Wordstream is an online tool for digital marketers looking to optimize their paid ads. Their academy is a series of blog posts, but taken together, are a free course of sort for many advanced digital marketing concepts.
  • SEMRush is a SEO and Google Adwords tool. This free course dives into the basics of pay-per-click bidding from an expert.
  • DataCamp is a data science focused education company. Understanding Data Science is a free introductory course for people who want to get started.
  • Kaggle is a platform that offers data science competitions. Its learning section has interactive free courses that will help you upskill as a data scientist with basics like Python and Pandas knowledge.
  • Fast.ai is a series of educational resources including free courses that aim to make “neural nets uncool again”. Unlike the other data science resources here, Fast.ai is really focused on helping you learn and understand the basics and advanced topics required to train your own neural nets.
  • Infosys Springboard is an initiative by Infosys to give free courses and training on cutting-edge tech topics like the basics of data science and data analytics.
  • RSquared Academy has free courses in R — the programming language that was the default for data analysis and data visualization, and is still a strong component of the data ecosystem.
  • Stat 101 brings Harvard’s introductory statistics course for free on Youtube.
  • Harvard University offers free law courses as well separate from their other offerings.
  • LA Law Library has a selection of free courses based on law.
  • The National Center for State Courts offers free evidence-based courses to both court professionals as well as members of the general public.
  • The AFP News Agency offers a free course in journalism, specifically how to use digital means to investigate and track down stories.
  • The Thriving Corals Project has a free course in marine biology – helping people to learn more about the oceans.
  • Jobberman has a curriculum structured around soft skills in the modern workplace.
  • Clever Girl Finance offers a catalog of free courses in personal finance – a critical topic.
Uncategorized

Introduction to RAPIDS and GPU Data Science: CUDF/Dask vs. Pandas

RAPIDS is the new framework for distributed data science and machine learning provided by NVIDIA. You can use software optimized to do distributed work over GPU hardware rather than just standard CPU cores.

This provides a lot more computational speedup for machine learning training and tasks, with many people reporting speedups over large datasets and common machine learning tasks to the order of magnitude of 10x or 100x.

RAPIDS is actually a set of APIs in both Python and C++ to implement common machine learning tasks on GPUs instead of CPUs. It integrates with CUDA, which is a Nvidia framework for parallel computing. 

The purpose of this article is to build something like the Pandas Cookbook together for RAPIDS. I want to make it easy and intuitive to go from the Pandas and CPU ecosystem to taking advantage of GPUs and the increased computational power they can deliver.  

GPUs vs CPUs

GPUs are an odd product of the need for humans to game. Gaming is a computationally expensive activity. Tons of memory and processing needs to happen behind the scenes to simulate nearly real universes for gamers. 

This typically means that GPUs work with multiple cores (sometimes hundreds) that can perform simultaneous and parallel processing while CPUs are focused on a few threads with sequential calculations. While each individual thread may be slower than CPU threads, taken together on many shallow calculations, GPUs can vastly outperform CPUs by working on them all at once. There is some overhead on this, but on sufficiently large datasets or data pipelines, the differences lead to large speedups. 

Getting Access To GPUs or TPUs

Getting access to GPUs and TPUs can be quite difficult. TPUs are specific TensorFlow processing units. For GPUs, the options are to work with the cloud or to build your own GPU machine. 

On the cloud, there are several services that offer free GPU cloud hours, most prominently Kaggle and Google Colaboratory.

Google Colab offers the ability to use GPUs for free. However, the GPUs are randomly allocated and it’s hard to get good ones. The free one also cuts off after 12 hours. 

The pro version of Google Colab, at $9.99 a month, is available in the United States and offers premium availability for Nvidia GPUs. It also offers more uptime (up to 24 hours) and more lax restrictions when it comes to idle times. 

You can also create your own deep learning hardware. This tutorial shows you how to do it in under $1000, though it’ll require some setup and some patience on your side — though at the end, you’ll have a machine that will save you some variable cost. In practice, you’ll only want to do this if you’re serious about machine learning use cases, and using as much of the fixed cost compute as you can. 

Of course, if you use AWS, Microsoft Azure, or Google Cloud solutions, you can pay for GPU access on those platforms, though that may end up costing a lot.  

For the purposes of this playbook, we’re going to start using Kaggle, which comes with access to both TPUs and GPUs as accelerators, though you’ll need to verify a phone number to get access to that. Once you do, you can set up GPUs then install RAPIDS through the handy dataset

Then you can import datasets from Kaggle datasets and you’re off and running. However, Kaggle has a 41 hour weekly quota on GPU usage — which means that it’s ideal for short experiments and learning examples.

The Kaggle instance will pause every 40 minutes or so. The best practice would be to pause the instance and turn it off when you’re not using it.

The above is a screenshot of my GPU usage. The usage will reset every week. You also won’t get access to the latest NVIDIA architecture, but it will be free. There’s an easy-to-access dashboard on your GPU and CPU usage as well. 

If you want to get started right away with powerful infrastructure, BlazingSQL offers a free hosted Jupyter environment with the latest version of the RAPIDS stack pre-installed and a bit of GPU memory to play with. They’re also offering beta access to clusters of cloud GPUs.

Rapids will be pre-installed, but it’ll be harder to get intuitive access to different datasets as you might on Kaggle — so we’ll stick with Kaggle for now for the importing data part. But BlazingSQL can be used in practice, especially since Kaggle’s data and compute limits are set more towards learning rather than production. 

cuDF’s role in Rapids

cuDF is meant to be the data manipulation layer of Rapids, allowing for the rapid manipulation of dataframes over GPUs. In the documentation, it describes cuDF as being useful for loading, joining, aggregating and filtering data.

You can think of it as a Pandas equivalent within Rapids and in fact many of the functions from cuDF map pretty closely to their Pandas equivalents.

In practice, when you’re dealing with data or wrangling data, you’re likely going to have to deal with cuDF if you want to work on RAPIDS on a GPU.

Dask-cuDF vs. cuDF vs. Pandas

Dask is a parallel processing library that slices up Pandas dataframes on CPUs. It can be used with cuDF combining multiple GPUs and chunking. This documentation from the RAPIDS team summarizes the difference and goes into detailed documentation of the different functions possible- from your standard filtering and value_counts() to more complex groupbys and aggregations.

The syntax here is very similar to Pandas — in practice, you’ll see using cuDF and Dask-cuDF as very similar experiences to the Pandas API, just with slightly less function completeness. 

It’s of course important to also note when it’s best to use each framework: 

  • Dask-cuDF for when you have very large datasets that you need multiple GPUs to train on and you have more memory in your dataset than the GPU can handle 
  • cuDF for when you have a large dataset that can be trained and wrangled on a single GPU and when you maybe don’t have access to multiple GPUs, such as our example on Kaggle, where you only have access to one GPU
  • Pandas for a small enough dataset that can fit and be trained on CPU only. In practice, for most standard setups, unless you have a particularly strong computer with a GPU installed, Pandas will be “good enough” for now, especially with smaller datasets.

Dask-cuDF, cuDF Import/Export Data With Pandas and CSV

It’s relatively simple to go from different datatypes into the three frameworks, and pass dataframes from framework to framework. Let’s discuss how to transfer between the different frameworks.

It’s quite simple to go from a Pandas dataframe to a cuDF dataframe: it’s a one-line command. In this case, we take our predefined dataframe (seattlelibrarydf) of the Seattle Library inventory and convert it into a cuDF dataframe with many of the same properties (seattlelibrarycudf). 

This simple function helps turn a Pandas Dataframe into the cuDF equivalent. 

Common Functions

It’s time to put cuDF to the test and actually get working on a large dataset. In the case of Kaggle, we’re going to work with the Seattle Public Library dataset, a large collection of CSV files that tabulate the inventory of the Seattle Public Library as well as a set of CSVs that describe the yearly checkout patterns. Specifically, we’re going to join together the yearly checkout data and then do analysis on the inventory.

Let’s now look at a common function, the ubiquitous value_counts in Pandas. This takes a column and returns an aggregated count of cell values. In this case, we’ll do it on the different collection codes we can later join onto descriptions of the categories.

Note here that the syntax for the function in both Pandas and cuDF is essentially the same — but by using cuDF on an average-power GPU and a slightly larger than 1 GB dataset, with more than 2.5 million records, we achieve about a 10x speedup even in pre-processing the data, from 679 milliseconds to 78.9 milliseconds. 

Groupby/Aggregations

Now let’s get to the meat of the dataset and join together different datasets that form the yearly check-ins per each item. This is not a trivial exercise. By the end, we’ll have combined together a large dataset of multiple GB (about 7 gb, or slightly under 50% of the GPU memory allocation Kaggle gives us) with about 90 million rows. 

The read_csv function of cudf will also have some issues, specifically with the datatypes you need to define and validate. It will sometimes take columns and mix up datatypes, meaning you have to set them manually with the dtype variable.

However, cuDF is decently finicky about how you do this. So far, I’ve found that ‘int64’, ‘timestamp’, and ‘str’ (and it’s important that they be passed as strings) works, unlike the numpy variants suggested in the basic documentation. You can track the progress on this open Github issue

Let’s now do some data wrangling and joins. We want to see what ten items are most frequently checked out in the dataset. We can do this really quickly by slicing a value_counts() method call just like you might do in Pandas.  We’ll do this on the BibNumber column which serves as a primary key that unites both checkout data and the underlying information about each inventory item. 

We get a bunch of item numbers. But what are the actual items here? Who are the authors? Can we say anything about these items beyond their key numbers.

To find that insight, we’ll have to perform a join of both the inventory information and the aggregated checkout data — and we’ll have to clean up the dates and times represented in the last CheckoutDateTime column.  This is something we may cover in another tutorial. 

For now, hopefully, you’ve learned enough to get set up on RAPIDS and CUDF and why you might want to use it.

Learning Lists, Resources Lists

99+ Places To Find Remote Coding Jobs

COVID-19 has turned the world around in a short period of time. In a span of a few months, the global economy has fundamentally shifted. There’s now a premium on jobs that can be done remotely as people around the world are instructed to shelter in place. Here are 99+ places you can find remote coding jobs, organized by category. 

Let’s start out with some spreadsheets and websites forwarded by some people at Springboard, specifically Siya. Each one of these links is worth looking at.

Candor

A website devoted to those companies that are hiring or not in COVID-19.

• Floodgate Capital/Unshackled/Awesome People Ventures spreadsheet of companies hiring.

• NEA’s start-ups actively hiring list.

Breakout List

1575 Remote Jobs From 100+ Companies Hiring Remotely in February 2020

30 Co’s Still Hiring

Torch Capital

Summer internships for students

Rock Health Job Board

Remote Developer Job Boards With Remote Coding Jobs

These job boards are focused on remote developer jobs in general, without a particular speciality. Some of them are slightly more general, but have categories that make it easy to find remote developer jobs within. 

Remotely Awesome Jobs

There are hundreds of remote-friendly developer jobs added each week in this remote jobs search engine. You can sign up for daily or weekly job alerts for free. 

Remote Jobs Club

A biweekly job newsletter that offers remote jobs served directly to your inbox, with many developer remote jobs featured. Sign up and get access to new postings. 

Remotive

Remotive is a large remote jobs community, with a friendly Doge mascot. There are about 250 jobs in April 2020 that are remote-friendly and developer-based, with technology-based tags so you can search through companies that use your preferred technology stack. 

ZipRecruiter Remote Developer Jobs

ZipRecruiter’s job board offers email updates, as well as rough salary averages for the field as a whole so you know the ballpark you’re in when you’re looking at the different jobs displayed. 

Glassdoor Remote Software Engineer Jobs

Glassdoor is known more for its employee ratings of workspaces, though it also offers this job board. Though this is the Canadian version (as the author of this piece is based in Canada), there are international versions. There was a good volume of remote developer jobs posted here (300+), and you can filter through various variables like time posted. 

Workopolis Remote Software Developer Jobs

Workopolis is one of the largest job search sites in the world. Here, this particular link is focused on remote jobs based in Canada, but you can toggle back and forth to your region of choice. 

StackOverflow Remote Developer Jobs

StackOverflow is a community for Q&A for programming questions across different languages. The jobs section offers a view of different jobs tagged by the technologies required. There are remote-friendly roles along with a handy section that specifies the preferred timezone of the employer, as well as other factors like company size, and company industry. It’s a great resource for finding remote jobs. 

Remote Developer Jobs on LinkedIn

LinkedIn, the largest career-based social network in the world, offers a remote toggle option for its job boards — so you can focus on remote-friendly jobs as a developer by searching for developer jobs and then toggling by remote location. 

RemoteOK

One of the biggest remote working job boards, with jobs in other tech categories like marketing jobs. There are about 12,000+ in the remote coding jobs section.  

Remote.co

A listing of remote jobs for developers on one of the largest remote work job boards in the world. Remote.co serves as an extension of FlexJobs, but unlike FlexJobs, many of the listings seem accessible for free. 

JustRemote

This job board has a few positions but they tend to be senior remote-friendly roles, which is a bit of a rarity — companies tend to like to hire individual contributors in remote roles, but not so much for more senior roles. You can use this job board to filter through for those rare coveted senior roles. 

Remote jobs are telecommute jobs here. There are about 1,000 freelance listings for software development jobs that you can search through (a lot more than its name would imply).

Remotees

Remote jobs here are sorted by tags — you can sort through the engineer tag, for example, to find about 350 remote software developer jobs. Or you can tag through to more specific technologies. 

Remoters

Another remote-focused job board. You can filter here by salary range, which can be helpful to quickly pick through job postings that fit within your personal salary aspirations.

Remote Software Developer Jobs On Indeed

Indeed is a job search aggregator and search engine, one of the largest in the world if not the largest. They also offer a category for remote software developer jobs as well, which you can consult from your region of choice. 

We Work Remotely

We Work Remotely is another awesome remote working community. Jobs cost $299 to post here, so you should expect that those featured ads have very highly motivated hiring managers who are explicitly looking for remote programmers. Though the community has jobs in other fields, there are plenty of jobs for remote coding. 

Hired

Hired is a marketplace for employers, where employers offer candidates on the platform who are vetted. While there are jobs that are city-based on the platform, it’s possible you might get some more remote-friendly options given the hiring managers tend to be startups and technology companies. 

Vettery

Vettery works by connecting you with hiring managers directly who might want to interview you. You’ll get interview requests with salary levels attached. Currently, it’s mostly focused on American cities, though it’s possible that with hiring manager interest, you might be able to convert opportunities into more remote work. 

SkipTheDrive

Just like the name implies, this job board is about telecommuting in general so you can skip the drive and work remotely. There’s a category for software engineering remote jobs called “software development”. There’s also a note that the desired skills include Python, Ruby, C#, and Java — though later on, we’ll see there are JavaScript specific remote job boards that can help you land remote jobs with JavaScript. 

Joblist

This list of developer positions seems to be focused on European companies as well as remote jobs from the United States, which is a bit different from the usual North America focus on remote developer jobs. The application offers email updates as well. 

Outsourcely

Outsourcely lets you have access to remote full-time jobs. It’s not as set up for contractor or freelance roles — it’s more a curated marketplace for people who are looking for freelance full-time employees. 

Jobspresso

A job board focused on remote work, with postings from top organizations like Medium and Mozilla. There seems to be a focus on the quality of job postings with this board, making it slightly different from the rest of them. 

Careerjet

Careerjet is a job search engine with the option of toggling for remote developers. You can scan through remote jobs based in some countries with filters. The site is more of an aggregator than an original repository of jobs. 

WorkingNomads

WorkingNomads curates lists of the most attractive remote working options — with most of the jobs being development-focused. 

Virtual Vocations

A remote jobs board with an ability to toggle between the “levels of telecommuting” that you want in a remote job. There are also COVID-19 specific tips as well that might be worth consulting. 

CloudPeeps

Cloudpeeps is a platform for talented freelancers. While there are less software engineering jobs here, you might be able to find some remote work here if you work on, for example, SEO or more design-oriented tasks. 

Startuphire

Startuphire is focused on finding jobs within San Francisco for now, but it looks like they’re looking to expand out to other cities. While not the most remote-friendly of sites currently, it’s worth keeping an eye out for, especially as they begin to expand out to projected cities such as Berlin, London and Toronto.

Flexjobs

FlexJobs is one of the original remote work communities. Unlike the other job boards posted here, however, you need to pay a small membership fee to get access to vetted remote jobs. They’ve also provided a free webinar and free course on remote working as a part of their COVID-19 response. 

MyRemoteTeam

MyRemoteTeam is a beta platform dedicated to providing best in class full-time remote developers in the latest technology stacks, from JavaScript frameworks to machine learning. While it’s in beta mode, you can apply as a freelancer. 

Dice

Dice is one of the largest technology job boards. There are a bunch of COVID-19 specific resources on the site and a nice, curated list of remote roles in programming. 

PowerToFly

PowerToFly is a job board based on organizations that pride themselves on diversity throughout their hiring practices. They also offer video chats, including ones filled with resources on remote teams and working from home that can be useful to anybody looking to apply for remote programming jobs.  

The Muse

The Muse is one of the largest job resources sites in the world, filled with actionable and useful advice that can help job seekers of any kind. Their job board has jobs tagged as remote/flexible for the location. 

Geekwire

Geekwire has a bunch of technology and startup focused jobs listed. There aren’t as many explicitly labelled remote jobs here as there might be in other job boards, as the categories are more around the general type of work (ex: freelancer vs full time) vs. the location of it. Nevertheless, it’s a good resource to consult if you’re digging through and trying to get the most remote job listings possible. 

Mashable Jobs

Mashable is one of the most popular blogs in the world. It offers a technology section and covers up-and-coming startups. While its job board is more general, you can find remote developers jobs using some fairly tailored searches (I’ve narrowed it down so you can at least filter through the existing remote roles by keyword in the link above.) 

Women Who Code

Women Who Code exists to inspire women to excel in tech careers. It’s filled with technical resources and more, set up to help women of all kinds begin and accelerate their tech careers. Their jobs board has technical jobs, with some marked as being remote-friendly in the location field. 

Jobs.Crossover

Crossover is the largest remote hiring firm in the world, with a portfolio of over 100+ cloud SaaS products. Their hiring role for a software engineering manager will give you a good idea of the kind of company they are, and for what skills they’re hiring for. These tend to be long-term roles: however, it seems like you can work from anywhere in them. 

RemoteCircle

A job board for remote jobs, where you can toggle by category (you might scan through programmer jobs, for example) as well as type of work (ex: full-time vs. contract). There are a few hundred remote jobs for developers. 

DynamiteJobs

This remote job board has had a few thousand postings since 2017, and about 700 active ones now. There is a higher degree of curation, and there are plenty of remote resources as well, differentiating DynamiteJobs from other remote job boards by their focus on quality, and truly contributing knowledge to remote workers with resources on remote interviews, job applications and more. 

BuiltIn

Another tech-focused jobs board, with the ability to filter by developer jobs. You can also toggle to locations across the United States or the remote option. 

RemoteLeads

RemoteLeads is really interesting — it’s not a job board, but rather a curator of remote jobs sent straight to your inbox. You choose the technologies you’re interested in as well as your pace of work (full-time or part-time), then you get a selection of 500+ remote job postings sent and curated in a personalized email funnel for you. It can be a great way to sit back and let remote job opportunities come to you rather than actively hunting for them across the web. 

ITJobPros bills itself as the most popular tech jobs site. Its emphasis on the term IT and its hiring managers focus on larger, more established companies means that unlike other job postings, you’ll get more than the startup and tech jobs selection, you’ll get more postings from large corporations all the way to the Fortune 500 — however, this comes at a cost of a lower observed rate of remote-friendly roles. Still, it’s worth a look, especially if you want remote work but don’t really want to get into startups or emerging tech organizations that tend to support that kind of work the most. 

WFHJobs

With thousands of remote work opportunities (most based in the United States) here, you can look through developer and data science jobs. The job board seems to be a remote extension of a collection of technology-specific job boards we discuss below in the context of JavaScript-related remote job boards. 

Honeypot

Honeypot is the European version of Hired for developers, giving developers the chance to apply for remote-friendly companies and positions closer to the European continent. While it may not be explicitly remote-friendly, that’s something that hiring managers may be able to offer given that the platform is meant to recruit elite talent. 

Landing.jobs

Landing.jobs also offers European-based developers an option for looking for job postings that are based around Europe, including some remote-friendly options. 

Remote-friendly programming job communities with remote coding jobs

Remote-friendly programming communities

AngelList

AngelList is one of the largest startup communities in the world. Startup founders will post their companies here in order to get financing from venture capitalists and other investors. This also creates volume for job postings for startups that tend to hire for tech roles. You can look through the postings with your own profile, and apply to hiring managers directly with one click. 

Hacker News

Hacker News is an upvote-focused messaging board populated mostly by Y Combinator startup founders. HN Hiring is a service that combs through the monthly “Who’s Hiring” posts that surface jobs directly from hiring managers who browse Hacker News, many of whom will tag remote-friendly options. This will tend to be a high-quality source of remote programming work that helps with interesting problems and businesses. 

Metafilter

Metafilter is a community where anybody can post links, sort of like Hacker News, but more focused on creatives and music and more free-flowing and less startup-oriented. The jobs section includes a variety of gigs and jobs with remote coding jobs spelled out (and some jobs which are tagged as remote-only). 

Nocabins

Nocabins is focused on hiring remote experts from around the world for startups. You’ll be able to join and benefit from the remote coding jobs on tap. 

Github

The jobs section of Github, the centralized repository of Git and code from around the world, is remote-friendly. Take a look, especially if you have a background in the open source space, and the contributions to prove it. 

F6S

F6s is a startup directory that helps startups connect with investors, similar to AngelList, but with more of an European focus. Their jobs site has some jobs marked with remote in the title — and since it’s mostly startups, you can be sure there are developer jobs, and that some will be remote-friendly even if they are not explicitly tagged as such. 

Startupers

Startupers focuses on the startup community, and different jobs you can find within. As with many startups, there are many remote-friendly roles ready for you to explore. 

Product Hunt

Product Hunt is a community where people post their startup/product ideas, in order to gain upvotes. It has a high concentration of Silicon Valley elites, from product leaders to venture capitalists. You can click a toggle to focus on remote jobs only as part of the Product Hunt community’s job posts. Most will be coming from elite SF startups. 

Twitter

Twitter oftentimes hosts searches for remote workers, including freelance programming gigs. You just have to look through the hashtags to find those offers. 

Reddit

This subreddit focuses on different jobs offers, more like gigs than anything else. However, you’ll often find offers for remote development based jobs. 

Remote-friendly industry-specific job boards

Cryptocurrency

Cryptocurrency Remote Jobs

CryptoJobsList

A frequently updated email newsletter and database that highlights top cryptocurrency jobs. I’ve conducted analysis before that has shown that cryptocurrency startups tend to be more remote-friendly than even conventional startups — some of this has to do with the decentralized nature of the tech, as well as different regulatory environments around the world. Jobs in Crypto is a great place to take a look at the industry and its remote jobs. 

Remote Blockchain Jobs On CryptoJobsList

Here is the remote blockchain jobs section of Crypto Jobs List . This is a recent focus of the site since the COVID-19 pandemic has broken out.  

Crypto.Jobs

Another cryptocurrency jobs board, with remote jobs tagged. There is category search too where you can specify that you’re looking for developer jobs. 

Cryptocurrency Jobs

Cryptocurrency Jobs bills itself as the leading job board for blockchain and cryptocurrency. It also offers guides on salaries and a newsletter.  There’s an explicit callout to search for remote jobs as part of the search engine. 

Blockew

Blockew is a cryptocurrency and blockchain job board that has been featured on TechCrunch, Forbes and more. Unlike other job sites, it also marks regions that are remote-friendly within the job postings.

Mobile Development

Remote Mobile Development Jobs

Android Jobs

Android native development is the focus of this job board. There’s a remote section, though there’s not many listings there currently (just one active when I took a look). 

App Futura

Another mobile development job board, with job postings specifically focused on mobile development. You might be able to find some remote positions here, though there seem to be updates from 2018-2019 still-present. 

WordPress

Wordpress Remote Developer Jobs

Development on WordPress is remote-friendly: the niche has a couple of job boards to highlight remote jobs. 

WordPress Jobs

Automattic, the company behind WordPress, is remote-only, so it makes sense that WordPress development jobs have a remote bias. The official WordPress job board, which gathers WordPress jobs from around the world, reflects that bias with many remote jobs. 

WPHired

WPHired is not affiliated with the company behind WordPress, but it is a robust job board for WordPress developers. Many of the jobs are marked “Anywhere”. Some of those job postings specify a preferred timezone.

Data Science/Machine Learning

AI Jobs

A job board focused on AI/Machine Learning and data science positions that is more of an aggregator. You can sign up for job alerts in the space, and toggle for remote roles. There are plenty of those, with many elite remote-only or remote-friendly companies like Github, Invision, and the Wikimedia Foundation represented. 

AnalyticsVidhya

AnalyticsVidhya is an India-based site with a focus on data analytics and data science tutorials. Their jobs board offers you the option to search for freelance jobs as a proxy for remote jobs in those positions with companies that are based in Asia (rather than the usual North America/Europe positions advertised in other job boards). 

Data Elixir

Data Elixir is a newsletter that reaches about 30,000 data scientists and engineers. The jobs board has the ability to toggle for remote jobs, most of them data science or machine learning based. 

Kaggle

Kaggle is a community site owned by Google. Aspiring data scientists and established ones go to work on various datasets here and to try to win machine learning challenges. As a result, the jobs board section of Kaggle has a bunch of ML-specific job postings.

KDNuggets

KDNuggets is a top data science publication, with tons of tutorials on the latest data science and machine learning topics. This job board includes tons of data science jobs, including several that are remote in nature. Data science roles tend to have fewer remote options — however, KDNuggets hosts a few of those rare remote listings. 

JavaScript/Ruby on Rails/Web Development

Web Developer Remote Developer Jobs

JSRemotely

This job board is specifically focused on curating all of the remote JavaScript jobs in the world: as such, it should be your first stop if you’re looking for JavaScript remote jobs. Unlike curation sites, it costs money to post a job here — so you know that hiring managers are fairly serious about landing a hire soon. 

CSS Tricks

This job board is, as befits CSS Tricks’s theme, mostly focused on frontend web developers. The remote jobs are specified in the location tab: a quick CTRL + F, and you’ll be able to find all of the most recently posted ones. 

Javascript Job Board

This job board, unlike the others below associated with JavaScript, seems not to focus on JavaScript frameworks but the language as a whole — there are positions for a variety of JavaScript skill levels as a result. Remote jobs can be found with CTRL + F — unfortunately there is no explicit categorization. 

Vuejobs

This job board is focused on Vue.js jobs, a lightweight JavaScript framework that helps people build interactive web applications with a minimum of hassle. The job board focuses on Vue.js, but there are other JavaScript jobs here. The category specified here is all remote jobs in JavaScript. 

AngularJobs

Here is a job board dedicated to Angular.js, a JavaScript framework that was supported by Google to easily build JavaScript frontends. You can scan through remote jobs by looking at keywords in the titles. There’s a section devoted to remote jobs everywhere, as we’ve covered before, entitled wfh.us. 

FindBacon

FindBacon exists to help developer and designers find new opportunities at the click of a button. You can search for remote-tagged jobs. Most of the jobs are more design or front-end oriented, but there are quite a few to search around for, so you can take a look. It costs $99 to feature a job on this job board — a sign that the hiring manager really wants to close that particular position.  

Smashing Magazine

Smashing Magazine is known more for its UI/UX content, but there’s a smattering of really good remote developer jobs here. They tend towards front-end roles and WordPress roles, but it’s worth a look — there are plenty of remote developer roles here to canvas should you want to. 

RORJobs

Moving onto another popular web development framework, this job board focuses on Rails jobs, many of which are remote and tagged as such in the title. A quick CTRL + F will help unearth remote coding jobs if you’re in the Ruby ecosystem rather than the JavaScript one.

Ruby Now

The original Ruby job board, around since 2005, offers you even more remote Ruby/Rails jobs. You’ll be able to find remote jobs by looking through tags on the title. 

Codepen.io

Codepen is a really interesting HTML/CSS/JavaScript sandbox which is more focused on front-end development, and elite skills at implementing CSS and JS to create simple, yet elegant interfaces. The jobs that are marked remote pop out here — most of the job board is focused on front-end web development jobs, but it’s a good job board to consult if you’re looking for remote web development work in general. 

Game Development

Game Development Remote Developer Jobs

Reddit/r/gameDevClassifieds

A subreddit for game developer jobs — a possible place to stake out remote jobs in the space. If the original poster is also a Reddit user and the hiring manager, you might be able to scope it out more directly by talking to them before applying. 

GameJobHunter

A job board focused on development jobs within video game companies. While there is no explicit remote category here, you can search for remote jobs with keyword-based search.

IndieDB

A job board focused on indie game studios. There are a propensity of remote job roles here, with anywhere/remote being the norm. It’s a great resource if you’re looking to do remote software engineering work and you want to help build the next indie video game hit. 

Gamedevjobs.io

This job board focuses on game development jobs. You can allow it to know your geolocation information to source job opportunities in the space near to you, or you can click through the remote tag to find opportunities in the space that are purely remote. 

Remote freelancer platforms for remote coding jobs

Remote freelancer platforms for coding jobs

You can also find remote freelancer jobs for developers on platforms where you can bid for freelance jobs and where there are jobs posted for employers explicitly looking for remote freelancers. 

Upwork

Upwork is one of, if not the largest remote freelancer job platforms. You can bid for open jobs and build a profile there to attract work. Use the platform to get together shorter gigs that might bridge you for cash flow purposes as you look for full-time remote work. 

Freelancer

Freelancer serves a very similar function to Upwork: you might want to build profiles for both sites in order to have the fullest access to remote freelance jobs. 

Toptal

Toptal is a marketplace for experts curated by the Toptal team. You have to apply to work with them, and Toptal takes a cut of any project, but you can get more money and a higher wage than you might get in other platforms. It’s a place where you can find premium remote coding jobs.

Fiverr

Fiverr is a platform where you could get services for $5. Now, with add-ons, it has become a place where freelancers can advertise their wares. 

Gigster

Gigster is a platform for hiring exceptional remote teams. You can join the Gigster talent network in order to start working on projects like this. They’re looking for software engineers, project managers, and UX designers to stay on top of their client needs. 

Gun.io

Gun.io promises exceptional remote engineers. It promises to abstract away all of the problems of working remotely for exceptional independent software engineers, from billing and contract details, to finding and landing good contracts in the first place. 

PeoplePerHour

A more curated version of Upwork or Freelancer where you can pay for remote experts per hour. You can look for remote coding jobs here as a result.

Guru

Guru has a platform of about 500,000 software engineer freelancers: join them and find remote coding jobs through the platform.

Career Paths and Job Reports

How To Be A Data Scientist: The Comprehensive Guide

What is Data Science?

how to be a data scientist
Source: Pixabay

Data scientist roles are often one of the most highly-paid and highly-rewarding jobs out there. Glassdoor has cited data scientist at the #1 position for most-satisfying job in the United States. With the explosive growth of unstructured data, there has never been a greater need for data scientists.

This has prompted a wave of questioning about how to be a data scientist, with upwards of 600 people a month searching for that on Google.

IBM predicts demand for data scientists will increase 28% by 2020. Data science roles are among LinkedIn’s fastest and most growing emerging fields with about 650% growth since 2012. 

Data science combines statistical knowledge, programming chops and domain expertise/communication skills. You’ll work on dealing with large amounts of data and get as much insight at scale as you can.

Job Prerequisites

To become a data scientist, you have to have a solid understanding of statistics, mathematics and the theory behind different algorithms.

You also have to have enough programming chops, usually in a language such as Python or R to iterate with data science models.

You also have to be able to communicate your findings to top executives. You need to have enough domain expertise to understand your data and the implications of it. 

Typically, most roles will need advanced degrees and programming experience. STEM degrees are preferred. However, some companies will hire undergraduates straight from school — and advanced degrees, while preferred, are not a hard prerequisite. You can do data science without a PhD or even a Master’s degree. 

Data Science Salary  

how to be a data scientist
Author screenshot

Based on a Kaggle survey, data scientists and the adjacent field of machine learning engineers earn the highest median salary ($120,000 USD) in the United States of America. Australia closely follows at about $110,000 USD. Other countries fall swiftly down the median, with data scientists earning close to $15,000 USD in both Russia and India. 

While it’s clear that you can earn a lot being a data scientist, it’s also true that there are nuances.

The division in the United States makes this clear. States like California and New York have the highest volume of data science jobs. California data scientists average about $140,000 USD in yearly salary. Washington and New York State follow up in the $115,000 USD to $120,000 USD range. New Jersey, Maryland, and North Carolina are around there as well.

California is home to Silicon Valley and the growing startups in San Francisco. Washington state headquarters both Microsoft and Amazon. New York state and adjacent states like New Jersey host large vibrant startup ecosystems including Silicon Alley. While all these figures need to be adjusted for cost of living (different states like Kansas come first due to their low cost of living in another analysis), they show a key tenet of raw data science salaries: to earn as much as possible, you’ll have to go to where data is most valuable.

Factors That Increase Your Data Science Salary

how to be a data scientist
Source: Pixabay

I wrote this handy guide from Springboard on factors that increase your data science salary after doing some research. The most important ones are the data science tools you have experience with, the industry you work in, the location you choose to work in (as discussed above), the data science role you choose, your experience and degrees, and the individual negotiation for each salary.

Understanding big data tools like Spark and data visualization tools like D3.js, a powerful and advanced custom library for strong visualization might increase your yearly salary by between $8,000 USD and $15,000 USD.

It’s not just data science in general that drives your salary, it’s also the individual components you’re familiar with. Premiums are paid for data scientists who know how to handle large amounts of data in a distributed fashion, and those who can work with powerful data visualization libraries.

If you have up to 15 data science tools mastered, it can increase your salary about $30,000 USD.

You’ll also want to work in an industry that has access to a lot of valuable data. This tends to be software or social media companies who pay the highest for data scientists (think Facebook or Google). 

You’ll want to make sure you’re working as a data scientist or data engineer, not a data analyst. Most intro-level roles in the data space are data analyst roles. It will affect your future salary if you stay in data analyst roles or only apply to them. 

As discussed, your location is key as well. If you want the absolutely highest raw salary, you’ll have to move to the United States, and you’re likely going to be working in one of the tech hubs there (either San Francisco/Silicon Valley, New York City, or the DC area). However, you should note that the amount of salary you can gain on location, while high, may not be as high as other factors that don’t need you to move. 

Finally, your level of experience can make a dramatic difference. Having ten or more years of experience can add around $30,000 USD to your yearly salary as a data scientist. And while degrees might not be a hard prerequisite, those with advanced degrees do tend to earn more as data scientists. 

Data Science Curriculum/Checklist

how to be a data scientist
Source: Pixabay

First, you’ll want to start with enough programming knowledge so you can play around with the different concepts and libraries. In practice, a lot of the statistics and mathematics is abstracted away by different programming libraries. It’s best to learn some of the basics of statistics and programming at the same time. If you had to focus on one area, start with the programming practice. 

Most machine learning and data science libraries (including Pandas, Numpy and scikit-learn, the mainstays of data science) are compiled in Python. You’ll want to start there, and work with Anaconda so you can manage different packages and dependencies. Once you’re in, you can find different courses to practice your Python programming, and practice live in the Jupyter Notebook offered, which is an intuitive and easy-to-access editor for code that can be run locally and uploaded or given version control quite easily by hooking it up with Git and a Github account. 

Here’s the documentation for how to get started with Anaconda and Jupyter notebook. The following post summarizes different ways of working with Jupyter notebooks and version control. Finally, this post from freeCodeCamp explains Git and the importance of version control

While you can work on Jupyter Notebook in a local context by yourself and seldom do anything but upload your finished experiments and files to Github (something I’ve often done), building in the habit of working with version control is a great practice.

It’s the default method of collaboration between different programmers, who must ensure that code doesn’t conflict — so if you want to work on a data science team, or any software team for that matter, it’s always good to start with good habits.

You’ll also want to use version control to revert back in case something goes wrong and to maintain a steady thread of progress. 

Programming

R vs Python

A large part of the data science ecosystem debate is whether or not to use R or Python as an intro-level programming language to get started. In this article for The Next Web, I wrote that it was ideal to know both. Realistically, if you had to choose, I would go with Python. We’ll start there, but I’ll add some R resources in case. 

R

R on Codecademy

Codecademy can help you practice your R skills before you start applying it to data science use cases. 

Introduction to R for Data Science


This interactive course is given by Microsoft on the edX platform, and is completely free to access. You will need to pay $99 USD if you want to have a verified certificate on your profile. 

Python

49 Essential Resources to Learn Python

I wrote this list of resources to learn Python, going from beginner to advanced. Go through and pick out the resources that are data science and machine learning-specific. 

Learning Python: From Zero to Hero

A text-based tutorial that summarizes the basics of Python. It will get you from knowing zero to Python hero.

Python Tutorial: Learn Python For Free | Codecademy

Codecademy was how I learned Python. Working through the interactive course modules will help you move through and learn by doing and practicing. 

SQL

21 of the Best Free Resources to Learn SQL

You’ll want to practice your SQL as well if you’re looking to become a data scientist. A large amount of data is still held in structured SQL tables. Practicing with SQL will help you extract that data and work with it.

SQLZoo

SQLZoo works partly as a Wiki, partly as a set of interactive exercises. I use this to sharpen my SQL skills when I need to practice.

Pandas 

Pandas dataframes are the default unit of data wrangling in data science work. Pandas allows you to organize your data in a tabular, structural fashion similar to a SQL table or an Excel spreadsheet. It also allows you to use Python to programmatically treat data. 

Pandas Cookbook

This handy guide goes over the Pandas library and different things you can do with it from grouping to aggregation functions. It’s a handy interactive guide to Pandas — and it’s how I first started getting familiar with the library and data science in general. 

A Comprehensive Guide to Data Wrangling

This guide helps define data wrangling, why it’s important, and introduces a few new functions and situations in Pandas to get you comfortable with it. 

Statistics

Once you’re able to source data, you’ll need the statistical ability to be able to draw insight from the data you’ve collected. 

As you’re learning the programming you need, you need to be able to understand statistics to manipulate data, understand it, and evaluate different models. This often involves at least a basic understanding of probability, frequentist and Bayesian statistics. 

Statistics and Probability: KhanAcademy 

This interactive video-filled course will help you catch up on frequentist statistics, confidence intervals, p-values, and more. It’ll serve as a refresher if you’ve encountered these concepts in university, and a learning opportunity if you haven’t. 

A Concrete Introduction to Probability by Peter Norvig

This iPython Notebook allows you to directly work with probability concepts in your own version of Jupyter Notebook should you desire. It expresses probability ideas in very readable Python code, helping to combine both your programming practice and statistics knowledge.

Bayes’s Theorem: A Visual Introduction

This post introduces Bayesian theory with a lot of visualizations. It can take the visualizations to really crystalize Bayesian thinking, especially since it involves a lot of segmentation on probability. 

Introduction to Bayesian Inference

This tutorial uses a Python library to explain Bayesian reasoning through a model of click-throughs on ads. Use it to understand Bayesian inference in practice. 

Mathematics

Once you’re done with the statistics, it can be good to understand some of the mathematics behind data science and machine learning even if most of the detail is not something you’ll confront everyday given how abstracted away most of the math is. 

The Mathematics of Machine Learning

The Towards Data Science article sums up the categories of mathematics you need to learn as well as links to different courses.

Mathematics of Machine Learning

This book is offered as a free PDF, covering several sections of machine learning math in detail from analytic geometry to vector calculus. 

Machine Learning

Now that you’ve refreshed or embraced statistics and programming concepts, it’s time to take it all together and learn the machine learning algorithms you can use on your data.  

A Tour of Machine Learning Algorithms

Starting with foundational concepts in machine learning such as the difference between supervised and unsupervised learning (and semi-supervised) we can then drill down into the different categories of machine learning algorithms and broadly see how the logic works with a set of visualizations.

10 Machine Learning Algorithms You Need To Know

This Towards Data Science Medium post then dives a bit deeper into ten specific machine learning algorithms, giving code implementations of a few so you can see them in practice on data. 

Data Modelling/Evaluation

After all the work on different algorithms, it’s time to refresh what makes for a good data model. How do you know if your model is working? This section of resources will help you put that together.

Part-4 Data Science Methodology From Modelling to Evaluation 

The article summarizes the data science methodology. In this section, it focuses on how evaluating your model fits with the broader work of machine learning and data science. 

Various ways to evaluate a machine learning model’s performance

The following tutorial includes a breakdown of evaluation metrics beyond accuracy such as the confusion matrix and the ROC curve.

Data Visualization

Python Matplotlib Guide

Matplotlib is the default data visualization library embedded in Python, and something designed to be used off-the-bat with Pandas. You can use its visualizations to get a quick sense of the data yourself without needing to export it. This guide goes over the basics of Matplotlib and how it’s constructed.  

Visualization With Seaborn

Seaborn is a Python library that provides more compelling data visualizations than the default Matplotlib library. Use this tutorial to get familiar with it.

Intro to D3.js with ten examples

This D3.js introduction helps get you started with the powerful JavaScript library. The related chart collection helps collect tons of examples of different charts you can use to visualize your data in R, Python and more. 

Datasets to Practice With

Datasets | Kaggle

Kaggle, the online data science competition platform, offers a variety of datasets you can use to practice your data science skills. The datasets feature ranking and comments so you can follow the most trending datasets. You can study what others have done with them as inspiration for your own projects.

19 Free Public Data Sets for Your Data Science Project

The link above is a list of 19 free, public datasets ranging from United States census data to FBI crime data. 

Awesome Public Datasets

A Github repository that hosts a wide variety of open, public databases. They are organized by their domain. This is a great definitive resource for free datasets. 

Awesome IPFS Datasets

This website hosts datasets, some of them quite large, hosted on IPFS (the interplanetary file system). This is a distributed, decentralized protocol of storing data that goes beyond HTTP’s standard server-client relationships. In theory, this means that datasets downloaded through IPFS might be faster to get. After all, you’ll be working with a swarm of hosts rather than just a single one. 

Registry of Open Data on AWS

Amazon Web Services, which helps host much of the content on the web today, also has this registry that helps people find open data hosted on its cloud services. It includes examples of what people have done with that data. 

BigQuery Open Datasets

Google hosts the above datasets on BigQuery, its big data storage solution. They include the complete revision history of Wikipedia up to April 2010, and weather information from the NOAA since late 1929. 

Data Science Courses/Bootcamps

how to be a data scientist
Source: Pixabay

The curriculum might be a bit too much to handle as a learner — and that’s perfectly fine. It’s meant as a bare-bones categorization of the material you need to learn to get into data science. However, if you want to refine your learning, there are a few options out there. I’ve linked to a list of bootcamps and courses. Be aware that I worked for Springboard.

Data Science Bootcamps

Data Science Bootcamps, CourseReport

CourseReport has a list of different data science bootcamps, with ratings and real student reviews given for each course.

Springboard

Springboard offers a variety of mentored bootcamps where you’re given personal attention from a data science expert and career coaching. It also comes with a job guarantee. Either get a job or your tuition back once you’re accepted. 

Udacity

Udacity offers nanodegrees where you can dive deep into specific data science topics such as self-driving cars.

Data Science Courses

Coursera (Data Science)

Coursera offers a variety of online course options for data science. Use them well to deepen your learning in the field.

Udemy Data Science

Udemy offers a variety of data science courses created by different independent teachers on its platform. 

Data Science Interview Questions

The data science interview tends to fall into many steps, with some being technical and some being non-technical. I wrote this guide for Springboard on the data science interview process to fully flesh it out. I’ve added some sample questions you might expect, some with solutions, under each section. 

Initial Recruiter Call

Before you’re assessed by a hiring manager, you’ll usually have a call with a recruiter to determine if you’re a fit with the company. They’ll ask general questions about your motivations and career path and see if you’re a fit with what the hiring manager wants. 

Sample questions

  1. Why this company?
  2. What interests you about the role?
  3. What are your salary expectations?

Technical Interview

A hiring manager will ask you technical questions related to your knowledge of statistics and programming. Here are about 109 data science questions with solutions. For programming, you can try HackerRank challenges as well to stay sharp before your interview. 

Technical Case Study

Part of the interview process will involve either an in-depth review of a project you worked on or a case study where you work with your (hopefully) future team. This will involve detailed questions about work you’ve done or how you’d approach a project. You might have to do a take-home assignment or to work on a problem with the hiring manager. 

Behavioral Interview

The behavioral part of the interview will test your management and communication skills as well as fit with the team. It’s usually done by the hiring manager rather than the recruiter. 

Job Boards And Resources

There are many data science specific job resources and career sites out there worth following. Here are a few where you can find resources and data science job postings. 

KDNuggets

KDNuggets features a job board and many resources for aspiring data scientists and the community at large.

Kaggle

Kaggle features a host of different resources for data scientists, including datasets that are free and public for use, a customized version of a Python kernel that allows for automated version control as well as collaboration with other Kaggle users and a host of competitions that can help you practice and show your data science skills. 

Data Elixir

Data Elixir is a newsletter dedicated to data science resources and jobs. Sign up to get a periodical update on the data science ecosystem.

Data Science Weekly

Another great newsletter filled with breaking news and tons of learning opportunities. Data Science Weekly is well worth subscribing to. 

Hacker News Jobs

Hacker News Jobs is a great spot to cleanly aggregate machine learning and data science job positions from technologists who post on Hacker News “Who’s Hiring?” threads.

What’s great about these postings is that you’ll often find a lot of context and a direct connection to a hiring manager, who will often leave their email directly on a posting to make themselves available for connection. You can easily search for data science specific postings.

Angellist Jobs

AngelList is the world’s largest repository of startups, many of whom are looking to hire for data science roles. You can filter specifically for data science roles, location and industry.

Do You Need A Degree Or Not?

This is an ongoing discussion. Advanced degrees help increase your data science salary and some hiring managers display a bias towards those degrees. Many hiring positions demand a minimum of a bachelor’s degree.

However, DJ Patil, the fromer Chief Data Scientist of the United States, called on recruiters and companies to judge candidates based on what they did with data, not their education. 

While the data science community often draws from the same ethos of do-it-yourself learning-by-doing that typifies the open source community, it can be a more gated process because of the statistics and math knowledge needed, as well as the communication skills data scientists need to develop.

Work experience can fill a lot of gaps here, but to get into the industry, it’s possible you might have to start with a data analyst role then move up in a data scientist role, or settle for a junior data science role or internship if you have no experience and no degree. 

Despite the emergence of Masters programs targeted for data science, the truth is that you don’t absolutely need a degree to succeed in data science. 

Sample Data Science Job Roles

Author screenshot

Data Scientist – Personalization @ Spotify

This role at Spotify involves a lot of teamwork and data exploration. It focuses on data modeling. Data engineers help to bring pipelines of data for you to model properly. This role is more focused on the product analytics team, and as a result, is cross-functional in nature. While there is a demand for degrees, most of the other requirements involve applied experience.

Author screenshot

Entry Level Data Scientist @ IBM 

This entry level role doesn’t need a degree — rather just the skills that make up the data science curriculum. The focus is on communication, tools, and the different models that make up data science roles. 

Candid Data Science Career Advice On How To Be A Data Scientist

Here’s some candid career advice from different data scientists in the field to help you with how to be a data scientist:

Claire Longo (Senior Machine Learning Engineer @ Twilio): Beat imposter syndrome by choosing a focus area to master. Talk about the stuff you don’t know as well as the stuff that you do. 

Jess Zhang (Inference Data Scientist @ Airbnb): Throw out the first number and do your research when it comes to negotiations. Look through courses and continually refresh and learn so you have a toolbox you can rely on. Find somebody who believes in you, sometimes through networking at data science meetups. 

Anmol Rajpurohit (Senior Software Engineer @ Splunk Enterprise Cloud): Data science isn’t for everybody. Make sure you know what you’re getting into before you start a career in data science. 

Checklist On How To Be A Data Scientist

1- Learn basic statistics, including frequentist, Bayesian thinking and probability theory.

2- Learn how to programmatically source and organize data, preferably with Python.

3- Learn more the more advanced statistics and mathematics behind data science at scale, from linear algebra, to model evaluation.

4- Practice your learning and work on projects with production-level datasets. Build a portfolio for hiring managers.

5- Prepare for the data science job interview process.

6- Accept a data science job offer (after many months of effort, most likely).

7- Continually practice!

Learning Guides, Web Development/UX Design

How to use GatsbyJS to build a blazing fast Drupal website

This is a guest post from Sujit Kumar. If you want to contribute guest posts to code(love), email [email protected].

What is Gatsby?

Gatsby is a static site generator that uses popular technologies such as ReactJS, Javascript and GraphQL in a way that is not dependent on external resources. This makes websites DDOS-resistant, faster, and more secure — and it is really easy to integrate with common content management systems like Drupal.

Why use Gatsby?

  • Unlike dynamic sites which render the pages on demand, static site generators pre-generate all the pages of the website.
  • No more live database querying and no more running through the template engine each time you load a page.
  • Performance goes up and maintenance cost goes down.
  • Using Gatsby means you can host the CMS in-house and publish the content generated by Gatsby as a static website.

It’s always good to increase the performance of Angular and React applications. This is one way you can do it.

GatsbyJS covers all the buzzwords out there like ReactJS, GraphQL, WebPack etc, but the coolest part is that you’re up and running in no time!

Since Gatsby is built on React you straight away get all the things we love about React, like composability, one-way binding, reusability and a great environment.

Gatsby makes Drupal work as a backend which means that we can get a modern stack frontend and complete static site with Drupal as a powerful backend.

Set up Drupal

  • You have to install and configure the JSON API module for Drupal 8.
  • Assuming you already have a Drupal 8 site running, download and install the JSON API module.
  • Composer require drupal/JSON API
    drupal module: install JSON. Or install it manually on Drupal 8 sites.
  • Next, we must ensure that only read permission is granted to anonymous users on the API. To do this, go to the permissions page and check the “Anonymous users” checkbox next to the “Access JSON API resource list” permission. If you skip this step, you’ll get an endless stream of 406 error codes.

After this, you should be all set. Try visiting http://yoursite.com/jsonapi and you should see a list of links.

Install gatsby

Now we need to work on Gatsby. The first thing we need to do is install the Gatsby client. If you don’t have it installed already, run this through NPM to grab it:

npm install --global gatsby-cli

That’ll give you the “Gatsby” cli tool, which you can then use to create a new project, like so:

 gatsby new my-gatsbyjs-app

That command basically just clones the default Gatsby starter repository and then installs its dependencies inside it. Note that you can include another parameter on that command which tells Gatsby that you want to use one of the starter repositories, but to keep things simple we’ll stick with the default. Now if we look at the project we can see a few different directories.

ls -la my-gatsbyjs-app/src/
#> /components
#> /layouts
#> /pages

Pages

The pages directory contains the pages. Each file becomes one page and the name is based on the file name. Each of these files contains a react component.

This is the index.js that we just created.

<script src="https://gist.github.com/nehajmani6/d0509a7b7bf0d8c2e7cf2e4634812155.js"></script>

Layouts

The Layout directory contains a layout that wraps our pages. These layouts are higher order react components that allow defining common layouts and how they should wrap the page. We can place our page where ever we want within the layout using the prop children.

Let’s look at a simple layout component

 <script src="https://gist.github.com/nehajmani6/2e23c6ce6f152dfe5619c4c17394efaf.js"></script>

As you can see, our layout component takes two props.

One is children prop, where the page is wrapped by us.

The second prop is the data. This is actually the data we fetch with the GraphQl query that is at the end of the code snippet – which in this example fetches the title from the gatsby-config. 

Components

The last directory is the components. It is used for creating general components. Fire up the newly generated site.

To run the development mode of the site and to get a Rough idea, run the command:

gatsby develop  
#> DONE Compiled successfully

We’re now up and running! See for yourself at http://localhost:8000

Once complete, you have the basis for a working Gatsby site. But that’s not good enough for us! We need to tell Gatsby about Drupal first.

For this part, we’ll be using the gatsby-source-drupal plugin for Gatsby. First, we need to install it:

cd my-gatsbyjs-app
npm install --save gatsby-source-drupal

Once that’s done, we just need to add a tiny bit of configuration for it, so that Gatsby knows the URL of our Drupal site. To do this, edit the gatsby-config.js file and add this little snippet to the “plugins” section:

plugins:


[
 {
   resolve:`gatsby-source-drupal`,
   options: {
     baseUrl: `http://yoursite.com`, //Drupal site url.
apiBase: `jsonapi`, //This the jsonapi endpoint
   },
 },
]

You’re all set. That’s all the setup that’s needed, and now we’re ready to run Gatsby and have it consume Drupal data.

Run gatsby

Let’s start the development environment to see the Gatsby running.

Run this to get Gatsby running:

gatsby develop

If all goes well, you should see some output with gatsby default starter:

You can now view gatsby-starter-default in the browser.

http://localhost:8000/

View GraphiQL, an in-browser IDE, to explore your site’s data and schema

http://localhost:8000/___graphql

Note that the development build is not optimized.
To create a production build, use gatsby build

(If you see an error message instead, there’s a good chance your Drupal site isn’t set up correctly and is erroring. Try manually running “curl yoursite.com/jsonapi” in that case to see if Drupal is throwing an error when Gatsby tries to query it.)

You can load http://localhost:8000/ but you won’t see anything particularly interesting yet. It’ll just be a default Gatsby starter page. It’s more interesting to visit the GraphQL browser and start querying Drupal data, so let’s do that.

Fetching data from Drupal with graphql

Load up http://localhost:8000/graphql in a browser and you should see a GraphQL UI called GraphiQL (pronounced “graphical”) with cool stuff like auto complete of field names and a schema explorer.

Clear everything that is on the left side and start inserting the open curly bracket and it will auto insert the closing curly bracket. Then click ctrl + space to view the auto-complete, which will list the all possible entity types and bundles that we can query.

It should look something like this:

GatsbyJS

For example, if you want to query Event nodes, you’ll enter “allNodeEvent” there, and drill down into that object.

Here’s an example which grabs the fields (field_task_name, field_date and nid) of the TodoList nodes on your Drupal site:


{
   allNodeTodoList{
       edges{
           node{
               nid
               field_task_name
               field_date
           }
       }
   }
}

Note that “edges” and “node” are concepts from Relay, the GraphQL library that Gatsby uses under the hood. If you think of your data like a graph of dots with connections between them, then the dots in the graph are called “nodes” and the lines connecting them are called “edges.”

Once you have that snippet written, press “control+Enter” to run it, and you should see a result like this on the right side:


{
 "data": {
   "allNodeTodoList": {
     "edges": [
       {
         "node": {
           "nid": 1,
           "field_task_name": "Learn Drupal",
           "field_date": "2018-12-14"
         }
       },
       {
         "node": {
           "nid": 2,
           "field_task_name": "Complete drupal task",
           "field_date": "2018-12-15"
         }
       },
       {
         "node": {
           "nid": 3,
           "field_task_name": "Learn gatsby",
           "field_date": "2018-12-16"
         }
       },
       {
         "node": {
           "nid": 4,
           "field_task_name": "Gatsby Project",
           "field_date": "2019-01-10"
         }
       }
     ]
   }
 }
}

Note the same code will give the data from Drupal which includes the reference data, URIs etc.

Pretty cool right? Everything you need from Drupal, in one GraphQL query.

So now we have Gatsby and Drupal all setup and we know how to grab data from Drupal, but we haven’t actually changed anything on the Gatsby site yet. Let’s change that.

Displaying drupal data on the Gatsby site

The cool thing about Gatsby is that GraphQL is so baked in that it assumes that you’ll be writing GraphQL queries directly into the pages or the components.

In your codebase, check out src/pages/displaynodes.js.

<script src="https://gist.github.com/sourabhsp21/1f69d5cffc5a4bd220b243a2dd8fb3a5.js"></script>

(Note, this assumes you have a node type named “Page”).

All we’re doing here is grabbing the node (task name and task date) via the GraphQL query at the bottom, and then displaying them in a table format.

Here’s how that looks on the frontend:

GatsbyJS

And that’s it! We are displaying Drupal data on our Gatsby site!

Author Bio:

Sujit Kumar is VP of Strategy & Marketing at Valuebound taking care of all aspects of lead generation, company and brand promotion and sales activity. He brings nearly 14+ years of marketing experience, strategic thinking, creativity, and operational effectiveness. Prior to joining Valuebound, Sujit worked in marketing management positions with professional services firms.

Meme Review

Programming Meme Review #1

I crawled through the Internet and found you these potential gems. Don’t judge them too harshly.

Subtext: O’Reilly really does have awesome programming books.

Subtext: HTML is not a programming language, y’all

Subtext: StackOverflow does have some pretty savage answers.

Subtext: Always write comments for your (imaginary) team members, and most likely, the most important team member of all: future you.

Subtext: This is how I remember the difference between false positives and false negatives for data science.

Source: code(love) — i maded this

Subtext: Python installation errors are basically hell. Still waiting for the day where I can “import everything”.

Subtext: XOR confuses me too.