Learning Lists, Uncategorized

Learn Machine Learning With These Six Great Resources

Learn Machine Learning 

A friend of code(love), Matt Fogel is doing awesome things with machine learning at fuzzy.io. He’s shared this valuable list of resources to learn machine learning that he usually gives his friends who ask him for more information.

You’ll see his original post here: https://medium.com/@mattfogel/master-the-basics-of-machine-learning-with-these-6-resources-63fea5a21c1c#.ta2bhsq8y

Learn machine learning with code(love)

Learn machine learning with code(love)

Great blog posts, podcasts and online courses to help you get started

It seems like machine learning and artificial intelligence are topics at the top of everyone’s mind in tech. Be it autonomous cars, robots, or machine intelligence in general, everyone’s talking about machines getting smarter and being able to do more.

Yet for many developers, machine learning and artificial intelligence are dense terms representing complex problems they just don’t have time to learn.

I’ve spoken with lots of developers and CTOs about Fuzzy.io and our mission to make it easy for developers to start bringing intelligent decision-making to their software without needing huge amounts of data or AI expertise. A lot of them were curious to learn more about the greater landscape of machine learning.

You can describe machine learning as using techniques to help computers learn new ways of uncovering insights from data. This deep dive into the topic will explore many elements outside of this short guide if you’re interested in learning more.

What you need to understand before you learn machine learning is that it’s not a magic buzzword that will help solve every problem with you. Machine learning is a practical way to get more data insights with less work. Nothing more, nothing less. 

To quote a professor in the field, “Machine learning is not magic; it can’t get something from nothing. What it does is get more from less. Programming, like all engineering, is a lot of work: we have to build everything from scratch. Learning is more like farming, which lets nature do most of the work. Farmers combine seeds with nutrients to grow crops. Learners combine knowledge with data to grow programs.”

If that excites you, here are some of the links to articles, podcasts and courses about machine learning that I’ve shared with my friends who were eager to learn more. I hope you enjoy!

Learn machine learning with code(love)

Learn machine learning with code(love)

1A Gentle Guide to Machine Learning

This guide, written by the awesome Raul Garreta of MonkeyLearn, is perhaps one of the best I’ve read. In one easy-to-read article, he describes a number of applications of machine learning, the types of algorithms that exist, and how to choose which algorithm to use.

2A Visual Introduction to Machine Learning

This piece by Stephanie Yee and Tony Chu of the R2D3 project gives a great visual overview of the creation of a machine learning model that determines whether an apartment is located in San Francisco or New York based on the traits they hold. It’s a great look into how machine learning models are created and how they work in practice.

Podcasts

3Data Skeptic

A great starting point on some of the basics of data science and machine learning. Every other week, they release a 10–15 minute episode where the hosts (Kyle and Linhda Polich) give a short primer on topics like k-means clustering, natural language processing and decision tree learning. They often use analogies related to their pet parrot, Yoshi. This is the only place where you’ll learn about k-means clustering via placement of parrot droppings.

4Linear Digressions

This weekly podcast, hosted by Katie Malone and Ben Jaffe, covers diverse topics in data science and machine learning. They teach specific advanced concepts like Hidden Markov Models and how they apply to real-world problems and datasets. They make complex topics extremely accessible, and teach you new words like clbuttic.

Online Courses

5Intro to Artificial Intelligence

Plan for this online course to take several months, but you’d be hard-pressed to find better teachers than Peter Norvig and Sebastian Thrun. Norvig quite literally wrote the book on AI, having co-authored Artificial Intelligence: A Modern Approach, the most popular AI textbook in the world. Thrun’s no slouch either. He previously led the Google driverless car initiative.

6Machine Learning

This 11-week long Stanford course is available online via Coursera. Its instructor is Andrew Ng, Chief Scientist at Chinese internet giant Baidu and one of the pioneers of online education. 

This list is really only scratching some of the complex and multifaceted topic that is machine learning.  If you have your own favorite resource, please suggest it in the comments and start a discussion around it!

 

Learning Lists

The most popular deep learning libraries

You might have heard of artificial intelligence, deep learning and neural networks, and wanted to get a path into this exciting new technology. This article will help you get a comprehensive overview of the tools and frameworks you can use to accelerate your impact and learning in artificial intelligence. It will help you understand what deep learning library you should use to accelerate your learning. 

This article assumes your familiarity with basic deep learning concepts — if you need to catch up on those, the following Wikipedia article will help you get to scratch.

This is a walkthrough of the most popular deep learning libraries, examples of clever projects and implementations that have used the different libraries to create something awesome.

I’ve ranked the deep learning libraries in question by the number of Github stars their repositories have collected as of June 2017 — a great way to see how much traction these different frameworks have with programmers.

Deep learning library


1- TensorFlow

Overview: TensorFlow is the open-source machine learning library developed by Google (and still used in both research and production level applications at the company). The library allows you to bring machine intelligence capabilities to all sorts of devices, from those equipped with GPUs to mobile devices such as the Raspberry Pi. With over open-source 6,000 repositories using TensorFlow, it has quickly become one of the most popular frameworks out there for those looking to build something with deep learning. TensorFlow is very accessible, with APIs for Python, C++, Haskell, Java, Go and Rust and a 3rd party package built in R.

Introductory Tutorial: Get introduced to TensorFlow with this official tutorial by Google by using TensorFlow with the famous MNIST dataset.

Use Cases: TensorFlow has become of the most popularized deep learning frameworks, and as such, it has seen a wide array of uses from powering cutting-edge machine learning work at different Silicon Valley companies to classifying cucumbers for farmers.

Resources:

  1. This Github repository TensorFlow-World contains a bunch of introductory tutorials with code compiled together to give you a better sense of how to do deep learning on TensorFlow.
  2. TensorFlow-101 contains tutorials on how to get started with TensorFlow in Jupyter Notebook with Python.
  3. This Github repository contains example code that will help you work through different analyses in TensorFlow.  

How to get started: Get all of the documentation and installation instructions here, then start practicing and training deep learning models on different datasets! TensorFlow is one of the most popular and powerful deep learning frameworks out there: take advantage! 


2- Scikit-learn

Overview: Scikit-learn is the versatile machine learning knife in Python, used for simple experimentation and iteration with different templated machine learning models. With modules like the MLPClassifier, you can easily bring deep learning approaches to your datasets and use the rest of the trusty scikit-learn ensemble (such as train_test_split) to validate and evaluate your model. You can also combine the scikit-learn interface with different deep learning libraries if you want to do more powerful analyses.

Introductory Tutorial: This tutorial runs through how to use scikit-learn as a deep learning library and a multilevel perceptron model to classify different types of wine. You can see how with a few lines of code, you can create a very accurate model with many variables.  

Use Cases: scikit-learn is often the first go-to deep learning tool for people working in the Python data science ecosystem: it comes pre-installed with Jupyter Notebook and it comes with powerful functions that are already-optimized versions of essential deep learning functions. You’ll be able to quickly build together machine learning models, evaluate them, and split different use cases into either test or training sets.

Resources:

  1. This cheatsheet by Datacamp will help you with many of the essential functions in scikit-learn.
  2. Springboard has a tutorial that will teach you how to build a simple neural network with scikit-learn and its MLPClassifier module.
  3. This gentle introduction into scikit-learn can help you ease into this machine learning package.

How to get started: Use jupyter notebook, which comes with scikit-learn installed by default. Start training machine learning models, then move into Scikit Flow, an interface that combines the code you’d use in scikit-learn augmented with functionality from Google’s TensorFlow library (we’ll dive into TensorFlow a little bit later in this article).


3- Caffe

Overview: Caffe is a deep learning framework built by Berkeley’s AI Research department (BAIR) and sustained through the use of community contributors. It features speedy application of deep learning approaches, with the ability to classify up to 60 million images a day on a single GPU. It powers a variety of deep learning projects in machine vision, speech and more — with projects ranging from fully fleshed out applications to academic papers.

Introductory Tutorial: This tutorial built by Berkeley AI researchers will help you get up to speed with this powerful deep learning framework.

Use Cases: Caffe is used for a variety of academic research — this web page has a ton of examples ranging from image classification to training the classic LeNet model.

Resources:

  1. This blog post will help you get up to scratch with using Caffe and Python with a relevant classification example from Kaggle on how to distinguish between dogs and cats.
  2. This tutorial will help you with loading Caffe onto iPython Notebook and also with C++ implementations of the library.
  3. This tutorial will get you to understand how you can build a layer of a neural network within Caffe.

How to get started: Use the command line and get started with different use cases of Caffe. You can use this tutorial to get installation instructions across a whole array of different platforms.


4- Keras

Overview: Keras is an open source deep learning library for neural networks written in Python. Authored by François Chollet, the library was meant to be a quick and easy way to experiment with different deep learning models — as a high-level API written entirely in Python, the library is easy to debug and navigate. It supports both convolutional and recurrent neural networks and it is designed to be as intuitive as possible for users to grasp.

Introductory Tutorial: This introductory tutorial will run over how to get started with Keras — all the way from installing the package to training a model with 99% predictive accuracy on the seminal MNIST dataset.  

Use Cases: Most of the time, Keras is used to build simple deep learning models as conceptual sketches: you would validate crude ideas using Keras, and get a first glance at whether or not you had a good idea for a deep learning architecture that can tackle a problem.

Resources:

  1. This course by DataCamp helps you dive deeper into Keras, even if you’re just a deep learning beginner!
  2. Here is a link to the official Keras documentation which allows you to get access to the inner working of the framework from the creators of it.
  3. If you’re more comfortable with the R programming language, you can use the R interface to Keras.

5- Torch

Overview: Torch is an open-source deep learning framework based on optimizing performance on GPUs based on the programming language Lua with an underlying C/CUDA implementation. It allows for the parallelization of neural networks across different CPUs and GPUs. Torch is used by a lot of organizations at the cutting edge of machine learning, from Google to Facebook. It has been extended for use in mobile settings, with the ability to perform on iOS and Android.

Introductory Tutorial: This 60 minute blitz into Torch will get you started and ready to use this powerful deep learning tool.

Use Cases: The deep learning library Torch is used for a lot of machine learning and deep learning research programs at leading companies such as Google, Twitter, and Facebook. Given its origins as Facebook Research’s default deep learning framework, you can be sure that it comes with all of the support it needs from one of the largest tech companies in the world.

Resources:

You’ll want to get started understanding Lua before you work with Torch. This handy guide will help you get up to scratch.

This Github repository contains a whole host of Torch tutorials.

The following blog article by Facebook Research contains a lot of related work done in Torch.

How to get started: Get the deep learning library Torch installed, then start running it on different deep learning problems. You might want to refer to the Facebook research blog for inspiration.


6- Theano

Overview: Theano is a deep learning framework built deeply into the Python data science ecosystem, with deep integration with the NumPy numerical computing library. You can use C to generate code as well to make it even speedier — and it is the default teaching tool used in deep learning founding father Yoshua Bengio’s lab.  

Introductory Tutorial: This Theano tutorial on Jupyter Notebook will help you understand the nuances of the deep learning library Theano, and will get you up to scratch with different conventions within the library.

Use Cases: This forum will allow you to understand different problems and use cases Theano users create with the deep learning library.

Resources:

  1. This documentation will help you understand more about the deep learning library Theano. 
  2. This blog post will help you understand the performance differences between Theano and Torch.
  3. Here is an introductory tutorial to Theano.

How to get started: Use the instructions here to get started with installing Theano.


7- Neon

Overview: Neon is an open-source deep learning platform built on Python that is committed to the most powerful implementation of deep learning possible, with a consideration towards simplicity.

Introductory Tutorial: Work with this Github repository which contains a Model Zoo that contains different scenarios of working with Neon.

Use Cases: This Youtube playlist walks through different use cases with Neon, including playing Pong, and speech recognition.

Resources:

  1. Learn how to do basic classification with the MNIST database and the deep learning library neon with this introductory-level tutorial.
  2. Use this video course to get you started in Neon.
  3. This compilation of Neon resources will help you learn the framework.

How to get started: Get the deep learning library Neon installed with the following instructions.

Learning Lists

101+ Resources to Learn Data Science

Many people are seeking to learn data science these days. It’s become a trendy topic associated with high salaries and some of the most interesting problems in the world. This demand has created many different resources in the data science space. People have curated their selection of favorite resources to learn data science, but I was seeking out something more comprehensive — so I built this list. Here’s my attempt at getting you my favorite resources in the data science space so you can understand what’s going on in the field — and how you can get your hands dirty and start learning right away.

Full disclosure: I work for Springboard (one of the data science education providers listed below). 

What is data science?

learn data science

First, let’s start with an overview of what seems to have become a popularized buzzword and defining exactly what you want to learn: data science. Data science is the combination of three kinds of skillsets: statistics, programming and business knowledge. It’s the interplay between these crafts where you’ll find a data scientist — somebody who will programmatically examine large data sets for precious business insights — somebody who can combine computer science knowledge with business insight.

You can use data science concepts and training to do data mining and get statistical inferences from large datasets. Using advanced techniques such as natural language processing and unsupervised learning, you can tame the power of computation and get precious data insights others simply cannot access. That will be attractive to all sorts of potential employers in the data science field, from Silicon Valley to Wall Street.

In order to get there though, you have to start with the basic techniques and basic concepts that underlie data science. Learning data science requires having an understanding of the process that goes behind it, and the various components that are required to bring everything together. Let’s get started on getting you know that knowledge. 

Overview

learn data science

You’ll want to get an overview of the field and the processes and concepts that make up data science so you can learn data science.

1- Data Scientist: The Sexiest Job of the 21st Century

In this seminal article, ex-Chief Data Scientist of the United States, DJ Patil, goes into exactly what makes a career in data science so compelling. It’s great fuel to the fire if you’re looking to learn data science. 

2What is data science?

This overview of data science by Berkeley delves into how data science came to be — and the average salary you can expect in the field.

3Data Science Salary Survey (2016) – O’Reilly

O’Reilly, a leading publication and media company on the cutting edge of technology, dives deeper into what tools and factors go into higher data science salaries. They’ve surveyed hundreds of data scientists in the field. Learn what pays and what doesn’t with data science careers through their research!

4Data Science (Wikipedia)

Wikipedia’s overview of data science goes over the history of the field and points to many different resources in the field. It can be a handy jumping-off point for further research.

5Building Data Science Teams

This piece by DJ Patil goes into the different roles inherent in a data scientist’s job — and exactly how best to build out a data science team.

6- Data Science Process

This piece by Springboard goes into what the day-to-day of data science looks like — tracing it all the way to a first principles view of exactly what steps effective data science requires.  

Interactive Tutorials

learn data science

Now that you’re done with an overview of the topic, it’s time to get your hands a bit dirty with interactive tutorials that will help you learn different parts of data science — whether that’s the statistical theories behind machine learning algorithms, or the programming skills you’ll need to implement those theories.

Statistics/Math

Understanding probability and the basics of statistics is essential to being able to understand machine learning methods and how to handle massive amounts of data. Linear algebra and the ability to manipulate different expressions of data (in matrix form or otherwise) will also be incredibly helpful in detailing what data scientists do. You’ll want to refresh your statistics knowledge and get a handle on the math you need to know to join their ranks.

7KhanAcademy (Statistics/Probability)

This free course from KhanAcademy serves as a great catch-up on the basics of probability and statistics.

8Introduction to Statistics in R (Datacamp)

Learn a bit of R (a programming language commonly used in data science) and statistics at the same time with this interactive walkthrough from Datacamp.

9Statistics 101 

This Youtube playlist from the Harvard Extension School covers everything from random variables to different statistical distributions. 

SQL

Knowing SQL and how to query from relational databases is a skill that is one of the building blocks of data science. You’ll often use SQL to source your data for further analysis — or even to transform your data on the spot.

10Mode Analytics SQL School

Mode Analytics teaches SQL through the use of case studies with real data. It’s an interactive experience that’ll teach you the basics of SQL by having you run through a dataset with some simple yet powerful commands.

11Learn SQL (Codecademy)

Codecademy, well known for its basic curated tutorials in different programming languages, has this simple interactive module that will help you learn SQL.

12SQLCourse

This is an older tutorial, but one that still holds up as an example of an organized approach to learning SQL.

Python

Python is one of the workhorse languages of data science — one of the most popular along with R. The large open-source community that powers Python enables it to be a powerful, versatile programming language that can help facilitate everything from data wrangling to training powerful machine learning models. It’s a powerful tool you’ll want to learn as you learn data science. 

13Pandas Cookbook

This interactive set of code examples walks you through how to get started with Pandas, the data processing library most commonly used in Python. It’s built by Julia Evans

14Intro to Python for Data Science (DataCamp)

This interactive course will walk you through the basics of the data science libraries for Python.

15Gentle introduction to scikit-learn

This gentle introductory tutorial will help you understand one of the most powerful machine learning and data science libraries out there: Python’s scikit-learn. You’ll be able to train simple, off-the-shelf data models in a matter of minutes.

16A dramatic tour through Python’s data visualization landscape

This somewhat witty and whimsical walkthrough will help you explore the difference between the major data visualization tools in the Python ecosystem — including some options that were ported from R!

17- Web scraping with BeautifulSoup

This short guide will teach you how to take information from different websites and render it into a format that is easy for machines to process — a handy skill for anybody looking to work with many different datasets. I often use the set of techniques described to scrape tables from Wikipedia so I can process that data in Python.

R

R is another popular programming language used for data science — in fact, it’s often pitted against Python as a comparable tool. The truth is that you can use both — and in fact, being conversant in both can only help you progress faster and further as a data scientist.

18– Introduction to R (Datacamp)

Here is the equivalent of the Datacamp introduction to Python — except this time for R, another common data science programming language.

19A complete tutorial to learn R from scratch

This tutorial, rendered as a blog post, offers a comprehensive A to Z guide to getting started in R. It covers everything from importing data into R to creating predictive models with it.

20Try R

Sponsored by O’Reilly Media, this interactive course will reward you with a badge for each fundamental building block of R you learn.

Hadoop

Hadoop is a big data framework meant to facilitate the treatment and storage of large data sets that have be processed in parallel by many different servers in order to yield actionable insights. 

21Hadoop Tutorial (Tutorialspoint)

This set of tutorials on Hadoop will help you understand how big data frameworks work — and how you can apply Hadoop to your data.

22Hortonworks Sandbox tutorial on Hadoop

This interactive Hadoop sandbox by Hortonworks lets you play with Hadoop code.

Spark

Spark helps solve some speed, flexibility and efficiency issues with Hadoop through the use of a new data structure: the RDD or resilient distributed dataset.

23Apache Spark Tutorial (TutorialsPoint)

TutorialsPoint offers a similar tutorial to Spark as it does for Hadoop.

24Hands-on introduction to Spark

Hortonworks has a sandbox that will let you play around with Spark code.

Courses/Workshops

learn data science

The following courses and online workshops will help you learn data science in an organized fashion. Use these resources to accelerate your learning of data science if you need to. A lot of these courses will help you find data science work, and you’ll likely be able to do data science projects after finishing them. 

25Fast.ai

This massive online course, built by a Kaggle champion in machine learning, will help you learn about neural networks and how to train machine learning models.

26Foundations of Data Science (Springboard)

This course offered by Springboard features a curated selection of resources in R, SQL and the basics of machine learning, as well as personalized mentoring from data science experts who work in the field.

27Data Science Intensive (Springboard)

Yet another course offered by Springboard, though this one is more advanced. Focused on Python and teaching the intricacies of machine learning methods, this course will help you use different machine learning techniques with ease.

28Data Science Career Track (Springboard)

Springboard’s Data Science Career Track is the first online bootcamp to offer a data science job or your tuition back. With personalized career coaching, mentorship from data scientists and exclusive employer partnerships, Springboard is putting it all on the line to help you get a job in data science.

29Data Science (Coursera)

Coursera partnered with Johns Hopkins University to deliver this nine-course series on data science, covering everything from tools to advanced machine learning methods.

30Machine Learning (Coursera)

This curated set of machine learning courses taught by Andrew Ng (the famous Stanford professor who founded Coursera in the first place) is one of the best resources to consult as you start understanding data science.

31Thinkful Data Science Bootcamp

Thinkful, an online education provider, provides a data science bootcamp that will curate your learning of data science and Python.

32Intro to Machine Learning (Udacity)

Udacity offers a free mini-course curated by Facebook and Tableau to help guide you through to doing analysis of the Enron email database.

33Data Science Certificate (Harvard Extension School)

This data science certificate offered by the Harvard Extension School can help you learn data science while getting credits and credibility from one of the leading universities in the world.

34Statistics with R (Coursera)

This selection of courses created in partnership with Duke University will help you understand basic probability and the use of Bayes’ Rule through the use of R.

35Data Science (EdX)

This set of curated learning paths in data science can help you get accreditation in the field — if you’re willing to pay for it.

36Insight Data Science Fellowship

The Insight Data Science Fellowship is a special type of data science education program — it takes talented PhD. students who have already demonstrated technical skills and aptitude, and helps them bridge the gap between academia and industry with a postgraduate fellowship that combines the best of academic rigor with industry knowledge.

37Data Science (General Assembly)

General Assembly, one of the largest online education providers in the world, offers courses in data science.

38Galvanize

If you’re looking for an in-person experience to learn data science instead of something online, Galvanize can help. This link leads to the San Francisco experience — however, Galvanize itself is present in many different other cities.

39Coursereport Data Science Reviews

Here are some reviews of different data science courses in Coursereport — this will allow you to pick and choose between many different options with fair reviews from previous students on display.

40Switchup Data Science Reviews

Here are some more reviews of different data science courses, this time from Switchup, another course review site.

Books

learn data science

Oftentimes it’s not a great course that helps you learn the most — it can be one single resource within that course — say a particularly well-written book. This selection of data science books can help you understand data science in detail.

41Bayesian Methods for Hackers

This book, delivered as an extended Github repository, can help you understand Bayesian inference and how to think about probabilities by working through them in code. 

42Think Stats

This O’Reilly book helps you conceptualize statistical concepts by having you work with them in Python.

43Think Bayes

This book combines Python programming with Bayesian inference, and can be a handy resource in case the books above aren’t enough.

44Deep Learning

This free technical book by some of the scions of deep learning and artificial intelligence (Ian Goodfellow, Yoshua Bengio and Aaron Courville) will help you understand exactly how to think about deep learning and neural networks.

45Learn Python the Hard Way

In case you need a refresher on Python, Learn Python the Hard Way will help you break down exactly what you need to do to master Python. While it focuses on an older version of Python, the first principles taught here can be useful to those looking to freshen up their knowledge of Python — though you shouldn’t become overly dependent on this book as it has quite a rigid philosophy on one particular version of Python. 

46The Data Science Handbook

This Data Science Handbook curates insights from 25 data science leaders and distills what it truly means to work in this exciting new field.

47Data Science from Scratch

This book from O’Reilly goes into the first principles of data science, looking beyond the programming tools and frameworks.

48Storytelling with Data

This book will help you visualize insights that you find within your data and teach you how to communicate them effectively so that you can drive impact with your data findings.

49Exploratory Data Analysis with R

Roger D. Peng, an expert in statistics, has written this book to teach how to look through datasets with the R programming language.

50Interactive Data Visualization for the Web

This online book will teach you how to use frameworks such as D3.js to make your visualizations fully interactive on the web.

51Machine Learning Yearning

This book by Andrew Ng, the famous artificial intelligence leader who founded Coursera, is going to be released soon — sign up to get drafts of new chapters as they come in!

Curated Collections

learn data science

I know you’re looking for curated resources to learn data science. There’s more than just this list right here — and each collection will help you expand your knowledge and collection of great data science resources even further.

52Awesome Machine Learning

This Github repository follows the “Awesome” method of curating the best resources in a particular space — in this case, all the different resources you’d need to learn machine learning.  

53Awesome Deep Learning Papers

In case you ever wanted to get a handle on the science behind the amazing technology being built out of artificial intelligence, this awesome curation of deep learning papers will help you continually be on top of exciting new developments.

54Awesome TensorFlow

TensorFlow is an awesome deep learning framework: this Github repository will have everything you need to learn more.

55Awesome Data Science

This repository is everything it promises: an awesome curation of different data science resources.

56Data Analysis Learning Path (Springboard)

This learning path curates different resources in an intuitive fashion so that you can learn the data analysis skills required for data science.

57The Open Source Data Science Masters

This is a curated curriculum of free, open-source resources to learn data science — consider it a masters’ degree for a fraction of the price.

General Resources

learn data science

58A visual intro to machine learning

This interactive, visual view of data science in action can help you conceptualize data science, especially if you prefer to learn visually. 

59Deep Learning Review (Nature)

This paper summarizes some of the latest findings in deep learning and artificial intelligence and it is written by one of the founding fathers of modern artificial intelligence research: Geoffrey Hinton.

60Build a deep learning machine

This fun little tutorial by O’Reilly will teach you how to build a computer that you can use specifically for data science purposes.

61- How can I become a data scientist (Quora)

This Quora thread contains different thoughtful replies on how to become a data scientist — and includes a bevy of free resources to boot!

62Becoming a Data Scientist

This blog charts the author, Renee, and her path from being a SQL analyst to becoming a full-fledged data scientist.

Career Advice

learn data science

Becoming a data scientist is now a career path that many envy — however, getting started and placing yourself in a position where you are paid to practice data science doesn’t start and end with technical skills. Here’s a set of resources that will help spell out exactly what you need to do to have a successful data science career.

632015 Data Science Salary Survey (O’Reilly)

This salary survey by O’Reilly was curated from about 600 respondents who divulged their salary and what they did at work. It’s an informative read on what the average salaries are like in data science and what factors or technical skills can either increase your data science salary — or set it on the path to stagnating.

We already highlighted the 2016 survey as part of our general overview of data science, but the 2015 survey will add even more context on how the data science industry works — and how much you should expect to be paid.

64Guide to Data Science Jobs (Springboard)

This guide to Data Science Jobs by Springboard curates a variety of job seeker and hiring manager stories and seeks to inform you on every element of what it takes to get a data science job: from how to get hiring managers to notice your profile, to advice on what technologies and skills you should practice before doing a data science interview. 

65Guide to Data Science Interviews (Springboard)

This companion guide to the Guide to Data Science Jobs by Springboard runs you through different interview questions and exactly what hiring managers are thinking when they are on the other side of the table. It’s a comprehensive overview of the data science interview process — and it provides you actionable tips on how to ace the data science interview.

66Getting your first job in data science

This blog post goes over different general tips on how to get that first job in data science.

67Data Science Career Paths

This blog post by Springboard breaks down the difference between data analysts, data scientists and data engineers.

Datasets

learn data science

In order to really get started and to learn data science, you have to have datasets to play with. The following resources will link you to different datasets you can experiment with as you’re learning data science techniques and putting them into practice.

6819 Free Public Datasets (Springboard)

This curated list of 19 free public datasets will help you get started on your path to learn data science!

69Kaggle Datasets

This list of datasets curated by Kaggle comes with upvote functionality as well as comments, so you can exactly which datasets are the most exciting — and what work has already been done with them.

70Reddit Datasets

This subreddit can be a handy way to pick out new datasets, and see some of the most popular ones.

71Data.world

This new social network has evolved around sharing great datasets and bringing data fans together!

72Google BigQuery Datasets

Google BigQuery has open-sourced some interesting big data sets–from Reddit comments to Github activity.

73Quandl

Quandl is a search engine mostly used for financial and economic data. Comb through if you’re looking in that space for data to play with. 

74Public Big Datasets

This curated list of big datasets can help you practice with Hadoop or Spark.

75Wikipedia dumps

Wikipedia dumps data from its database and makes it free to analyze every so often. Sift through here if you want to query the world’s largest collection of knowledge on your quest to learn data science.

76Open Street Map

This collection of open-source geographic data extends around the world in its reach!

Resources/Blogs to Follow

learn data science

You’ll want to keep an eye on different resources and blogs that update frequently as you learn data science. This ensures that you’re always on top of the latest developments — and it can be a stimulating way to keep your data science skills sharp.

77Top data scientists to follow on Twitter

This is a list of data science influencers you’ll want to consider following to get to know more about the industry.

7850 of the best data science blogs

This curated list of data science blogs will help you find the best blogs to follow as you learn data science.

79Ultimate guide to data science blogs

This larger, extended guide to data science blogs has a lot more entries — feel free to take a look if you feel like you want something comprehensive to digest.

80KDNuggets

KDNuggets is one of the largest data science communities on the web, and their blog regularly posts interesting data science content.

81R-bloggers

R Bloggers is a data science blog focused on tutorials to learn R and different resources in the R ecosystem. 

82Dataconomy

Dataconomy focuses on larger trends in data science rather than many technical tutorials. It’s the data science blog with the largest focus on the European data science scene as well.

83Analytics Vidhya

Analytics Vidhya contains plenty of technical tutorials on many data science topics.

84Big Data Made Simple

Big Data Made Simple is a relatable blog that conveys different topics in data science in an approachable manner.

85Yhat blog

The Yhat blog is always filled with interesting tutorials and data science case studies.

86Machine Learning Mastery

Machine Learning Mastery focuses on the intricacies of machine learning.

87Learndatasci

Learndatasci is a blog that offers a broad overview of different data science topics.

88Mastersindatascience

Mastersindatascience is the resource to consult if you wanted to look at paid offerings to learn data science.

Newsletters

learn data science

If you want regular updates in your inbox on the latest news in data science, there’s no better way to do that than to subscribe to the following data science newsletters.

89Data Science Weekly

This weekly newsletter summarizes the latest tutorials and resources in data science. It’s a very useful resource if you’re looking to learn data science. 

90Data Elixir

Another data science newsletter that will keep you informed on the latest happenings in data science. 

91Python Weekly

This weekly Python newsletter curates a selection of the finest Python resources, many of them related to data science.

92Datafloq

This handy newsletter promises to be a one-stop shop for you when it comes to big data trends.

93- The Analytics Dispatch

Mode Analytics provides a dispatch to keep you informed on all things analytics and BI-related.

94- Postgres Weekly

This Postgres Weekly newsletter keeps you informed on the latest Postgres updates.

95- O’Reilly Data Newsletter

A premium data science newsletter, O’Reilly will often curate the best data science resources that have popped up.

Communities

learn data science

While newsletters and blogs are great, interactive communities where participants share articles and comment on them together can truly help you entrench your data science knowledge. Here are just a few of those communities where you can learn data science and interact with different data science practitioners.

96Datatau

Datatau is a sort of Hacker News for data science resources where data science practitioners discuss the latest news and upvote the best articles.

97Reddit Datascience

This subreddit deals with general data science topics.

98Reddit Machine Learning

This subreddit deals with more in-depth machine learning materials and discussions.

99Reddit Deep Learners

This subreddit deals with how to learn artificial intelligence and deep learning.

100Reddit Data is Beautiful

This subreddit contains impactful data visualizations that are visually appealing — and a true set of examples if you want to display your data in a beautiful manner.

101Data Science Stack Exchange

This subcomponent of the Stack Exchange network deals with technical questions and solutions in data science.

102Quora Data Science

This section of Quora is composed of many of the questions posed about data science — it is an awesome resource for those looking to learn data science. 

Hopefully the resources above have been helpful for you to learn data science: let me know in the comments below what you think about them or whether you think there are some I missed!

Learning Guides

How to do common Excel and SQL tasks in Python

How to do common Excel and SQL tasks in Python

The code and data for this tutorial can be found in this Github repository. For more information on how to use Github, check out this guide

Data practitioners have many tools that they use to slice and dice data. Some people use Excel, some people use SQL — and some people use Python. The advantages of using Python are obvious when it comes to certain tasks. You can process much bigger datasets at much faster speeds. You can use open source machine learning libraries built on top of Python. You can easily import and export data in different formats. 

Python can become an essential part of any data analyst’s toolbox due to its versatility. However, it can be hard to get started. Most data analysts are probably familiar with either SQL or Excel. This tutorial is structured to help you transfer over skills and techniques from those two programs to Python.

First, let’s get you set up on Python. The easiest way to get started is to use Jupyter Notebook and Anaconda. This visual interface will allow you to plug Python code in and immediately see the output of your results. It’ll make it easy for you to follow along with the rest of this tutorial as well.

I highly recommend using Anaconda, but this beginners guide will also help you with installing Python directly — though that’ll make following this tutorial harder. 

Let’s start with the basics: opening up a dataset.

IMPORTING DATA

You can import .sql databases and process them in SQL queries. On Excel, you could double-click a file and then start working with it in spreadsheet mode. In Python, there’s slightly more complexity that comes at the benefit of being able to work with many different types of file formats and data sources.

Using Pandas, a data processing library, you can import a variety of file formats using the read function. A full list of the file formats you can import using this function is in the Pandas documentation. You can import everything from CSV and Excel files to the whole content of HTML files!

One of the biggest advantages of using Python is the ability to be able to source data from the vast confines of the web instead of only being able to access files you’ve downloaded manually. The Python requests library can help you sort through different websites and take data from them while the BeautifulSoup library can help you process and filter the data so you get exactly what you need. Be careful of usage rights issues if you’re going to go down this route.

(Don’t worry if you want to skip this part, you can! The raw csv file is here, and you can download it at will if you’d rather start this exercise without taking data from the web. Or you can git clone the entire repository.)

In this example, we’re going to take a Wikipedia table of countries by their nominal GDP per capita (a technical term that means an amount of income a country earns divided over the number of its population), and use the Pandas library in Python to sort through the data.

First, let’s import the different libraries we need. For more information on how imports work in Python, click here.

import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
import re

We’ll need the Pandas library to process our data. We’ll need the numpy library to perform manipulations and transformations of numeric data. We’ll need the requests library to get HTML data from a website. We’ll need BeautifulSoup to process that data. Finally, we’ll need the regular expression library of Python (re) to change certain strings that will come up as we process the data. 

It’s not necessary to know much about regular expressions in Python, but they are a powerful tool you can use to match and replace certain strings or substrings. Here’s a tutorial if you wanted to learn more.

r = requests.get('https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)_per_capita')

gdptable = r.text
soup = BeautifulSoup(gdptable, 'lxml')
table = soup.find('table', attrs = {"class" :"wikitable sortable"})

theads=[]
for tx in table.findAll('th'):
    theads.append(tx.text)

data =[]
for rows in table.findAll('tr'):
        row={}
        i=0
        for cell in rows.findAll('td'):
            row[theads[i]]=re.sub('\xa0', '',cell.text)
            i+=1
        if len(row)!=0:
            data.append(row)
print(data)

Credit to this website for some of the code.

Here’s a more technical explanation of how to grab HTML tables with Python code with more step-by-step instructions.

You can copy + paste the code above into your own Anaconda setup, and iterate with it if you want to play with some Python code!

The output from the code below, if you don’t modify it, is what is known as a list of dictionaries.

You’ll notice commas separating bracketed lists of key-value pairs. Each bracketed list represents a row in our dataframe, and each column is represented by the keys within: we are working with a country’s rank, its GDP per capita (expressed as US$), and its name (in ‘Country’).

For some more information on how data structures such as lists and dictionaries work in Python, this tutorial will help.

Thankfully, we don’t need to understand much of that in order to move this data into a Pandas dataframe, a similar way of aggregating data to a SQL table or an Excel spreadsheet. With one line of code, we’ve assigned and saved this data into a Pandas dataframe — as it turns out to be the case, lists of dictionaries are the perfect data format to be converted to a dataframe.

gdp = pd.DataFrame(data)

With this simple Python assignment to the variable gdp, we now have a dataframe we can open up and explore anytime we write out the word gdp. We can add Python functions to that word to create curated views of the data within. For a bit more of an in-depth look at what we just did with the equal sign and assignment in Python, this tutorial is helpful.

TAKING A QUICK LOOK AT THE DATA

Now, if we want to take a quick look at what we’ve done, we can use the head() function, which works very similarly to selecting a few rows in Excel or the LIMIT function in SQL. Use it handily to take a quick look at datasets without loading the whole thing! You can also insert a number within the head function if you want to look at a particular number of rows.

gdp.head()

The output we get are the first five rows of the GDP per capita dataset (the default value of the head function), which we can see are neatly arranged into three columns as well as an index column. Be aware that Python starts indexes at 0 and not 1, such that if you wanted to call up the first value in a dataframe, you’d use 0 instead of 1! You can change the number of rows displayed by adding a number of your choice within the parentheses. Try it out!

RENAMING COLUMNS

One thing you’ll quickly realize in Python is that names with certain special characters (such as $) can become very annoying to handle. We’ll want to rename certain columns, something you can do easily in Excel by clicking on the column name and typing over the old name and something you can do in SQL either with the ALTER TABLE statement or sp_rename in SQL server.

In Pandas, the way to do it is with the rename function.

gdp = gdp.rename(columns = {'US$':'gdp_per_capita'}) 

In implementing the above function, we’ll be replacing the column header ‘US$’ with the column header ‘gdp_per_capita’. A quick .head() function call confirms that this change has been made.

DELETING COLUMNS

There’s been some data corruption! If you look at the Rank column, you’ll notice that there are random dashes scattered throughout it. That’s not good, and since the actual number order is disrupted, this makes the Rank column quite useless, especially with the numbered index column that Pandas gives you by default.

Fortunately, deleting a column is easy with a built-in Python function: del. By selecting columns through the use of square brackets appended to the dataframe name.

del gdp['Rank']

Now, with another call to the head function, we can confirm that the dataframe no longer contains a rank column.

CONVERTING DATA TYPES WITHIN COLUMNS

Sometimes, a given data type is hard to work with.This handy tutorial will break down the differences between the different data types in Python in case you need a refresher.

In Excel, you could right-click and find ways of converting columns of data to a different type of data quite easily. You could copy a set of cells rendered by formulas and paste special as values, and you can use formatting options to quickly switch between numbers, dates, and strings. 

It’s not as easy in Python to switch between one data type to the other sometimes, but it’s certainly possible.

Let’s first use the re library in Python. We will regular expressions to replace the commas within the gdp_per_capita column so we can more easily work with that column.

gdp['gdp_per_capita'] = gdp['gdp_per_capita'].apply(lambda x: re.sub(',','',x))

The re.sub function essentially takes every comma and replaces it with a blank space. This following tutorial goes into each function of the re library in detail.

Now that we’ve gotten rid of the commas, we can easily convert the column into a numeric one.

gdp['gdp_per_capita'] = gdp['gdp_per_capita'].apply(pd.to_numeric)

Now we can calculate a mean for the column.

We can see that the mean of the GDP per capita column is about $13037.27, something we couldn’t do if the column were classified as strings (which you can’t perform arithmetic operations on). We can now do all sorts of calculations on the GDP per capita column that we weren’t able to do before — including filtering the columns by different values and determining what percentile rank values are for the column.   

SELECTING/FILTERING DATA

The basic need of any data analyst is to slice and dice a large dataset into actionable insights. In order to do that, you have to go through a subset of the data you have: this is where selecting and filtering data is very helpful. In SQL, this is accomplished with a mix of SELECT and different other functions, while in Excel, this can be done by dragging and dropping through data and implementing filters.

Using the Pandas library, you can quickly filter down with different functions or queries.

Let’s, as a quick proxy, only show countries that have a GDP per capita above $50,000.

This is how to do it:

gdp50000 = gdp[gdp['gdp_per_capita'] > 50000]

We assign a new dataframe with a filter that takes a column and creates a boolean variable — this function above essentially says “create a new dataframe for which there is a GDP per capita above 50000”. Now we can display gdp50000.

And now we see that there are 12 countries with a GDP above 50000!

Now let’s select only rows that belong to a country that start with s.

We can now display a new dataframe containing only countries that start with s. A quick check with the len function (a life-saver for counting the number of rows in a dataframe!) indicates that we have 25 countries that fit the bill.

Now what if we want to chain those two filter conditions together?

Here’s where chained filtering comes in handy. You’ll want to understand how this works before filtering with multiple conditions. You’ll also want to understand the basic operators in Python. For the purposes of this exercise you just need to know that ‘&’ stands for AND — and that ‘ | ‘ stands for OR in Python. However, with a deeper understanding of all basic operators, you can easily manipulate data with all sorts of conditions. 

Let’s go ahead and work on filtering countries that both start with ‘S’ AND that have a GDP per capita above 50,000.

sand500gdp = gdp[(gdp.gdp_per_capita > 50000) & (gdp.Country.str.startswith('S'))]

Now let’s work on those that start with S OR have over 50000 GDP per capita.

sor500gdp = gdp[(gdp.gdp_per_capita > 50000) | (gdp.Country.str.startswith('S'))]

There we go! We’re well on our way to working with filtered views in Pandas.

MANIPULATE DATA WITH CALCULATIONS

What would Excel be without functions that help you calculate different results?

Pandas in this case leans heavily on the numpy library and general Python syntax to put calculations together. We’re going to go through a simple series of calculations on the GDP dataset we’ve been working on. Let’s for example, calculate the sum total of all GDP per capita countries that are over 50,000.

gdp50000.gdp_per_capita.sum()

That’ll give you the answer of 770046. Using that same logic we can calculate all sorts of things — the full list can be located at the Pandas documentation under the computation/descriptive statistics section located on the menu bar at the left.

DATA VISUALIZATION (CHARTS/GRAPHS)

Data visualization is a very powerful tool — it allows you to share insights you’ve gained with others in an accessible format. A picture, after all, is worth a thousand words. SQL and Excel both have the capability to translate queries into charts and graphs. With the seaborn and matplotlib libraries, you can do the same with Python.

There are far more comprehensive tutorials on data visualization options — a favorite of mine is this Github readme document (all in text) which explains how to build probability distributions and a wide variety of plots in Seaborn. That should give you an idea of how powerful data visualization can be in Python. If you’re ever feeling overwhelmed, you can use a solution such as Plot.ly which might be more intuitive to grasp.

We’re not going to go through each and every data visualization option — suffice it to say that with Python, you’re going to have a lot more power to visualize things than anything SQL can offer, and you’ll have to trade-off the additional flexibility you gain with Python for how easy it is in Excel for generating charts from templates.

In this case, we’re going to build a simple histogram to show the distribution of GDP per capita for those countries that have more than $50,000 in GDP per capita.

gdp50000.hist() 

With this powerful histogram function (hist()) we can now generate a histogram that shows that most of the countries with a high GDP per capita cluster around the $50000 to $70000 range!

GROUPING AND JOINING DATA TOGETHER

Within Excel and SQL, powerful tools such as the JOIN function and pivot tables allow for the rapid aggregation of data.

Pandas and Python share many of the same functions that have been ported over from both SQL and Excel. You’ll be able to group data within datasets and join different datasets together. You can take a look here at the documentation. You’ll find that the join functionality offered by the merge function in Pandas is very similar to the one offered by SQL through the join command, while Pandas also offers pivot table functionality for those who are used to it in Excel.

We’re going to do a simple join here between the table we’ve developed with GDP per capita, and a list of world development indices from the World Bank.

Let’s first import the csv of country-level indicators.

country = pd.read_csv("Country.csv")

Let’s do a quick .head() function to take a look at the different columns in this dataset.

Now that we’re done, we can take a quick look and see that we’ve added a few columns that we can play with, including different years where data was sourced.

Now let’s merge the data:

gdpfinal = pd.merge(gdp,country, how = 'inner', left_on='Country', right_on = 'TableName')

We can now see the table incorporates elements of both our GDP per capita column and our new country-wide table with different data columns. For those familiar with SQL joins, you can see that we’re doing an inner join on the Country column of our original dataframe. 

Now that we have a joined table, we may want to group countries and their GDP per capita by the region of the world they’re in.

We can now use the group by functions in Pandas to play around with the data grouped by region.

gdpregion = gdpfinal.groupby(['Region']).mean()

What if we want to see a permanent view of groupby summation? Groupby operations create a temporary object that can be manipulated, but they don’t create a permanent interface to aggregated results that can be built upon. For that, we’ll have to go through an old favorite of Excel users: the pivot table. Fortunately, pandas has a robust pivot table function.

gdppivot = gdpfinal.pivot_table(index=['Region'], margins=True, aggfunc=np.mean)

gdppivot

You’ll see we’ve picked up some extra columns we don’t need. Fortunately, with the drop function in Pandas, you can easily delete several columns.

gdppivot.drop(['LatestIndustrialData', 'LatestTradeData', 'LatestWaterWithdrawalData'], axis=1, inplace=True)

gdppivot

Now we can see that the GDP per capita differs depending on the regions in different parts of the world. We have a clean table with the data we want.

This is a very superficial analysis: you’d want to actually do a weighted mean since a GDP per capita for each nation is not representative of the GDP per capita of every nation in a group since populations differ across the nations within a group.

In fact, you’ll want to redo all of our calculations involving means to reflect a population column for each country! See if you can do that within the Python notebook you’ve just started. If you can figure it out, you’ll have been well on your way to transferring your SQL or Excel knowledge to Python. 

Got any comments or questions? Please leave them in the comments section on this blog post 🙂 

Open News

Shake: The Bitcoin Debit Card Perfect for Travel or Anything Else

If you’ve ever been hit by foreign transaction fees, you’ll probably have remembered your dream trip around the world less fondly.

We live in a global economy, but the infrastructure to deal with it doesn’t seem to have caught up. Financial companies still charge you for the crossing of borders and you’re largely restricted to a set of charge and bank cards you have to collect in your mailbox or go to a branch to get.

Shake: A new Bitcoin debit card

Shake aims to change all of that. You can issue as many cards as you’d like digitally for a variety of expenses by loading them with Bitcoin.

Importantly, you can choose to issue a card in different foreign currencies. Foreign currency charges still apply if you charge a card differently than the currency you issued it in, but since Shake seamlessly allows you to create as many cards as you want, you can simply prevent those charges by making sure that you issue a card for every situation.

You can also choose to receive SMS notifications every time a transaction is approved or denied on your Shake Bitcoin debit card.

This bypasses several financial constraints. You can travel around the world without worrying about foreign transaction fees. You can load your card with Bitcoin, and spend it wherever you want in whatever currency you issue, bypassing stores that don’t accept Bitcoin. Shake allows you to take advantage of NFC (near-field communications) payment technology, the same technology that powers Apple Pay.

Shake uses Visa’s financial infrastructure to back an innovative approach to democratizing the spend of bitcoin that comes with the security and ease of use required for anybody to start spending their money around the world.

I played around with it and figured out that you could issue a Bitcoin debit card with no daily purchase limit. The first tier of cards (dubbed the KYC Level 1) allowes you to issue cards up to a value of $2,500 USD. If you want an unlimited amount, you’ll have to upgrade to the KYC Level 2, though that’s free of charge. The interface was slick and easy to navigate: in other words, nothing like your typical experience with a bank.

While I was there, I thought I glimpsed a bit of the financial future, one where transactions were as seemless and as costless as possible, and one where banks cared about end users in every way. I don’t know if Shake will be a large part of that vision in the future, but I do know they are moving the needle on it, and that given the right moves, the company could help transform financial transactions.

For now though, they’re in Alpha, and Shake is merely your key to unlocking an ultramodern financial system, on-demand–which for most people, may be more than they’ll ever need.

Learning Guides

Python List Comprehension: An Intro and 5 Learning Tips

Python list comprehension: an introduction and 5 great tips to learn

Python list comprehension empowers you to do something productive with code. This applies even if you’re a total code newbie. At code(love), we’re all about teaching you how to code and embrace the future, but you should never use technology just for its own sake.

Python list comprehension allows you to do something useful with code by filtering out certain values you don’t need in your data and changing lists of data to other lists that fit specifications you design. Python list comprehension can be very useful and it has many real-world applications: it is technology that can add value to your work and your day-to-day.

To start off, let’s talk a bit more about Python lists. A Python list is an organized collection of data. It’s perhaps easiest to think of programming as, among other things, the manipulation of data with certain rules. Lists simply arrange your data so that you can access them in an ordered fashion.

Let’s create a simple list of numbers in Python.

numbers = [5,34,324,123,54,5,3,12,123,657,43,23]
print (numbers)
[5, 34, 324, 123, 54, 5, 3, 12, 123, 657, 43, 23]

You can see that we have all of the values we put into the variable numbers neatly arranged and accessible at any time. In fact, we can access say, the fifth variable in this list (54) at any time with Python list notation, or we can access the first 5 and last 5 values in the list.

print(numbers[:5]); print(numbers[-5:]); print(numbers[4])
[5, 34, 324, 123, 54]
[12, 123, 657, 43, 23]
54

If you want to learn more about how to work with Python lists, here is the official Python documentation and an interactive tutorial from Learn Python to help you play with Python lists.

Python list comprehensions are a way to condense Python for loops into lists so that you apply a formula to each value in the old list to create a new one. In other words, you loop a formula or a set of formulae to create a new list from an old one.

What can Python list comprehensions do for you?

Here’s a simple example where we filter out exactly which values in our numbers list are below 100. We start by applying the [ bracket, then add the formula we want to apply (x < 100) and the values we want to apply it to for (x in numbers -> numbers being the list we just defined). Then we close with a final ] bracket.

lessthan100 = [x < 100 for x in numbers]
print (lessthan100)
[True, True, False, False, True, True, True, True, False, False, True, True]
#added for comparision purposes
[5, 34, 324, 123, 54, 5, 3, 12, 123, 657, 43, 23]

See how everything above 100 now gives you the value FALSE?

Now we can only display which values are below 100 in our list and filter out the rest with an if filter implemented in the next, which is followed by the if trigger.

lessthan100values = [x for x in numbers if x < 100]
print(lessthan100values)
[5, 34, 54, 5, 3, 12, 43, 23]

We can do all sorts of things with a list of numbers with Python list comprehension.

We can add 2 to every value in the numbers list with Python list comprehension.

plus2 = [x + 2 for x in numbers]
print (plus2)
[7, 36, 326, 125, 56, 7, 5, 14, 125, 659, 45, 25]

We can multiply every value by 2 in the numbers list with Python list comprehension.

multiply2 = [x * 2 for x in numbers]
print(multiply2)
[10, 68, 648, 246, 108, 10, 6, 24, 246, 1314, 86, 46]

And this isn’t just restricted to numbers: we can play with all kinds of data types such as strings of words as well. Let’s say we wanted to create a list of capitalized words in a string for the sentence “I love programming.”

codelove = "i love programming".split()
codelovecaps = [x.upper() for x in codelove]
print(codelove); print(codelovecaps)
['i', 'love', 'programming']
['I', 'LOVE', 'PROGRAMMING']

Hopefully by now, you can grasp the power of Python list comprehension and how useful it can be. Here are 5 tips to get you started on learning and playing with data with Python list comprehensions. 

1) Have the right Python environment set up for quick iteration

When you’re playing with Python data and building a Python list comprehension, it can be hard to see what’s going on with the standard Python interpreter. I recommend checking out iPython Notebook: all of the examples in this post are written in it. This allows you to quickly print out and change list comprehensions on the fly. You can check out more tips on how to get the right Python setup with my list of 11 great resources to learn and work in Python.

2) Understand how Python data structures work

In order for you to really work with Python list comprehensions, you should understand how data structures work in Python. In other words, you should know how to play with your data before you do anything with it. The official documentation on the Python website for how you can work with data in Python is here. You can also refer again to our resources on Python.

3) Have real-world data to play with

I cannot stress enough that while a Python list comprehension is useful even with pretend examples, you’ll never really understand how to work with them and get things done until you have a real-world problem that requires list comprehensions to solve.

Many of you came to this post with something you thought list comprehensions could solve: that doesn’t apply to you. If you’re one of those people who are looking to get ahead and learn without a pressing problem, do look at public datasets filled with interesting data. There’s even a subreddit filled with them!

Python list comprehension with code(love)

Real-world data with code(love)

4) Understand how to use conditionals in list comprehensions

One of the most powerful applications of Python list comprehensions is the ability to be able to selectively apply different treatments to different values in a list of values. We saw some of that power in some of our first examples.

If you can use conditionals properly, you can filter out values from a list of data and selectively apply formulas of any kind to different values.

The logic for this real-life example comes to us from this blog post.

Imagine you wanted to find every even power of 2 from 1 to 20.

In mathematical notation, this would look like the following:

A = {x² : x in {0 … 20}}

B = {x | x in A and x even}

square20 = [x ** 2 for x in range(21)]
print(square20)
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361, 400]
evensquare20 = [x for x in square20 if x % 2 == 0]
print (evensquare20)
[0, 4, 16, 36, 64, 100, 144, 196, 256, 324, 400]

In this example, we first find every square power of the range of numbers from 1 to 20 with a list comprehension.

Then we can filter which ones are even by adding in a conditional that only returns TRUE for values that when divided by 2 return a remainder of 0 (even numbers, in other words).

We can then combine the two into one list comprehension.

square20combined = [x ** 2 for x in range(21) if x % 2 == 0]
print(square20combined)
[0, 4, 16, 36, 64, 100, 144, 196, 256, 324, 400]

Sometimes, it’s better not to do this if you want things to be more readable for your future self and any audience you’d like to share your code with, but it can be more efficient.

5) Understand how to nest list comprehensions in list comprehensions and manipulate lists with different chained expressions

The power of list comprehensions doesn’t stop at one level. You can nest list comprehensions within list comprehensions to make sure you chain multiple treatments and formulae to data easily.

At this point, it’s important to understand just what list comprehensions do again. Because they’re condensed for loops for lists, you can think about how combining outer and inner for loops together. If you’re not familiar with Python for loops, please read the following tutorial.

This real-life example is inspired from the following Python blog.

list = [(x,y) for x in range(1,10) for y in range(0,x)]
print(list)
[(1, 0), (2, 0), (2, 1), (3, 0), (3, 1), (3, 2), (4, 0), (4, 1), (4, 2), (4, 3), (5, 0), (5, 1), (5, 2), (5, 3), (5, 4), (6, 0), (6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (7, 0), (7, 1), (7, 2), (7, 3), (7, 4), (7, 5), (7, 6), (8, 0), (8, 1), (8, 2), (8, 3), (8, 4), (8, 5), (8, 6), (8, 7), (9, 0), (9, 1), (9, 2), (9, 3), (9, 4), (9, 5), (9, 6), (9, 7), (9, 8)]

If we were to represent this as a series of Python for loops instead, it might be easier to grasp the logic of a Python list comprehension. As we move from the outer loop to the inner loop, what happens is that for each x value from 1 to 9 (for x in range(1,10)), we print out a range of values from 0 to x.

for x in range(1,10):
    for y in range(0,x):
        print(x,y)
1 0
2 0
2 1
3 0
3 1
3 2
4 0
4 1
4 2
4 3
5 0
5 1
5 2
5 3
5 4
6 0
6 1
6 2
6 3
6 4
6 5
7 0
7 1
7 2
7 3
7 4
7 5
7 6
8 0
8 1
8 2
8 3
8 4
8 5
8 6
8 7
9 0
9 1
9 2
9 3
9 4
9 5
9 6
9 7
9 8

The chain of for loops we just went over has the exact same logic as our initial list comprehension. You’ll notice though that in a for loop, you will print seperate values while in a list comprehension it will produce a new list, which allows us to use Python list notation to play with the data.

With this in mind, you can make your code more efficient and easily manipulable with a Python list comprehension.

I hope you enjoyed my introduction to Python List Comprehensions. If you want to check out more content on learning code, check out the rest of my content at code-love.com! Please comment if you want to join the discussion, and share if this created value for you 🙂

Interactive Items

The best programming language for beginners to learn (an interactive list)

[playbuzz-item url=”//www.playbuzz.com/rogerhuang10/what-is-the-best-programming-language-for-beginners-to-learn-interactive”]

At code(love), we believe in interactivity and participation.

Instead of treating our opinions as facts, we’ve decided to let people from all over vote and comment on what the best programming language for beginners is. Feel free to participate: all input is valued!

Comment below on why you voted for the programming language you chose, and argue your way civilly with proponents of other languages.

Learning Lists

11 Great Resources to Learn and Work in Python

Python is one of our favorite languages at code(love). Versatile, and yet easy to grasp, it’s one of the best languages at expressing the logic behind code with a simplicity that is sometimes breathtaking in its elegance.

If you happen to be more practical, Python always ranks among one of the programming languages that draws the highest median annual salaries, hovering around the magic $100,000 USD mark.

Despite how simple it is, Python is also surprisingly powerful. It can help introduce you to the basics of machine learning, it can slice and dice relatively big datasets for you, and it can even help you build entire web platforms. Pinterest often uses Python to serve millions of images around the world.

The language itself grows ever more versatile with its community. If you want to join this healthy, vibrant network of builders and learn how to do awesome things with Python, you’ve come to the right place. Here are eleven places you should start.

Python is poetry

Python is poetry

1-Read about Python

Learn Python the Hard Way was my first introduction to Python and several programming concepts. Author Zed A. Shaw made the book accessible online for free, but he has a special place in his heart and inbox for people who pay the small sum of $29.95. The practical exercises within are well worth going through. Make sure you write out as much of the code as possible: it’s only through mastery of the basics that you can become an expert.

2-Watch Python Videos

If you’re more of a visual learner, you can learn about the fundamentals of using Python for the web with this excellent free Udacity course. Of course, there’s more where that came from, with a variety of courses from everything to data fundamentals in Python to machine learning. I went through the series myself, and though it’s a bit long (and there are a lot of exercises that I didn’t think added that much value), the end result was that I came out of the tutorial with a deeper understanding of how data moves across the web.

You can also catch plenty of Python videos on Coursera, Treehouse and Udemy.

Udacity with code(love)

Udacity with code(love)

3- Look through lists of Python Learning Resources

This might be a little bit meta, but I love lists of resources. One of the hidden secrets to finding those great resources are going through Github repositories. Github is the Google Docs of code, a great collection of “repositories” where coders can “commit” their code to a shared codebase. It’s also a place where people love compiling great collections of programming resources.

This particular link above is a favorite collection of mine. I hope you enjoy it as much as I do.

4- Anaconda and iPython Notebook

Anaconda and iPython Notebook are what I commonly refer to as the “Excel” of Python. It can be hard to work with the Python interpreter (the command line prompt where you enter Python code if you install it from Python.org) as is. You can’t really refer back to the work that you’ve done before very easily without saving a whole variety of Python files, and it can be pretty hard to share your code with the web at large in HTML form, especially with different charts and graphs and a structured flow you want to convey that goes beyond just one Python script.

iPython Notebooks allow you to write your code in Notebook form.

iPython Notebook with Python

This is what Notebook form looks like.

Python Interpreter with code(love)

This is what the Python interpreter looks like. Source: http://2.bp.blogspot.com/-Duisv8kz1l0/T9q30qpexeI/AAAAAAAAAAM/hxsQB-tLt7E/s1600/python-interpreter.png

Anaconda and iPython Notebook make it intuitive and visually appealing to organize different Python software modules, and bring them together so that you can work and show your results as easily as possible with nbviewer, which generates a HTML version of your Notebooks that you can share on Github. A lot of popular modules we talk about like Pandas are pre-installed, saving you some time. When you click on the next link, you’ll see exactly what it looks like using iPython Notebook.

5-Slice and dice data with Pandas

Built on the aforementioned iPython Notebook, Julia Evans has created a “cookbook” for the Pandas module, a collection of Python code that can help you handle relatively large data sets with ease.

Python can only help you process what you can fit in memory on your computer, but that’s more than enough for most of your data needs. Pandas will help you efficiently process that data: you’ll be able to read from very large CSVs and clean them up so you can find great data insights and visualize them (more on that in point #10!)

6-Build something small with Flask

Flask is what is termed as a micro-framework, a set of code that you can lean on to build small web projects. It has a bunch of reuseable components that help you build interactive websites that can both receive and transmit data. Give it a try: in a few lines of codes, you can get something interactive going on the web!

7-Build something big with Django

If you’re tired of the word micro, and want to go with a full web framework, build something with Django! Django is used to this day to build very large websites including Pinterest, and Instagram.

django with code(love)

Take a bite out of the web with Django!

8-Play around with Python APIs and even more!

We had a list of learning resources before on Github, now we can explore a list of the things that make Python awesome! I especially love using Python to play with Application Programming Interfaces or APIs. APIs are a set of rules for servers to communicate data with one another: what this means is that with Python, you can scrape your personal fitness information from your Fitbit or work with Google Sheets automation easily. You can do anything that involves getting data from a server willing to give it to you.

You’ll find a list of really cool APIs above that will allow you to play with all sorts of cool data!

9-Do some machine learning with Python

Have you heard of machine learning? It’s all the rage today and the reason why is because it allows you to do more with less. By having machines learn patterns in your data and by being able to infer conclusions from smaller data sets to larger populations with their insights, machine learning lets you know more about the world around you with less data points.

This Github repository offers a fantastic dive into the fundamentals of machine learning, and gets you to practically embark on your machine learning adventure with sample code sets.

10-Tell data stories with Plotly

Data doesn’t mean anything unless you can storytell with it. You can throw all the numbers in the world at people but it won’t mean they’re any closer to understanding your point. You really have to break down your data into meaningful chunks for it to go anywhere.

Thankfully, Plot.ly can help with that. With a few lines of Python, you’ll be well on your way to doing bar graphs, charts, and figures of all kinds.

Plotly with code(love)

An example of what you can do with Plotly!

11-Do coding challenges in Python

Now that you’re done learning all of the fun stuff in Python, it’s time to put yourself up to the test! Use HackerRank challenges to test your skills: you could even get a job out of it!

HackerRank allows you to complete problems in the coding language of your choice and allows you to demonstrate your skill with clean code that solves problems in a short amount of time.

Python is a wonderful language for programming beginners, and powerful enough to explore multiple areas of data, machine learning, artificial intelligence and other advanced computer science concepts. It’s the perfect mix for anybody who is getting into programming or who wants to develop their skills further. With these resources you’ll be able to learn and work in Python!

Share this list of resources if it can help somebody–and let me know what else could be added to this list in the comments 🙂

Source for featured image: http://www.slideshare.net/audreyr/python-tricks-that-you-cant-live-without

Open News

The 15 Most Popular Programming Languages on Github

There’s always a lot of questioning when it comes to the most popular programming languages in use.

Github, the network of programming repositories, is always a good place to gauge programmer activity and the trending languages you want to know.

Loggly, “a fast-growing startup helping thousands of cloud-centric organizations to turn log file data into insight and action” has helped do the hard work of finding those languages. Here are the results:

15 Most Popular Programming Languages on Github

15 Most Popular Programming Languages on Github

Technology and Society

The real reason why net neutrality matters

A lot of people think the core of net neutrality is site speed: the amount of time information is served to users. They’re partially right, but there’s a fundamental flaw in keeping the explanation to just those confines.

The Internet at its core is a bunch of servers (computers up 24/7) that receive HTTP requests from clients: your web browser or mine.

The whole point of the Internet is that it abstracts away physical location so that you can consume data created elsewhere: data in the form of textual input/images/ and technical assets such as CSS/Javascript files (NYT’s digital website) or video (Netflix) or in the case of things like Kimono which creates what is known as an Application Programming Interface out of static websites, a structured auto-updated data feed that can be interpreted by your server so you can, for example, scrape data from Yahoo Finance and create your own auto-updating personal dashboard of leading stock picks.

Now the reason why the net neutrality debate has focused on bandwidth and speed of transfer rather than the fundamentals of the Internet are because most people approach it from a user point of view rather than a server/builder point of view, as there are vastly more Internet users than builders so we focus on the paid connections clients have to use to access servers.

Net Neutrality with code(love)

Net Neutrality with code(love)

The crux of the debate isn’t that your Netflix is slower than it should be or that the “tubes” carrying data are filled up and so you will get shittier Internets.

The real core of the debate is that from the builder side, if one were to discriminate based on content type or volume, services like blogs, peer-to-peer cryptocurrency, and more would be threatened because as soon as they show business viability, a monopoly in another industry can arbitrarily decide to toll them either to discourage that growth or to profit from it as much as possible.

This kills innovation. We saw it with the destruction of Google Wallet and the degradation of bittorrent. We will see it when the next Netflix or Spotify fails to ever start because the cost of paying monopoly fees at an early stage will crush any hopes of late-stage returns.

The real argument around net neutrality is whether you trust a monopoly of telecom companies, users, or the government to determine what services the Internet should provide.

I obviously prefer users, but given that the power of the government is being balanced with corporate power, I lean towards the former not because I love governmental intervention but because it is the lesser of two evils. The US government barring its recent spate of backdoor hacking has done a reasonably good job with, for example, giving more power to ICANN (the organization responsible for managing the domain name system) so that innovation is spurred by non-government sources.

Meanwhile, new technologies have constantly been attacked by ISPs.

https://www.techdirt.com/articles/20…-problem.shtml

“Even in the U.S., there have been some major violations by small and large ISPs. These include:

The largest ISP, Comcast, secretly interfering with peer-to-peer technologies, including some of the most popular basic technologies used to distribute online TV and music (2005-2008);

A small telephone ISP called Madison River blocking Vonage, a company providing competing telephone service online (2005);

Apple blocking Skype on the iPhone, subject to a secret contract with AT&T, a company that competes with Skype in providing telephone service (2008-2009);

Verizon, AT&T, and T-Mobile blocking the functionality of Google Wallet on Nexus devices, while all three of those ISPs are part of a competing mobile payments joint venture called Isis (late 2011- +today);

and Comcast’s disputes with Level 3 and Netflix over termination fees, and the appearance that Comcast is deliberately congesting its network connections to force Netflix to pay Comcast for an acceptable connection (2010- +today).

In other countries, including democracies, there are numerous violations. In Canada, rather than seeking a judicial injunction, a telephone ISP used its control of the wires to block the website of a union member during a strike against that very company in July 2005. In the Netherlands, in 2011, the dominant ISP expressed interest in blocking against U.S.-based Whatsapp and Skype.”

I don’t want to live in a world where monopolistic ISPs determine what innovations thrive and which ones die.

IN SUMMARY

The fundamental problem in net neutrality isn’t how fast services can be rendered to clients, it’s that if ISPs have their way, those services users want will never get the chance to prove themselves and survive.

Photo credit: https://www.flickr.com/photos/36540382@N08/3419555567/