Roger Huang

Roger has worked in user acquisition and marketing roles at startups that have raised 200m+ in funding. He self-taught himself machine learning and data science in Python, and has an active interest in all sorts of technical fields. He's currently working on boosting personal cybersecurity (youarecybersecure.com)

Cryptocurrency/Blockchain, Data Science/Artificial Intelligence, Learning Guides, Web Development/UX Design

The Best Programming Language to Learn: a Definitive Guide.

Most people approach me often ask the same question: what’s the best programming language to learn? The answer is: it depends. I wrote an article that declared the mathematical and analytical skills behind programming are what really matter. Now, I’m a bit wiser –so I’ve had the time to break it down into a more tangible and useful answer.

What is the best programming language to learn? It depends, and you can be much more efficient with your time by knowing which programming language is the best for what you want.

So I’ve broken down the best programming language to learn for a variety of needs. I took into consideration the amount of time you need to invest in a programming language and the power you need for different tasks.

You want a versatile, general-purpose language that can be narrowed to different tasks without too much hassle.

The best programming language to learn, source: Pixabay

Python is a programming ecosystem with a vast array of communities and libraries for different use cases. From Django for web development to Pandas for data, Python is the Swiss-army knife of programming languages. Its syntax is also very approachable, and there are tons of tutorials and documentation for beginners. These libraries tend to be almost like learning a new syntax or paradigm.

Still, the ability to import libraries of different kinds and have a relatively consistent experience puts Python up here. If you want a simple intro-level programming language, Python is a great choice. With the second most active community on Github (at about slightly under 15% of all active users), you’re sure to find many projects and usable components to play with in Python.

Python Resources:

Learn Python

This step-by-step tutorial teaches Python in an accessible manner. It makes it easy for you to go through the basics of everything from data structures to how to structure functions. That makes it ideal for people who don’t have programming experience.

11 Beginner Tips for Learning Python

This set of tips is a handy primer for not only learning Python, but really a generalizable way to learn and practice all kinds of different programming languages.

Zen of Python

The Zen of Python is more philosophical than practical. Still, it serves as a useful reminder of the ideals of Python programming and the ideals one should strive for. Simple, after all, is better than complex.

Codecademy: Learn Python for Free

This free interactive Codecademy course is a great way to start with Python basics and syntax. Use it to cement the theory you’ve learned and start practicing with Python.

Web Development Using Python and Django

Python is versatile mostly because there are tons of documentation and frameworks. Django is a content management system built on Python. This curated curriculum will help you learn what you need to build fully-fledged websites with Python by tapping into Django.

You’re interested in working with data, in a data analysis or data science capacity or as a data engineer/machine learning engineer

The best programming language to learn, source: Pixabay

When it comes to the data ecosystem, you’ll want to learn SQL as a domain-specific way to work with data. However, SQL is not a general purpose programming language, but merely a utility to deal with one data type over another. You can think of it as a complex interface to the .sql data format.

There are two obvious choices here, R or Python. Academics tend to use R. It used to have the bulk of good data visualization and analytics libraries. Now, however, the open-source Python community has sprinted to catch up. With the advent of machine learning, the balance has shifted towards Python.

Previously, I wrote about both R vs. Python a few years ago. I came out with the conclusion that both had their uses. It was perhaps best to learn both. Practically speaking however, if you’re dealing with large amounts of data ,Python gets the slightest of edges here as the best programming language to learn for data purposes, especially if you’re coming from a programming background in the first place — it’ll be easier for you to work with Python’s syntax than R.

Python and Data Resources:

Data Science Sexiness, R vs Python

I wrote this guide describing the differences between R and Python, and listed a bunch of learning resources for both. I concluded it might be best to learn both, but I’ve since become immersed in the Python ecosystem when it comes to data.

Introduction to the Machine Learning Stack

I wrote this tutorial which summarized the frameworks and libraries you need to know to get started doing machine learning with different frameworks, most of which have Python ports or APIs so you can write code in Python (or in any case, Pythonic syntax) and get started.

Pandas Cookbook

Pandas was where I really started practicing programming: wrangling datasets is a passion of mine. This tutorial walks through how to use Pandas with an example dataset. You’ll learn how to import data of different formats, transform it in different ways, and then extract and export it.

How to do Common Excel and SQL Tasks in Python

Another tutorial I wrote helps you port some of the logic and functions in both Excel and SQL to Python. Do everything from importing data to analyzing it in the summary form or filtered form you’ve come to expect.

Machine Learning in Python

This curated curriculum takes your Python skills and helps you learn machine learning theory. By pairing the two, you can start working on machine learning projects by the end.

You want to build mobile applications that require access to native functionalities such as a phone camera

The best programming language to learn, source: Pixabay

Depending on what ecosystem you want to build in, the best language is quite selective. If it’s the Android ecosystem, you’re going to have to learn Java.

Meanwhile, if you’re interested in building for the iOS ecosystem and getting placed on Apple’s App Store, you’ll have to learn Swift. Swift is Apple’s official programming language for its laptops based on MacOS, iOS, or for Apple Watch apps.

There are other ecosystems such as Microsoft, which needs C#. There are also cross-platform programming languages such as React Native. Microsoft doesn’t have as much market share as either Android phones or iPhones. Reach Native doesn’t have access to as many of the specific native functions on either device (and you’ll have to compile down to Swift or Java to get those features). Still, they’re handy languages to know about, even if they might not be the best — unless you were trying to launch on as many platforms as possible.

Resources:

Android Application Development

Learn the ins and outs of Android application development, from building an application to how debug common issues.

Introduction to Swift

This interactive set of courses will help you get through the basics of Swift and building iOS applications. You’ll pass to an intermediate stage/course once done.

400+ Swift Language Video Tutorials

If video learning is more of your thing, look no further than this series of video tutorials on Swift topics. They’re broken down into sets of continuous playlists, so you can pick and choose a particular curated playlist or choose a particular topic to focus on.

React Native Tutorial

This React Native tutorial and documentation from Facebook is a fairly comprehensive overlook on how the versatile cross-platform framework works.

Expo

If you want to speed up your app development cycle across multiple platforms and want to stick to using JavaScript for your mobile app coding, Expo can be a quick, iterable solution.

You want to build the latest web applications

The best programming language to learn, source: Pixabay

For web development, PHP used to be the default, powering everything from e-commerce sites to WordPress itself. Now though, most people have shifted to JavaScript and different frameworks within it. There’s a bit of a fight going on between the major tech companies on building web development interfaces, with Google sponsoring Angular.js while Facebook builds React.js. In practice, these mega-corporations are building the most recent web development frameworks for their needs and then open-sourcing and supporting the developments.

Both are doing it on JavaScript, so if you want to build the latest and greatest in web applications and benefitting from their work and others, look no further. JavaScript is the best programming language to learn for cutting-edge web development applications. It boasts the most active community on Github with over 22% of all active users participating in the JavaScript community.

JavaScript Resources:

An Introduction to JavaScript

This wiki gives you a broad overview of JavaScript and how it serves web content. You’ll understand basic concepts like how JavaScript interacts with browsers once you’re done. You can then take the next step towards learning more powerful frameworks.

jQuery Intro

jQuery is a powerful JavaScript library that allows you to do powerful things such as animations with a one-word function. Use this tutorial to grasp the basics and combine it with HTML and CSS to serve dynamic web content easily.

How to Learn React — A roadmap from beginner to advanced

This roadmap will help you conceptualize your roadmap for learning JavaScript frameworks like React.js.

React.js, Codecademy

React.js is a powerful framework to create web interfaces. Practice with this Codecademy course.

MEAN Stack Tutorial

This tutorial will summarize all of your theory-based learning towards building a MEAN stack application that will serve as a Reddit clone. This is a full-featured web app that has user authentication, databases through MongoDB, routing and linking through Express and a back-end server through Node.js and a combination of Angular (though React can also be used in this situation). At the end of this tutorial, you should be able to extend your learnings and build full-fledged web apps.

You need to do something that requires very high performance (ex: cryptography)

The best programming language to learn, source: Pixabay

For tasks that require a lot of compute power and manipulation of lower-level processes such as dynamic memory allocation, it’s best to work in C++. Lower-level tasks require more efficient implementations of memory and space and involve working closer to hardware in order to get higher performance. C++ is lower-level than all of the languages discussed about yet is also still readable and compilable enough so that with some practice, you can be conversant in it.

Python can access lower-level functions with something called Cython using the C programming language. Bitcoin is coded in C++, including its advanced cryptographic features. In order to do something at a highly performant level, you’ll likely need to access C++ and its superior lower-level flexibility.

C++ Resources:

Introduction to C++

This edX course, provided by Microsoft, will help you get started with C++ and its basics.

C++ Language

This wiki helps you tackle C++ from A to Z. There’s different sections dedicated to everything from how to write functions in the language to how to deal with different variables and types.

C++ Codecademy

Consolidate all of the theory you’ve learned by practicing with this free C++ course with Codecademy.

Cython Tutorial

Cython allows you to access C++ functions while using Python, combining the versatility of the Python ecosystem with the power of C++.

C++ Cryptography Libraries

If you want to look into advanced functions such as cryptography, look through this list of C++ cryptography libraries to get you started.

—–

I hope this tutorial has helped you determine what the best programming language to learn for you. If you have any questions, feel free to ask me at [email protected]. Please leave a comment below if you want to give feedback or if you think I’m missing something 🙂

Learning Guides, Quantum Computing

A Comprehensive Introduction to Quantum Computing

If you’ve heard about quantum computers, you might get the itch to start working on something in the field. What is quantum computing? How do you get started?

Full disclosure: I’m not an expert in the field. I’m just a regular (self-taught) coder. I compiled this tutorial because I was interested in exploring quantum computing. The goal was to define the use cases that made it stand out from classical computing. I also didn’t want to dive too deep into the quantum physics part. Many of the explanations below will be basic, and assume that you have little context in quantum computing.

Also, if this is inartfully explained, or flagrantly wrong, I welcome feedback and will make corrections. And if this is helpful, I appreciate knowing as well 🙂

Introduction to Quantum Computing

Unlike classical computing, quantum computing uses quantum phenomena that intersect with mechanical properties, such as superposition and entanglement. Binary code stores data in either a definite 0 or definite 1 state. Quantum computing uses qubits: bits of data that can coherently rest in a combination of 0 or 1 state probabilities. A qubit can theoretically hold more data than a classical bit. Unfortunately, it is impractical to store a large amount of information in a qubit due to how measurement disturbs a quantum system. To get any further, we have to define three concepts.

Quantum superposition: Quantum superposition allows quantum bits (qubits) to coherently hold together many states of data until the data is decomposed. A piece of data can coherently be in two states before it is measured as one. The most well-known example of this is Schrödinger’s cat. A which posits that a cat might be simultaneously alive or dead in a sealed box based on the probability that a poison might be leaked inside. Only once the observer lifts the sealed box is the final state of the cat revealed. Quantum superposition works metaphorically the same way.

Quantum superposition is what allows quantum computing to be extraordinary. The ability to superimpose extraordinary amounts of data allows for much faster calculations than can be done in classical computing. Mathematically speaking, quantum superposition allow qubits to be linear combinations of different quantum states rather than fixed, mutually exclusive categories. This is what allows for a qubit to store more classical information than the strictly binary classical bit.

Quantum entanglement: Entanglement refers to the correlation between different quantum-level molecules. If one entangled molecule has a clockwise spin, another entangled one might have a counter-clockwise spin, no matter the distance between them. This happens with large molecules and even some small diamonds.

Entanglement means you have to read a whole system of data rather than individual data points. The “information” contained in entangled quantum data includes how the entire system is structured. You cannot isolate information from individual molecules or parts.

This is the beginning of the constraint of quantum computing. Quantum states can capture more data, but you have to capture the entire entangled system to do something useful with it. Recent scientific advances in maintaining the lifetime of quantum entanglement have helped push quantum computing further.

Quantum decoherence: Decoherence is the bogey-man of quantum computing. Whenever quantum states are exposed to an observer they start decomposing, meaning information gets lost as time goes on. Quantum decoherence is a major bottleneck to quantum computing at scale.

TLDR (too long didn’t read): Quantum computers are amazing because they can collapse a lot of data into quantum states rather than just the old “0,1” of physical binary code. You can make simultaneous calculations orders of magnitude above what you can do with your regular computer.

Yet, you have to deal with the messy problem of entangled quantum molecules. You have to read the state of the whole system rather than its individual components. And you have to do all that before the state of the system loses coherence with the passage of time.

quantum computing
The even more TLDR version

Quantum Use Cases

What can that extraordinary quantum computational power allow you to do beyond classical computing if you’re able to capture the data in a coherent manner? Here are some examples.

Quantum annealers

Perhaps the most well-known example of quantum computing is D-Wave. One common misconception is that D-Wave is building full quantum computers. They’re really building quantum annealers. What’s the difference? In summary, you can use a quantum annealer to find a local “good enough” minimum much faster than a classical computing context, making quantum annealers ideal for factoring numbers and network analysis/optimization. Complex machine learning models can run on a quantum annealer in much less time if you don’t care as much about finding the absolute best answer. Yet, quantum annealers are not set up to run full quantum algorithms.

Boeing uses quantum annealers to facilitate plane research, and healthcare providers use them to calculate the optimal radiology treatment with cancer patients.

Yet, you won’t be able to run Shor’s algorithm on a D-Wave quantum annealer or any full quantum algorithm, and so you wouldn’t be able to use D-Wave to fully crack cryptography patterns (except on a limited basis). That requires a universal gate quantum computer, a different beast than a quantum annealer.

Shor’s algorithm

There is a comprehensive catalog of about 50 quantum algorithms. Among the most interesting of those would be Shor’s algorithm which can solve for the prime factors of very large and complex numbers. When people talk about securing devices, blockchains and more for a “post-Quantum” world, they are talking about a world where a quantum computing device is able to calculate Shor’s algorithm and break certain parts of modern cryptography .

Grover’s algorithm

Grover’s algorithm helps reverse functions: usually, given X input you find Y output, but here, with a given Y output you can find the X input that initiated it. This is useful for database search. You can search to find a given X and whether it is present in a certain set of data. It could also be used to reverse-engineer user credentials. This might allow attackers to create counterfeit blocks on a blockchain or steal user passwords.

Quantum algorithms in general

Algorithms that are better processed in quantum settings than in classical computing are plenty: there are about 50 examples, ranging from verifying matrix products to Pell’s equation, with polynomial to superpolynomial (exponential) speedup over their classical variants — though whether those speedups are still present after rigorous testing is still an academic matter.

Quantum programming frameworks

Now that you’ve run through some of the theory, what programming frameworks are out there to implement quantum computing concepts?

Qiskit

Qiskit is an open-source quantum computing platform developed in collaboration with IBM’s Q platform. You can run it on quantum computers built by IBM. This allows educators, researchers and businessmen a first look at the possibilities of quantum computing without having one themselves.

Resource:  Qiskit-tutorials, available on Github, is a series of Jupyter notebooks that go into the basics of programming with Qiskit. They are community notebooks that serve as both interactive tutorial and a wiki of sorts on quantum computing in general.

Q#

Called “Q Sharp”, this is Microsoft’s effort to join the quantum computing fray. Most Q# subroutines will run on a simulator instead of an actual quantum chip. Microsoft’s Visual Basic Studio supports Q#. As Microsoft offers more quantum products, it will become the de facto language of the Microsoft quantum computing ecosystem.

Resource: With this quickstart tutorial, Microsoft gets you up to speed with how to use Q#.

QCL

QCL is a high-level programming framework for quantum computing that abstracts away some of the physics associated with quantum phenomenon.

Resource: This simple primer offers an explanation for the roots of QCL and its similarity to existing traditional computer science languages, with a few specific differences (such as the dump function which returns the current quantum state of all qubits) that make it suited to quantum computing, but comfortable enough for traditional computer scientists.

Project Q

Project Q is an open-source programming framework for quantum computing developed at ETH Zurich. It features a high-level programming language for quantum programming, the ability to customize the compiler, and specific libraries to solve for quantum problems. You can run Project Q on quantum simulators or run it on IBM’s 5-qubit quantum computer.

Resource: This Github repo filled with examples from Project Q code serves as a useful reference and tutorial to explore.

Cirq

Cirq is Google’s effort to address a chronic problem with limited-qubit quantum computers (namely error-correction). It’s a Python library you can install via pip (pip install cirq). It’s a useful tool that you can access right away if you’re running a Python environment.

Resource: Use this step-by-step tutorial on using Cirq on Medium to understand its capabilities.

D-Wave Leap

D-Wave Leap offers an interactive cloud platform where you can operate on D-Wave annealers online. You can work in Python and Jupyter notebooks and have immediate access to a D-Wave 2000Q quantum computer. You get a minute of free QPU time which you can use to solve between 400 and 4000 problems.

Resource: This link allows you access to a set of Jupyter notebooks where you can try D-Wave Leap.

Quantum Computing Careers

What are the career prospects of working with quantum computing? For now, the field is mostly academic in nature — and there are few commercial use cases. A search on Indeed.com returns no results for quantum computer or quantum programmer. There are some research roles/internships such as the following from Microsoft. With IBM, Microsoft and Google making big bets in the space however, more quantum careers are surely coming.

quantum computing

Quantum Computing Resources

If you want to follow the space, here are a few great communities and resources to keep track of.

Quantum Bits

This Medium publication hasn’t been updated recently, but it features many interesting articles on quantum computing concepts. Anastasia Marchenkova, a quantum physicist whose passion is quantum computing, writes most of the content.

Microsoft Quantum Computing Newsletter

Microsoft offers a newsletter dedicated around the latest quantum computing updates as well as industry news. While it’s focused on selling Microsoft products, you can gain valuable insights here.

Quanta Magazine (Quantum Computing Section)

Quanta Magazine takes a different approach from the rest of the resources in this space, focused on quality storytelling. It acts as a compelling story-driven overview into advances in quantum computing and the people who make them.

Stack Exchange (Quantum Computing)

The Stack Exchange for Quantum Computing offers deeper answers on quantum computing theory and quantum programming frameworks.

Reddit Quantum Computing

Check out this subreddit for the latest trending quantum computing discussions and articles. With over 10,000 subscribers, it is one of the largest communities dedicated to quantum computing.

Quantum Computing Courses

Quantum Learning Algorithms (Coursera)

Coursera offers this course from Saint Petersburg State University. It covers quantum algorithms, including the two most common discussed (Shor’s algorithm, and Grover’s Algorithm).

Quantum Machine Learning (edX)

This free course offered by University of Toronto (it offers a verified certificate for $49 USD) will go over the use cases of quantum computing in machine learning, and where machine learning can benefit from quantum computing advantages.

Quantum Computing for the Determined

This free video series from Michael Nielsen goes over the theory of qubits in detail, allowing you to get an introductory view to quantum computing theory. Buckle up, finish the whole series, and you’ll be capable of tackling basic implementation of that theory.

Quantum Computation (MIT Open Courseware)

This free course on MIT’s open platform teaches the theory behind quantum computation. Professor Peter Shor , who was the inventor of Shor’s Algorithm, teaches it.

Quantum Computing: Lecture Notes

These set of notes about quantum computing by Ronald de Wolf (a full-time professor at the University of Amsterdam) serve as a text-heavy and notation-heavy deep dive into quantum computing topics. Regard it as a textbook for whenever you need a deep dive on a particular subject.


I hope you enjoyed this introduction — I’d love feedback on what specific topics and resources I can build in the space. Comment below if you have any ideas!

Uncategorized

What is Digital Literacy? A Comprehensive Guide

What is digital literacy?

“Digital literacy is the ability to use information and communication technologies to find, evaluate, create, and communicate information, requiring both cognitive and technical skills.” is the textbook definition given by the American Library Association. At code(love), we think it has to go further.

Digital literacy involves a set of foundational skills that are required to navigate the 21st century. These new 21st century skills will allow anybody to navigate the emerging technologies of today. It will empower everybody to fully interface with the rich ecosystem of applications and digital services that are being developed. 

digital literacy

Why does it matter?

With high job satisfaction for technical jobs such as data scientist, high compensation levels, the ability to create and interact with new digital technologies has never been more important. 

Digital literacy skills are needed to thrive in a world where many of the world’s richest companies are software and hardware technology companies such as Facebook, Google, and Microsoft. 

It also matters because of the flipside. 72% of Americans are scared of a future where they think robots and machines do most of the jobs accorded to humans. That’s almost twice as many as those excited about that possibility. The divide in politics doesn’t seem to between liberals and conservatives so much as people who embrace the future or people who are afraid of it.

Today’s students are going to be confronting a world that is very different than what their high schools and universities are preparing them for. Even these so-called digital natives will need to quickly up their information literacy skills for the 21st century.

The digital divide between those who are digitally literate and those who are not will soon extend to wealth and life outcomes across the board as the digital world takes over.

We have to dig deeper into the specific components that underlie digital literacy and these new literacy skills with how much it matters. 

What are the specific components of digital literacy?

  1. The ability to find relevant and reliable information
  2. The ability to work with applications
  3. The ability to build a relevant audience
  4. The ability to build a website
  5. The ability to make payments and hold balances securely 
  6. The ability to understand and control your own data 
  7. The ability to understand new technologies

Let’s go look in-depth into each item:

1- The ability to find relevant and reliable information

The ability to find relevant information is how search engine Google has built a multi-billion dollar business. In 2017, people were producing about 2.5 quintillion bytes of data a day, most of it unstructured and hard to query. The Internet isn’t just the world’s largest container of data: it is also its largest attempt at structuring and classifying that data.

In order to be digitally literate, you should navigate that large realm of data and be able to pick out pieces of data and navigate the web.

This is an increasingly relevant skill in a world where media sources are disputed and where more and more authentic replicas of human behavior are being created: take a look at this photorealistic video of President Obama whose words were completely faked using artificial intelligence. The ability to be able to tell what information is relevant, credible and substantive is critical for digital literacy. 

Digital content can be filled with inaccuracies. Determining reliable sources is a critical digital skill to have. It’s a critical part of 21st-century skills to have this new form of media literacy and understand digital media to be able to get the best information possible.

A nation with many digital citizens should have ready internet access, a way to curate and access information, and a way to quickly get relevant data.  

Sample Stat: Only 17% of people are illiterate now in 2018. This was a reversal from 1820 when only 12% of the world could read and write. Hopefully, digital literacy will follow the same trend and as 80% of people will be able to find relevant information on the Internet. 

Skills Required:

  • Reading comprehension
  • Writing or voice-to-text capability
  • The ability to quickly navigate search engines and get the most relevant results
  • The ability to authenticate information via secondary sources
  • The ability to verify providers of information and data

In general, you should be able to write out or communicate your search intent in a way that helps frame the most helpful results, understand how search engines surface certain results and the algorithms they use to determine the best results, and you should be able to quickly evaluate new sources of data for authenticity and reliability.

Resources:

Global Search Engine Market Share In The Top 15 Countries By GDP

This Medium article uses StatCounter to suss out which search engines have the most penetration and market share per each market. Google tends to dominate in most countries with above 70% search engine market share — though Yandex leads in Russia, and Baidu leads in China, while Yahoo has a significant share as a search engine in Japan.

How to Search on Google: 31 Google Advanced Search Tips

This guide for search modifiers will help you tailor down your search patterns to exactly the sort of information you’re looking for on the world’s most popularly used search engine.

2- The ability to work with applications

The world is run with different digital applications. If you’re a salesperson or somebody who has to chase down a list of people as part of your work, you’ve probably used customer relationship management software to track down everybody .

Your day-to-day routine might involve looking through social media applications and all sorts of different work and productivity apps, from spreadsheet software to document processors. Understanding how to work with these tools is a critical part of digital literacy.

The ability to navigate online communities, social networks and more and leave your own digital footprints is a critical part of digital citizenship as well — without participating in the digital discourse and lending your voice to it, your perspective may get lost in a world that has shifted from analog to digital.

Sample Stat: There were 171.8 billion mobile app downloads worldwide in 2017.

Skills Required:

  • Reading comprehension
  • Writing or voice-to-text interface capability
  • The ability to quickly navigate application user interfaces
  • The ability to navigate accessibility issues
  • The ability to use shortcuts
  • The ability to recognize app interface cues

Resources:

Usability 101: Introduction to Usability

This handy guide dives into what makes a website easier to access and lays down a process for how to make apps more usable. It then runs over why usability itself is critical. These ten usability heuristics help dive into the rules behind making sites easy-to-access. 

Accessibility for iPhone and iPad: the Ultimate Guide

This guide runs through how to interact with an iPhone or iPad, two of the most popular screen interfaces for browsing the web. Learn how to do everything from accessing voice commands to increase the legibility of text.

3- The ability to build your own website

From being an application user, the next important step for digital literacy is to be able to build your own online media. In order to be fully digitally literate, it’s important not just to be a consumer and user, but also a producer or curator.

Having the ability to build your own website brings a whole new world of potential. It is akin to the writing aspect of literacy. It means the difference between merely absorbing the Internet and browsing it to being able to broadcast one’s thoughts on it — taking full advantage of the two-way street the Internet was always meant to be.

You can build simple webpages that help you do everything from displaying your CV and portfolio to sharing your thoughts on different matters, without a line of code. You might build a virtual store to sell your wares. Or you might share your business. With some basic knowledge of code, you can build so much more. 

Sample Stat: 1 billion websites were created in 2015. There are close to 2 billion in 2018. Out of those 2 billion, only about 200 million (or 10%) are active.

Skills Required:

  • Reading comprehension
  • Writing or voice-to-text interface capability
  • Ability to work with applications/landing page generators
  • Ability to interact with text editors
  • Ability to understand basic HTML/CSS and ideally some JavaScript

Resources:

HTML and CSS Basics

This interactive tutorial helps cover the steps and resources you’d need to understand HTML and CSS, the building blocks of the modern Internet. Once you understand HTML and CSS, you’ll understand how the skeletons of websites are built, and you’ll be able to analyze different webpages.

Website Builders

This review of different website builders gives you a handy way to build your own webpages even if you don’t know any code. 

4- The ability to build a relevant audience

Reddit co-founder Aaron Swartz once said that “Everybody has the right to speak on the Internet, what matters is who is heard.”

The ability to create a website or application means very little if you don’t understand how to draw a relevant audience to it, and if you don’t understand how content is surfaced to users around the world.

Writing something, after all, isn’t the same sharing it with millions of people around the world. The ability to make an impact on the Internet means getting your content seen by a targeted audience at scale.

This means working with digital marketing techniques and understanding how to spread content with social media and a variety of digital tools. It means knowing how search engines rank content and then using that knowledge to help showcase your content to people around the Internet.

Sample Stat: Out of the Alexa Top 50 websites by visitor traffic, the top ten only has three countries represented: India, China, and the United States.

Skills Required:

  • Reading comprehension
  • Writing or voice-to-text interface capability
  • Ability to use analysis and statistics tools for web traffic such as Google Analytics
  • Understanding of social media platforms and how to use them to distribute content
  • Understanding of search engines and how to use them to distribute content
  • Understanding of how social communities evolve on the Internet, and how to post and distribute content within those communities (ex: Reddit).

Resources:

Digital Marketing Made Simple: A Step-By-Step Guide

Neil Patel has made his living building large audiences for his ventures. Here he walks through all of the different tactics and approaches you can use to build your own relevant audience on the web.

SEO Starter Guide

This guide by Google will help you understand what it takes to rank in their search engine index. While everybody can create content, it’s really content that holds staying power in search engine rankings that creates lasting impact. Getting ranked on Google and other search engines the right way and with the right relevant keywords will certainly help you drive relevant audiences.

5- The ability to be able to make payments and hold balances securely

As the Internet gradually moves to a place where payments become part of the infrastructure, to become digitally literate is to combine your financial ability with your technological capabilities.

A decade ago, only about 5% of all retail operations were conducted on the Internet in the United States: now in those same categories, about 13% of retail sales are conducted online. In 2017, online retail sales to American customers crossed the $450bn mark, with rapid year-on-year growth of 16% from 2016.

With a growing amount of payment processors vying to help you send money online from Apple Pay to China’s WePay, it’s clear that e-commerce, unlike the heady days of the early 2000s Internet bust, is here to stay.

This has only been accentuated with the rise of blockchain technologies and cryptocurrencies, new entirely virtual monetary technologies. It’s been accelerated with a drive to online banking. With virtual assets coming into play and more real-world assets being digitized, the critical skill of being able to understand how to securely maintain balances online and to deal with transactions online will grow ever more important.

Sample Stat: According to a survey of 2,000 Americans, only about 8% of Americans hold digital cryptocurrencies.

Skills Required:

  • Reading comprehension
  • Ability to write or give voice-to-text commands
  • Basic statistics knowledge
  • Understanding of safe practices around authentication and passwords
  • Understanding financial interfaces around value transfer
  • Basic knowledge on how to maintain privacy and security on the Internet

Resources:

The 15 Most Popular Online Payment Solutions

The following list of payment solutions will get you introduced to the services that help you both receive and send payments online.

Free Introductory Course to Digital Currencies

This online video series will teach you about the foundations behind digital currencies and how they have evolved into the current stage of financial and technological innovation. It will run over the basics of the blockchain, Bitcoin, and cryptocurrencies.

6- The ability to understand and control your own data 

We all generate data as we interact with the Internet. A critical part of understanding the Internet and how to use it safely and consensually is to understand what data is captured from us, and to navigate how and where we can consent to particular uses of our data. We can then navigate the trade-off between our attention and the data we generate for a company with the utility that the company provides us.

We can also make sure that our data is private and that we can deliberately choose who we share it with for whatever purpose we want and we can make conscious choices to avoid companies that violate our data principles. By browsing on the Web, we give away data about ourselves constantly. Having control over that data lets us keep our privacy and security while benefitting from applications. 

Sample Stat: 93% of Americans believe it is important to be in control of who gets information about them.

Skills Required:

  • Reading comprehension
  • Ability to write or give voice-to-text commands
  • Understanding of how data is processed on the web and transmitted
  • Understanding of what data is used for
  • Basic knowledge on how to maintain privacy and security on the Internet

Resources:

How to Encrypt Your Entire Life in One Hour

This handy guide will walk you through how to leave as little of a digital profile as possible by using encrypted chat and by making sure that the data you share with the world is the sort of data that you want shared.

Europe’s New Privacy Law Will Change the Web

This article talks about the sweeping new changes new European privacy legislation will bring (GDPR) and serves as a case study of how legislation can affect collective and individual data rights.

7- The ability to understand new technologies

As new technologies evolve, the ability to master them serves as the ultimate foundation of digital literacy. In order to be fully digitally literate, you need to have the foundation to be able to anticipate new technological advances, and to be fully ready to be an early adopter or creator with new trends.

We live in an age where each year brings drastic innovation, from biotechnology advances that allow individuals the power of modifying genomes to artificial intelligence models that can help individuals do tasks that once would have taken thousands of humans to do. To be able to understand those advances and create with them will help take and extend your digital literacy to the point where it is flexible and malleable to new advances, just like a full grasp of literacy allows you to understand and take in new ideas.

Sample Stat: Americans are more afraid of robots than death.

Skills Required:

  • Reading comprehension
  • Ability to write or give voice-to-text commands
  • Basic statistics knowledge
  • Ability to work with applications/landing page generators
  • Ability to understand basic HTML/CSS and ideally some JavaScript
  • The ability to quickly navigate search engines and get the most relevant results
  • The ability to authenticate information via secondary sources
  • The ability to verify who is a provider of information and data

Resources:

Learning How to Learn

Drawing from her background learning engineering, Dr. Oakley introduces powerful mental frameworks and tools to quickly and efficiently work with new information and challenges. It’s a powerful primer on how to adapt to an ever-changing world where information is king.

The Gartner Hype Cycle

The Gartner Hype Cycle walks through the different stages of excitement a new technology brings, and how it can solidify to lasting change. You can use it as a framework to place new technologies into a certain mindset.

Digital literacy shouldn’t just be a rehash of literacy principles for the digital age and our new digital world. It should be a whole new set of metrics and capabilities that can be measured as an indicator of whether countries and nation-states and their citizens are ready for the 21st century. By evolving our understanding of what digital literacy means, we can more meaningfully prepare people for a future too many are currently afraid of.  

Data Science/Artificial Intelligence, Learning Lists

21 of the best free resources to learn SQL

I self-taught myself SQL after I bombed a technical interview that involved SQL. It got me a bit mad at myself, so I went ahead and started looking for different resources to help me practice and learn SQL. I wasn’t looking to spend any money so I focused on getting the best free resources. The list below is the fruit of my efforts. I hope that it helps you on your journey to learn SQL.

It can be handy to learn SQL. You might be looking to do data analysis with it, or you might look to learn SQL to help answer different questions you have on your data. A quarter of data users use MySQL databases — to get access, you’ll need to learn SQL.


1- Codeacademy

Codeacademy is one of my go-to resources for learning programming because it involves interactive exercises where you get immediate feedback. This is what makes the practice and learning of programming so fluid and rapid. Their section on SQL meets the same bar as the rest of their courses, and of course, it’s free.

Learn SQL

Source: author screenshot

2- Stanford Databases Mini-Course

This free self-paced mini-lesson on databases and SQL allows you to follow along with what top computer science students at Stanford are learning — all this at a self-directed pace, and for the total cost of zero dollars.

3- Head-First SQL (O’Reilly)

This book by O’Reilly entitled SQL Head First helps you dive deep into SQL topics, with tons of examples and writing to help you learn SQL. It can be completely free with a free trial of their Safari product (though the trial will run out). I classified it as a free resource as it doesn’t seem like O’Reilly’s paywalled it.

4- SQL Tutorial – W3Schools

W3Schools offers a text + exercises tutorial that breaks down individual SQL functions. I like looking through individual-level examples of different functions, especially when I’ve forgotten the nuances of one or another. You’ll see that several other providers do the same thing as W3Schools, but for me, W3 is the best resource out of the lot.

5- SQLZoo

SQLZoo is one of the coolest free SQL resources out there. The set of rich, interactive exercises placed in real-world settings is about as close as you’ll get to working with SQL in complex production-level environments without actually being hired. I used it a lot to practice SQL as I was picking up data analysis skills and I still use it every once in a while as a refresher.

learn sql

Source: Author screenshot

6- Mode Analytics

Mode Analytics offers a cool, intuitive interface along with a case study like approach involving Crunchbase data. It can help you rapidly place your SQL skills in a context that is like real-world uses, and once again, it’s free. They’re trying to upsell you to their paid product in offering this educational resource. That type of content always does well however, because it’s never priced and the focus of the team in question is to deliver maximum quality and substance for your learning objectives.

7- Sololearn 

This Sololearn course is interesting because it has different quizzes to make sure you retain material as you’re learning it. It follows the same conventional structure of teaching you how to select and query data, then teaching you how to sort and manipulate it within a data table, finally moving to joining and modifying tables. Notably, there isn’t much here in the way of aggregation functions or subqueries, which other courses above will teach.

8- SQL for Beginners (Youtube)

If you prefer learning with video content, this 1-hour video tutorial on SQL might just do the trick for you. It starts going over the basics of SQL, then shifts into the end by talking through different functions and examples on video. It is however meant to be an introduction — don’t expect to become an expert in SQL just by watching this video.

9- SQL Tutorial

SQL Tutorial offers an interface like W3Schools without the exercises. In case none of the resources above it seem to be working for you, this is another free resource where you can break down and learn SQL functions one-by-one.

10- SQLCourse

SQLCourse offers a curriculum with specific exercises for different functions. It is another handy way to practice your SQL skills for free. The interface is a bit janky, but it’s workable.

11- Welcome to SQL (KhanAcademy)

If you prefer a set of curated video tutorials that cover different topics, then look no further than the KhanAcademy course that introduces SQL. It runs through to the basics until aggregating data — there are projects in between as well. If you’re already on KhanAcademy, this could be a perfect transition to you learning SQL.

learn sql

Source: author screenshot

12-TutorialsPoint

Yet another text + examples format of teaching SQL, TutorialsPoint does offer more of the same: but if you’re looking for more and more practice or different ways to digest SQL material, it might be handy enough for you.

13- SQLBolt 

SQLBolt is constructed a bit like a virtual, interactive book on SQL. You can read through it and then go through each exercise, feeling like you’re being guided on your journey by the book-like structure. If you want a more focused and curated approach to learning SQL — this might be it for you.

 

14- Galaxql

Another interactive SQL course, meant to run in the browser. It’s free-to-use, but may use quite a few intensive resources as it’s running. (not sure if that’s just how old this tutorial is, or an honest warning). It is free to try if you want to take a look.

15- Use the Index 

Use the Index is a very useful advanced training site for SQL. Once you have the basics down, check out this site to see how to optimize your SQL databases and your queries so you can get maximum performance from it.

learn sql

Source: Author screenshot

16 – Schemaverse 

Schemaverse is a game built entirely on SQL. I’m a big fan of using gaming to motivate people to learn. I think it could show an exponential effect on motivation and on acquiring SQL skills if you’re somebody who wants to learn, but might get bored out of their mind with conventional tutorials or videos.

17- SQL Server Microsoft Documentation

The Microsoft documentation around SQL Server can help you understand how to use their software to manage SQL databases and tables — it’s offered for free, so you can take advantage and have a look even if you’re just curious.

18- SQLFiddle

SQLFiddle allows you to upload data and play around with different SQL queries to see what data comes back. It is a great way to practice your SQL skills, especially if you don’t have a real-world database to play with. You can take .sql files from elsewhere, upload them, and then try to manipulate the data contained within or query certain selections of data. It is also free to use — which never hurts.

19- MySQL Sandbox

MySQL sandbox allows you to install several MySQL servers in sandbox mode, allowing you to rapidly experiment, test and learn with different data files. It can be a useful way for you to practice what you can do with multiple SQL instances.

20- Stack Overflow (SQL) 

One of my favorite free resources to learn SQL or really any programming language is StackOverflow. This is an online Q&A community focused on technical topics where you can ask experts questions or see solutions to existing problems. It’s a fantastic community to immerse yourself in if you’re learning SQL. The specific SQL tag makes sure that you can stay focused on SQL topics.

learn sql

Source: author screenshot

21- r/LearnSQL (Reddit)

Another great community for learning is Reddit — where you can see different resources being placed and ranked by users as the days go by. Take part in discussions here or take advantage of articles or discussions posted to spur your own learning exponentially forward.


Hopefully, all these free resources are helpful to you on your journey to learn SQL — they were helpful to me as I started learning. While I leaned on some a bit more than others, I’m always a big fan of curating as many resources as possible because my learning style (very text-heavy) may not be the same as yours.

A lot of research has been done on the predictive abilities of matching how a concept is presented with a preferred learning style. While few models meet the test, it does seem like digesting new materials under a whole bunch of different modes or media formats helps learning.

So if you want to learn SQL, it’s best to go through and try a few of these resources, then choosing and sticking to which ones you like and combating the learning problem from many different angles.  

Data Science/Artificial Intelligence, Learning Guides

Learn machine learning with Python: a free curated curriculum

How to learn data science and deep learning in Python

I recently wrote a 80-page guide to how to get a programming job without a degree, curated from my experience helping students do just that at Springboard. This excerpt is a part where I focus on how to learn machine learning in Python.

How to learn machine learning in Python is a very popular topic: with the rise of artificial intelligence, programmers have been able to do everything from beating human masters at Go to replicating human-like speech. At the foundation of this fantastic technological advance are programming and statistics principles you can learn.

Here’s how to learn machine learning in Python:

Sponsored link: 

Excel can be a powerful tool for data exploration and analysis when dealing with small data sets, but for anything more complex it often makes more sense to use Python. PyXLL lets you keep the best of both by integrating Python into Excel. You can use Excel as an interactive user interface and use Python to do the data fetching, cleaning and computation.

Python Basics

learn machine learning

Before you learn how to run, you have to learn how to walk. Most people who start learning machine learning and deep learning come from a programming background: if you do, you can skip this section. However, if you’re new to programming or you’re new to Python, you’ll want to take a look through this section.

Codecademy for Python

Codecademy is an online platform for learning programming, with free interactive courses that encourage you to fully type out your code to solve simple programming problems.

Introduction to Python for Data Science

This interactive Python tutorial is created by Datacamp, and is more suited to introducing how Python basics work in the context of data science.

11 Great Resources to Learn and Work in Python

This list of resources will point you to great ways to immerse yourself in Python learning. It’s a broad list filled with different resources that will help you, no matter your learning style.

Installing Jupyter Notebook

These are instructions for installing Jupyter Notebook, an intuitive interface for Python code. You’ll have all of the important Python libraries you need pre-installed and you’ll be easily able to export out and show all of your work in an easy-to-visualize fashion. I strongly suggest that you use Jupyter as your default tool for Python, and the rest of this learning path assumes that you are.

Statistics Basics

learn machine learning

In order to learn machine learning in Python, you not only have to learn the programming behind it — you’ll also have to learn statistics. Here are some resources that can help you gain that fundamental knowledge.

Khan Academy, Math, and Statistics

Khan Academy is the largest source of free online education with an array of free video and online courses. This section on Khan Academy will teach you the basic statistics concepts you need to know to understand machine learning, deep learning and more — from mode, median, mean to probability concepts.

Probabilistic Programming & Bayesian Methods for Hackers

This book will delve into Bayesian methods and how to program with probabilities. Combined with your budding knowledge of Python, you’ll be quickly able to reason with different statistical concepts. It’s a book the author gave out for free — and its deeply interactive nature promises to engage you into these new concepts.

Pandas

learn machine learning

The main workhorse of data science in Python is the Pandas data science library, an open-source tool that allows for a tabular organization of large datasets and which contains a whole array of functions and tools that can help you with both data organization, manipulation, and visualization. In this section, you’ll be given the resources needed to learn Pandas which will help you to learn machine learning in Python.

Cooking with Pandas

Julia Evans, a programmer based in Montreal, has created this simple step-by-step tutorial on how to analyze data in Pandas using noise complaint and bike data. It starts with how to read CSV data into Pandas and goes through how to group data, clean it, and how to parse data.

Official Pandas Cookbook

The official Pandas cookbook involves a number of simple functions that can help you with different datasets and hypothetical transformations you might want to do on your data. Take a look and play with it to extend your knowledge of Pandas.

Data Exploration and Wrangling

learn machine learning

Before you can do anything with the data, you’ll want to explore it, and do what is called exploratory data analysis (EDA) — summarize your dataset and get different insights from it so you know where to dig deeper. Fortunately, tools like Pandas are built to give you relevant and surprisingly deep summary insights into your data, allowing you to shape which questions you want to explore next.

By looking through your dataset from afar, you’ll already be able to understand what faults the dataset might have that will keep you from completing your analysis: missing values, wrongly formatted data etc. This is where you can start processing and transforming the data into a form that you want to answer your questions. This is called “data wrangling” — you are cleaning the data and making sure that it is able to answer all of your questions in this step.

Python Exploratory Data Analysis with Pandas

This article from Datacamp goes through all of the nuts and bolts functions you need in order to take a slightly deeper look at your data. It covers topics ranging from summarization of data to understanding how to select certain rows of data. It also goes into basic data wrangling steps such as filling in null values. There are interactive embedded code workspaces so you can play with the code in the article while you are digesting its concepts.

A Comprehensive Introduction to Data Wrangling

This blog article from Springboard is filled with code examples that describe how you can filter data, detect and drop invalid/null values from your dataset, how to group data such that you can perform aggregated analyses on different groups of data (ex: doing an analysis of survival rate on the Titanic by gender or passenger class) and how to handle time series data in Python. Finally, you’ll learn how to export out all of your work in Python so that you and others can play around with it in different file formats such as the Excel-friendly CSV.

Pandas Cheat Sheet

This Pandas cheat sheet, hosted on Github, can be an easy, visual way to remember the Pandas functions most essential to data exploration and wrangling. Keep it as a handy reference as you go out and practice some more.

Data Visualization

learn machine learning

Data exploration and data visualization work together hand-in-hand. Learning how to visualize data in different plots can be important is seeing underlying trends.

Beginner’s Guide to Matplotlib

This legend of resources on the official matplotlib library (the workhorse library for Python data visualization) will help you understand the theory behind data visualization and how to build basic plots from your data.

Seaborn Python Tutorial

The Seaborn library allows people to create intuitive plots that the standard matplotlib library doesn’t cover easily: things like violin plots and box plots. Seaborn comes with very compelling graphics right out of the box.

Introduction to Machine Learning

learn machine learning

Machine learning is a set of programming techniques that allow computers to do work that can simulate or augment human cognition without the need to have all parameters or logic explicitly defined.

The following section will delve into how to use machine learning models to create powerful models that can help you do everything from translating human speech to machine code, to beating human grandmasters at complex games such as Go.

It’s important before we get started implementing ideas in code that you understand the fundamentals of machine learning. This section will help you understand how to test your machine learning models, and what statistics you should use to measure your performance. It is an essential cornerstone to your drive to learn machine learning in Python. 

A Visual Introduction to Machine Learning

This handy visualization will allow you to understand what machine learning is and the basic mechanisms behind it through a visual display of how machines can classify whether a home is in New York or in San Francisco.

Train/Test Split and Cross-Validation in Python

This article explains why you need to split your dataset into training and test sets and why you need to perform cross-validation in order to avoid either underfitting or overfitting your data. Does that seem like a lot of jargon to you? The article will define all of these different concepts, and show you how to implement them in code.

Sci-kit Learn

learn machine learning

Sci-kit learn is the workhorse of machine learning and deep learning in Python, a library that contains standard functions that help you map machine learning algorithms to datasets.

It also has a bunch of functions that will allow you to easily transform your data and split it into training and test sets — a critical part of machine learning. Finally, the library has many tools that can evaluate the performance of your machine learning models and allow you to choose the best for your data.

You’ll want to make sure you know how to effectively use the library if you want to learn machine learning in Python.

A Gentle Introduction to Scikit-Learn

This post introduces a lot of the history and context of the Sci-Kit Learn library and it gives you a list of resources and documentation you can pursue to further your learning and practice with this library.

Scikit-Learn Documentation

The official scikit-learn documentation is filled with resources and quick start guides that will help you get started with Scikit-Learn and which will help you entrench your learning.

Regression

learn machine learning

Regression involves a breakdown of how much movement in a trend can be explained by certain variables. You can think about it as plotting a Y or dependent variable versus a slew of X or explanatory variables and determining how much of the movement in Y is dependent on individuals factors of X, and how much is due to statistical noise.

There are two main types of regression that we’re going to talk about here: linear regression and logistic regression.  Linear regression measures the amount of variability in a dependent factor based on an explanatory factor: you might, for example, find out that poverty levels explain 40% of the variability in the crime rate. Logistic regression mathematically transforms a level of variability into a binary outcome. In that way, you might classify if a name is most likely to be either male or female. Instead of percentages, logistic regression produces categories.

You’ll want to study both types of regression so you can get the results you need.

Simple and Multiple Linear Regression in Python

This informative Medium piece goes into the theory and statistics behind linear regression, and then describes how to implement it in Sci-Kit Learn.

Building a Logistic Regression in Python, Step-by-Step

This Medium tutorial uses the Sci-Kit Learn tools available to implement a logistic regression model. The amount of detail in each step will help you follow along.

Clustering

learn machine learning

Another type of machine learning model is called clustering. This is where datasets are grouped into different categories of data points based on the proximity between one point and other groups of points. Mastering clustering is an important part of learning machine learning in Python. 

An Introduction to Clustering and different methods of clustering

Analytics Vidhya has presented this comprehensive introduction to clustering methods: it’s good to get a handle on this theory before you try implementing it in code.

Customer Segmentation using Python

This article from Yhat demonstrates how to do simple K-means clustering across different wine customers. It’ll take your learning in Pandas and Scikit-Learn and combine them into a useful clustering example.

Deep Learning/Neural Networks

learn machine learning

Neural networks are an attempt to simulate how the human mind works (on a very simplified level) in computational code. They have been a great advance in artificial intelligence — and while in some ways they are a black box of complex algorithms working in tandem to learn how data generalizes, their practical applications have exponentially multiplied in the last few years. Deep learning encompasses neural networks as well as other approaches meant to simulate human intelligence. They are an important part to learn if you want to learn machine learning in Python. 

In a huge breakthrough, Google’s AI beats a top player at the game of Go

This short Wired article isn’t a technical tutorial: it’s the recounting of an epic match between a human grandmaster at Go, a game that was supposed to be so complex for computers to win that technology to do so wasn’t supposed to come until around the 2030s. By leveraging the power of neural networks, Google was able to bring AI victory forward some two decades or so. This article should give you a great glimpse at the potential and power of neural networks.

A Beginner’s Guide to Neural Networks in Python and SciKit Learn 0.18

This example-laden tutorial uses the neural networks module in the Scikit-Learn library to build a simple neural network that can classify different types of wine. Follow along and play with the code so you can get a feel for how to build neural networks.

Develop Your First Neural Network in Python With Keras Step-By-Step

This tutorial from Machine Learning Mastery uses the Python implementation of the Keras library to build slightly more powerful and intricate neural networks. Keras is a code library built to optimize for speed when it came to experimenting with different deep learning models.

Big Data

learn machine learning

Big data involves a lot of volume and velocity of data. It’s an amount of data, measured in petabytes, that can’t be processed easily with tools like Pandas, which are based on the processing power of one laptop or computer.

You’ll want to scale out to controlling many processors and servers and passing data through a network to process data at scale. Tools that allow you to map and reduce data between multiple servers and others such as Spark and Hadoop play an important role here. It’s time to take the learning you’ve had before this and apply it to massive data sets! You can’t learn machine learning in Python without dealing with big data. 

Get Started With Pyspark and Jupyter Notebook in 3 Minutes

This blog post will help you get set up with PySpark, a Python library that brings the full power of Spark to you in the Jupyter Notebook format you’ve been used to working in. PySpark can be used to process large datasets that can go all the way to petabytes of data!

PySpark Video Tutorial

This video tutorial will help you get more context about PySpark and will provide sample code for tasks such as doing word counts over a large collection of documents.

Using Jupyter on Apache Spark: Step-by-Step with a Terabyte of Reddit Data

This tutorial from Insight goes a little further than installation instructions and gets you working with Spark on a terabyte (that’s 1024 gigabytes!) of Reddit comment data.

Machine Learning Evaluation

learn machine learning

Now that you’ve learned a baseline for all of the theory and code you need to learn machine learning in practice, it’s time to learn what metrics and approaches you can use to evaluate your machine learning models.  

Metrics to Evaluate Machine Learning Algorithms in Python

In this tutorial, you’ll learn about the different metrics used to evaluate the performance of different machine learning approaches. You’ll be able to implement them in Scikit-Learn and Jupyter right away!

Model evaluation, model selection, and algorithm selection in machine learning

This long six-part series (check the end of this blog post for more posts after) goes deep into the theory and math behind machine learning evaluation metrics. You’ll come out of the whole thing with a deeper knowledge of how to measure machine learning models and compare them against one another.

Suggested daily routine

Learning isn’t often a static thing. You need ongoing practice to master a skill. Here’s a suggested learning routine you can implement in your day to make sure you practice and expand your knowledge and learn machine learning in Python.

Here’s my suggested daily routine:

  1. Continue working on something in machine learning at all times
  2. Go to StackOverflow, ask and answer questions
  3. Read the latest machine learning papers, try to understand them
  4. Practice your code whenever you can by looking through Github machine learning repositories
  5. Do Kaggle competitions so you can extend your learning and practice new machine learning concepts

At the end, you’ll have effectively mastered how to learn machine learning in Python!

Want more material like this? Check out my guide on how to get a programming job without a degree.

Blockchain Learning, Cryptocurrency/Blockchain

Nine Free Resources to Learn Solidity

If you’re here, it’s likely because you’ve heard about Solidity and blockchain development. You’re probably looking to get involved and build your own DApps and you’re looking to learn solidity. This resource is the perfect place to get you started if that’s the case.

First let’s stop off with a primer about Solidity: the what of it, the why, and then what of what you can build with it. If you want to skip ahead to the resources, feel free to click here.

What is Solidity?

learn solidity

We have to start here with the concept of blockchains, immutable, distributed datastores that are verified by a network of actors rather than one centralized source. Bitcoin is the most famous example of this technology and its first widely adopted application.

The genesis of Solidity comes from the simple realization that while Bitcoin is a highly secure blockchain, it was not very scalable both from a technical sense, but also from a community development sense: people had to either create entire new blockchains or fork into the existing chain to create new innovations or iterations.

Solidity is the programming language associated with Ethereum, a blockchain that aimed to have developers iterate on top of it: the Ethereum blockchain aimed to provide a more complete platform where developers could dictate more of the logic behind what data and payments get recorded on the blockchain, and which do not. Solidity is the programming tool that makes that possible.

Ethereum rests on the principle of decentralization: as with other blockchains, there is no centralized data storage that declares a certain state for the entire system. Data and the state of the system flow through a series of decentralized nodes that can be run on a number of different servers. For example, you could run an Ethereum node on your PC, and it would form part of a collective amount of computing power dedicated to sending and verifying data on the Ethereum blockchain.

In technical terms, the Solidity language is Turing complete (meaning that it is a general-purpose programming language similar in functionality to JavaScript and that you can program for loops, if statements and more and benefit from object inheritance and function modifiers), and it’s a contract-based language oriented around Python, JavaScript and C++ concepts meant to be executed on the Ethereum Virtual Machine. It is a high-level language that abstracts away many fundamental memory and space problems so as to allow programmers to easily build their own DApps.  

Why Solidity?

learn solidity

Solidity is a language that compiles with the Ethereum Virtual Machine within each Ethereum node — you can think of the EVM as a Solidity compiler. It comes with a set of conventions and global variables (such as msg.sender, indicating an Ethereum address that triggers a function) that can make your life easier, and it’s easily plugged into web3.js, an API that allows you to interact with common JavaScript web frameworks such as React.js. It is, in short, the easiest way for you to create Ethereum wallets and embed them in your apps (allowing you the ability to transact monetary value in your functions).

While there are other Ethereum programming languages (such as the more Python-based Serpent), Solidity is the most popular one with the most documentation out there so far. You’ll be able to access complementary libraries such as those offered by OpenZeppelin and utility tools such as Truffle.  If you want to build something with Ethereum, or you want to monetize your functions or build your own token, you’re probably going to end up using Solidity.

What can you build with Solidity?

learn solidity

You might have heard of smart contracts — you’ll be able to build smart contracts that execute different functions with Solidity. This will allow you to write data to the Ethereum blockchain or to receive or send Ether when people trigger different functions. Think of it as integrating a wallet directly in your code.

An interesting limitation you’ll have to deal with is that function calls are correlated to gas price within the Ethereum blockchain, meaning people will pay to execute functions on your platform. This is something you’ll have to keep in mind as you scale out new applications — the monetization of your functions can be a double-edged sword.

Onto real-life examples, you can build your own token according to the ERC-20 standard and start distributing it for initial coin offerings. You can create your own game that might go viral (such as CryptoKitties). You could even create your own decentralized exchange for buying and selling other tokens such as the people behind Etherdelta have done. You can build any number of Ethereum Dapps. The possibilities are there for you to explore once you understand the technology.

Ok, what are the nine resources you promised?

Fair is fair. If you’ve made it this far or clicked through all the way down to the bottom, you’re probably looking for the resources I promised you to help you learn Solidity and different Solidity tutorials. 

Here they are, in the order for which I think it makes sense for you to consult them.

1. ConSensys Resources

You’ll want to start off with a general overview of Ethereum, different blockchain concepts, and a feel for how Solidity can fit into that framework. This compilation of resources can certainly be helpful in that space and help flesh out the context of the space you’re getting into when you start building smart contracts. You’ll be able to see and get inspired by the vast potential of what has been done with smart contracts — and what you can do with them in turn.

2. How to set up an Ethereum Node

Next up, you’ll want to set up an Ethereum node yourself. This is useful for local testing of the apps you build but also, in a greater sense, provides you with the bridge you need to be part of the Ethereum community. The guide in question links out to a section where you can install an Ethereum node under different operating systems — it also contains sections dedicated to other aspects of Ethereum you may find interesting, including the mining mechanism for it.

3. Ethereum for Web Developers

This guide on Ethereum concepts can help you really understand the promise of Solidity and how you can extrapolate your thinking about web concepts into development for blockchain.

It also allows you to separate out the differences between centralized and decentralized datastores and how you should conceptually think about the programming logic for either.

The guide then moves onto a free tutorial (though the rest of the site offers paid lessons) on how to build a ballot voting system in Solidity that is much better than any “Hello World” tutorial could be in terms of getting you started on your path to learning Solidity.

4. BlockGeeks Guide to Solidity

This guide offers you yet another text-based case study for creating something in Solidity — this time though, you’ll be able to get an overview of how to build a web app in conjunction with Ethereum functionality.

Think of this as a case study upon which you can layer the Solidity concepts you’re learning and put them into practice. It will also teach you how to build a development environment for Solidity apps so that you can build your own smart contract and iterate on it in real-time without the fear of breaking anything as you learn Solidity.

5. Smart Contracts Best Practices

This resource from ConsenSys, an accelerator based on powering different teams working on Ethereum-based projects helps you define more of the meta-thinking behind Ethereum smart-contracts.

It will help you design smart contracts and tailor your learning around best practices that will keep your contracts performant and secure as you learn Solidity — highly desirable factors in an emerging tech ecosystem that is often in flux.

6. CryptoZombies

This handy game helps explain Ethereum functions in more depth and will help you learn Solidity — you can learn interactively by building your own version of CryptoKitties (CryptoZombies) — as you go through, you’ll be able to explore, among other things, function creation, function calls and modification, how to implement standard programming language such as assert and if statements, the different data types within Solidity, how to make objects inherit from one another, how to return data and finally, how to determine who can securely access and trigger different functions within your code.

7. YouTube Intro to Solidity

If you’re more of a visual learner rather than textual, you’ll find this video series on YouTube more amenable to your desire to learn Solidity.

Use it to catch up on Solidity concepts or to refresh your knowledge — or try this learning perspective first if you know a video is how you learn best.

8. Remix for Ethereum

With this web browser based compiler for Solidity code, you can experiment with different contracts and different functions cheaply, seeing what compiles properly and what doesn’t right off the bat. It’s a great, experimental way for you to learn Solidity and practice with it.

Consider it a fun sandbox for you to test different functions, similar to what JSFiddle provides for JavaScript, and your first line of validation and defense against improperly built code. It can also serve as the easiest way for you to experiment and build things in a sandbox setting you might not want to deploy.

9. Ethereum StackExchange

Finally, the last resource is the Stack Exchange forum for Ethereum and Solidity questions — a solid resource for you to consult and to pose questions as you’re stumbling around as a beginner, and a community you can give back to once you’ve practiced and built different things with Solidity.

I hope this guide is helpful to getting you started in your Solidity programming journey. If you want to join a newsletter packed with cutting-edge resources for how to learn new technology skills and maximally leverage them for a meaningful and socially impactful life, look no further than mine.

Resources Lists

32 Free Tech Job Boards for Programming Job Seekers

If you’re here, it’s because you’re likely looking for a job in technology. This excerpt from our upcoming guide to how to get a programming job without a degree will help you do just that by giving you categories of tech job resources, tech job boards and tech job sites to consult. I’ve helped isolate some of the best job boards for you among the many tech job boards out there. Hopefully, this resource will help you land a new job! 

General

tech job boards

The following tech job boards often have a selection of general jobs, but they are also useful resources that can be used to find technical jobs — if you’re able to process the information correctly. Tech companies abound on these general resources. 

LinkedIn

Sometimes it’s good to start at the most obvious place: LinkedIn has a large number of technology jobs that you can find quite easily. You can sign up for a free trial of the premium version and quickly look through different jobs.

LinkedIn can also be a great way to research hiring managers and get a sense of what a company is like before you even apply there. You’ll be able to see what the organizational hierarchy looks like by scrolling from one profile to another — and you’ll be able to see what skills the company emphasizes, either by looking at the profiles of those who were hired or by using your trial Premium account and looking at job postings or company pages.

You’ll want to think about how to optimize your LinkedIn profile so you can get the most out of this career-oriented social network. Among tech job boards, it is easily one of the largest. 

Crunchboard

Crunchboard is the job board associated with TechCrunch, a publication that specializes in writing about emerging technologies and new companies. As you can imagine, their job board is filled with a lot of technology and web development positions due to their audience.

Another technique you can use related to this is to look for startups that have just raised a large fundraising round on either TechCrunch or CrunchBase and reach out to hiring managers or executives at those companies: immediately after raising a fundraising round, a company is in aggressive growth mode, and is most likely looking to hire many qualified people to fill different and interesting job roles.

Hacker News

Besides being a great repository of technical articles and a community that curates people who are interested in the cutting edge of technology, Hacker News also serves as a job portal of sorts for Y Combinator companies — technology companies that might be as young as a two-person startup and also those who have started full maturing (as an example, Dropbox, Airbnb, and Quora were all at one time or another incubated by Y Combinator). The jobs section of the site features different YC companies and their hiring needs. There are also monthly threads started by a bot called Ask HN: Who is hiring? –where discussion about urgent job opportunities is surfaced that may be hard to find elsewhere. Here’s an example of a“who’s hiring” thread in May 2017.

By commenting on different articles and reaching out to different members in the Hacker News community, many of whom are senior figures in the startup world, you might also find your way to different mentors — and somebody who can introduce you to the right hiring manager.

AngelList

AngelList is an online repository for different startups. The jobs on offer here tend to be with earlier stage companies working at the edge of technology. One great perk about this is that entrepreneurs may be more willing to accept people from non-traditional backgrounds to work with them — especially if you’re willing to accept and maybe even embrace the risk that comes with working in a startup.

GitHub

GitHub, the living repository of code collaboration, also offers a selection of curated jobs for developers around the world. You can even search by programming language here, ensuring the best match for your skills.

Stack Overflow Jobs

Stack Overflow, the popular Q&A site for programming questions, offers a selection of different programming jobs, many of them posted by hiring managers who are trying to find top talent within the Stack Overflow community.

Glassdoor

Glassdoor is an interesting job board since you’ll be able to see what employees think about the company and you can get some transparency on the salary range the company offers as well. All in all, Glassdoor is a great general place to find technology jobs — but its greatest value probably rests in the additional data on employee satisfaction and approximate salary ranges that can help guide your career decisions.

Mashable

Mashable, the popular content repository based out of New York City, has a job board as well with a lot of different technology job postings.

The Muse

The Muse is a unique jobs resource, with tons of personalized career coaching and resources related to career development. It can be well worth browsing the content on the site itself if you want to learn about salary negotiation, interviews and career progression from a somewhat general perspective. The jobs board section also boasts a selection of technical and developer jobs.  

Startupers

Another community oriented towards posting startup jobs, many of them programming-related.

Dice

One of the leading repositories of tech jobs in the world, Dice offers nearly 80,000 jobs in technology for you to consider.

Cybercoders

Run by a placement agency for engineers, Cybercoders offers an easy way to search across 10,000+ different technology jobs across different industries.

Front-End/Design

tech job boards

The following tech job boards focus on jobs that are oriented towards front-end work and user design. Check these out if you’re looking to work on how the user experience of digital products feels for different people.

Smashing Magazine

Smashing Magazine is one of the premier web development and design resources on the web. They offer a selection of jobs tailored to front-end web development. It’s a perfect selection among a number of tech job boards if you’re looking for more design and development-driven work. 

Codepen Jobs

Codepen is a great interactive sandbox for front-end code, where you can use HTML/CSS/JavaScript to generate awesome interactive graphics — or where you can copy those snippets of code for use on your own website. The site also offers a job board that tilts towards front-end web development and design jobs, as you might expect.

Web Development

tech job boards

The following job boards will help you hone your skills in web development if that’s the technical career path you want to choose.

Sensational Jobs

Sensational Jobs curates a selection of different positions for web professionals of all sorts and stripes.

WordPress Jobs

The official WordPress jobs board will help you curate a selection of jobs in web development specifically focused on building things with the WordPress platform — a popular, open-source content-management system that serves as the back-end framework for nearly one in six of all websites on the Internet.

WPHired

WPHired is another great selection among this list of tech job boards — that is if you’re looking for development jobs oriented around WordPress.

Data Science

tech job boards

Data science entails a mix of statistics, programming and communication skills that are quite specialized. Oftentimes, data science job postings will be found in these specialized communities that have grown to help support the data science community. These tech job boards are often the result of careful curation and community-building. 

Kaggle Data Science Jobs

Kaggle is an online community centered around machine learning competitions. Here, they’ve used their reach in the data science community to curate a selection of data science jobs for you.

Data Elixir Job Board

Data Elixir offers a newsletter filled with data science resources, and also curates this job board to help data science jobs seekers.

KDNuggets Jobs

KDNuggets is one of the leading data science content hubs, filled with useful tutorials and resources to help you understand different topics in data science. This static jobs page is updated quite frequently with different job postings in data science.

Mobile Development

tech job boards

The following tech job boards curate different opportunities for those looking to build mobile apps on a variety of platforms. The most common tend to be iOS or Android-oriented.

Android Jobs

Android Jobs curates a selection of jobs for developers interested in building Android applications. Come here if you want to make your mark in mobile development.

Core Intuition

Core Intuition features a selection of curated Mac Cocoa and iOS development jobs — if you want to develop apps for Apple products, there are few job boards as well-placed as Core Intuition to help you advance along that career path.

Language-Specific

tech job boards

The following tech job boards are specific to a type of programming language. It can be a handy place to look if you plan to specialize in one language and grow your career there.

AngularJobs

AngularJobs is a job board curated around the Google-backed front-end JavaScript framework. Come here if you want to work with Angular.js and develop your JavaScript skills.

We Work Meteor

We Work Meteor is a job board focused on meteor.js, a full-stack JavaScript framework that can handle every part of web development. If you’re interested in pursuing a career using Meteor as your tool of choice, or if you’re interested in developing your JavaScript skills — coming to this job board wouldn’t be a bad choice.

Ruby Now

Ruby Now is a job board focused on curating Ruby on Rails specialists. Given the extensive use of Ruby on Rails for web development, you’ll mostly be working with web development positions if you look through this job board — though there are some more senior positions in back-end development.

Python Jobs (official Python website)

Python.org (the official centerpiece of the Python programming community) hosts a small repository of curated and interesting jobs that involve the use of Python. It’s one of the best among these tech job boards for those looking to work with Python. 

Python Jobs

Python Jobs (unaffiliated with the official Python programming community) is a great free resource for looking up Python jobs and web development jobs associated with the Django web development framework.

R-Users

R-Users is the place to go if you’re proficient in R or if you’re a statistician looking to get some work developing their programming skills in R.

Remote

One of the luxuries of working in a technology-oriented career is the ability to be able to work remotely from anywhere in the world. The following job boards curate remote opportunities in technology.

We Work Remotely

We Work Remotely curates a selection of jobs that are online and remote, with a section dedicated to just programming jobs.

Remote OK

RemoteOk is another job board that curates different jobs where remote work is available. They have a large selection of technology jobs and they have a neat categorization of the highest paying remote jobs and the technologies involved with it.

AngelList Remote Jobs

AngelList curates a selection of startup jobs where it’s acceptable to work remote. Again, as with the rest of AngelList, most of the jobs revolve around earlier stage startups — so be aware of that as you browse through this selection.

Upwork Jobs

Upwork is a curated marketplace where freelancers can meet potential employers. The entire process of payment, job search, and work management can be completely managed on Upwork. As a result, it can be a great place to find remote work in different technical fields.


Want more content like this? Be one of the first to get our Guide to Getting a Programming Job without a Degree!

Data Science/Artificial Intelligence, Learning Lists

The most popular deep learning libraries

You might have heard of artificial intelligence, deep learning and neural networks, and wanted to get a path into this exciting new technology. This article will help you get a comprehensive overview of the tools and frameworks you can use to accelerate your impact and learning in artificial intelligence. It will help you understand what deep learning library you should use to accelerate your learning. 

This article assumes your familiarity with basic deep learning concepts — if you need to catch up on those, the following Wikipedia article will help you get to scratch.

This is a walkthrough of the most popular deep learning libraries, examples of clever projects and implementations that have used the different libraries to create something awesome.

I’ve ranked the deep learning libraries in question by the number of Github stars their repositories have collected as of June 2017 — a great way to see how much traction these different frameworks have with programmers.

Deep learning library


1- TensorFlow

Overview: TensorFlow is the open-source machine learning library developed by Google (and still used in both research and production level applications at the company). The library allows you to bring machine intelligence capabilities to all sorts of devices, from those equipped with GPUs to mobile devices such as the Raspberry Pi. With over open-source 6,000 repositories using TensorFlow, it has quickly become one of the most popular frameworks out there for those looking to build something with deep learning. TensorFlow is very accessible, with APIs for Python, C++, Haskell, Java, Go and Rust and a 3rd party package built in R.

Introductory Tutorial: Get introduced to TensorFlow with this official tutorial by Google by using TensorFlow with the famous MNIST dataset.

Use Cases: TensorFlow has become of the most popularized deep learning frameworks, and as such, it has seen a wide array of uses from powering cutting-edge machine learning work at different Silicon Valley companies to classifying cucumbers for farmers.

Resources:

  1. This Github repository TensorFlow-World contains a bunch of introductory tutorials with code compiled together to give you a better sense of how to do deep learning on TensorFlow.
  2. TensorFlow-101 contains tutorials on how to get started with TensorFlow in Jupyter Notebook with Python.
  3. This Github repository contains example code that will help you work through different analyses in TensorFlow.  

How to get started: Get all of the documentation and installation instructions here, then start practicing and training deep learning models on different datasets! TensorFlow is one of the most popular and powerful deep learning frameworks out there: take advantage! 


2- Scikit-learn

Overview: Scikit-learn is the versatile machine learning knife in Python, used for simple experimentation and iteration with different templated machine learning models. With modules like the MLPClassifier, you can easily bring deep learning approaches to your datasets and use the rest of the trusty scikit-learn ensemble (such as train_test_split) to validate and evaluate your model. You can also combine the scikit-learn interface with different deep learning libraries if you want to do more powerful analyses.

Introductory Tutorial: This tutorial runs through how to use scikit-learn as a deep learning library and a multilevel perceptron model to classify different types of wine. You can see how with a few lines of code, you can create a very accurate model with many variables.  

Use Cases: scikit-learn is often the first go-to deep learning tool for people working in the Python data science ecosystem: it comes pre-installed with Jupyter Notebook and it comes with powerful functions that are already-optimized versions of essential deep learning functions. You’ll be able to quickly build together machine learning models, evaluate them, and split different use cases into either test or training sets.

Resources:

  1. This cheatsheet by Datacamp will help you with many of the essential functions in scikit-learn.
  2. Springboard has a tutorial that will teach you how to build a simple neural network with scikit-learn and its MLPClassifier module.
  3. This gentle introduction into scikit-learn can help you ease into this machine learning package.

How to get started: Use jupyter notebook, which comes with scikit-learn installed by default. Start training machine learning models, then move into Scikit Flow, an interface that combines the code you’d use in scikit-learn augmented with functionality from Google’s TensorFlow library (we’ll dive into TensorFlow a little bit later in this article).


3- Caffe

Overview: Caffe is a deep learning framework built by Berkeley’s AI Research department (BAIR) and sustained through the use of community contributors. It features speedy application of deep learning approaches, with the ability to classify up to 60 million images a day on a single GPU. It powers a variety of deep learning projects in machine vision, speech and more — with projects ranging from fully fleshed out applications to academic papers.

Introductory Tutorial: This tutorial built by Berkeley AI researchers will help you get up to speed with this powerful deep learning framework.

Use Cases: Caffe is used for a variety of academic research — this web page has a ton of examples ranging from image classification to training the classic LeNet model.

Resources:

  1. This blog post will help you get up to scratch with using Caffe and Python with a relevant classification example from Kaggle on how to distinguish between dogs and cats.
  2. This tutorial will help you with loading Caffe onto iPython Notebook and also with C++ implementations of the library.
  3. This tutorial will get you to understand how you can build a layer of a neural network within Caffe.

How to get started: Use the command line and get started with different use cases of Caffe. You can use this tutorial to get installation instructions across a whole array of different platforms.


4- Keras

Overview: Keras is an open source deep learning library for neural networks written in Python. Authored by François Chollet, the library was meant to be a quick and easy way to experiment with different deep learning models — as a high-level API written entirely in Python, the library is easy to debug and navigate. It supports both convolutional and recurrent neural networks and it is designed to be as intuitive as possible for users to grasp.

Introductory Tutorial: This introductory tutorial will run over how to get started with Keras — all the way from installing the package to training a model with 99% predictive accuracy on the seminal MNIST dataset.  

Use Cases: Most of the time, Keras is used to build simple deep learning models as conceptual sketches: you would validate crude ideas using Keras, and get a first glance at whether or not you had a good idea for a deep learning architecture that can tackle a problem.

Resources:

  1. This course by DataCamp helps you dive deeper into Keras, even if you’re just a deep learning beginner!
  2. Here is a link to the official Keras documentation which allows you to get access to the inner working of the framework from the creators of it.
  3. If you’re more comfortable with the R programming language, you can use the R interface to Keras.

5- Torch

Overview: Torch is an open-source deep learning framework based on optimizing performance on GPUs based on the programming language Lua with an underlying C/CUDA implementation. It allows for the parallelization of neural networks across different CPUs and GPUs. Torch is used by a lot of organizations at the cutting edge of machine learning, from Google to Facebook. It has been extended for use in mobile settings, with the ability to perform on iOS and Android.

Introductory Tutorial: This 60 minute blitz into Torch will get you started and ready to use this powerful deep learning tool.

Use Cases: The deep learning library Torch is used for a lot of machine learning and deep learning research programs at leading companies such as Google, Twitter, and Facebook. Given its origins as Facebook Research’s default deep learning framework, you can be sure that it comes with all of the support it needs from one of the largest tech companies in the world.

Resources:

You’ll want to get started understanding Lua before you work with Torch. This handy guide will help you get up to scratch.

This Github repository contains a whole host of Torch tutorials.

The following blog article by Facebook Research contains a lot of related work done in Torch.

How to get started: Get the deep learning library Torch installed, then start running it on different deep learning problems. You might want to refer to the Facebook research blog for inspiration.


6- Theano

Overview: Theano is a deep learning framework built deeply into the Python data science ecosystem, with deep integration with the NumPy numerical computing library. You can use C to generate code as well to make it even speedier — and it is the default teaching tool used in deep learning founding father Yoshua Bengio’s lab.  

Introductory Tutorial: This Theano tutorial on Jupyter Notebook will help you understand the nuances of the deep learning library Theano, and will get you up to scratch with different conventions within the library.

Use Cases: This forum will allow you to understand different problems and use cases Theano users create with the deep learning library.

Resources:

  1. This documentation will help you understand more about the deep learning library Theano. 
  2. This blog post will help you understand the performance differences between Theano and Torch.
  3. Here is an introductory tutorial to Theano.

How to get started: Use the instructions here to get started with installing Theano.


7- Neon

Overview: Neon is an open-source deep learning platform built on Python that is committed to the most powerful implementation of deep learning possible, with a consideration towards simplicity.

Introductory Tutorial: Work with this Github repository which contains a Model Zoo that contains different scenarios of working with Neon.

Use Cases: This Youtube playlist walks through different use cases with Neon, including playing Pong, and speech recognition.

Resources:

  1. Learn how to do basic classification with the MNIST database and the deep learning library neon with this introductory-level tutorial.
  2. Use this video course to get you started in Neon.
  3. This compilation of Neon resources will help you learn the framework.

How to get started: Get the deep learning library Neon installed with the following instructions.

Data Science/Artificial Intelligence, Learning Lists

101+ Resources to Learn Data Science

Many people are seeking to learn data science these days. It’s become a trendy topic associated with high salaries and some of the most interesting problems in the world. This demand has created many different resources in the data science space. People have curated their selection of favorite resources to learn data science, but I was seeking out something more comprehensive — so I built this list. Here’s my attempt at getting you my favorite resources in the data science space so you can understand what’s going on in the field — and how you can get your hands dirty and start learning right away.

Full disclosure: I work for Springboard (one of the data science education providers listed below). 

What is data science?

learn data science

First, let’s start with an overview of what seems to have become a popularized buzzword and defining exactly what you want to learn: data science. Data science is the combination of three kinds of skillsets: statistics, programming and business knowledge. It’s the interplay between these crafts where you’ll find a data scientist — somebody who will programmatically examine large data sets for precious business insights — somebody who can combine computer science knowledge with business insight.

You can use data science concepts and training to do data mining and get statistical inferences from large datasets. Using advanced techniques such as natural language processing and unsupervised learning, you can tame the power of computation and get precious data insights others simply cannot access. That will be attractive to all sorts of potential employers in the data science field, from Silicon Valley to Wall Street.

In order to get there though, you have to start with the basic techniques and basic concepts that underlie data science. Learning data science requires having an understanding of the process that goes behind it, and the various components that are required to bring everything together. Let’s get started on getting you know that knowledge. 

Overview

learn data science

You’ll want to get an overview of the field and the processes and concepts that make up data science so you can learn data science.

1- Data Scientist: The Sexiest Job of the 21st Century

In this seminal article, ex-Chief Data Scientist of the United States, DJ Patil, goes into exactly what makes a career in data science so compelling. It’s great fuel to the fire if you’re looking to learn data science. 

2What is data science?

This overview of data science by Berkeley delves into how data science came to be — and the average salary you can expect in the field.

3Data Science Salary Survey (2016) – O’Reilly

O’Reilly, a leading publication and media company on the cutting edge of technology, dives deeper into what tools and factors go into higher data science salaries. They’ve surveyed hundreds of data scientists in the field. Learn what pays and what doesn’t with data science careers through their research!

4Data Science (Wikipedia)

Wikipedia’s overview of data science goes over the history of the field and points to many different resources in the field. It can be a handy jumping-off point for further research.

5Building Data Science Teams

This piece by DJ Patil goes into the different roles inherent in a data scientist’s job — and exactly how best to build out a data science team.

6- Data Science Process

This piece by Springboard goes into what the day-to-day of data science looks like — tracing it all the way to a first principles view of exactly what steps effective data science requires.  

Interactive Tutorials

learn data science

Now that you’re done with an overview of the topic, it’s time to get your hands a bit dirty with interactive tutorials that will help you learn different parts of data science — whether that’s the statistical theories behind machine learning algorithms, or the programming skills you’ll need to implement those theories.

Statistics/Math

Understanding probability and the basics of statistics is essential to being able to understand machine learning methods and how to handle massive amounts of data. Linear algebra and the ability to manipulate different expressions of data (in matrix form or otherwise) will also be incredibly helpful in detailing what data scientists do. You’ll want to refresh your statistics knowledge and get a handle on the math you need to know to join their ranks.

7KhanAcademy (Statistics/Probability)

This free course from KhanAcademy serves as a great catch-up on the basics of probability and statistics.

8Introduction to Statistics in R (Datacamp)

Learn a bit of R (a programming language commonly used in data science) and statistics at the same time with this interactive walkthrough from Datacamp.

9Statistics 101 

This Youtube playlist from the Harvard Extension School covers everything from random variables to different statistical distributions. 

SQL

Knowing SQL and how to query from relational databases is a skill that is one of the building blocks of data science. You’ll often use SQL to source your data for further analysis — or even to transform your data on the spot.

10Mode Analytics SQL School

Mode Analytics teaches SQL through the use of case studies with real data. It’s an interactive experience that’ll teach you the basics of SQL by having you run through a dataset with some simple yet powerful commands.

11Learn SQL (Codecademy)

Codecademy, well known for its basic curated tutorials in different programming languages, has this simple interactive module that will help you learn SQL.

12SQLCourse

This is an older tutorial, but one that still holds up as an example of an organized approach to learning SQL.

Python

Python is one of the workhorse languages of data science — one of the most popular along with R. The large open-source community that powers Python enables it to be a powerful, versatile programming language that can help facilitate everything from data wrangling to training powerful machine learning models. It’s a powerful tool you’ll want to learn as you learn data science. 

13Pandas Cookbook

This interactive set of code examples walks you through how to get started with Pandas, the data processing library most commonly used in Python. It’s built by Julia Evans

14Intro to Python for Data Science (DataCamp)

This interactive course will walk you through the basics of the data science libraries for Python.

15Gentle introduction to scikit-learn

This gentle introductory tutorial will help you understand one of the most powerful machine learning and data science libraries out there: Python’s scikit-learn. You’ll be able to train simple, off-the-shelf data models in a matter of minutes.

16A dramatic tour through Python’s data visualization landscape

This somewhat witty and whimsical walkthrough will help you explore the difference between the major data visualization tools in the Python ecosystem — including some options that were ported from R!

17- Web scraping with BeautifulSoup

This short guide will teach you how to take information from different websites and render it into a format that is easy for machines to process — a handy skill for anybody looking to work with many different datasets. I often use the set of techniques described to scrape tables from Wikipedia so I can process that data in Python.

R

R is another popular programming language used for data science — in fact, it’s often pitted against Python as a comparable tool. The truth is that you can use both — and in fact, being conversant in both can only help you progress faster and further as a data scientist.

18– Introduction to R (Datacamp)

Here is the equivalent of the Datacamp introduction to Python — except this time for R, another common data science programming language.

19A complete tutorial to learn R from scratch

This tutorial, rendered as a blog post, offers a comprehensive A to Z guide to getting started in R. It covers everything from importing data into R to creating predictive models with it.

20Try R

Sponsored by O’Reilly Media, this interactive course will reward you with a badge for each fundamental building block of R you learn.

Hadoop

Hadoop is a big data framework meant to facilitate the treatment and storage of large data sets that have be processed in parallel by many different servers in order to yield actionable insights. 

21Hadoop Tutorial (Tutorialspoint)

This set of tutorials on Hadoop will help you understand how big data frameworks work — and how you can apply Hadoop to your data.

22Hortonworks Sandbox tutorial on Hadoop

This interactive Hadoop sandbox by Hortonworks lets you play with Hadoop code.

Spark

Spark helps solve some speed, flexibility and efficiency issues with Hadoop through the use of a new data structure: the RDD or resilient distributed dataset.

23Apache Spark Tutorial (TutorialsPoint)

TutorialsPoint offers a similar tutorial to Spark as it does for Hadoop.

24Hands-on introduction to Spark

Hortonworks has a sandbox that will let you play around with Spark code.

Courses/Workshops

learn data science

The following courses and online workshops will help you learn data science in an organized fashion. Use these resources to accelerate your learning of data science if you need to. A lot of these courses will help you find data science work, and you’ll likely be able to do data science projects after finishing them. 

25Fast.ai

This massive online course, built by a Kaggle champion in machine learning, will help you learn about neural networks and how to train machine learning models.

26Foundations of Data Science (Springboard)

This course offered by Springboard features a curated selection of resources in R, SQL and the basics of machine learning, as well as personalized mentoring from data science experts who work in the field.

27Data Science Intensive (Springboard)

Yet another course offered by Springboard, though this one is more advanced. Focused on Python and teaching the intricacies of machine learning methods, this course will help you use different machine learning techniques with ease.

28Data Science Career Track (Springboard)

Springboard’s Data Science Career Track is the first online bootcamp to offer a data science job or your tuition back. With personalized career coaching, mentorship from data scientists and exclusive employer partnerships, Springboard is putting it all on the line to help you get a job in data science.

29Data Science (Coursera)

Coursera partnered with Johns Hopkins University to deliver this nine-course series on data science, covering everything from tools to advanced machine learning methods.

30Machine Learning (Coursera)

This curated set of machine learning courses taught by Andrew Ng (the famous Stanford professor who founded Coursera in the first place) is one of the best resources to consult as you start understanding data science.

31Thinkful Data Science Bootcamp

Thinkful, an online education provider, provides a data science bootcamp that will curate your learning of data science and Python.

32Intro to Machine Learning (Udacity)

Udacity offers a free mini-course curated by Facebook and Tableau to help guide you through to doing analysis of the Enron email database.

33Data Science Certificate (Harvard Extension School)

This data science certificate offered by the Harvard Extension School can help you learn data science while getting credits and credibility from one of the leading universities in the world.

34Statistics with R (Coursera)

This selection of courses created in partnership with Duke University will help you understand basic probability and the use of Bayes’ Rule through the use of R.

35Data Science (EdX)

This set of curated learning paths in data science can help you get accreditation in the field — if you’re willing to pay for it.

36Insight Data Science Fellowship

The Insight Data Science Fellowship is a special type of data science education program — it takes talented PhD. students who have already demonstrated technical skills and aptitude, and helps them bridge the gap between academia and industry with a postgraduate fellowship that combines the best of academic rigor with industry knowledge.

37Data Science (General Assembly)

General Assembly, one of the largest online education providers in the world, offers courses in data science.

38Galvanize

If you’re looking for an in-person experience to learn data science instead of something online, Galvanize can help. This link leads to the San Francisco experience — however, Galvanize itself is present in many different other cities.

39Coursereport Data Science Reviews

Here are some reviews of different data science courses in Coursereport — this will allow you to pick and choose between many different options with fair reviews from previous students on display.

40Switchup Data Science Reviews

Here are some more reviews of different data science courses, this time from Switchup, another course review site.

Books

learn data science

Oftentimes it’s not a great course that helps you learn the most — it can be one single resource within that course — say a particularly well-written book. This selection of data science books can help you understand data science in detail.

41Bayesian Methods for Hackers

This book, delivered as an extended Github repository, can help you understand Bayesian inference and how to think about probabilities by working through them in code. 

42Think Stats

This O’Reilly book helps you conceptualize statistical concepts by having you work with them in Python.

43Think Bayes

This book combines Python programming with Bayesian inference, and can be a handy resource in case the books above aren’t enough.

44Deep Learning

This free technical book by some of the scions of deep learning and artificial intelligence (Ian Goodfellow, Yoshua Bengio and Aaron Courville) will help you understand exactly how to think about deep learning and neural networks.

45Learn Python the Hard Way

In case you need a refresher on Python, Learn Python the Hard Way will help you break down exactly what you need to do to master Python. While it focuses on an older version of Python, the first principles taught here can be useful to those looking to freshen up their knowledge of Python — though you shouldn’t become overly dependent on this book as it has quite a rigid philosophy on one particular version of Python. 

46The Data Science Handbook

This Data Science Handbook curates insights from 25 data science leaders and distills what it truly means to work in this exciting new field.

47Data Science from Scratch

This book from O’Reilly goes into the first principles of data science, looking beyond the programming tools and frameworks.

48Storytelling with Data

This book will help you visualize insights that you find within your data and teach you how to communicate them effectively so that you can drive impact with your data findings.

49Exploratory Data Analysis with R

Roger D. Peng, an expert in statistics, has written this book to teach how to look through datasets with the R programming language.

50Interactive Data Visualization for the Web

This online book will teach you how to use frameworks such as D3.js to make your visualizations fully interactive on the web.

51Machine Learning Yearning

This book by Andrew Ng, the famous artificial intelligence leader who founded Coursera, is going to be released soon — sign up to get drafts of new chapters as they come in!

Curated Collections

learn data science

I know you’re looking for curated resources to learn data science. There’s more than just this list right here — and each collection will help you expand your knowledge and collection of great data science resources even further.

52Awesome Machine Learning

This Github repository follows the “Awesome” method of curating the best resources in a particular space — in this case, all the different resources you’d need to learn machine learning.  

53Awesome Deep Learning Papers

In case you ever wanted to get a handle on the science behind the amazing technology being built out of artificial intelligence, this awesome curation of deep learning papers will help you continually be on top of exciting new developments.

54Awesome TensorFlow

TensorFlow is an awesome deep learning framework: this Github repository will have everything you need to learn more.

55Awesome Data Science

This repository is everything it promises: an awesome curation of different data science resources.

56Data Analysis Learning Path (Springboard)

This learning path curates different resources in an intuitive fashion so that you can learn the data analysis skills required for data science.

57The Open Source Data Science Masters

This is a curated curriculum of free, open-source resources to learn data science — consider it a masters’ degree for a fraction of the price.

General Resources

learn data science

58A visual intro to machine learning

This interactive, visual view of data science in action can help you conceptualize data science, especially if you prefer to learn visually. 

59Deep Learning Review (Nature)

This paper summarizes some of the latest findings in deep learning and artificial intelligence and it is written by one of the founding fathers of modern artificial intelligence research: Geoffrey Hinton.

60Build a deep learning machine

This fun little tutorial by O’Reilly will teach you how to build a computer that you can use specifically for data science purposes.

61- How can I become a data scientist (Quora)

This Quora thread contains different thoughtful replies on how to become a data scientist — and includes a bevy of free resources to boot!

62Becoming a Data Scientist

This blog charts the author, Renee, and her path from being a SQL analyst to becoming a full-fledged data scientist.

Career Advice

learn data science

Becoming a data scientist is now a career path that many envy — however, getting started and placing yourself in a position where you are paid to practice data science doesn’t start and end with technical skills. Here’s a set of resources that will help spell out exactly what you need to do to have a successful data science career.

632015 Data Science Salary Survey (O’Reilly)

This salary survey by O’Reilly was curated from about 600 respondents who divulged their salary and what they did at work. It’s an informative read on what the average salaries are like in data science and what factors or technical skills can either increase your data science salary — or set it on the path to stagnating.

We already highlighted the 2016 survey as part of our general overview of data science, but the 2015 survey will add even more context on how the data science industry works — and how much you should expect to be paid.

64Guide to Data Science Jobs (Springboard)

This guide to Data Science Jobs by Springboard curates a variety of job seeker and hiring manager stories and seeks to inform you on every element of what it takes to get a data science job: from how to get hiring managers to notice your profile, to advice on what technologies and skills you should practice before doing a data science interview. 

65Guide to Data Science Interviews (Springboard)

This companion guide to the Guide to Data Science Jobs by Springboard runs you through different interview questions and exactly what hiring managers are thinking when they are on the other side of the table. It’s a comprehensive overview of the data science interview process — and it provides you actionable tips on how to ace the data science interview.

66Getting your first job in data science

This blog post goes over different general tips on how to get that first job in data science.

67Data Science Career Paths

This blog post by Springboard breaks down the difference between data analysts, data scientists and data engineers.

Datasets

learn data science

In order to really get started and to learn data science, you have to have datasets to play with. The following resources will link you to different datasets you can experiment with as you’re learning data science techniques and putting them into practice.

6819 Free Public Datasets (Springboard)

This curated list of 19 free public datasets will help you get started on your path to learn data science!

69Kaggle Datasets

This list of datasets curated by Kaggle comes with upvote functionality as well as comments, so you can exactly which datasets are the most exciting — and what work has already been done with them.

70Reddit Datasets

This subreddit can be a handy way to pick out new datasets, and see some of the most popular ones.

71Data.world

This new social network has evolved around sharing great datasets and bringing data fans together!

72Google BigQuery Datasets

Google BigQuery has open-sourced some interesting big data sets–from Reddit comments to Github activity.

73Quandl

Quandl is a search engine mostly used for financial and economic data. Comb through if you’re looking in that space for data to play with. 

74Public Big Datasets

This curated list of big datasets can help you practice with Hadoop or Spark.

75Wikipedia dumps

Wikipedia dumps data from its database and makes it free to analyze every so often. Sift through here if you want to query the world’s largest collection of knowledge on your quest to learn data science.

76Open Street Map

This collection of open-source geographic data extends around the world in its reach!

Resources/Blogs to Follow

learn data science

You’ll want to keep an eye on different resources and blogs that update frequently as you learn data science. This ensures that you’re always on top of the latest developments — and it can be a stimulating way to keep your data science skills sharp.

77Top data scientists to follow on Twitter

This is a list of data science influencers you’ll want to consider following to get to know more about the industry.

7850 of the best data science blogs

This curated list of data science blogs will help you find the best blogs to follow as you learn data science.

79Ultimate guide to data science blogs

This larger, extended guide to data science blogs has a lot more entries — feel free to take a look if you feel like you want something comprehensive to digest.

80KDNuggets

KDNuggets is one of the largest data science communities on the web, and their blog regularly posts interesting data science content.

81R-bloggers

R Bloggers is a data science blog focused on tutorials to learn R and different resources in the R ecosystem. 

82Dataconomy

Dataconomy focuses on larger trends in data science rather than many technical tutorials. It’s the data science blog with the largest focus on the European data science scene as well.

83Analytics Vidhya

Analytics Vidhya contains plenty of technical tutorials on many data science topics.

84Big Data Made Simple

Big Data Made Simple is a relatable blog that conveys different topics in data science in an approachable manner.

85Yhat blog

The Yhat blog is always filled with interesting tutorials and data science case studies.

86Machine Learning Mastery

Machine Learning Mastery focuses on the intricacies of machine learning.

87Learndatasci

Learndatasci is a blog that offers a broad overview of different data science topics.

88Mastersindatascience

Mastersindatascience is the resource to consult if you wanted to look at paid offerings to learn data science.

Newsletters

learn data science

If you want regular updates in your inbox on the latest news in data science, there’s no better way to do that than to subscribe to the following data science newsletters.

89Data Science Weekly

This weekly newsletter summarizes the latest tutorials and resources in data science. It’s a very useful resource if you’re looking to learn data science. 

90Data Elixir

Another data science newsletter that will keep you informed on the latest happenings in data science. 

91Python Weekly

This weekly Python newsletter curates a selection of the finest Python resources, many of them related to data science.

92Datafloq

This handy newsletter promises to be a one-stop shop for you when it comes to big data trends.

93- The Analytics Dispatch

Mode Analytics provides a dispatch to keep you informed on all things analytics and BI-related.

94- Postgres Weekly

This Postgres Weekly newsletter keeps you informed on the latest Postgres updates.

95- O’Reilly Data Newsletter

A premium data science newsletter, O’Reilly will often curate the best data science resources that have popped up.

Communities

learn data science

While newsletters and blogs are great, interactive communities where participants share articles and comment on them together can truly help you entrench your data science knowledge. Here are just a few of those communities where you can learn data science and interact with different data science practitioners.

96Datatau

Datatau is a sort of Hacker News for data science resources where data science practitioners discuss the latest news and upvote the best articles.

97Reddit Datascience

This subreddit deals with general data science topics.

98Reddit Machine Learning

This subreddit deals with more in-depth machine learning materials and discussions.

99Reddit Deep Learners

This subreddit deals with how to learn artificial intelligence and deep learning.

100Reddit Data is Beautiful

This subreddit contains impactful data visualizations that are visually appealing — and a true set of examples if you want to display your data in a beautiful manner.

101Data Science Stack Exchange

This subcomponent of the Stack Exchange network deals with technical questions and solutions in data science.

102Quora Data Science

This section of Quora is composed of many of the questions posed about data science — it is an awesome resource for those looking to learn data science. 

Hopefully the resources above have been helpful for you to learn data science: let me know in the comments below what you think about them or whether you think there are some I missed!

Data Science/Artificial Intelligence, Learning Guides

How to do common Excel and SQL tasks in Python

How to do common Excel and SQL tasks in Python

The code and data for this tutorial can be found in this Github repository. For more information on how to use Github, check out this guide

Data practitioners have many tools that they use to slice and dice data. Some people use Excel, some people use SQL — and some people use Python. The advantages of using Python are obvious when it comes to certain tasks. You can process much bigger datasets at much faster speeds. You can use open source machine learning libraries built on top of Python. You can easily import and export data in different formats. 

Python can become an essential part of any data analyst’s toolbox due to its versatility. However, it can be hard to get started. Most data analysts are probably familiar with either SQL or Excel. This tutorial is structured to help you transfer over skills and techniques from those two programs to Python.

First, let’s get you set up on Python. The easiest way to get started is to use Jupyter Notebook and Anaconda. This visual interface will allow you to plug Python code in and immediately see the output of your results. It’ll make it easy for you to follow along with the rest of this tutorial as well.

I highly recommend using Anaconda, but this beginners guide will also help you with installing Python directly — though that’ll make following this tutorial harder. 

Let’s start with the basics: opening up a dataset.

IMPORTING DATA

You can import .sql databases and process them in SQL queries. On Excel, you could double-click a file and then start working with it in spreadsheet mode. In Python, there’s slightly more complexity that comes at the benefit of being able to work with many different types of file formats and data sources.

Using Pandas, a data processing library, you can import a variety of file formats using the read function. A full list of the file formats you can import using this function is in the Pandas documentation. You can import everything from CSV and Excel files to the whole content of HTML files!

One of the biggest advantages of using Python is the ability to be able to source data from the vast confines of the web instead of only being able to access files you’ve downloaded manually. The Python requests library can help you sort through different websites and take data from them while the BeautifulSoup library can help you process and filter the data so you get exactly what you need. Be careful of usage rights issues if you’re going to go down this route.

(Don’t worry if you want to skip this part, you can! The raw csv file is here, and you can download it at will if you’d rather start this exercise without taking data from the web. Or you can git clone the entire repository.)

In this example, we’re going to take a Wikipedia table of countries by their nominal GDP per capita (a technical term that means an amount of income a country earns divided over the number of its population), and use the Pandas library in Python to sort through the data.

First, let’s import the different libraries we need. For more information on how imports work in Python, click here.

import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
import re

We’ll need the Pandas library to process our data. We’ll need the numpy library to perform manipulations and transformations of numeric data. We’ll need the requests library to get HTML data from a website. We’ll need BeautifulSoup to process that data. Finally, we’ll need the regular expression library of Python (re) to change certain strings that will come up as we process the data. 

It’s not necessary to know much about regular expressions in Python, but they are a powerful tool you can use to match and replace certain strings or substrings. Here’s a tutorial if you wanted to learn more.

r = requests.get('https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)_per_capita')

gdptable = r.text
soup = BeautifulSoup(gdptable, 'lxml')
table = soup.find('table', attrs = {"class" :"wikitable sortable"})

theads=[]
for tx in table.findAll('th'):
    theads.append(tx.text)

data =[]
for rows in table.findAll('tr'):
        row={}
        i=0
        for cell in rows.findAll('td'):
            row[theads[i]]=re.sub('\xa0', '',cell.text)
            i+=1
        if len(row)!=0:
            data.append(row)
print(data)

Credit to this website for some of the code.

Here’s a more technical explanation of how to grab HTML tables with Python code with more step-by-step instructions.

You can copy + paste the code above into your own Anaconda setup, and iterate with it if you want to play with some Python code!

The output from the code below, if you don’t modify it, is what is known as a list of dictionaries.

You’ll notice commas separating bracketed lists of key-value pairs. Each bracketed list represents a row in our dataframe, and each column is represented by the keys within: we are working with a country’s rank, its GDP per capita (expressed as US$), and its name (in ‘Country’).

For some more information on how data structures such as lists and dictionaries work in Python, this tutorial will help as well as this course: Intermediate Data Science Course by Springboard.

Thankfully, we don’t need to understand much of that in order to move this data into a Pandas dataframe, a similar way of aggregating data to a SQL table or an Excel spreadsheet. With one line of code, we’ve assigned and saved this data into a Pandas dataframe — as it turns out to be the case, lists of dictionaries are the perfect data format to be converted to a dataframe.

gdp = pd.DataFrame(data)

With this simple Python assignment to the variable gdp, we now have a dataframe we can open up and explore anytime we write out the word gdp. We can add Python functions to that word to create curated views of the data within. For a bit more of an in-depth look at what we just did with the equal sign and assignment in Python, this tutorial is helpful.

TAKING A QUICK LOOK AT THE DATA

Now, if we want to take a quick look at what we’ve done, we can use the head() function, which works very similarly to selecting a few rows in Excel or the LIMIT function in SQL. Use it handily to take a quick look at datasets without loading the whole thing! You can also insert a number within the head function if you want to look at a particular number of rows.

gdp.head()

The output we get are the first five rows of the GDP per capita dataset (the default value of the head function), which we can see are neatly arranged into three columns as well as an index column. Be aware that Python starts indexes at 0 and not 1, such that if you wanted to call up the first value in a dataframe, you’d use 0 instead of 1! You can change the number of rows displayed by adding a number of your choice within the parentheses. Try it out!

RENAMING COLUMNS

One thing you’ll quickly realize in Python is that names with certain special characters (such as $) can become very annoying to handle. We’ll want to rename certain columns, something you can do easily in Excel by clicking on the column name and typing over the old name and something you can do in SQL either with the ALTER TABLE statement or sp_rename in SQL server.

In Pandas, the way to do it is with the rename function.

gdp = gdp.rename(columns = {'US$':'gdp_per_capita'}) 

In implementing the above function, we’ll be replacing the column header ‘US$’ with the column header ‘gdp_per_capita’. A quick .head() function call confirms that this change has been made.

DELETING COLUMNS

There’s been some data corruption! If you look at the Rank column, you’ll notice that there are random dashes scattered throughout it. That’s not good, and since the actual number order is disrupted, this makes the Rank column quite useless, especially with the numbered index column that Pandas gives you by default.

Fortunately, deleting a column is easy with a built-in Python function: del. By selecting columns through the use of square brackets appended to the dataframe name.

del gdp['Rank']

Now, with another call to the head function, we can confirm that the dataframe no longer contains a rank column.

CONVERTING DATA TYPES WITHIN COLUMNS

Sometimes, a given data type is hard to work with.This handy tutorial will break down the differences between the different data types in Python in case you need a refresher.

In Excel, you could right-click and find ways of converting columns of data to a different type of data quite easily. You could copy a set of cells rendered by formulas and paste special as values, and you can use formatting options to quickly switch between numbers, dates, and strings. 

It’s not as easy in Python to switch between one data type to the other sometimes, but it’s certainly possible.

Let’s first use the re library in Python. We will regular expressions to replace the commas within the gdp_per_capita column so we can more easily work with that column.

gdp['gdp_per_capita'] = gdp['gdp_per_capita'].apply(lambda x: re.sub(',','',x))

The re.sub function essentially takes every comma and replaces it with a blank space. This following tutorial goes into each function of the re library in detail.

Now that we’ve gotten rid of the commas, we can easily convert the column into a numeric one.

gdp['gdp_per_capita'] = gdp['gdp_per_capita'].apply(pd.to_numeric)

Now we can calculate a mean for the column.

We can see that the mean of the GDP per capita column is about $13037.27, something we couldn’t do if the column were classified as strings (which you can’t perform arithmetic operations on). We can now do all sorts of calculations on the GDP per capita column that we weren’t able to do before — including filtering the columns by different values and determining what percentile rank values are for the column.   

SELECTING/FILTERING DATA

The basic need of any data analyst is to slice and dice a large dataset into actionable insights. In order to do that, you have to go through a subset of the data you have: this is where selecting and filtering data is very helpful. In SQL, this is accomplished with a mix of SELECT and different other functions, while in Excel, this can be done by dragging and dropping through data and implementing filters.

Using the Pandas library, you can quickly filter down with different functions or queries.

Let’s, as a quick proxy, only show countries that have a GDP per capita above $50,000.

This is how to do it:

gdp50000 = gdp[gdp['gdp_per_capita'] > 50000]

We assign a new dataframe with a filter that takes a column and creates a boolean variable — this function above essentially says “create a new dataframe for which there is a GDP per capita above 50000”. Now we can display gdp50000.

And now we see that there are 12 countries with a GDP above 50000!

Now let’s select only rows that belong to a country that start with s.

We can now display a new dataframe containing only countries that start with s. A quick check with the len function (a life-saver for counting the number of rows in a dataframe!) indicates that we have 25 countries that fit the bill.

Now what if we want to chain those two filter conditions together?

Here’s where chained filtering comes in handy. You’ll want to understand how this works before filtering with multiple conditions. You’ll also want to understand the basic operators in Python. For the purposes of this exercise you just need to know that ‘&’ stands for AND — and that ‘ | ‘ stands for OR in Python. However, with a deeper understanding of all basic operators, you can easily manipulate data with all sorts of conditions. 

Let’s go ahead and work on filtering countries that both start with ‘S’ AND that have a GDP per capita above 50,000.

sand500gdp = gdp[(gdp.gdp_per_capita > 50000) & (gdp.Country.str.startswith('S'))]

Now let’s work on those that start with S OR have over 50000 GDP per capita.

sor500gdp = gdp[(gdp.gdp_per_capita > 50000) | (gdp.Country.str.startswith('S'))]

There we go! We’re well on our way to working with filtered views in Pandas.

MANIPULATE DATA WITH CALCULATIONS

What would Excel be without functions that help you calculate different results?

Pandas in this case leans heavily on the numpy library and general Python syntax to put calculations together. We’re going to go through a simple series of calculations on the GDP dataset we’ve been working on. Let’s for example, calculate the sum total of all GDP per capita countries that are over 50,000.

gdp50000.gdp_per_capita.sum()

That’ll give you the answer of 770046. Using that same logic we can calculate all sorts of things — the full list can be located at the Pandas documentation under the computation/descriptive statistics section located on the menu bar at the left.

DATA VISUALIZATION (CHARTS/GRAPHS)

Data visualization is a very powerful tool — it allows you to share insights you’ve gained with others in an accessible format. A picture, after all, is worth a thousand words. SQL and Excel both have the capability to translate queries into charts and graphs. With the seaborn and matplotlib libraries, you can do the same with Python.

There are far more comprehensive tutorials on data visualization options — a favorite of mine is this Github readme document (all in text) which explains how to build probability distributions and a wide variety of plots in Seaborn. That should give you an idea of how powerful data visualization can be in Python. If you’re ever feeling overwhelmed, you can use a solution such as Plot.ly which might be more intuitive to grasp.

We’re not going to go through each and every data visualization option — suffice it to say that with Python, you’re going to have a lot more power to visualize things than anything SQL can offer, and you’ll have to trade-off the additional flexibility you gain with Python for how easy it is in Excel for generating charts from templates.

In this case, we’re going to build a simple histogram to show the distribution of GDP per capita for those countries that have more than $50,000 in GDP per capita.

gdp50000.hist() 

With this powerful histogram function (hist()) we can now generate a histogram that shows that most of the countries with a high GDP per capita cluster around the $50000 to $70000 range!

GROUPING AND JOINING DATA TOGETHER

Within Excel and SQL, powerful tools such as the JOIN function and pivot tables allow for the rapid aggregation of data.

Pandas and Python share many of the same functions that have been ported over from both SQL and Excel. You’ll be able to group data within datasets and join different datasets together. You can take a look here at the documentation. You’ll find that the join functionality offered by the merge function in Pandas is very similar to the one offered by SQL through the join command, while Pandas also offers pivot table functionality for those who are used to it in Excel.

We’re going to do a simple join here between the table we’ve developed with GDP per capita, and a list of world development indices from the World Bank.

Let’s first import the csv of country-level indicators.

country = pd.read_csv("Country.csv")

Let’s do a quick .head() function to take a look at the different columns in this dataset.

Now that we’re done, we can take a quick look and see that we’ve added a few columns that we can play with, including different years where data was sourced.

Now let’s merge the data:

gdpfinal = pd.merge(gdp,country, how = 'inner', left_on='Country', right_on = 'TableName')

We can now see the table incorporates elements of both our GDP per capita column and our new country-wide table with different data columns. For those familiar with SQL joins, you can see that we’re doing an inner join on the Country column of our original dataframe. 

Now that we have a joined table, we may want to group countries and their GDP per capita by the region of the world they’re in.

We can now use the group by functions in Pandas to play around with the data grouped by region.

gdpregion = gdpfinal.groupby(['Region']).mean()

What if we want to see a permanent view of groupby summation? Groupby operations create a temporary object that can be manipulated, but they don’t create a permanent interface to aggregated results that can be built upon. For that, we’ll have to go through an old favorite of Excel users: the pivot table. Fortunately, pandas has a robust pivot table function.

gdppivot = gdpfinal.pivot_table(index=['Region'], margins=True, aggfunc=np.mean)

gdppivot

You’ll see we’ve picked up some extra columns we don’t need. Fortunately, with the drop function in Pandas, you can easily delete several columns.

gdppivot.drop(['LatestIndustrialData', 'LatestTradeData', 'LatestWaterWithdrawalData'], axis=1, inplace=True)

gdppivot

Now we can see that the GDP per capita differs depending on the regions in different parts of the world. We have a clean table with the data we want.

This is a very superficial analysis: you’d want to actually do a weighted mean since a GDP per capita for each nation is not representative of the GDP per capita of every nation in a group since populations differ across the nations within a group.

In fact, you’ll want to redo all of our calculations involving means to reflect a population column for each country! See if you can do that within the Python notebook you’ve just started. If you can figure it out, you’ll have been well on your way to transferring your SQL or Excel knowledge to Python. 

Got any comments or questions? Please leave them in the comments section on this blog post 🙂