Roger Huang

Roger has worked in user acquisition and marketing roles at startups that have raised 200m+ in funding. He self-taught himself machine learning and data science in Python, and has an active interest in all sorts of technical fields. He's currently working on boosting personal cybersecurity (youarecybersecure.com)

Technology and Society

The 21st Century Prisoner’s Dilemma

Decreasing labor in order to salvage profits, to the detriment of both

A prisoner’s dilemma is when two groups that would be better off cooperating in order to achieve a higher coordinated payout choose instead to sacrifice their better aggregate payouts because their individual incentives lead them to forgo cooperation.

Typically represented in a matrix form, one way to conceptualize it is to describe the following scenario: I and Stephen Colbert are both in prison for being fearless conservatives.  We are given the choice to either be silent or to cooperate with the statist authorities by informing on the other. If we are both silent, we would get two years each in prison, if we both informed we would get three years in prison, and if one of us cooperated and the other remained silent, the one cooperating would be free, while the other who was silent would get five years.

It is in both of our private interests to inform on the other, because then we face a choice between freedom if the other was silent, and three years if the other informed, rather than in the case of if we choose to be silent, in which case we face either two years in prison if the other was silent, and five years if the other informed.

In aggregate that means instead of having two years in prison if we both were silent, we will both inform on one another and get a negative aggregate outcome of having three years in prison each because it is in our private interest to arrive to this equilibrium, since both of us will seek the better payoff of informing on the other. We will harm each other as we seek to help ourselves.


The 21st century’s prisoner’s dilemma will be that every firm will not want to hire workers, but will want every other firm to hire workers in order to have a consumer base for itself. This is because the private payoff of having less labor (and saving on what for many businesses is the largest cost) is such a powerful private incentive. Despite what other businesses do in aggregate, it will almost always be better for the individual firm to shed workers.

Unfortunately, this will lead to a worse social equilibrium. Castes of the unemployed, political and economic volatility, and staggering inequality may become the norm. Ironically, this economic chaos will then lead to lower profits, as less consumers will be able to buy most products. If left unchecked, lower private costs will be overwhelmed by higher social costs.

The 1950s saw the rise of the Great Society, the establishment of the welfare state, and of mass infrastructure projects that set the foundation of the 20th century. We will have to do even better to build the 21st century, and ensure a balence between private and social incentives.

(Now please help break me and Stephen out of prison!)

Defining the Future

Defining Big Data in Less Than Three Minutes

I remember the first time I said the word “big data” with pride when describing my work. It, like every good buzzword, meant nothing to me, but conveyed a lot to my imagined prospective audience. It said something about my intelligence that I was working in “big data”, plying away at Excel sheets with way too many lines—a sure sign of a “big data” expert!

I know better now. After doing some research, I’m proud to say that I knew absolutely nothing about the topic at the time. In many ways, I still don’t—but I know enough to talk about the basics of “big data” and what it really represents, so you can explore with me.

The first step is to realize that big data represents data that is so large and complex that conventional data tools such as the table-based SQL cannot handle the load. Big data is not simply a big dataset that can be handled with Excel. Think of, for example, someone tracking every time someone commented on Ahnold’s accent on social media, their location, and other user attributes, in a mad quest to find who had the best “get to the choppa!” or “there is no bathroom!” quote variations: you’d quickly go mad trying to pass through every single one of those data points in a relational table or in an Excel file, even if you worked for a large Arnold-watching company, and had a set data process.

An easy rule of thumb to describe this is to say that big data refers to data sets that become difficult for an organization with a conventional data process to handle. This can be on several orders of magnitude. A smaller business may struggle with a lower threshold than a larger one. Nevertheless, it is the beginning of the struggle, and the search for alternatives to bread-and-butter SQL/Excel that is at the core of big data.

Traditional data tends to group data into tables, and operates with a smaller number of servers. Big data tends to ungroup data, and organize and analyze data through parallel processing across a larger number of servers.

When people in the field comment about the possibilities offered by big data, they are espousing the collection of unfathomable amounts of details we are now leaving on the web which was impossible five or ten years ago—because there were not so many details on the web, and there were no tools to collect them. Now with smartphones, sensors, and social media, data points are multiplying on an exponential level. Those who would take a dragnet over all of this data, pry them through tools not traditionally used in data collection that spread the volume and velocity of data over several servers instead of one or two, and then emerge with finely combed and actionable insights despite the overbearingly massive amount of data, are dealing with big data. This includes the NSA, but also data scientists who won the 2012 election, and health analysts working to ensure better care for all.

Please contribute to big data by commenting or forwarding me your terabytes of favorite Ahnold quotes.

It’s probably big data: new tools and terms

Hadoop

NoSQL

MapReduce

MongoDB

Look at me in very not-tabled Javascript Object Notation, a favorite of web-based Big Data databases:

JSON

JSON in relation to Big Data

 

It’s probably not big data

Your Excel spreadsheets of political enemies, no matter how many you have

Your Excel spreadsheets of dateable people, no matter how many you have

Your SQL tables of your favorite Arnold movies, and quotes contained within

Your handwritten list of things you would do for a Klondike bar

Look at me in traditional SQL table form:

SQL

SQL in relation to Big Data