Monday, August 26, 2013

Using R to visualize Karpov-vs-Kasparov Lifetime winner-take-all tally

The Karpov vs. Kasparov rivalry holds a special place in the chess world.

The idea behind this analysis is simple. If we take their lifetime games, plot the wins, what would it look like? We introduce one twist -- we'll be plotting the "winner-take-all" tallies, meaning that for every year, every five years, and every decade, we declare one person to be the 'winner'.

First, a note of caution: "Winner-take-all" type analyses lose a lot of information due to the roll-up. Whether a GM wins by 1 game, or a dozen games in a given year, he still gets only one "win".

At the outset, I must mention that this is NOT a chess exercise. I am ignoring the colors (whether each player had White or Black pieces) and even more egregious, I don't differentiate between standard and rapid games, or exhibition games. Time controls are ignored, as are openings.

This is a visualization exercise, and the idea is to see how it all looks when plotted.

I scraped the data from chessgames.com - where they have 201 games that the two have played. (I cleaned up the data and the csv file is available in github for anyone who wants to do their own analysis.) I use plyr to aggregate the data, and ggplot for the visualization. I wanted to try out this "pianogram" type visualization, where each plot looks like piano-keys.

Let's get the basics out of the way:

201 games - 138 draws,  37 wins for Kasparov, 26 for Karpov

Overall, Kasparov pretty much dominated Karpov. But how are these wins and losses spread across time? The two played for a little over 30 years.

The Winner-Take-All Method
In any given time period, say 1990, there can be 4 possible outcomes:
No games played, Equal number of wins, Karpov won, or Kasparov won. (If both players had the exact same number of wins in a given time period, we label that a Draw.)

Thanks to the 'plyr' package and ggplot, we can calculate the by-year, "5-year Winner" for each half-decade, and also the decade-wise winners by writing one function, and calling it with ddply.


So here's what the Yearly-Winner-Take-All looks like:

Let's plot the half-decade and decade plots. Again, note that only one GM is declared the winner for the entire decade, no matter what the difference in the scores are.



Now, we can put it all together, in one graph.


As a very quick summary, we can see that Karpov started out strong, the entire 80's was a draw, and then Kasparov took over.

The complete R code to reproduce this analysis is available in this gist, along with the data-file in CSV file.




Wednesday, August 21, 2013

What are the hottest areas for CS Research? (Based on Google Research 2013)

What are some of the hottest areas of research in Computer Science at the moment (August 2013)? And at which universities is this research being carried out?
The answers are subjective by definition, but looking at the numbers behind the Google Research awards announced yesterday can provide some quick insights. Going by the grants as a proxy for what are where are the current hottest areas of research, here's what we get:

A total of 105 grants were awarded. 81 universities in all got the awards, in 19 different research areas.

Here are the top research areas:





As far as institutions go, MIT, Georgia Tech and CMU got 5 grants each, with Cornell and Standford getting 3 each.

The R code I used to generate this can be found here. In case anyone is interested in performing their own analysis, I've also included the CSV data file.