The popularity of infographics, particularly interactive maps, is on the rise in popular culture and daily life. Recently, I've seen glimpses of boring information, in graphical form, popping up in popular entertainment. Today, I'll talk about the British TV series 'Sherlock'. Note: this post would be better with a nice action, instrumental soundtrack. I'm just saying.
Who doesn't love a good chase scene? Cops jumping over buildings, secret agents commandeering motorcycles in pursuit of thuggish Russian agents, Wiley Cayote mounting an ACME rocket in pursuit of that dastardly road runner. What can be better than well shot sequences of gun shots, jumps, swinging, running, ducking and breaking through glass? What's missing? I think the answer is 'context.'
Single perspective views can not capture both the chaser and his or her prey. It's not always easy to know if either is making a good move to catch or evade their competitor. Enter the info-graphic.
In the pilot episode of 'Sherlock' (the mystery of pink), Sherlock and his reluctant sidekick Watson are in pursuit of a cab through the streets of London. The cab is bound to stay on roads. The heroes are on foot but are free to climb stairs, jump across buildings, cut through gardens and romp through small alley ways.
The scene is populated with cut sequences of cars whizzing and our heroes moving in a totally non-overlapping geographical space. To help the viewer keep track of this scene, they display a map of the London neighborhood (frame 1), with the baddies in red and the goodies in green (can it be any other way? I maintain that it can not). These graphics help provide strategic context by illustrating Sherlock's thought process and update the audience with the geographical history and future of the chase scene.
Starting at around the intersection of Ingestre Pl and Hopkins St, Sherlock deduces that the cab is unlikely to take the route along Wardour to Broadwick as in Frame 1 and rather assumes the cab will take Warwick to Wardour street (Frame 2). The purple dot in frame 2 presumably indicates Sherlock's intended point of intersection.
Running and jumping ensues. I have no idea exactly where they are, but probably on someone's roof who lives near the intersection of Broadwick and Poland Street (not shown). Poor bastards. Sherlock and Watson arrive on D'Abblay St, but oh no, they've missed them. Frame 3 clearly shows that the green line overlaps with the red, and we're left to assume that they did not intersect at the same time. Sherlock picks a new intersection point (see purple dot), and instead of following the cab by turning Left along Poland Street, he turns right. The combination of the physical actors prancing off in the wrong direction and the clear graphic helps the audience understand what is going on, and thereby CARE about all the action.
Running and trampling of nicely landscaped gardens ensues. Frame 4 is the final graphic we are given. It shows only slight progress over frame 3. This is the least informative frame. More running, and then, BAM, sherlock has his man, thus demonstrating that logic, deduction, an unnatural amount of cartographical memory coupled with obscene amounts of mundane municipal construction knowledge will triumph over cab drivers (who don't even know they're being chased).
Success: These map frames definitely help the viewer understand the logic and agency in Sherlocks's pursuit. Without it, it would appear to be gratuitous running and jumping and a fortunate and inexplicable interception of his target. We see glimpses of why Shelock assumes the cab will take one route over another. We also see two different instances of where Sherlock wants to intercept his target. This makes the running and jumping over buildings meaningful. We also see, in sort-of real time, the progress he's making whilst running and jumping, thus making it more suspenseful.See comments
"Powering the Cell: mitochondria" and the "Inner Life of the Cell", videos produced and distributed by XVIVO, are two of the most sensational examples of modern scientific animation. An ensuing story in the New York Times solidified for me that scientific animation was not only a blooming industry producing eye candy for fund raisers but was also an active area of research and the beginnings of a community trying to push the boundaries of scientific communication. This inspired me to get a copy of Maya (an industry-leading 3D animation software package freely available to the academic community) and get playing.
Even with all the inspiration from personal heroes Gael McGill and Drew Barry, the short animation above was the most style I could muster with my feeble newbie maya skills. The subject of this video is a particular example of the bump hole method, pioneered by Kevan Shokat [paper], which in general refers to the strategy of introducing a genetic mutation to a native enzyme in such a way that the enzyme can catalyze a specific reaction between the native substrate and another molecule. In this case, the lab at Memorial Sloan Kettering Cancer Center (MSKCC) at whose request I made this video used the bump hole method to enable a transferase reaction that could attach a label to a substrate. For a more detailed narrative of the video, please see the end of the post.
The stated purpose of this animation was to replace a simple 2D schematic illustration of the entire problem and its solution. The video is intended to accompany a live presenter that will narrate the video. Its primary function as a schematic allows us flexibility to deviate from scientific accuracy, when necessary. The scientific accuracy here is limited to the shape of the molecules and the positioning of the substrate and cofactor relative to the enzyme. All colors (obviously), transparencies and glow effects are for illustration. Furthermore, the magnitude of the mutation's effect on the enzyme geometry is greatly exaggerated to clearly show a perceptible change in the enzyme structure. Walking the line between scientific representation and interpretation is something all scientific animators will have to deal with, and the rules for scientific integrity and responsibility in this arena are still up for discussion. I hope that here I don't exemplify any egregious violation.
A few tidbits for the interested. I used the free and brilliant maya plugin for molecular animators called molecularMaya. As far as I understand, molecularMaya is the brainchild of Digizyme owner Gael McGill (and his super friendly and helpful team). It allows for automatic importing of pdb files from the pdb website or locally on your machine. The plugin provides a set of menu options for viewing the protein as a set of atoms, a mesh, or ribbon. Each viewing mode is coupled to different style options. For example, I used a mesh resolution of 1.714 for the enzyme to show more detail. The mesh resolution does not, as far as I am aware, translate to an Angstrom resolution. molecularMaya is due for a much anticipated new release and I believe it will be significantly more than just a few new features. One feature that I would really like would be the ability to select individual residues. I trust additional representations such as beta-sheet and alpha-helix cartoons wil be included.
I hope to extend the utility of this animation with interactive labels and overlaid figures to supplement the content with scientific evidence. In a dream world, scientisits will be communicating with each other and to the public through such interactive media. I expect also that 3D animations can be a valuable part of that media experience. New presentation modalities are here and new ways of learning need to be explored. We might as well also have some fun with it.
I apologize in advance for the generics, but the specific names and information about the enzymes, substrates, cofactors and mutations are privileged until publication. The characters of this animation include a 'blue' enzyme, a 'red' substrate, and a ball-and-stick model of a cofactor that consists of a base and a clickable moiety, which I'll refer to as the tag finger.
Scene 1 begins with an introduction to the native enzyme and its substrate. Scene 2 introduces the cofactor and its constituent parts. Scene 3 consists of a demonstration of the problem, which is that the full cofactor does not bind to the native enzyme. We tried to use the effect of the cofactor bouncing off the enzyme to clearly illustrate that the cofactor does not fit. Scene 3 continues with the placement of the cofactor in its intended position. Here we use a simple rotation to get a better view and a transparency on the enzyme mesh to give the viewer an idea of where the cofactor sits relative to the native enzyme. As the transparency goes away and returns to opaque, we see that the cofactor's tag finger is no longer visible. We hope this clearly suggests that this part of the cofactor doesn't fit the native enzyme structure. Scene 3 concludes with a slow morph from a representation of the native enzyme into a representation of the mutated enzyme. This part should conceptually explain that the effect of the mutation is to 'make room' for the cofactor tag finger. In this part of scene 3 we take an artistic license and devaite from scientific accuracy. We exaggerate the size of the hole, since at that mesh resolution, the deleted residue would be noticed. Scene 4 shows the consecutive binding of the substrate and cofactor to the enzyme followed by an artistic (non-scientifically accurate) representation of the reaction carried out by the transferase, where the tag finger breaks from the cofactor and attaches to a specific lysine residue on the substrate. I use a glow effect to represent the start of the reaction; Why? Scientists love glow effects, don't we? Scene 5 is the money shot. It shows the individual components breaking off after the reaction, with special emphasis on the newly tagged substrate.See comments
Music genres are serious business -- the source of debate, speculation, fights, and of course, mockery. What seems like a fairly clear-cut concept in a record store is less clear when debating with your friends whether Brian Eno makes electronica or ambient music, or what kind of hip-hop this Kid Cudi album is (if it's hip-hop at all). Or even worse, how do genres link together -- is hip-hop a descendent of R&B, or are they both sibling children of soul music? Is indy folk closer to 60s and 70s folk music, or to indy rock (or are all three just branches of the same limb stretching back to the blues)?
The goal of this post is to take advantage of some of the available social data (in this case, tags on last.fm) to form a sort of consensus on music genre classification. This isn't meant to produce an authoritative ground truth on music classification (I doubt such a thing could exist), but rather to try to get at the most widely-held conception in a somewhat objective and perhaps novel way.
note -- I saved the technical details for the end; if you want to read them before seeing the results, skip to the Mining Details section below
Pop Music Genre Tree
As my source of data, I took the most common genre-related tags on last.fm for songs from the Whitburn project. To work out the relationships between all these tags (and by extension the genres themselves), I used some phylogenetic software to produce a family tree of tags. The logic of using phylogenetics algorithms for this is explained in the Mining Details below. Here's the tree, with colors and (terrible) labels added by me (click for bigger version):
This tree serves two purposes: it works as a map from the varied and whimsical landscape of social tags onto a concise and recognizable group of genres, and it also reveals some surprising insights about how genres (are perceived to) actually relate to one another.
For instance, the R&B tags seem to cluster into two groups – a 70s and 80s R&B closely aligned with soul music, and a later R&B aligned with hip-hop. It's also surprising that country music seems to cluster very closely to folk rock and southern rock, both genres I expected to see closer to the pure rock camp. Speaking of which, a few other genres I associate with rock (soft rock / ballads, alternative / punk / grunge, and pop rock) defied expectation by branching out on their own rather than falling under the rock umbrella.
Less surprising was the close association of electronica with other dance music including disco, and the very broad nature of the rock genre (which includes classic rock, hard rock, psychadelic rock, glam rock, progressive rock, etc.).
One caveat -- I do expect the exact structure of this tree to be somewhat sensitive to things like which songs are included in the dataset. Still, even if slightly rearranged versions of the tree are valid themselves, that really doesn't make this less valid, as it's still a representation of genre relationships based on input from perhaps millions of last.fm users.
Pop Genres Through History
Having a sensible map of social tags to song genres also gave me the chance to take a look at pop history -- to take a look at the growth of, the decline of, and in some cases the resurgence of genres over time.
Taking a look at the number of songs associated with each derived genre over time reveals a few cool things. The first thing to notice is that the total number of tagged songs each year varies quite a bit -- from a few in 1920 to a few hundred by the 1980s. Though the number of songs in the Whitburn project does vary a little from year to year, most of this variation is due to a lot of songs, especially old songs, just not being tagged or even present in last.fm. This means some (real) genres of music are completely absent; after all, users of last.fm are people that live in the 20th century and listen to digital music, which for better or worse does not include old gospel recordings of Homer Rodeheaver or ragtime covers by the US Marine Band (though I'm sure a few people will be saddened by the lack of tagged Broadway showtunes). I prefer to take this as a reminder that history (or maybe I should say culture) is in the eyes of the beholders. When we think of music of the 30s, we think of blues and jazz, and that's what represented here.
Of the songs that are tagged, a few interesting patterns emerge. First, except for the explosion of rock and soul in the late 50s / early 60s (fairly quickly after the respective introductions of the two genres), most genres seem to grow at the expense of others. The growth in hip-hop and alternative music in the late 80s / early 90s coincides with the decline of rock (and to a lesser extent dance and soul music) in the same period. Second, just because a genre of music is down doesn't mean it'll stay down -- country / americana might have looked like it was on its last legs by the late 80s, but by the 2000s it actually had a bigger marketshare than ever.
Normalizing the songs per year to produce a genre ratio plot makes a few things a bit more visible. One of these is that out of all these genres, the one with the best longevity seems to be soul music, though I do have to qualify that somewhat, as the tag "soul" is pretty ambigious so I might be picking up some songs that are just soulful without being soul.
Finally, I do have to point out that tagging each song as a member of a single genre only gives part of the story: a lot of songs are tagged as members of several genres. For the curious, out of this dataset the artists with the most genre-spanning power were Prince, Phil Collins, Peter Gabriel, and Michael Jackson. Taking a closer look at genre blending and fusion will most likely be the topic of a future post.
Mining Details (for the curious)
My basic strategy for this analysis was to link up two pieces of data. The first was pop music charts, and the second was the social tags associated with these songs on last.fm. The second piece was straightforward to obtain thanks to the well-maintained last.fm api, but the first required some curated and maintained dataset. My original plan was to use the publicly-available data in the Billboard Charts API to gather a list of popular songs over the last century. Sadly, as of right now the service is completely broken and useless. But where Billboard's effort falls short, the Whitburn project managed to make up for it by releasing a meticulously gathered and annotated list of 37000 chart-hitting songs since the 1890s.
Here are the most common tags for the Whitburn project songs represented as a word cloud (I highlighted genre-specific tags in red):
The first thing to notice (which will be nothing new to people who work with this kind of data professionally) is that the tags are, for lack of a better term, "messy". For instance, there are about eight different tags for R&B, including the alternative spelling "rhythum and blues tag". Several tags are ambiguous -- does "soul" mean that the song is in the genre of soul music, or that the song is soulful? Since this is social data, we have to contend with people using a single tag for more than one meaning, and using different tags to mean the same thing.
Rather than simply letting it be an annoyance though, the idea here was to let treat ambiguity itself as a source of information. Grabbing the 100-odd common tags that have to do with genre, I labelled each by which songs have that tag. I admit this sounds somewhat backwards; to use a metaphor, we can think of each genre as having a sort of genotype -- a sequence that defines it. To get that sequence, I look through the set of songs and mark down 1 where that tag is mentioned and 0 where it is not (this means that the songs are basically being treated as alleles).
To help visualize, here's a raster image of a section of this "genotype" map. For each genre tag (y-axis) there's a mark if the song on the x-axis has ben tagged with that genre.
The first thought that comes to mind looking at this kind of data is to use a standard clustering algorithm (e.g. hierarchical clustering or PCA followed by k-means) on it to find groups of related tags. The problem with that is coming up with a sensible distance metric -- one that puts a large distance say between two rarely-used tags with few overlapping songs, but also puts a small distance between a common tag and a rare tag whose songs overlap with the common tag (i.e. its parent).
This is actually where the genotype metaphor came in handy. I simply took it literally, and used an algorithm developed by evolutionary biologists that does exactly what I want: produces a tree of the relationship between the tags assuming that losing songs from parents to children is common, but gaining songs is very rare (for the even more technically-curious, I produced the maximum parsimony character tree for the genre tags by taking the consensus tree for 100 bootstrap rounds). Once I had the tree, using it to classify songs based on their tags was straightforward.See comments