My activity on twitter revolves around four accounts.
I try to segregate what happens on each account, and there’s inevitably some overlap. But what about overlap in followers?
What lucky people are following all four? How many only see the individual accounts?
It’s quite easy to look at this in R.
So there are 36 lucky people (or bots!) following all four accounts. I was interested in the followers of the quantixed account since it seemed to me that it attracts people from a slightly different sphere. It looks like about one-third of quantixed followers only follow quantixed, about one-third follow clathrin also and more or less the remainder are “all in” following three accounts or all four. CMCB followers are split about the same. The lab account is a bit different, with close to one-half of the followers also following clathrin.
Extra nerd points:
This is a Venn diagram and not an Euler plot. Venn just shows schematically the intersections and does not attempt to encode information in the area of each part. Euler plots for greater than three groups are hard to generate and to make any sense of what is shown. It is a dataviz problem to look at the proportions or lots of groups. A solution here would be to generate a further four Venn diagrams. On each, display the proportion for one category as a fraction or percentage
How to do it:
Last time, I described how to set up rtweet and make a Twitter app for use in R. You can use this to pull down lists of followers and extract their data. Using the intersect function you can work out the numbers of followers at each intersection. For four accounts, there will be 1 group of four, 4 groups of three, 6 groups of two. The VennDiagram package just needs the total numbers for all four groups and then details of the intersections, i.e. you don’t need to work out the groups minus their intersections – it does this for you.
library(rtweet) library(httpuv) library(VennDiagram) ## whatever name you assigned to your created app appname &lt;- "whatever_name" ## api key (example below is not a real key) key &lt;- "blah614h" ## api secret (example below is not a real key) secret &lt;- "blah614h" ## create token named "twitter_token" twitter_token &lt;- create_token( app = appname, consumer_key = key, consumer_secret = secret) clathrin_followers &lt;- get_followers("clathrin", n = "all") clathrin_followers_names &lt;- lookup_users(clathrin_followers) quantixed_followers &lt;- get_followers("quantixed", n = "all") quantixed_followers_names &lt;- lookup_users(quantixed_followers) cmcb_followers &lt;- get_followers("Warwick_CMCB", n = "all") cmcb_followers_names &lt;- lookup_users(cmcb_followers) roylelab_followers &lt;- get_followers("roylelab", n = "all") roylelab_followers_names &lt;- lookup_users(roylelab_followers) # a = clathrin # b = quantixed # c = cmcb # d = roylelab ## now work out intersections anb &lt;- intersect(clathrin_followers_names$user_id,quantixed_followers_names$user_id) anc &lt;- intersect(clathrin_followers_names$user_id,cmcb_followers_names$user_id) and &lt;- intersect(clathrin_followers_names$user_id,roylelab_followers_names$user_id) bnc &lt;- intersect(quantixed_followers_names$user_id,cmcb_followers_names$user_id) bnd &lt;- intersect(quantixed_followers_names$user_id,roylelab_followers_names$user_id) cnd &lt;- intersect(cmcb_followers_names$user_id,roylelab_followers_names$user_id) anbnc &lt;- intersect(anb,cmcb_followers_names$user_id) anbnd &lt;- intersect(anb,roylelab_followers_names$user_id) ancnd &lt;- intersect(anc,roylelab_followers_names$user_id) bncnd &lt;- intersect(bnc,roylelab_followers_names$user_id) anbncnd &lt;- intersect(anbnc,roylelab_followers_names$user_id) ## four-set Venn diagram venn.plot &lt;- draw.quad.venn( area1 = nrow(clathrin_followers_names), area2 = nrow(quantixed_followers_names), area3 = nrow(cmcb_followers_names), area4 = nrow(roylelab_followers_names), n12 = length(anb), n13 = length(anc), n14 = length(and), n23 = length(bnc), n24 = length(bnd), n34 = length(cnd), n123 = length(anbnc), n124 = length(anbnd), n134 = length(ancnd), n234 = length(bncnd), n1234 = length(anbncnd), category = c("Clathrin", "quantixed", "CMCB", "RoyleLab"), fill = c("dodgerblue1", "red", "goldenrod1", "green"), lty = "dashed", cex = 2, cat.cex = 1.5, cat.col = c("dodgerblue1", "red", "goldenrod1", "green"), fontfamily = "Helvetica", cat.fontfamily = "Helvetica" ); # write to file png(filename = "Quad_Venn_diagram.png"); grid.draw(venn.plot); dev.off()
I’ll probably return to rtweet in future and will recycle the title if I do.
Like last time, the post title is from “I’m Not Following You” the final track from the 1997 LP of the same name from Edwyn Collins
2 thoughts on “I’m not following you II: Twitter data and R”
Well, that’s my education for today! I’d never heard of Euler diagrams but that’s a useful and interesting distinction with Venn ones. I can see how they’d be harder to generate by authors (far easier just to show the intersections equally and the numbers in each, as with Venn), but in theory they can contain a lot, right? There’s a Euler diagram on the Wikipedia page showing 10 groups, unless I’m getting the semantics wrong…
Thanks for the comment. Hmmm the wiki pages are not clear on the differences between the two. My understanding was that Euler is scaled and wouldn’t show combinations with zero members.
Comments are closed.