Another post using R and looking at Twitter data.
As I was typing out a tweet, I had the feeling that my vocabulary is a bit limited. Papers I tweet about are either “great”, “awesome” or “interesting”. I wondered what my most frequently tweeted words are.
Like the last post you can (probably) do what I’ll describe online somewhere, but why would you want to do that when you can DIY in R?
First, I requested my tweets from Twitter. I wasn’t sure of the limits of rtweet for retrieving tweets and the request only takes a few minutes. This gives you a download of everything including a csv of all your tweets. The text of those tweets is in a column called ‘text’.
## for text mining library(tm) ## for building a corpus library(SnowballC) ## for making wordclouds library(wordcloud) ## read in your tweets tweets <- read.csv('tweets.csv', stringsAsFactors = FALSE) ## make a corpus of the text of the tweets tCorpus <- Corpus(VectorSource(tweets$text)) ## remove all the punctation from tweets tCorpus <- tm_map(tCorpus, removePunctuation) ## good idea to remove stopwords: high frequency words such as I, me and so on tCorpus <- tm_map(tCorpus, removeWords, stopwords('english')) ## next step is to stem the words. Means that talking and talked become talk tCorpus <- tm_map(tCorpus, stemDocument) ## now display your wordcloud wordcloud(tCorpus, max.words = 100, random.order = FALSE)
For my @clathrin account this gave:
So my most tweeted word is paper, followed by cell and lab. I’m quite happy about that. I noticed that great is also high frequency, which I had a feeling would also be the case. It looks like @christlet, @davidsbristol, @jwoodgett and @cshperspectives are among my frequent twitterings, this is probably a function of the length of time we’ve been using twitter. The cloud was generated using 10.9K tweets over seven years, it might be interesting to look at any changes over this time…
The cloud is a bit rough and ready. Further filtering would be a good idea, but this quick exercise just took a few minutes.
The post title comes from “The Sound of Clouds” by The Posies from their Solid States LP.
One thought on “The Sound of Clouds: wordcloud of tweets using R”
I daresay it’s a thumbs-up for your humility that “Steve” and “Royle” are nowhere to be seen… 😉
Comments are closed.