Mr. Mastodon Farm: analysing a mastodon ActivityPub outbox.json file

I migrated my personal Mastodon account from to recently. If you’d like to do the same, I found this guide very useful. Note that, once you move, all your previous posts are left behind on the old instance.

Before I migrated, I downloaded all of my data from the old instance. I thought I’d take a look at what I had posted to see if anything was worth reposting on I also took a look at my first 9 months on Mastodon at To do all of this, I used R.

The previous content is still visible at the old instance, unless you delete that account. However, divining your original content from the boosts is hard. This is where R can help!

Getting started

When downloading your data from a Mastodon instance, you get a zip file which contains a bunch of files, e.g. your avatar, header etc. as well as a file called outbox.json – this is the file we’ll use.

I set up an RStudio project with standardised structure (using this) and copied the outbox.json file into the Data folder.

The code

I have annotated the code as we go. The first thing is to make a html file so that I can read the contents of my posts.

# this script can be found here:

# load only the "orderedItems" part of outbox.json
df <- fromJSON("Data/outbox.json")[["orderedItems"]]

# we just need a list of times and text/content of posts
# here we filter for "raw posts" and not other outbox items
posts <- df %>%
  unnest_wider(object, names_sep = "_") %>% 
  filter(type == "Create") %>% 
  filter(! %>% 
  filter( %>% 

# a quick kludge to make something that will display in html
output <- as.vector(paste0("<p><b>",posts$published,"</p></b>",posts$object_content,"<hr>"))

# write the file
fileConn <- file("Output/Data/output.html")
writeLines(output, fileConn)

Now, in the directory Output/Data/ we have a little output.html file that can be opened in a browser.

It contains a list of time/dates of each toot and the text content. Note the aim here was just for me to be able to read the content easily. There are other projects out there to make a fully functional repository of Mastodon content.

All the links are live and apart from some borked special characters, the html is very readable.

My first toot was:


So I made an account here and now I’m not sure what to do. I also missed the opportunity to ditch my biogeek handle from the bird site. Anyway, this is my first post.

and my last toot on was:


Therapy dogs are so last year. Free ice cream and llamas to pet today at #WarwickUni

The photos are not displayed since we only extracted two columns (date and post content from the json file).

In my archive, there were 1192 items of content, of which only 206 were original toots. I found a couple of things that might be fun to repost, and a bunch of things I’d forgotten about. So, mission accomplished!

Post frequency

Finally, I had a look at posting frequency.

# transform the created date/time to POSIX
posts$created <- as.POSIXct(posts$published, format="%Y-%m-%dT%H:%M:%SZ")
# summarise the posts
df_day <- posts %>%
  summarize_by_time(.date_var = created,
                    .by = "day",
                    n = n())
# generate the plot using calendarHeatmap function
p <- calendarHeatmap(as.Date(df_day$created), df_day$n, title = "Toots", subtitle = social_caption)
# add social media icon, see
sysfonts::font_add(family = "Font Awesome 6 Brands",
                   regular = "/Users/steve/Library/Fonts/Font Awesome 6 Brands-Regular-400.otf")
social_caption <- paste0(
  "<span style='font-family:\"Font Awesome 6 Brands\";'>&#xf4f6;</span>
  <span style='color: #3b528b'>@clathrin</span>"
p <- p +
  theme(plot.subtitle = ggtext::element_textbox_simple())
# save the plot
ggsave("Output/Plots/all_calendar.png", p, width = 4, height = 4)

which resulted in this graphic:

Note that the code above uses the calendarHeatmap function from here, and the method for adding social media icons to ggplots is described by Nicola Rennie here.

OK, so 9 months is not a lot of data compared to my 12 years on Twitter, but there are a couple of insights. I am using Mastodon everyday, but here we are looking at when I post – not when I reply or anything else – just post. From this data, it looks like I was settling into a pattern of posting Monday-Friday and not on the weekend. My maximum number of toots in a day was 8, which coincided with a trip to HHMI Janelia for the Recognizing Preprint Peer Review meeting.

It will be interesting to have look at my data from after a few months to see how my Mastodon usage is developing.

The post title comes from “Mr. Mastodon Farm” by Cake from their Motorcade of Generosity album.

4 thoughts on “Mr. Mastodon Farm: analysing a mastodon ActivityPub outbox.json file

  1. Nice to see you easing into this new environment. Out of interest, do you know if there’s a way of seeing how many referrals are coming to a WordPress blog via Mastodon? It’s already obvious that the level of community engagement I get on Mastodon is greater than on Twitter, but I’ve no idea how much that translates into page visits for the blog (and knowing that would have a large bearing on how much time I allocate to Mastodon).

    1. There’s a few stumbling blocks to doing this. First, Mastodon traffic is (as I understand it) impossible to discern from someone just typing the URL into the browser. I guess this could itself be useful if you don’t get any direct traffic: any direct traffic you see would be coming from Mastodon! Second, if you are tracking using Jetpack then I think the options are a bit limited.
      The longer answer is that the links in Mastodon are converted to “rel=noreferrer” when they appear for someone to click on. So the traffic appears direct. A way around this is to add some tracking stuff (campaign parameters) to the URL you post. If you are looking at traffic with Google Analytics, this can be highly customised (that’s how companies try to track us). You can add whatever gubbins you like after a question mark at the end of the URL and the user just gets the page. What needs to be determined is how those click throughs appear in Jetpack. If they show up distinct from “direct” and other sources, this would be a solution.

Comments are closed.