By 30th September 2022, I had clocked up a total of over 2000 km of running in 2022. This milestone was a good opportunity to look at how I got to this point.

The code is shown below. First, we can make a histogram to look at the distance of runs.

From this type of plot it’s clear that my runs this year consist of a lot of 4-5 km runs and then a chunk of 21 km plus. This is because my run commute is ~5 km (5.5 km but with a summer-only shorter route of 4.4 km) and I do this a lot plus I do a weekly long run of at least 21.1 km.

A histogram like this obscures how much distance these runs contribute to the total, since one 10 km run is worth two 5 km runs. We need a better way to visualise this info.

Enter `treemap`

, a way to see this information more clearly.

## Treemap

This visualisation shows the total distance in each category as an area. The runs are organised into bins of 1 km distance and then grouped by 5 km distance intervals.

Although the runs of 20-25 km in distance were far fewer in number, they make up more distance than the 5-10 km bracket. This was not so easy to see in the histogram.

## The code

```
library(treemap)
library(ggplot2)
library(dplyr)
# load the data (output from process_data() within a timeframe of interest)
all_data <- read.csv("Output/Data/alldata_2022-01-01_2022-12-31.txt",sep = "\t")
# make histogram of running distances
p <- ggplot(all_data, aes(x = Distance)) +
geom_histogram(breaks = seq(from = 0, to = 45, by = 1)) +
labs(x = "Distance (km)", y = "Runs")
ggsave("Output/Plots/distanceHist.png", p)
# bin the data at 5 km and 1 km resolution
all_data <- all_data %>%
mutate(km5 = cut(Distance, breaks = seq(from = 0, to = 45, by = 5)),
km1 = cut(Distance, breaks = seq(from = 0, to = 45, by = 1)))
# two functions to rename the categories
rename_km5 <- function(x) {
x <- sub("\\(", "", x)
x <- sub("\\]", " km", x)
x <- sub(",", " - ", x)
return(x)
}
rename_km1 <- function(x) {
x <- sub("\\(", "", x)
x <- sub(",[[:digit:]]+\\]", "", x)
return(x)
}
# rename the categories to give nice labels
all_data$labelkm5 <- rename_km5(all_data$km5)
all_data$labelkm1 <- rename_km1(all_data$km1)
# PNG device
png("Output/Plots/tremap.png", width = 800, height = 800)
treemap(all_data,
index = c("labelkm5","labelkm1"),
vSize = "Distance",
type = "index",
align.labels=list(
c("left", "top"),
c("center", "center")
),
palette = "Set2",
overlap.labels = 1,
title="")
dev.off()
```

A few comments on the code for anyone interested in replicating the plot. The data loaded in are runs within a time-frame of interest. I generated the file to load using some code I wrote previously. All that is needed is a dataframe of runs with a column called Distance.

Binning the data can be done with `mutate`

and `cut`

this factorises the distances into defined bin widths. Unfortunately, the names of the bins don’t look great on the plot, so I made two functions to reformat them to something nice. In this was `(0,5]`

turns into `0 - 5 km`

for example.

There’s several ways to customise the Treemap and I didn’t go crazy optimising it. The palette (Set2) looked good to me and specifying type as index worked well for my needs.

—

The post title comes from “Get Miles” by Gomez from their debut LP “Bring It On”.

Looks nice. Thanks for sharing the code!

Thank you!

Have you tried a self-weighted histogram? Each value is weighted by itself.

Great suggestion.

p2 <- ggplot(all_data, aes(x = Distance, weight = Distance)) + geom_histogram(breaks = seq(from = 0, to = 45, by = 1)) + labs(x = "Distance (km)", y = "Total (km)")

This is a good way to show the total distance in each bin.