Target: monitoring a running goal in R

Many running races and events have been cancelled or disrupted due to the pandemic. To fill the void, I’ve been taking on Garmin’s “Challenges”.

In Garmin Connect, you can accept a challenge set by Garmin (I think users can challenge each other too). Completion of these challenges gives the user points, which I’ve become somewhat obsessed with.

In late December I accepted the 2021 Running Stage 1 challenge: to run 505 km by the end of March. The name Stage 1 suggests that there will be 4 similar challenges which would presumably take me over the 2000 km running total for the year. This sounds like a fun goal – off we go!

Challenge accepted

The app shows progress with a bar and you can tap through to see the distance covered so far, but it is difficult to know whether I am “on target” with respect to the time remaining for the challenge.

Monitoring progress using R

As described previously, Garmin Connect in a browser allows for easy export of summary data in csv format.

I downloaded my data, ensuring that it included activities from the start of 2021. This csv was added to the Data/ folder of my RStudio project. I am currently using a uniform way of processing data using a folder structure which makes coding easier. The script starts by checking that this folder structure is in place. The csv goes into Data/ and I can add further csv files, as the challenge progresses, which contain the latest activities, to keep monitoring the target.

The script loads in all csv files and makes a data frame of them all, removing any duplicates. I created a function that allows me to filter the data for the time window we are interested in (for the challenge) and to only look at Running (and Treadmill Running) activities. It also calculates the cumulative distance run during this window. Now, this distance needs to be compared to the target, at that point in time.

I could do this by plotting Cumulative Distance by Date and adding an abline. However, I also wanted to look at my distance “balance”, i.e. am I ahead or behind target?

So, another function creates a day-by-day data frame of the target. This can be used for plotting, and it allows me to calculate my distance “balance”.

The script is below. This code is reusable for future stages of the challenge or for monitoring progress against an annual/monthly running target you’ve set yourself.

The challenge means running just over 5.5 km every day and so far progress is good.

Generally my running balance has been “in the black” and I could see that it was OK take a rest day yesterday!

Just show me the code


## Setup preferred directory structure in wd
ifelse(!dir.exists("Data"), dir.create("Data"), "Folder exists already")
ifelse(!dir.exists("Output"), dir.create("Output"), "Folder exists already")
ifelse(!dir.exists("Output/Data"), dir.create("Output/Data"), "Folder exists already")
ifelse(!dir.exists("Output/Plots"), dir.create("Output/Plots"), "Folder exists already")
ifelse(!dir.exists("Script"), dir.create("Script"), "Folder exists already")

## functions

compare2target <- function(activity,fromStr,toStr,df) {
  # filter for activity
  df_window <- subset(df,grepl(tolower(activity),tolower(df_all$Activity.Type)))
  # activities within the window
  fromDate <- as.Date(fromStr)
  toDate <- as.Date(toStr)
  df_window <- subset(df_window, as.Date(df_window$Date) >= fromDate & as.Date(df_window$Date) <= toDate)
  # put them in order
  df_window <- df_window[order(as.Date(df_window$Date)),]
  df_window$Cumulative.Distance <- cumsum(df_window$Distance)

maketarget <- function(fromStr,toStr,km) {
  temp <- seq(as.Date(fromStr), as.Date(toStr), by="days")
  cumdist <- seq(km / length(temp), km, by = km / length(temp))
  df <- data.frame(Date = as.POSIXct(temp),
                   Cumulative.Distance = cumdist)


## main script

all_files <- list.files("Data", pattern = "*.csv", full.names = TRUE)
df_all <- read.csv(all_files[1], header = TRUE, stringsAsFactors=FALSE)
for (filename in all_files[-1]) {
    df_temp <- read.csv(filename, stringsAsFactors=FALSE)
    df_all <- rbind(df_all, df_temp)
# remove duplicates
df_all <- df_all[!duplicated(df_all), ]
# format Date column to POSIXct
df_all$Date <- as.POSIXct(strptime(df_all$Date, format = "%Y-%m-%d %H:%M:%S"))
df_all <- compare2target("running","2021-01-01","2021-03-31",df_all)
df_target <- maketarget("2021-01-01","2021-03-31", 505)
# wrangle data frames to have matching date columns and then merge, then find difference
# between the cumulative distance and the target
df_temp <- df_all
df_temp$Date <- as.Date(df_all$Date)
df_temp2 <- df_target
df_temp2$Date <- as.Date(df_target$Date)
df_merge <- merge(x = df_temp,
                  y = df_temp2,
                  by = "Date",
                  all.x = TRUE)
df_merge$Difference <- df_merge$Cumulative.Distance.x - df_merge$Cumulative.Distance.y

# plot out cumulative distance over time compared to target
p1 <- ggplot(data = df_all, aes(x = Date, y = Cumulative.Distance)) + 
  geom_line(colour = "blue", size = 1.2) +
  geom_line(data = df_target, linetype = 2) +
  labs(x = "Date", y = "Cumulative Distance (km)")
# plot out how it's going wrt to target
p2 <- ggplot(data = df_merge, aes(x = Date, y = Difference)) + 
  geom_line(colour = "blue", size = 1.2) +
  geom_hline(yintercept = 0, linetype = 2) +
  ylim(-max(abs(df_merge$Difference)),max(abs(df_merge$Difference))) +
  labs(x = "Date", y = "Difference (km)")

# save all plots
ggsave("Output/Plots/progress.png", plot = p1, width = 8, height = 4, dpi = "print")
ggsave("Output/Plots/difference.png", plot = p2, width = 8, height = 4, dpi = "print")

The trickiest part, as usual with this type of data, is wrangling with date/time formats. I’m not happy with the part of the script where I switch Date/POSIX. It works but it’s hacky. I’ll use this code quite a bit this year, so I might edit this later. Always happy for suggestions to improve the code.

The post title comes from “Target” by Fugazi from their Red Medicine LP

6 thoughts on “Target: monitoring a running goal in R

  1. Great post; thanks! I got here from asking for how to best export Garmin running data for my own visuals and progress measurements:

    I’ll definitely try out the Activities / CSV export. Do you happen to know whether there’s also an easy way to extract things like lap times or stats for individual runs? I’d like to analyse at which kilometers I’m drifting into slower paces.

    As for your date/time wrangling question: have you tried out the lubridate package? It works great for me for those kind of applications:

    1. Hello! Thanks for the comment. I have used lubridate in the past for other stuff and agree it is fantastic.

      For working with individual runs, yes. There are a few options. You can work with a gpx file directly as described here: Running Free: Calculating Efficiency Factor in R. If you want to look at many activities in that kind of detail you could have a look at the trackeR package, see here: Cluster One: comparing running performance on similar courses. I have also tried to use the Garmin API directly but found it complicated. While trying to figure that out, I came across on iOS app called ConnectStats which does loads of useful analysis.

      1. This is great; many thanks! I wasn’t aware of the trackeR package. That should be very helpful. Your posts have a lot of great detail, and I will study them in more detail when playing with those gpx files.

        I also found 2 other interesting things:

        – there’s a [Python package]( that uses the Garmin API and might be easier to use

        – for individual running activities on Garmin Connect there’s an option to “export splits to csv” that might cover most of what I need for now

        I’m a big fan of maps, though, so I probably won’t be able to resist those gpx files 😉

Comments are closed.