Probot 2: upgrading a Mastodon bot

Earlier this year I set up a bot on Mastodon. The bot, AlbumsX3, posts an album suggestion twice-a-day.

Performance has been good. It has only missed a few posts due – I think – to server glitches. However, I have made a couple of tweaks to upgrade the bot since my last post, so I thought I would detail them here.

Preventing duplicate posts

In the last post I wrote:

This means that duplicate posts will happen, but I didn’t try too hard to find a way around this. I might revisit it in the future.

Well, it wasn’t long before I needed to revisit this issue. The bot posted one album twice in a 24 hour window and I knew this needed fixing. Jacob’s Mouse – I’m Scared is a great record, but double recommendations are bad.

Solution: retain a list of unposted records and check the random selection from the database against this list before posting.

import pandas as pd
from mastodon import Mastodon
import os
 
# function to make hashtag
def to_hashtag(text):
    # separators become spaces
    s = text.replace("-", " ").replace("_", " ")
    # split on spaces
    s = s.split()
    # deal with no spaces
    if len(text) == 0:
        return "#" + text
    # camelCase
    new_s = s[0] + ''.join(i.capitalize() for i in s[1:])
    # remove non-alphanumeric characters (hashtag-friendly)
    hashtag = ''.join(c for c in new_s if c.isalnum())
    return "#" + hashtag

# relative path of data
dfFile = os.path.realpath(os.path.join(os.path.dirname(__file__), '.', 'bot_df.csv'))

# import the data into data frame
df = pd.read_csv(dfFile, sep=",")

# import the list of unposted rows
rowFile = os.path.realpath(os.path.join(os.path.dirname(__file__), '.', 'bot_df_rows.csv'))
rows = pd.read_csv(rowFile, sep=",")

# select a random row of allowed rows
selectRow = rows.sample()
# this is the index of the row we picked
rowIndex = selectRow.index[0]
# read which row in df will be used, will be different from index
theRow = selectRow.loc[rowIndex,'A']

# build the text string - this will be the message in the post.
textString = " - ".join([df['artist'].loc[theRow],
                        df['album'].loc[theRow],
                        str(df['year'].loc[theRow])])
# convert artist and album strings to hashtags
artistTag = to_hashtag(df['artist'].loc[theRow])
albumTag = to_hashtag(df['album'].loc[theRow])

# add hashtags
textString = textString + "\n#Music #AlbumSuggestions #NowPlaying #NowListening"
textString = textString + "\n" + artistTag + " " + albumTag
 
# build image path
imgPath = os.path.realpath(os.path.join(os.path.dirname(__file__), 'img', theRow['img_name'].loc[theRow.index[0]]))
 
# write apologetic alt text
altText = "The image shows the album cover. Sorry for lack of a better description; I am just a bot!"
 
# Set up Mastodon
mastodon = Mastodon(
    access_token = 'foobar',
    api_base_url = 'https://botsin.space/'
)
 
media = mastodon.media_post(imgPath, "image/jpeg", description=altText)
mastodon.status_post(textString, media_ids=media)

# find the row we selected earlier and drop it
rows = rows.drop([rowIndex])

# write the integers to each line of file
rows.to_csv(rowFile, encoding='utf-8', index=False)

To get this to run, I generated a text file with all the row numbers to get started. This list gets loaded along with the database, and it is used to check for allowed database entries to post. Finally, the entry that got posted is removed from the list and it is then resaved. If you can think of a better way to do this, leave a comment!

Improving searchability/discoverability

The main way to discover content on Mastodon is via hashtags. Adding the band name as a hashtag should help with discoverability. Having said this, random people do seem to find posts by the bot. A recent post of They Might Be Giants’ “Here Comes Science” got several likes and boosts from people who don’t follow the bot. So people can discover the bot’s posts, but more hashtags should help.

I am already posting the artist and album title – how hard can it be to add hashtags with the same information? Well, Mastodon hashtags can only contain alphanumeric characters and the underscore. Numbers only are not allowed as hashtags. So, some wrangling is needed.

It is also preferable to use CamelCase for hashtags for readability and so that folks using screen readers hear the hashtags correctly. This presents a further challenge. Almost all of the artist names and album titles in the database use title capitalisation which is not the same as CamelCase with spaces. Consider “The Dark Side of the Moon”. We would need “#TheDarkSideOfTheMoon” and not “#TheDarkSideoftheMoon”

Solution: split the string using space (or other separator), capitalise, then strip non-alphanumeric characters and add the hash sign.

import pandas as pd
from mastodon import Mastodon
import os
 
# function to make hashtag
def to_hashtag(text):
    # separators become spaces
    s = text.replace("-", " ").replace("_", " ")
    # split on spaces
    s = s.split()
    # deal with no spaces
    if len(text) == 0:
        return "#" + text
    # camelCase
    new_s = s[0] + ''.join(i.capitalize() for i in s[1:])
    # remove non-alphanumeric characters (hashtag-friendly)
    hashtag = ''.join(c for c in new_s if c.isalnum())
    return "#" + hashtag

# relative path of data
dfFile = os.path.realpath(os.path.join(os.path.dirname(__file__), '.', 'bot_df.csv'))

# import the data into data frame
df = pd.read_csv(dfFile, sep=",")

# import the list of unposted rows
rowFile = os.path.realpath(os.path.join(os.path.dirname(__file__), '.', 'bot_df_rows.csv'))
rows = pd.read_csv(rowFile, sep=",")

# select a random row of allowed rows
selectRow = rows.sample()
# this is the index of the row we picked
rowIndex = selectRow.index[0]
# read which row in df will be used, will be different from index
theRow = selectRow.loc[rowIndex,'A']

# build the text string - this will be the message in the post.
textString = " - ".join([df['artist'].loc[theRow],
                        df['album'].loc[theRow],
                        str(df['year'].loc[theRow])])
# convert artist and album strings to hashtags
artistTag = to_hashtag(df['artist'].loc[theRow])
albumTag = to_hashtag(df['album'].loc[theRow])

# add hashtags
textString = textString + "\n#Music #AlbumSuggestions #NowPlaying #NowListening"
textString = textString + "\n" + artistTag + " " + albumTag
 
# build image path
imgPath = os.path.realpath(os.path.join(os.path.dirname(__file__), 'img', theRow['img_name'].loc[theRow.index[0]]))
 
# write apologetic alt text
altText = "The image shows the album cover. Sorry for lack of a better description; I am just a bot!"
 
# Set up Mastodon
mastodon = Mastodon(
    access_token = 'foobar',
    api_base_url = 'https://botsin.space/'
)
 
media = mastodon.media_post(imgPath, "image/jpeg", description=altText)
mastodon.status_post(textString, media_ids=media)

# find the row we selected earlier and drop it
rows = rows.drop([rowIndex])

# write the integers to each line of file
rows.to_csv(rowFile, encoding='utf-8', index=False)

The solution is achieved via a function which generates a legal and legible hashtag from either the Artist or Album title. There are some edge cases that will get borked with this method. For example, any brackets in the title will mean that the first word in brackets does not get capitalised. Also, any bands with special characters will have a mangled hashtag.

The post title comes, as before, from the Probot band-and-album project from Dave Grohl.