Libraries and Data

First, I’ll import the data I downloaded through Spotify’s API, which consists of an egocentric network of The Grateful Dead (the Ego) and their related artists out to 10 steps removed. I’ll also load all of the libraries I’m using.

knitr::opts_chunk$set(message = FALSE, warning = FALSE)
library(rio)
library(tidyverse)
library(tidygraph)
library(visNetwork)
library(rtweet)
library(igraph)
library(ggpubr)
library(ggimage) 
library(magick)
library(ggrepel)

# Import Dataset of Grateful Dead related artists
# out to max of 10 steps
dead_relatives <- import("dead_relative_10steps.csv", skip = 1) %>% 
  select(-V1) %>% 
  rename(artist = V2,
         relative = V3)

# load the image file for plots 
# at the end of the script
img = "support/bears_t.png"

Recommendation Network for the Dead

Right now we have an edge list, where each row corresponds to an edge in the Dead’s recommendation network between an artist and a related/recommended artist. I want to turn that into a graph object.

I’ll be using the tidygraph library, which is basically a tidy wrapper around igraph, for all of the network manipulation. I’ll be using the visNetwork to visualize the networks, since it makes nice, animated, interactive graphs.

dead_relatives %>% 
  # turn edgelist into network
  as_tbl_graph() %>% 
  # activate the nodes to
  # compute new node-level variables
  activate(nodes) %>% 
  # get distance
  mutate(distance = node_distance_to(nodes = 1, mode = 'all'),
         group = distance) %>% 
  # arrange based on distance and name
  # this makes it so distance legened is in order
  arrange(distance, name) %>% 
  # use visIgraph to plot the network
  visIgraph(physics = TRUE) %>%
  # set options
  visOptions(highlightNearest = list(enabled = TRUE, degree = 1, hover = TRUE),
             # this allows people to select nodes by
             # distance
             selectedBy = "distance",
             # and this allows people to select nodes by name
             nodesIdSelection = TRUE) %>% 
  visLayout(randomSeed = 23) %>% 
  # avoidOverlap worked well here to make a clean graph
  visPhysics(forceAtlas2Based = list("avoidOverlap" = 1)) %>%
  # label the group (distance) legened, and tell it to not
  # allow zooming on the legend
  visLegend(main = "Distance", zoom = FALSE)

Twitter API

Next, I’ll look up the artists on Twitter and get their list of followers.

Authentication

Next, we need to authenticate the session so that we can access the Twitter API. You need your keys for this part; mine are hidden from the markdown (html) file.

Now set up the authorization (again, those variables were defined in a hidden chunk above).

create_token("",
             c_key,
             c_secret, 
             a_token,
             a_secret)

Search for Artists

Here I search for each of the artists’ twitter accounts by searching for their name (as it appears on Spotify), match those search results with the artists’ name, and save that as a df.

distinct_artists <- dead_relatives %>% 
  # remove repeat artists for twitter lookup
  distinct(relative) %>% 
  as.data.frame() 

user_names <- NULL
for(each in distinct_artists$relative){
  username <- search_users(each, n = 1) # search for users that match artist names
  user_names <- rbind(user_names, mutate(username,
                                         relative = each))# combine into a df
}

Here I screen for returned search results that were under a similarity threshold with the search string in order to screen for mismatches (e.g., searching for ‘moe’ and getting an account other than the band). I played around with the threshold and landed on this one as it seemed to catch most of the actual misses.

distinct_artists %>% 
  left_join(user_names) %>% 
  select(relative, screen_name) %>% 
  # calculate levenshtein Similarity index
  # which quantifies the similarity of two strings
  mutate(str_sim = RecordLinkage::levenshteinSim(relative, screen_name)) %>% 
  # filter for usernames that are below .4 similarity (range 0 to 1)
  filter(str_sim < .40) 

Then, I set them to corrected account names if I could find them easily and otherwise set them to missing.

correct_handles <- distinct_artists %>% 
  left_join(user_names) %>%
  select(relative, screen_name) %>%
  # replacing screen names with corrected ones 
  # or missing if the artists aren't on twitter
  mutate(screen_name = 
         case_when(relative == "moe." ~ "moeperiod",
                   relative == "New Riders of the Purple Sage" ~ NA_character_,
                   relative == "The New Deal" ~ NA_character_,
                   relative == "Doc & Merle Watson" ~ NA_character_,
                   relative == "Flat & Scrugs" ~ NA_character_,
                   relative == "Doc Watson" ~ NA_character_,
                   relative == "Bill Monroe" ~ "BILLMONROE1911",
                   relative == "Hot Rize" ~ NA_character_,
                   relative == "Lowell George" ~ NA_character_,
                   relative == "Leon Russell" ~ NA_character_,
                   relative == "Stephen Stills" ~ NA_character_,
                   relative == "The Derek Trucks Band" ~ NA_character_,
                   relative == "Derek & the Dominos" ~ NA_character_,
                   relative == "Dave Mason" ~ "davemasonband",
                   relative == "Tony Rice" ~ NA_character_,
                   relative == "The Band" ~ NA_character_,
                   relative == "Traffic" ~ NA_character_,
                   relative == "Ry Cooder" ~ NA_character_,
                   relative == "David Bromberg" ~ NA_character_,
                   relative == "Pure Prairie League" ~ NA_character_,
                   relative == "Al Kooper" ~ NA_character_,
                   relative == "Quicksilver Messenger Service" ~ NA_character_,
                   relative == "Savoy Brown" ~ NA_character_, 
                   relative == "The Electric Flag" ~ NA_character_,
                   relative == "Peter Rowan" ~ NA_character_,
                   relative == "Poco" ~ NA_character_,
                   relative == "firefall" ~NA_character_,
                   relative == "Dan Fogelberg" ~ NA_character_,
                   relative == "Seals and Crofts" ~ NA_character_,
                   relative == "Orleans" ~ NA_character_,
                   relative == "Crosby & Nash" ~ NA_character_,
                   relative == "Rick Danko" ~ NA_character_,
                   relative == "The Youngbloods" ~ NA_character_,
                   relative == "John Phillips" ~ NA_character_,
                   relative == "Strve Young" ~ NA_character_,
                   relative == "Jim Ford" ~ NA_character_,
                   relative == "Cowboy" ~ NA_character_,
                   relative == "Henry Paul Band" ~ NA_character_,
                   relative == "Kelly Joe Phelps" ~ NA_character_,
                   relative == "John Hammond" ~ NA_character_,
                   relative == "Herb Pedersen" ~ NA_character_,
                   relative == "Gene Parsons" ~ NA_character_,
                   relative == "Joe Ely" ~ NA_character_,
                   relative == "New Monsoon" ~ "newmonsoon",
                   relative == "Brothers Past" ~ "BrothersPast",
                   relative == "The Werks" ~ "TheWerksMusic",
                   relative == "Space Bacon" ~ "SpaceBaconMusic",
                   relative == "Kung Fu" ~ NA_character_,
                   relative == "Future Rock" ~ NA_character_,
                   relative == "Exmag" ~ NA_character_,
                   relative == "SuperVision" ~ "ThatSuperVision",
                   relative == "Love & Light" ~ "LoveNLightMusic",
                   relative == "Nanda" ~ "Nanda_Musica",
                   relative == "R/D" ~ NA_character_,
                   relative == "Bird of Prey" ~ NA_character_,
                   relative == "Raq" ~ NA_character_,
                   relative == "U-Self" ~ NA_character_,
                   relative == "Life Force" ~ NA_character_,
                   TRUE ~ screen_name))

Then, I looked up users based on the corrected twitter handles so that I would actually get those artists’ data.

dead_relatives_tw_handles <- correct_handles %>% 
  left_join(lookup_users(correct_handles$screen_name))

Finally, I got the followers for each of the artists. Because of the rate limits, this step took a fair amount of computing time. I just wrote the files out into a subdirectory to be read back in.

for (i in 1:nrow(dead_relatives_tw_handles)){
  # check to make sure screen name isn't missing
  if(!is.na(dead_relatives_tw_handles$screen_name[i])){
    # get that users' followers
    get_followers(dead_relatives_tw_handles$screen_name[i], 
                n = dead_relatives_tw_handles$followers_count[i], 
                retryonratelimit = TRUE) %>% 
      # write out their follower list to a subdirectory and
      # label it with the artist name
    export(., paste0("./twitter_data/", dead_relatives_tw_handles$relative[i], "_followers.csv"))
  }
}