The Medical Library Association (MLA) has just held its annual conference in Toronto, Canada. Being based in Australia means that it’s hard for me to attend, but the Twitter stream is a very useful way of keeping up with what’s going on at the conference. I thought I would use it as a test of using the R programming language to conduct some very basic Social Network Analysis (SNA) on the Twitter stream of the conference. See my previous post for a background of my interest in R and SNA.
I used the twitteR package for R to retrieve all the tweets with the hashtag #mlanet16 which had been sent between 13th and 18th May (there were 9,985). Next, the graphTweets package was used to turn these tweets into a data frame which only included all the Twitter accounts which had been included in a mention i.e. an @ message, and/or sent a retweet. This data frame was then converted into a graphml file, which I opened in Gephi, a free data visualisation tool. If you’re interested, the code I used was:
library(twitteR) library(igraph) library(graphTweets) setup_twitter_oauth("API key", "API secret", "access token", "access secret") tweets <- searchTwitter("#mlanet16", n=15000, lang="en", since="2016-05-13", until="2016-05-18") tw_df <- twListToDF(tweets) edges <- getEdges(data = tw_df, tweets = "text", source = "screenName", "retweetCount", "favorited", str.length = 20) nodes <- getNodes(edges) g <- graph.data.frame(edges, directed = TRUE, vertices = nodes) write.graph(g, "F:/mlagraph.graphml", format="graphml")
The raw, unfiltered data looks like this:
Each node represents an individual Twitter account (I’ve left them unlabelled in order not to identify anyone). This is a bit messy and hard to read, so I filtered the data to make the graph easier to interpret. The graph below shows the top nodes based on their “out-degree”, with the larger nodes having a larger out-degree. Out-degree is a measure of the influence of a node, i.e. how many outward ties they have to other nodes.
Another filter I applied was in-degree, which is a measure of the number of inward connections that each node has. Nodes with a high in-degree have a high prestige, as other nodes try to connect with them. The in-degree graph looks like this:
There are a range of other measures that can be used to filter the data, so I’ll have a play around a bit more. I certainly wouldn’t call myself an expert in R or social network analysis after doing this, but it has been a great introduction to what R can do.