Twitter activity at the MLA conference

The Medical Library Association (MLA) has just held its annual conference in Toronto, Canada. Being based in Australia means that it’s hard for me to attend, but the Twitter stream is a very useful way of keeping up with what’s going on at the conference. I thought I would use it as a test of using the R programming language to conduct some very basic Social Network Analysis (SNA) on the Twitter stream of the conference. See my previous post for a background of my interest in R and SNA.

I used the twitteR package for R to retrieve all the tweets with the hashtag #mlanet16 which had been sent between 13th and 18th May (there were 9,985). Next, the graphTweets package was used to turn these tweets into a data frame which only included all the Twitter accounts which had been included in a mention i.e. an @ message, and/or sent a retweet. This data frame was then converted into a graphml file, which I opened in Gephi, a free data visualisation tool. If you’re interested, the code I used was:

library(twitteR)
library(igraph)
library(graphTweets)
setup_twitter_oauth("API key", "API secret", "access token", "access secret")
tweets <- searchTwitter("#mlanet16", n=15000, lang="en", since="2016-05-13", until="2016-05-18")
tw_df <- twListToDF(tweets)
edges <- getEdges(data = tw_df, tweets = "text", source = "screenName", "retweetCount", "favorited", str.length = 20)
nodes <- getNodes(edges)
g <- graph.data.frame(edges, directed = TRUE, vertices = nodes)
write.graph(g, "F:/mlagraph.graphml", format="graphml")

The raw, unfiltered data looks like this:

MLA complete

Each node represents an individual Twitter account (I’ve left them unlabelled in order not to identify anyone). This is a bit messy and hard to read, so I filtered the data to make the graph easier to interpret. The graph below shows the top nodes based on their “out-degree”, with the larger nodes having a larger out-degree. Out-degree is a measure of the influence of a node, i.e. how many outward ties they have to other nodes.

MLA out-degree

Another filter I applied was in-degree, which is a measure of the number of inward connections that each node has. Nodes with a high in-degree have a high prestige, as other nodes try to connect with them. The in-degree graph looks like this:

MLA in-degree

There are a range of other measures that can be used to filter the data, so I’ll have a play around a bit more. I certainly wouldn’t call myself an expert in R or social network analysis after doing this, but it has been a great introduction to what R can do.

Tinkering with R and Social Network Analysis

My interest in Social Network Analysis (SNA) began when I was studying the Data, Analytics, and Learning MOOC (DALMOOC) through edX a couple of years ago (see my posts from during the course here). During the course it was mentioned that Twitter lends itself to SNA, so I did some fiddling around with analysing the Twitter streams of various library conferences. I used some of the tools that I was introduced to during the DALMOOC, such as Gephi and NodeXL, and managed to produce some graphs. However I put this on the backburner while I focussed on preparing my poster for the EBLIP8 Conference.

Earlier this year, though, I got the urge to start learning more about the R programming language. Although I have absolutely no background in coding or programming (unless you count copying BASIC programs out of a book for my Commodore 128 when I was a kid), I’d heard about the R programming language, and wanted to find out a bit more about it. I came across the free Datacamp course on R and did the first few lessons, but haven’t worked on it for a while now. I started looking around to see if there were any R packages that could do SNA on Twitter data, and I found that there were a few that I could use. There were websites which had some example code which I was able to copy and do some tweaking on (such as this one and this one), and before long I was collecting and analysing my own data.

I still wouldn’t call myself a coder or programmer, but I’m starting to get the hang of using R. It’s pretty easy to use, especially when you’re using code that is freely available and not having to develop your own. In my next post I’ll show some examples of SNA that I prepared based on the tweets sent at the 2016 Medical Library Association conference.