Twitter activity at the MLA conference

The Medical Library Association (MLA) has just held its annual conference in Toronto, Canada. Being based in Australia means that it’s hard for me to attend, but the Twitter stream is a very useful way of keeping up with what’s going on at the conference. I thought I would use it as a test of using the R programming language to conduct some very basic Social Network Analysis (SNA) on the Twitter stream of the conference. See my previous post for a background of my interest in R and SNA.

I used the twitteR package for R to retrieve all the tweets with the hashtag #mlanet16 which had been sent between 13th and 18th May (there were 9,985). Next, the graphTweets package was used to turn these tweets into a data frame which only included all the Twitter accounts which had been included in a mention i.e. an @ message, and/or sent a retweet. This data frame was then converted into a graphml file, which I opened in Gephi, a free data visualisation tool. If you’re interested, the code I used was:

setup_twitter_oauth("API key", "API secret", "access token", "access secret")
tweets <- searchTwitter("#mlanet16", n=15000, lang="en", since="2016-05-13", until="2016-05-18")
tw_df <- twListToDF(tweets)
edges <- getEdges(data = tw_df, tweets = "text", source = "screenName", "retweetCount", "favorited", str.length = 20)
nodes <- getNodes(edges)
g <-, directed = TRUE, vertices = nodes)
write.graph(g, "F:/mlagraph.graphml", format="graphml")

The raw, unfiltered data looks like this:

MLA complete

Each node represents an individual Twitter account (I’ve left them unlabelled in order not to identify anyone). This is a bit messy and hard to read, so I filtered the data to make the graph easier to interpret. The graph below shows the top nodes based on their “out-degree”, with the larger nodes having a larger out-degree. Out-degree is a measure of the influence of a node, i.e. how many outward ties they have to other nodes.

MLA out-degree

Another filter I applied was in-degree, which is a measure of the number of inward connections that each node has. Nodes with a high in-degree have a high prestige, as other nodes try to connect with them. The in-degree graph looks like this:

MLA in-degree

There are a range of other measures that can be used to filter the data, so I’ll have a play around a bit more. I certainly wouldn’t call myself an expert in R or social network analysis after doing this, but it has been a great introduction to what R can do.

Tinkering with R and Social Network Analysis

My interest in Social Network Analysis (SNA) began when I was studying the Data, Analytics, and Learning MOOC (DALMOOC) through edX a couple of years ago (see my posts from during the course here). During the course it was mentioned that Twitter lends itself to SNA, so I did some fiddling around with analysing the Twitter streams of various library conferences. I used some of the tools that I was introduced to during the DALMOOC, such as Gephi and NodeXL, and managed to produce some graphs. However I put this on the backburner while I focussed on preparing my poster for the EBLIP8 Conference.

Earlier this year, though, I got the urge to start learning more about the R programming language. Although I have absolutely no background in coding or programming (unless you count copying BASIC programs out of a book for my Commodore 128 when I was a kid), I’d heard about the R programming language, and wanted to find out a bit more about it. I came across the free Datacamp course on R and did the first few lessons, but haven’t worked on it for a while now. I started looking around to see if there were any R packages that could do SNA on Twitter data, and I found that there were a few that I could use. There were websites which had some example code which I was able to copy and do some tweaking on (such as this one and this one), and before long I was collecting and analysing my own data.

I still wouldn’t call myself a coder or programmer, but I’m starting to get the hang of using R. It’s pretty easy to use, especially when you’re using code that is freely available and not having to develop your own. In my next post I’ll show some examples of SNA that I prepared based on the tweets sent at the 2016 Medical Library Association conference.

Social network analysis for the study of learning

Social network analysis (SNA) can be used in many different ways in the study of learning. Some examples of these are:

Learning design – finding ways to design courses which don’t follow the instructor-led model but which allow students to ask and answer questions with each other. Involving the students in this way leads to higher levels of engagement with and understanding of the course material. The paper by Lockyer and colleagues gives more detail about this use of SNA.

Sense of community – identifying students who may not feel like they are part of the community of learners in a course, and coming up with ways to improve their sense of community. SNA based on online discussion forums can be used instead of questionnaires to identify these students. See the paper by Dawson for  more information about this approach.

Creative potential – trying to identify the network brokers i.e. the students who are the link between communities with the network, as they are often the students with the greatest creative potential. This is due to them being exposed to information and ideas from multiple networks, so they have the chance to put all the information together in new and creative ways. For an example of this, see the paper by Burt and colleagues.

Academic performance – there is a link between network position and a student’s location within a network. If there are cross-class networks i.e. the same students enrol in the same subjects, then the performance of these students in higher than those who take classes separately. Students who were at the centre of the network typically performed better. Gasevic and colleagues have written a paper on this topic.

Social presence – students who are able to present themselves and their personality are said to have social presence. The online interactions of students can be investigated to try and identify the level of social presence that each student has. This allows instructors to develop and implement strategies to encourage those students with a low social presence to improve it. See the paper by Kovanovic for an explanation of this use of SNA.

MOOCs – identifying the effectiveness of connectivist MOOCS (cMOOCs) i.e. those which encouraged students to acquire knowledge for themselves rather than be led by an instructor. SNA can be used to see if the information flows and community formation within the cMOOC reflect the goal of moving the responsibility for learning from the instructor to the students. Skrypnyk and colleagues have written a paper on this use of SNA in learning.

Before taking this MOOC, I wasn’t aware of the wide range of potential uses of learning analytics. I thought that they were designed for identifying students currently at risk or trying to predict those students who might fall into this category later in their studies. However, after seeing the case studies for this week, I now realise how powerful a tool they are and that they can be used in many different settings.

So what is Social Network Analysis, anyway?

One of the data analysis methods that I’m learning about in the Data, Analytics and Learning MOOC is social network analysis (SNA). As the name suggests, SNA investigates social processes and interactions, rather than looking at numerical data. SNA has applications in many different disciplines, and it doesn’t have to concern itself only with humans. Any system where there is interaction between distinct entities could be analysed using SNA. All that is required is to have some “actors” i.e. the individuals or entities within the network, and “relations” i.e. links between actors.

There are a wide range of potential data sources for SNA. In a learning context (which this MOOC is focussed on) data could be obtained from Twitter to see how students in a particular class or unit are interacting on that platform, or from the interactions on discussion forums within a Learning Management System (LMS).

There are several measures of a social network that can calculated. Some of them relate to the size of the network e.g. diameter, or the “connectedness” of the nodes within the network e.g. degree centrality and closeness centrality. It’s also possible to investigate the modularity of the network, which looks at whether there are smaller modules, or communities, within the network.

I’m keen to have a go at exploring SNA, especially with regards to Twitter networks. I’m working on a research project looking at Twitter use at conferences, and I think the tools and measures that I’ve learnt about will be useful for this project. As far as library-specific use of SNA is concerned, I’m having trouble coming up with possible uses for it. Most of our systems produce numerical data about items or people – there aren’t many networks involved. Unless there was a librarian who was involved in teaching a unit which had a presence within the LMS, I’m not sure how else libraries could take advantage of SNA. Maybe by the end of the MOOC I might have some more ideas.

Using some “real” data for social network analysis

This week’s content in the Data, Analytics and Learning MOOC concentrated on some real-world case studies of how social network analysis (SNA) has been used in the study of learning. The studies which were discussed looked at the application of SNA in areas such as learning design, sense of community, creative potential, academic performance, social presence, and MOOCs. The activities we had to complete involved using Gephi and Tableau to analyse and interpret SNA for the study of learning. Seeing that I don’t have access to any data from a Learning Management System, I had to come up with another source of data that I could play with. I decided to use some Twitter data that’s related to a research project that I’m working on at the moment. I used the NodeXL template for Excel to collect the tweets which were sent during a recent conference, and then analysed the network in Gephi and Tableau. Although the data didn’t come from learners in a traditional setting, such as a Learning Management System, I think that a conference is a learning experience, so I think it’s an appropriate source to use for this exercise. This was the first time I’ve used NodeXL, and I found it very easy to use. For ethical and privacy reasons I’ve left the names of the nodes off in each of these visualisations.

This is what the network looks like in Gephi:

screenshot_214311In this representation of the network, each circle represents a Twitter user who retweeted or mentioned another Twitter user. This retweeting and mentioning is represented by the lines connecting the nodes. The darker the colour of the node, the more connections that it has to other nodes in the network i.e. the user sent more retweets and mentions. In SNA this is known as degree centrality. I imported the network data into Tableau to plot some of the measures against each other to try and see if they are any relationships between them. This graph shows the number of followers, the degree centrality, and betweenness centrality (a measure which indicates whether a node is acting as a bridge between distinct communities within the network) for each of the nodes:

Conference tweets 2From this, it looks like (without conducting any statistical analysis) that for this network there is no relationship between the number of followers on Twitter that a network member has, and their degree or betweenness centrality. I can see that combining these two analysis tools (Gephi and Tableau) can be very useful to gain all sorts of insights. I certainly think I’ll be using these tools in some upcoming work that I’ll be doing looking at Twitter networks.

Social Network Analysis with Gephi

The next software package that we were introduced to in the DALMOOC was Gephi, which is an open source tool for conducting social network analysis. I found Gephi an easier tool to use than Tableau, and it was fairly straightforward to load the sample data that was provided and start analysing it.

These were the results of my analysis to determine the density and centrality measures of each dataset :

For the example_1 dataset:


For the example_2 dataset:


For the CCK11 dataset (Twitter network):


For the CCK11 dataset (blog network):


These were the results of using the Giant Component filter, and then determining the modularity for each dataset:

For the example_1 dataset:

Example_1 modularity

For the example_2 dataset:

Example_2 modularity

For the CCK11 dataset (Twitter network):

CCK Twitter modularity

For the CCK11 dataset (blog network):

CCK blogs modularityIt was also fun to play around with the various network representations, and the options for partitioning and highlighting various properties of the network. This is the example_1 network with a few changes made to it: it’s in the Fruchterman Reingold representation, nodes are sized according to betweenness centrality, labels are turned on, and each community is a different colour

Example_1 extra

Here’s the example_2 network with similar changes:


And for the CCK dataset (Twitter network):


And finally the CCK dataset (blogs network):

CCK_blogs extraI found these exercises a useful way to get some experience with social network analysis, and I have some ideas of how I could use Gephi in a project that I’m working on.