Using some “real” data for social network analysis

This week’s content in the Data, Analytics and Learning MOOC concentrated on some real-world case studies of how social network analysis (SNA) has been used in the study of learning. The studies which were discussed looked at the application of SNA in areas such as learning design, sense of community, creative potential, academic performance, social presence, and MOOCs. The activities we had to complete involved using Gephi and Tableau to analyse and interpret SNA for the study of learning. Seeing that I don’t have access to any data from a Learning Management System, I had to come up with another source of data that I could play with. I decided to use some Twitter data that’s related to a research project that I’m working on at the moment. I used the NodeXL template for Excel to collect the tweets which were sent during a recent conference, and then analysed the network in Gephi and Tableau. Although the data didn’t come from learners in a traditional setting, such as a Learning Management System, I think that a conference is a learning experience, so I think it’s an appropriate source to use for this exercise. This was the first time I’ve used NodeXL, and I found it very easy to use. For ethical and privacy reasons I’ve left the names of the nodes off in each of these visualisations.

This is what the network looks like in Gephi:

screenshot_214311In this representation of the network, each circle represents a Twitter user who retweeted or mentioned another Twitter user. This retweeting and mentioning is represented by the lines connecting the nodes. The darker the colour of the node, the more connections that it has to other nodes in the network i.e. the user sent more retweets and mentions. In SNA this is known as degree centrality. I imported the network data into Tableau to plot some of the measures against each other to try and see if they are any relationships between them. This graph shows the number of followers, the degree centrality, and betweenness centrality (a measure which indicates whether a node is acting as a bridge between distinct communities within the network) for each of the nodes:

Conference tweets 2From this, it looks like (without conducting any statistical analysis) that for this network there is no relationship between the number of followers on Twitter that a network member has, and their degree or betweenness centrality. I can see that combining these two analysis tools (Gephi and Tableau) can be very useful to gain all sorts of insights. I certainly think I’ll be using these tools in some upcoming work that I’ll be doing looking at Twitter networks.

Data wrangling with Tableau

The first hands-on assignment for the Data, Analytics and Learning MOOC was designed to give us some experience with using the Tableau software package to analyse and visualise data. It was a straightforward process to download and install the software, then it was time to find some data to analyse. I decided to use the data about overseas students who had come to study in Australia, for the period 2004-1013, from the Australian Higher Education Statistics. Before I could import the data into Tableau, I had to do a bit of cleanup on it. I had to combine the data from each year into a single spreadsheet, and I also had to delete countries which were not listed in the data every year. I wanted to compare the number of students coming from each country to see which countries had grown and which had shrunk. One of the tables had a column for “Country of permanent residence”, so that’s what I used. The source data is limited to countries with more than 20 students, which is why there is variation in the number of countries which are included from year to year.

After a bit of fiddling with the dimensions, measures and table calculations, I managed to produce the map I was after.

Overseas students mapIn order to create the map, I used the “Table Calculation” function to calculate the percentage difference between 2004 and 2013. This produced a map for each year, so I used the “Hide” command to hide the results for every year except 2013, and bingo – I had my map.

All in all I found working with Tableau fairly straightforward, although I did find it took a bit of trial and error to produce the analysis I was after. However, the aim of the assignment (and DALMOOC in general) wasn’t to turn us into Tableau experts, but to expose us to some of the tools which can be used for data analysis and visualisation. I now have enough of an idea of how Tableau works to be able to consider how I can use it in the future. It will be interesting to see how my introduction to Tableau compares to the other software packages that we’ll use over the next few weeks.