Using some “real” data for social network analysis

This week’s content in the Data, Analytics and Learning MOOC concentrated on some real-world case studies of how social network analysis (SNA) has been used in the study of learning. The studies which were discussed looked at the application of SNA in areas such as learning design, sense of community, creative potential, academic performance, social presence, and MOOCs. The activities we had to complete involved using Gephi and Tableau to analyse and interpret SNA for the study of learning. Seeing that I don’t have access to any data from a Learning Management System, I had to come up with another source of data that I could play with. I decided to use some Twitter data that’s related to a research project that I’m working on at the moment. I used the NodeXL template for Excel to collect the tweets which were sent during a recent conference, and then analysed the network in Gephi and Tableau. Although the data didn’t come from learners in a traditional setting, such as a Learning Management System, I think that a conference is a learning experience, so I think it’s an appropriate source to use for this exercise. This was the first time I’ve used NodeXL, and I found it very easy to use. For ethical and privacy reasons I’ve left the names of the nodes off in each of these visualisations.

This is what the network looks like in Gephi:

screenshot_214311In this representation of the network, each circle represents a Twitter user who retweeted or mentioned another Twitter user. This retweeting and mentioning is represented by the lines connecting the nodes. The darker the colour of the node, the more connections that it has to other nodes in the network i.e. the user sent more retweets and mentions. In SNA this is known as degree centrality. I imported the network data into Tableau to plot some of the measures against each other to try and see if they are any relationships between them. This graph shows the number of followers, the degree centrality, and betweenness centrality (a measure which indicates whether a node is acting as a bridge between distinct communities within the network) for each of the nodes:

Conference tweets 2From this, it looks like (without conducting any statistical analysis) that for this network there is no relationship between the number of followers on Twitter that a network member has, and their degree or betweenness centrality. I can see that combining these two analysis tools (Gephi and Tableau) can be very useful to gain all sorts of insights. I certainly think I’ll be using these tools in some upcoming work that I’ll be doing looking at Twitter networks.

What are learning analytics, and what can we learn from them?

Learning analytics (LA) are certainly becoming a hot topic within the education sector. There are conferences, societies and journals where new developments in the LA field are discussed and developed. But what are LA, and how can academic libraries use them to learn more about our users?

The goal of LA is to use the data generated by the various systems on campus to improve the teaching and learning experience for students. It’s about bringing together the data from these disparate systems, e.g. the Learning Management System (LMS), the student administration system, to look for patterns and trends. Once these patterns and trends have been identified, they can be used to inform changes to teaching practices to assist students. Traditionally LA have been used by departments other than the library, as their systems can provide more information about a student’s progress and background. The LMS, for example, is a rich source of data on student behaviour during a semester. However, data from library systems can be combined with data from other systems on campus to make use of LA. For example, library staff at Curtin University combined the data from library systems and the campus student administration system to “explore if an association between library use and student retention is evident”. As they describe in their paper, they found that “[a]lthough student retention was associated with high levels of library use generally, it was the finding that use of electronic Library resources early in the semester appears to lead to an improved likelihood of remaining enrolled that is most useful.”

Another potential use of LA by academic libraries is to investigate whether embedding library content in the LMS can be linked to student performance. Increasingly librarians are collaborating with teaching staff to include library content directly in the LMS for individual units or subjects. It should be possible to examine the data produced by the LMS which shows how many times a link to library content is clicked on, and see if students who access library resources regularly achieve better results than those students who don’t use these resources.

I think there is great potential for libraries to use the data that our systems produce to try and learn more about our students, and try and improve their learning experience. It will not be an easy process, as there are institutional barriers which need to be overcome. I’ll discuss these in a future post.

Social Network Analysis with Gephi

The next software package that we were introduced to in the DALMOOC was Gephi, which is an open source tool for conducting social network analysis. I found Gephi an easier tool to use than Tableau, and it was fairly straightforward to load the sample data that was provided and start analysing it.

These were the results of my analysis to determine the density and centrality measures of each dataset :

For the example_1 dataset:

Example_1

For the example_2 dataset:

Example_2

For the CCK11 dataset (Twitter network):

CCK_Twitter

For the CCK11 dataset (blog network):

CCK_blogs

These were the results of using the Giant Component filter, and then determining the modularity for each dataset:

For the example_1 dataset:

Example_1 modularity

For the example_2 dataset:

Example_2 modularity

For the CCK11 dataset (Twitter network):

CCK Twitter modularity

For the CCK11 dataset (blog network):

CCK blogs modularityIt was also fun to play around with the various network representations, and the options for partitioning and highlighting various properties of the network. This is the example_1 network with a few changes made to it: it’s in the Fruchterman Reingold representation, nodes are sized according to betweenness centrality, labels are turned on, and each community is a different colour

Example_1 extra

Here’s the example_2 network with similar changes:

Example_2_extra

And for the CCK dataset (Twitter network):

CCK_Twitter_extra

And finally the CCK dataset (blogs network):

CCK_blogs extraI found these exercises a useful way to get some experience with social network analysis, and I have some ideas of how I could use Gephi in a project that I’m working on.

 

Data wrangling with Tableau

The first hands-on assignment for the Data, Analytics and Learning MOOC was designed to give us some experience with using the Tableau software package to analyse and visualise data. It was a straightforward process to download and install the software, then it was time to find some data to analyse. I decided to use the data about overseas students who had come to study in Australia, for the period 2004-1013, from the Australian Higher Education Statistics. Before I could import the data into Tableau, I had to do a bit of cleanup on it. I had to combine the data from each year into a single spreadsheet, and I also had to delete countries which were not listed in the data every year. I wanted to compare the number of students coming from each country to see which countries had grown and which had shrunk. One of the tables had a column for “Country of permanent residence”, so that’s what I used. The source data is limited to countries with more than 20 students, which is why there is variation in the number of countries which are included from year to year.

After a bit of fiddling with the dimensions, measures and table calculations, I managed to produce the map I was after.

Overseas students mapIn order to create the map, I used the “Table Calculation” function to calculate the percentage difference between 2004 and 2013. This produced a map for each year, so I used the “Hide” command to hide the results for every year except 2013, and bingo – I had my map.

All in all I found working with Tableau fairly straightforward, although I did find it took a bit of trial and error to produce the analysis I was after. However, the aim of the assignment (and DALMOOC in general) wasn’t to turn us into Tableau experts, but to expose us to some of the tools which can be used for data analysis and visualisation. I now have enough of an idea of how Tableau works to be able to consider how I can use it in the future. It will be interesting to see how my introduction to Tableau compares to the other software packages that we’ll use over the next few weeks.

A MOOC with a difference

Last week I started the Data, Analytics, and Learning MOOC (DALMOOC) through edX. I signed up for this course because I’m a bit of a data nerd, but have never really got into it in any depth. This course seemed to be a good way to get a basic understanding of what learning analytics are, and the sorts of tools which can be used to analyse and visualise the data.

The content from the first week was, as I expected, an introduction to the concept of learning analytics, and the tools which we’ll be using later in the course. It was all fairly straightforward, and it was presented through a mix of recorded videos and Google Hangouts. DALMOOC is structured a little differently to the other MOOCs which I’ve taken. Rather than being driven by an instructor who releases content each week with corresponding assessment tasks, DALMOOC includes a social learning pathway as well. It uses a tool called ProSolo to facilitate this, and I’ll admit I was a bit wary of using it. I’m not used to having my peers assess my work, so I might use ProSolo to track how I’m progressing but submit my assignments through edX. The distributed nature of the course content has confused a few people (myself included), but I think it’s becoming a bit clearer now.

I’m looking forward to getting some hands-on experience with using these tools over the next few weeks, and seeing if I get inspired to use learning analytics within the library.