Social network analysis for the study of learning

Social network analysis (SNA) can be used in many different ways in the study of learning. Some examples of these are:

Learning design – finding ways to design courses which don’t follow the instructor-led model but which allow students to ask and answer questions with each other. Involving the students in this way leads to higher levels of engagement with and understanding of the course material. The paper by Lockyer and colleagues gives more detail about this use of SNA.

Sense of community – identifying students who may not feel like they are part of the community of learners in a course, and coming up with ways to improve their sense of community. SNA based on online discussion forums can be used instead of questionnaires to identify these students. See the paper by Dawson for  more information about this approach.

Creative potential – trying to identify the network brokers i.e. the students who are the link between communities with the network, as they are often the students with the greatest creative potential. This is due to them being exposed to information and ideas from multiple networks, so they have the chance to put all the information together in new and creative ways. For an example of this, see the paper by Burt and colleagues.

Academic performance – there is a link between network position and a student’s location within a network. If there are cross-class networks i.e. the same students enrol in the same subjects, then the performance of these students in higher than those who take classes separately. Students who were at the centre of the network typically performed better. Gasevic and colleagues have written a paper on this topic.

Social presence – students who are able to present themselves and their personality are said to have social presence. The online interactions of students can be investigated to try and identify the level of social presence that each student has. This allows instructors to develop and implement strategies to encourage those students with a low social presence to improve it. See the paper by Kovanovic for an explanation of this use of SNA.

MOOCs – identifying the effectiveness of connectivist MOOCS (cMOOCs) i.e. those which encouraged students to acquire knowledge for themselves rather than be led by an instructor. SNA can be used to see if the information flows and community formation within the cMOOC reflect the goal of moving the responsibility for learning from the instructor to the students. Skrypnyk and colleagues have written a paper on this use of SNA in learning.

Before taking this MOOC, I wasn’t aware of the wide range of potential uses of learning analytics. I thought that they were designed for identifying students currently at risk or trying to predict those students who might fall into this category later in their studies. However, after seeing the case studies for this week, I now realise how powerful a tool they are and that they can be used in many different settings.

So what is Social Network Analysis, anyway?

One of the data analysis methods that I’m learning about in the Data, Analytics and Learning MOOC is social network analysis (SNA). As the name suggests, SNA investigates social processes and interactions, rather than looking at numerical data. SNA has applications in many different disciplines, and it doesn’t have to concern itself only with humans. Any system where there is interaction between distinct entities could be analysed using SNA. All that is required is to have some “actors” i.e. the individuals or entities within the network, and “relations” i.e. links between actors.

There are a wide range of potential data sources for SNA. In a learning context (which this MOOC is focussed on) data could be obtained from Twitter to see how students in a particular class or unit are interacting on that platform, or from the interactions on discussion forums within a Learning Management System (LMS).

There are several measures of a social network that can calculated. Some of them relate to the size of the network e.g. diameter, or the “connectedness” of the nodes within the network e.g. degree centrality and closeness centrality. It’s also possible to investigate the modularity of the network, which looks at whether there are smaller modules, or communities, within the network.

I’m keen to have a go at exploring SNA, especially with regards to Twitter networks. I’m working on a research project looking at Twitter use at conferences, and I think the tools and measures that I’ve learnt about will be useful for this project. As far as library-specific use of SNA is concerned, I’m having trouble coming up with possible uses for it. Most of our systems produce numerical data about items or people – there aren’t many networks involved. Unless there was a librarian who was involved in teaching a unit which had a presence within the LMS, I’m not sure how else libraries could take advantage of SNA. Maybe by the end of the MOOC I might have some more ideas.

The Learning Analytics data cycle

There are several steps to the learning analytics (LA) data cycle. These include:

  • Collection and Acquisition: data is collected and acquired from one, or several, sources.
  • Storage: data is stored so it can be worked on. This storage may be located within the system which is used to produce the data, or the data may need to be exported and stored elsewhere.
  • Cleaning: there will usually be a need for some cleaning of the data so that it is in a format which can be used by the analysis software. This will be especially true if the data has been collected from a range of different sources, as each source will have its own data format.
  • Integration: if data is collected from multiple sources, it needs to be integrated into a single file so that it can be analysed.
  • Analysis: a software package is used to analyse the data to produce statistics about it.
  • Representation and Visualisation: in order to make the results of the data analysis easier to understand, they need to be represented and visualised in some way e.g. as a graph or chart, or a network diagram.
  • Action: finally, some action should be taken on the basis of the results of the data analysis. There is no point in initiating this LA data cycle if there is not going to be an action at the end of it.

Although LA have traditionally been used by departments other than the library, there are library systems which could produce data which could be analysed using this cycle. We can collect data about loans (from our catalogue), database access from proxy server logs), and website usage. Librarians are very good at collecting data and statistics about our patrons and collections, but often there is no particular reason for collecting them. LA ties nicely into the philosophy of Evidence Based Library and Information Practice (EBLIP), which is defined as:

Evidence based librarianship (EBL) is an approach to information science that promotes the collection, interpretation, and integration of valid, important and applicable user reported, librarian observed, and research derived evidence. The best available evidence moderated by user needs and preferences is applied to improve the quality of professional judgments.

Booth, A. (2002). From EBM to EBL: Two steps forward or one step back? Medical Reference Services Quarterly, 21(3), 51-64. doi: 10.1300/J115v21n03_04

By using an approach similar to the LA data cycle, it’s possible for librarians to collect the evidence that they can use to improve existing services or develop new ones.

Before LA are used at an institution, there needs to be consideration of policies and planning around it. There should be policies dealing with the ethical collection and use of the data, as well as a clear outline of how the results of the data analysis will be used to improve the learning experience. LA is nicely suited to be part of the quality and evaluation system within an institution, and the LA cycle could be incorporated into a continual improvement process.

As LA can potentially use data from a range of units from across the university, there needs to be some strategic planning around how it will be implemented and used. The results of LA data analysis could be used to inform changes to teaching practice, and these changes need to have a sound planning framework associated with them. Strategic planning could also help mitigate the “bright and shiny syndrome”, where institutions rush to embrace the latest new technology without a plan for how it will be used. LA is a powerful tool for providing insight into the learner experience, but it should not be relied on as the sole driver for change.

Finding my way around in the DALMOOC

The Data, Analytics and Learning MOOC (DALMOOC) that I’m taking via edX is structured a little differently to the MOOCs that I’ve completed previously. Rather than relying solely on the MOOC platform for providing content and submitting assessment tasks, DALMOOC also provides an option for using other tools and social media to complete the course. It did take me a while to get my head around the distributed nature of the course, but I think I’ve got a handle on it now.

I’m mostly following the “traditional” pathway through the course, with the occasional detour down the “social” pathway. This means that edX is the main platform that I’m using to access the course content – videos, exercises, and assessment tasks. However, some of my fellow students are using a platform called ProSolo to do this. ProSolo is a social learning tool which lets you select a competency that you would like to complete, and provides you with a list of tasks that you need to complete in order to meet that competency. You can upload completed tasks and link to blog posts you write which provide evidence that you’ve met the requirements of each competency. It’s also possible to receive and provide feedback from your peers on your work, which is the “social” aspect of learning. I dip into ProSolo now and then, but I’m not a heavy user of it.

Peer feedback is also possible via the discussion forums on edX; these also allow further discussion with fellow students and course instructors. There’s also a Facebook page and Twitter hashtag (#dalmooc) to facilitate discussion, too. A tool which I used for the first time as part of DALMOOC was Google Hangouts. There are weekly Hangouts scheduled with the course instructors, where they share their thoughts about the content for the week, as well as provide feedback on the previous week. Luckily, some of them are held at a time which is convenient for those of us in Australia – most webinar-type activities from the US are usually at a very early hour in the morning for us. All the Hangouts are recorded, so we can still access the ones that are on too early.

I appreciate the effort that the DALMOOC instructors have put into providing different options for learning to suit the varied preferred learning styles of the participants. It’s certainly an interesting course to be part of.

Using some “real” data for social network analysis

This week’s content in the Data, Analytics and Learning MOOC concentrated on some real-world case studies of how social network analysis (SNA) has been used in the study of learning. The studies which were discussed looked at the application of SNA in areas such as learning design, sense of community, creative potential, academic performance, social presence, and MOOCs. The activities we had to complete involved using Gephi and Tableau to analyse and interpret SNA for the study of learning. Seeing that I don’t have access to any data from a Learning Management System, I had to come up with another source of data that I could play with. I decided to use some Twitter data that’s related to a research project that I’m working on at the moment. I used the NodeXL template for Excel to collect the tweets which were sent during a recent conference, and then analysed the network in Gephi and Tableau. Although the data didn’t come from learners in a traditional setting, such as a Learning Management System, I think that a conference is a learning experience, so I think it’s an appropriate source to use for this exercise. This was the first time I’ve used NodeXL, and I found it very easy to use. For ethical and privacy reasons I’ve left the names of the nodes off in each of these visualisations.

This is what the network looks like in Gephi:

screenshot_214311In this representation of the network, each circle represents a Twitter user who retweeted or mentioned another Twitter user. This retweeting and mentioning is represented by the lines connecting the nodes. The darker the colour of the node, the more connections that it has to other nodes in the network i.e. the user sent more retweets and mentions. In SNA this is known as degree centrality. I imported the network data into Tableau to plot some of the measures against each other to try and see if they are any relationships between them. This graph shows the number of followers, the degree centrality, and betweenness centrality (a measure which indicates whether a node is acting as a bridge between distinct communities within the network) for each of the nodes:

Conference tweets 2From this, it looks like (without conducting any statistical analysis) that for this network there is no relationship between the number of followers on Twitter that a network member has, and their degree or betweenness centrality. I can see that combining these two analysis tools (Gephi and Tableau) can be very useful to gain all sorts of insights. I certainly think I’ll be using these tools in some upcoming work that I’ll be doing looking at Twitter networks.

What are learning analytics, and what can we learn from them?

Learning analytics (LA) are certainly becoming a hot topic within the education sector. There are conferences, societies and journals where new developments in the LA field are discussed and developed. But what are LA, and how can academic libraries use them to learn more about our users?

The goal of LA is to use the data generated by the various systems on campus to improve the teaching and learning experience for students. It’s about bringing together the data from these disparate systems, e.g. the Learning Management System (LMS), the student administration system, to look for patterns and trends. Once these patterns and trends have been identified, they can be used to inform changes to teaching practices to assist students. Traditionally LA have been used by departments other than the library, as their systems can provide more information about a student’s progress and background. The LMS, for example, is a rich source of data on student behaviour during a semester. However, data from library systems can be combined with data from other systems on campus to make use of LA. For example, library staff at Curtin University combined the data from library systems and the campus student administration system to “explore if an association between library use and student retention is evident”. As they describe in their paper, they found that “[a]lthough student retention was associated with high levels of library use generally, it was the finding that use of electronic Library resources early in the semester appears to lead to an improved likelihood of remaining enrolled that is most useful.”

Another potential use of LA by academic libraries is to investigate whether embedding library content in the LMS can be linked to student performance. Increasingly librarians are collaborating with teaching staff to include library content directly in the LMS for individual units or subjects. It should be possible to examine the data produced by the LMS which shows how many times a link to library content is clicked on, and see if students who access library resources regularly achieve better results than those students who don’t use these resources.

I think there is great potential for libraries to use the data that our systems produce to try and learn more about our students, and try and improve their learning experience. It will not be an easy process, as there are institutional barriers which need to be overcome. I’ll discuss these in a future post.

Social Network Analysis with Gephi

The next software package that we were introduced to in the DALMOOC was Gephi, which is an open source tool for conducting social network analysis. I found Gephi an easier tool to use than Tableau, and it was fairly straightforward to load the sample data that was provided and start analysing it.

These were the results of my analysis to determine the density and centrality measures of each dataset :

For the example_1 dataset:


For the example_2 dataset:


For the CCK11 dataset (Twitter network):


For the CCK11 dataset (blog network):


These were the results of using the Giant Component filter, and then determining the modularity for each dataset:

For the example_1 dataset:

Example_1 modularity

For the example_2 dataset:

Example_2 modularity

For the CCK11 dataset (Twitter network):

CCK Twitter modularity

For the CCK11 dataset (blog network):

CCK blogs modularityIt was also fun to play around with the various network representations, and the options for partitioning and highlighting various properties of the network. This is the example_1 network with a few changes made to it: it’s in the Fruchterman Reingold representation, nodes are sized according to betweenness centrality, labels are turned on, and each community is a different colour

Example_1 extra

Here’s the example_2 network with similar changes:


And for the CCK dataset (Twitter network):


And finally the CCK dataset (blogs network):

CCK_blogs extraI found these exercises a useful way to get some experience with social network analysis, and I have some ideas of how I could use Gephi in a project that I’m working on.


Data wrangling with Tableau

The first hands-on assignment for the Data, Analytics and Learning MOOC was designed to give us some experience with using the Tableau software package to analyse and visualise data. It was a straightforward process to download and install the software, then it was time to find some data to analyse. I decided to use the data about overseas students who had come to study in Australia, for the period 2004-1013, from the Australian Higher Education Statistics. Before I could import the data into Tableau, I had to do a bit of cleanup on it. I had to combine the data from each year into a single spreadsheet, and I also had to delete countries which were not listed in the data every year. I wanted to compare the number of students coming from each country to see which countries had grown and which had shrunk. One of the tables had a column for “Country of permanent residence”, so that’s what I used. The source data is limited to countries with more than 20 students, which is why there is variation in the number of countries which are included from year to year.

After a bit of fiddling with the dimensions, measures and table calculations, I managed to produce the map I was after.

Overseas students mapIn order to create the map, I used the “Table Calculation” function to calculate the percentage difference between 2004 and 2013. This produced a map for each year, so I used the “Hide” command to hide the results for every year except 2013, and bingo – I had my map.

All in all I found working with Tableau fairly straightforward, although I did find it took a bit of trial and error to produce the analysis I was after. However, the aim of the assignment (and DALMOOC in general) wasn’t to turn us into Tableau experts, but to expose us to some of the tools which can be used for data analysis and visualisation. I now have enough of an idea of how Tableau works to be able to consider how I can use it in the future. It will be interesting to see how my introduction to Tableau compares to the other software packages that we’ll use over the next few weeks.