And that’s a wrap!

As learning analytics has emerged as a discipline over the last few years, several organisations have been founded with the aim of conducting research in the field as well as bringing together professionals to discus the latest developments. Some of them are listed below:

Overall I found the DALMOOC interesting, and I was certainly introduced to tools and ideas that I can use during my research project. Here is my first ever attempt a concept map for what was covered in the course:


I didn’t really engage with the social learning aspects of the course – I preferred to work through the edX platform in the traditional way. I’ve always been a bit wary and nervous of putting my work out there for my peers to assess, so that’s why I stuck with edX. As far as the structure is concerned, I did find it a little disorienting in the first week, but soon got the hang of it. I didn’t really get much out of the Hangouts – I was expecting that they were going to be a bit more interactive and allow some participation from students, rather than only having the instructors involved.

As a complete newbie to learning analytics I found the content manageable and fairly easy to understand. The exception to this was the unit on prediction modeling and behaviour detection in weeks 5 and 6. I found it all quite technical and confusing, and I didn’t complete any of the assessments during those weeks. It was nice to be exposed to it, but I don’t think it’s an area that I’ll be using in my small research project. The tools that we were introduced to in the DALMOOC were pretty easy to use, and I can see that I’ll find Tableau, Gephi and LightSide useful in my Twitter research, at least at a basic level. On a side note, it was nice to see the work of researchers at other Australian universities was mentioned during the course e.g. Shane Dawson and Lori Lockyer.

The DALMOOC has given me a taste of what’s involved in working with learning analytics, and the tools and techniques that are available. There are certainly opportunities for libraries to get involved and make use of the data that our systems produce.

Now we’re on to text mining

Text mining is the next type of data analysis that we’re looking at in the Data, Analytics and Learning MOOC. I’m looking forward to the next couple of weeks, as I think that some of these tools and techniques might be useful for my research project, which is based on analysing tweets. Text mining is all about trying to find patterns in large collections of text, and using these patterns as a basis for identifying data that is worth investigating further. It’s this finding patterns in textual data which interests me, as that’s the vision that I’ve got for my Twitter research project.

One of the subareas of text mining is analysing the collaborative learning process that occurs in online courses via the discussion forums. This analysis involves modelling conversational interactions between students , and using those models to find out what it is about conversations that make them valuable for online learning. Based on this understanding it’s then possible to design interventions to support learning in online settings. Analysing conversations in online courses draws on knowledge from a number of fields, such as education, psychology, and sociolinguistics. This knowledge is used to determine the cognitive processes associated with collaborative learning, investigate what conversational interactions look like, and build models of how psychological signals are revealed through language. All this ultimately allows the development of models showing where processes are happening during interactions.

An example of how these models can be used in learning analytics research is assessing some reasons for attrition along the way in MOOCs. The models are based on the analysis of the posts in discussion forums, both from the point of view of individual students and from the overall tone of individual threads. The negativity and positivity of the posts and threads is calculated, and then survival modeling is carried out to determine the probability that a student will have dropped out of the course by the following week.

This sort of detailed modeling is out of scope for my research project, but some of the aspects of conversation analysis could be useful, as many of the interactions between Twitter users could be characterised as conversations. At this stage I think I’ll be learning some useful stuff over the next couple of weeks.

Working with models in LightSide

Most of the exercises for this week were concerned with building models in LightSide and comparing their performance.

The first exercise dealt with using different feature spaces within the model and seeing how this affected their performance. The initial model, using unigrams, resulted in an accuracy of 75.9% and a kappa value of 0.518. This is OK, but would including bigrams and trigrams as features improve these results? They might, by providing further context for each word, thus reducing the number of incorrect predictions. By including these extra features, there was a slight improvement in the model – an accuracy of 76.5% and a kappa value of 0.530. However, by increasing the number of features there is a risk of creating a model which overfits the data, and can’t be applied to other data sets. To overcome this there is a Feature Selection tool, which only uses the 3,500 (in this case) most predictive features in the model. The result of using this select group of features was a statistically significant improvement in the quality of the model.


Getting on the right side of LightSide

As I was watching one of the text mining lecture videos this morning, I experienced a “lightbulb moment” with regards to using LightSide. Up until now I didn’t think that LightSide would be useful for my Twitter research project, as I wasn’t interested in building models, I just wanted to analyse the content of the tweets. However, I know realise that I don’t need to use the model-building features of LightSide for my Twitter data, I can just use it to extract features to get a count of the number of the times each word (or group of words) appears in all the tweets. This is the type of analysis that I’m interested in. I was really pleased that I’ve managed to find a tool to help me with this part of the data analysis.

I couldn’t wait to get home and try using LightSide on some of the tweets that I’ve already collected. I had to do a bit of a clean-up of the Excel file to make it ready to import into LightSide, but once that was done everything worked fine. The image below shows the LightSide workspace once I’d extracted the features.

Tweets in LightSideOnce I had the Feature Table prepared, I exported it as a .csv file, and was able to use the Sum feature in Excel to quickly tally the occurrence of each term. I’m going to play around with LightSide a bit more to explore the other features that can be extracted, but I’m pretty sure that it can do exactly what I need it to do. Time to crunch some data!

Getting my head around text mining

In response to a tweet from one of the instructors of the Data, Analytics and Learning MOOC, I wanted to try and unpack how I think I can use text mining for my Twitter research project.

As a complete novice at this whole text mining caper, I’m still coming to terms with all the concepts behind it. To answer Carolyn’s question, I guess I see classification models working like this:

1. Take a subset of the data, and classify each item in the subset e.g. individual tweets, by hand. The classification scheme I’m thinking of would have categories such as “administrative”, “presentation summary”, “marketing”.

2. Build a model which will take this subset and learn the characteristics of the tweets which are in each category.

3. Apply the model to the remaining unclassified data so that each tweet is assigned the correct classification.

My take on predictive models is similar, I suppose, but I see them as more theoretical rather than practical. I guess by their nature models are theoretical, but the application of the models is still something I’m not sure about. The basic premise of training the model and then applying it to the data is the same, but I haven’t yet seen what happens at the end of the process i.e. the predictive side of things. This may be covered in next week’s content, so hopefully I’ll have a better understanding of the process then.

From the perspective of the Twitter analysis project that I’m working on, I don’t think the text mining tools will do what I need them to do. My aim is to categorise all the tweets that I’ve collected, based on their content. This is something that needs to be 100% accurate so that I can get an accurate picture of what was tweeted about. Perhaps I might do a bit of playing around with LightSide as part of the data analysis, but I won’t be relying on it to categorise all the tweets.

This whole course has been a great introduction to data analysis and mining, and the tools which are available. I think I’ll be trying to think of future projects to utilise them.

Conducting text mining with LightSide

The next topic in the Data, Analytics and Learning MOOC is text mining, which I’ll explain further in my next post. We were introduced to the last software tool that we’ll be using in the course – LightSide (the Star Wars fan in me is wondering if there’s a competing program called DarkSide which does the opposite of LightSide). It seems fairly simple to use, and I managed to get the correct answer for the exercise we were given:

LightSide exerciseI’m still not 100% sure if text mining is going to be useful for the Twitter analysis project I’m working on. I think I need a tool which will categorise the data, rather than try and build predictive models of it. Anyway, it’s always good to learn about a new tool – you never know when it might come in handy.

Social network analysis for the study of learning

Social network analysis (SNA) can be used in many different ways in the study of learning. Some examples of these are:

Learning design – finding ways to design courses which don’t follow the instructor-led model but which allow students to ask and answer questions with each other. Involving the students in this way leads to higher levels of engagement with and understanding of the course material. The paper by Lockyer and colleagues gives more detail about this use of SNA.

Sense of community – identifying students who may not feel like they are part of the community of learners in a course, and coming up with ways to improve their sense of community. SNA based on online discussion forums can be used instead of questionnaires to identify these students. See the paper by Dawson for  more information about this approach.

Creative potential – trying to identify the network brokers i.e. the students who are the link between communities with the network, as they are often the students with the greatest creative potential. This is due to them being exposed to information and ideas from multiple networks, so they have the chance to put all the information together in new and creative ways. For an example of this, see the paper by Burt and colleagues.

Academic performance – there is a link between network position and a student’s location within a network. If there are cross-class networks i.e. the same students enrol in the same subjects, then the performance of these students in higher than those who take classes separately. Students who were at the centre of the network typically performed better. Gasevic and colleagues have written a paper on this topic.

Social presence – students who are able to present themselves and their personality are said to have social presence. The online interactions of students can be investigated to try and identify the level of social presence that each student has. This allows instructors to develop and implement strategies to encourage those students with a low social presence to improve it. See the paper by Kovanovic for an explanation of this use of SNA.

MOOCs – identifying the effectiveness of connectivist MOOCS (cMOOCs) i.e. those which encouraged students to acquire knowledge for themselves rather than be led by an instructor. SNA can be used to see if the information flows and community formation within the cMOOC reflect the goal of moving the responsibility for learning from the instructor to the students. Skrypnyk and colleagues have written a paper on this use of SNA in learning.

Before taking this MOOC, I wasn’t aware of the wide range of potential uses of learning analytics. I thought that they were designed for identifying students currently at risk or trying to predict those students who might fall into this category later in their studies. However, after seeing the case studies for this week, I now realise how powerful a tool they are and that they can be used in many different settings.

So what is Social Network Analysis, anyway?

One of the data analysis methods that I’m learning about in the Data, Analytics and Learning MOOC is social network analysis (SNA). As the name suggests, SNA investigates social processes and interactions, rather than looking at numerical data. SNA has applications in many different disciplines, and it doesn’t have to concern itself only with humans. Any system where there is interaction between distinct entities could be analysed using SNA. All that is required is to have some “actors” i.e. the individuals or entities within the network, and “relations” i.e. links between actors.

There are a wide range of potential data sources for SNA. In a learning context (which this MOOC is focussed on) data could be obtained from Twitter to see how students in a particular class or unit are interacting on that platform, or from the interactions on discussion forums within a Learning Management System (LMS).

There are several measures of a social network that can calculated. Some of them relate to the size of the network e.g. diameter, or the “connectedness” of the nodes within the network e.g. degree centrality and closeness centrality. It’s also possible to investigate the modularity of the network, which looks at whether there are smaller modules, or communities, within the network.

I’m keen to have a go at exploring SNA, especially with regards to Twitter networks. I’m working on a research project looking at Twitter use at conferences, and I think the tools and measures that I’ve learnt about will be useful for this project. As far as library-specific use of SNA is concerned, I’m having trouble coming up with possible uses for it. Most of our systems produce numerical data about items or people – there aren’t many networks involved. Unless there was a librarian who was involved in teaching a unit which had a presence within the LMS, I’m not sure how else libraries could take advantage of SNA. Maybe by the end of the MOOC I might have some more ideas.

The Learning Analytics data cycle

There are several steps to the learning analytics (LA) data cycle. These include:

  • Collection and Acquisition: data is collected and acquired from one, or several, sources.
  • Storage: data is stored so it can be worked on. This storage may be located within the system which is used to produce the data, or the data may need to be exported and stored elsewhere.
  • Cleaning: there will usually be a need for some cleaning of the data so that it is in a format which can be used by the analysis software. This will be especially true if the data has been collected from a range of different sources, as each source will have its own data format.
  • Integration: if data is collected from multiple sources, it needs to be integrated into a single file so that it can be analysed.
  • Analysis: a software package is used to analyse the data to produce statistics about it.
  • Representation and Visualisation: in order to make the results of the data analysis easier to understand, they need to be represented and visualised in some way e.g. as a graph or chart, or a network diagram.
  • Action: finally, some action should be taken on the basis of the results of the data analysis. There is no point in initiating this LA data cycle if there is not going to be an action at the end of it.

Although LA have traditionally been used by departments other than the library, there are library systems which could produce data which could be analysed using this cycle. We can collect data about loans (from our catalogue), database access from proxy server logs), and website usage. Librarians are very good at collecting data and statistics about our patrons and collections, but often there is no particular reason for collecting them. LA ties nicely into the philosophy of Evidence Based Library and Information Practice (EBLIP), which is defined as:

Evidence based librarianship (EBL) is an approach to information science that promotes the collection, interpretation, and integration of valid, important and applicable user reported, librarian observed, and research derived evidence. The best available evidence moderated by user needs and preferences is applied to improve the quality of professional judgments.

Booth, A. (2002). From EBM to EBL: Two steps forward or one step back? Medical Reference Services Quarterly, 21(3), 51-64. doi: 10.1300/J115v21n03_04

By using an approach similar to the LA data cycle, it’s possible for librarians to collect the evidence that they can use to improve existing services or develop new ones.

Before LA are used at an institution, there needs to be consideration of policies and planning around it. There should be policies dealing with the ethical collection and use of the data, as well as a clear outline of how the results of the data analysis will be used to improve the learning experience. LA is nicely suited to be part of the quality and evaluation system within an institution, and the LA cycle could be incorporated into a continual improvement process.

As LA can potentially use data from a range of units from across the university, there needs to be some strategic planning around how it will be implemented and used. The results of LA data analysis could be used to inform changes to teaching practice, and these changes need to have a sound planning framework associated with them. Strategic planning could also help mitigate the “bright and shiny syndrome”, where institutions rush to embrace the latest new technology without a plan for how it will be used. LA is a powerful tool for providing insight into the learner experience, but it should not be relied on as the sole driver for change.

Finding my way around in the DALMOOC

The Data, Analytics and Learning MOOC (DALMOOC) that I’m taking via edX is structured a little differently to the MOOCs that I’ve completed previously. Rather than relying solely on the MOOC platform for providing content and submitting assessment tasks, DALMOOC also provides an option for using other tools and social media to complete the course. It did take me a while to get my head around the distributed nature of the course, but I think I’ve got a handle on it now.

I’m mostly following the “traditional” pathway through the course, with the occasional detour down the “social” pathway. This means that edX is the main platform that I’m using to access the course content – videos, exercises, and assessment tasks. However, some of my fellow students are using a platform called ProSolo to do this. ProSolo is a social learning tool which lets you select a competency that you would like to complete, and provides you with a list of tasks that you need to complete in order to meet that competency. You can upload completed tasks and link to blog posts you write which provide evidence that you’ve met the requirements of each competency. It’s also possible to receive and provide feedback from your peers on your work, which is the “social” aspect of learning. I dip into ProSolo now and then, but I’m not a heavy user of it.

Peer feedback is also possible via the discussion forums on edX; these also allow further discussion with fellow students and course instructors. There’s also a Facebook page and Twitter hashtag (#dalmooc) to facilitate discussion, too. A tool which I used for the first time as part of DALMOOC was Google Hangouts. There are weekly Hangouts scheduled with the course instructors, where they share their thoughts about the content for the week, as well as provide feedback on the previous week. Luckily, some of them are held at a time which is convenient for those of us in Australia – most webinar-type activities from the US are usually at a very early hour in the morning for us. All the Hangouts are recorded, so we can still access the ones that are on too early.

I appreciate the effort that the DALMOOC instructors have put into providing different options for learning to suit the varied preferred learning styles of the participants. It’s certainly an interesting course to be part of.