As I was watching one of the text mining lecture videos this morning, I experienced a “lightbulb moment” with regards to using LightSide. Up until now I didn’t think that LightSide would be useful for my Twitter research project, as I wasn’t interested in building models, I just wanted to analyse the content of the tweets. However, I know realise that I don’t need to use the model-building features of LightSide for my Twitter data, I can just use it to extract features to get a count of the number of the times each word (or group of words) appears in all the tweets. This is the type of analysis that I’m interested in. I was really pleased that I’ve managed to find a tool to help me with this part of the data analysis.
I couldn’t wait to get home and try using LightSide on some of the tweets that I’ve already collected. I had to do a bit of a clean-up of the Excel file to make it ready to import into LightSide, but once that was done everything worked fine. The image below shows the LightSide workspace once I’d extracted the features.
Once I had the Feature Table prepared, I exported it as a .csv file, and was able to use the Sum feature in Excel to quickly tally the occurrence of each term. I’m going to play around with LightSide a bit more to explore the other features that can be extracted, but I’m pretty sure that it can do exactly what I need it to do. Time to crunch some data!