Twitter analysis project

Research Report – Twitter Analysis and Presentation

The aim of this assignment is to collect Twitter data, summarise the data using a spreadsheet or other tool, and then write a report about that data. The purpose of the report is to investigate and discuss the use of twitter analysis by researchers, brands or journalists (depending on your major). The report is not meant to be written as a public facing report or feature, but rather an internal research report that might be used in a professional context or to inform your own practices.

You can choose to follow a group of people or a hashtag/hashtags over a period of time that will yield a reasonable sized data set ( a few thousand tweets at least up to a max of 250 thousand is about the right size for this task, much bigger as Excel will struggle to open the file). Suitable targets could be hashtags for a TV show or media event, a new or defective product, a group of journalists attending a conference or the conference hashtag,  a brand campaign, or news event as it happens.

You may have to try a few different scenarios before you get some data you can use. For example broad hashtags like christmas or happy is a bad choice, corbyn (during PMQ) or MUFC (during a game) are probably better ones. Spend a little time exploring how hastags are used together (co-hashags) partly to make sure that you have all the relevant tags covered (i.e ‘Manu’ as well as ‘MUFC’), this can be done with the Twitter advanced search page. You should write about this hashtag research as part of your reflection.

Once you have collected the tweets and profile data use this data set to discuss the following questions in your report. You can do more analysis than this, but these are expected as part of the report.

Required analysis

a) Who were the top tweeters and retweeters?

b) How many of your top tweeters are bots? (remove as many as possible from your data set before performing the rest of the analysis)

c) What was the top retweet? and what was the ratio of tweets to retweets in your data set?

d) What % of tweet/retweets in your data set came from the top 10  tweeters?

e) Use a word cloud or word tree of the most used words in your data set to show the type of language being used. Was the hashtag being used in conjunction with other hashtags?

f) Where to the tweets come from? What % are geocoded, what % of profiles have a location?

g) Do the tweeters fall into any demographic groupings that you can see (look at some follower, friend counts, total number of tweets etc)

In addition to answering these questions you can perform other types of acquisition and/or analysis and you may be awarded extra marks for doing so.

Visualisations you should also include your report.

Time series for your tweets on a suitable timeframe unit

Word Cloud or Word Tree of language in popular retweets (or co-hashtag use)

Chart showing the % of tweets to retweets

Chart showing the % tweets geocoded and the % of profiles with locations

Histogram of tweeters volumes (i.e. 1 person tweeted more than 100 times, 5 people tweeted 50-100 times, 50 people tweeted 10-50 times, 1,000 people tweeted 5-10 times etc)

The report should be ~1500 words done as a basic but well styled HTML page that includes some visualisations to help illustrate your data. MSc students should attempt visualisations using a JavaScript library rather than iframe embeds. You should try to use a template system to start your page. As well as answering the questions above in your report you should do some research on social media analytics and Twitter use in journalism and consider the how the types of analysis you have performed can be used in a professional context. Include references to research material you used in your report. You can also talk about the 5th estate and Twitter more generally and it’s effect on journalism and society making reference to your own data where possible.

You should also supply a written ~500 word reflection. The reflection should consider the following points. Why did you settle on the hashtag(s) and timeframe that you did? What issues did you encounter in gathering the tweets and analysing them, how did you overcome these problems. How would you extend or improve your study given more time and/or resources? Include attributions for any code libraries or images used in your report.

Submission

The submission should consist of a single word or rtf document that contains your reflection and a link to the online report. If you have used code in the acquisition or analysis of your tweets, you should also provide a link to a github GIST for each one. You should add plenty of comments to this code to demonstrate your understanding of how it works.

Marking scheme

Will be allocated based on the following scheme

5/25 Acquisition – Research and discussion of method used to acquire tweets and data obtained

5/25 Presentation – HTML/CSS, layout, quality of writing, overall quality

5/25 Visualisation – Quality, scope/difficulty, integration with report

5/25 Analysis – Quality, depth, difficulty

5/25 Reflection – Discussion of techniques, self critique, journalistic context

Note that extra marks can be used for using acquisition, analysis and presentation techniques beyond those taught in class.

Code is not required for the acquisition stage to pass the assignment unless you are on an MSc award. Note that use of code in this coursework can contribute towards the award of the MSc for journalism students.