First, please get some background on this event if you don't remember it. Here are some articles to begin with. I recommend you read these before reading the rest of the final, as they will add important context.
I have a dataset of tweets posted on the day of the cancelation. For this project, the entire class will work as a group to label the tweets, building a publicly accessible dataset. Then, we will break into two groups. One will analyze the tweets' content. The other will analyze the network of people around the event./
A few notes before we get into it. I have run many, many large group projects like this. I have learned in painful ways that, to be successful, we need very strict organization. What that means in practice is that I keep tight control over our activities and processes. If we were working 1-on-1, I'd be very open to discussing alternative ways to store or format data, better schemes for working, etc. But with this many people, we don't have that kind of flexibility (trust me - I've tried it and it has been a nightmare for everyone). So even if you can see a way to do this that seems more efficient, please don't change up what we're doing. I promise I've made all the decisions about how to handle this to work well given my deep knowledge about the dynamics of these large virtual teams and where things go wrong. If I say to use a Google Doc, please use one. If I give you a spreadsheet, please do not change the formatting. What I'm sharing is simplified for a reason and added complexity is going to break things - even if it seems like it might be better to you with your changes.
In phase 1 of the project, we will build a dataset. This involves labeling the tweets into the following categories:
To categorize the tweets, read only the text provided. Do not follow the links or look up the tweets. Make your judgements only on the text you can see.
When doing this kind of labeling, we need to make sure we all understand how to do it. Thus, you will begin by coding a set of sample tweets
When you are done, you can compare your answers to others here: https://docs.google.com/spreadsheets/d/1cBxlb8Ks_bgipOUZJ6Nz9DCUtX81no4RTzSCKeHpL8E/edit?usp=sharing I have provided the correct answers. If you don't understand why your answers don't match mine, please discuss in the Final: Sample Coding Discussion page on Canvas.
Next, we will label the larger dataset. Check the email I sent to the class for a link to that spreadsheet. There are two tabs, so find the one with your last name's first letter in the range. You'll see I put your name in the first column. DO NOT RESORT THIS SPREADSHEET! Just scroll to find your name in the right tab. Names are in alphabetical order
You each have 1500 tweets to label. You must use one of the codes we have trained on: Neutral, Unclear/Unrelated, Pro-Roseanne, Anti-Roseanne. It must have that exact capitalization and spelling. I put in a few codes on each sheet so it should auto-fill for you which will minimize the chance for error.
Each tweet is being coded by two people, which will allow me to reconcile the disagreements later on. I will also use this to check you - if you have a LOT of disagreements, I'm going to take a very close look to make sure you were doing this correctly. If you just randomly or thoughtlessly enter labels, I will know.
That said, disagreements happen and I do not want you to agonize over the codes. Give the tweet a careful read, make a pretty fast decision (there are only 4 categories, so this shouldn't be a big struggle), and label it. I know people work at different speeds, but if you find this is on pace to take more than a few hours, you are probably spending too much time.
This should take around 2 hours to complete. If you find yourself agonizing over how to categorize a tweet, you are doing it wrong! Give the tweet a careful read, decide the best label, and put it there. Do not try to be creative or clever - everyone should be able to understand what you did. If you are not sure, put it in the Unclear category.
I will send you a set of tweets to code by July 11th. They must be coded by July 18th. I will be checking everyone's coding and if you have a bunch of incorrect codes, I will know and you will lose points. Pay close attention to this.
You must have your team selected by July 18. Pick your team here: https://goo.gl/forms/4aj0Rg6gUqKhMvup2
This team will develop a deeper analysis of the themes within the tweets. Specifically, we will look at the pro-Roseanne and anti-Roseanne tweets. Within each group, we will use a thematic analysis (a common qualitative analysis technique) to develop a set of high-level themes discussed by each group. Please read this article on thematic analysis: https://www.psych.auckland.ac.nz/en/about/our-research/research-groups/thematic-analysis/about-thematic-analysis.html. This is the process we will follow.
Example pro-Roseanne themes that we come up with may be something like "Donald Trump", "Insulting liberals", "Denial of Racism". Anti-Roseanne tweets may have themes like "Thanking ABC", "Criticizing Trump", "Insulting/Mocking Roseanne", etc. (I don't know if these are right - they are just rough guesses based on reading a few tweets). As a group, we will iteratively develop this set of themes by reading the tweets and discussing together. The working document for our thematic analysis is here: https://docs.google.com/document/d/1vwKVSI-mteP4MYEM_utTl8BhniCqlyJBkjA2K-nZ0lE/edit?usp=sharing
Our set of themes should be complete by July 25. You are all expected to actively participate in developing the themes over the course of the week - you can't just jump in on the morning of the 25th and participate. I will review the document over the week, but you must also submit an activity log that roughly details your participation in the discussion over the course of the week. It's fine if you only spend a few minutes checking in and commenting each day, but you as a group (i.e. without my intervention) must have a final list of themes by the 25th. If you slack off, I will know. Submit your activity log via canvas under Final: Team Activity Log 1.
Once we have the themes finalized, I will assign each of you a batch of tweets. Like you did in the whole class Phase 1 activity, you will label each tweet, but you will label them with all the relevant themes instead of pro/anti/neutral labels. Each tweet will be labeled by 2 people. If there are disagreements, those 2 people will sort them out or decide to leave the disagreement (I will resolve these).
All tweets must be labeled with themes by August 3. I will spend that weekend resolving any differences. Nothing to turn in here other than your labels which will go in a shared spreadsheet we create together.
Finally, your group will write up the results of your thematic analysis. You will describe the themes and how often each was found in your datasets. You will share any additional insights about what this says about the supporters/opponents and the larger issues for them. You will produce one document which will become a major section of the resulting paper we submit for publication. You are responsible not just for writing up your results but for producing a smooth, well-written document. That means editing, integrating text, and making the writeup work is an important part of this task.
Final writeup is due. Your group should email Jen a link to your shared Google doc when you begin work on it. That is what will be graded for this section.
This team will analyze a network of participants in the Roseanne discussion. On July 18, I will provide you with a Gephi network file. As a group, you need to divide up the work of calculating statistics and performing an analysis on this network similar to what you did on the Enron assignment. You will try to find important nodes, clusters, and explain why they are linked. This will involve computing statistics, creating visualizations, and connecting the stats to things you can get from reading the tweets of these people on Twitter.
The first step will be to run a bunch of statistics and create some good visualizations. As a group, you need to lay out what statisics you want and who will do what. You need to organize this amongst yourselves. Notes, statistics, and organization should happen in this Google Doc: https://docs.google.com/document/d/1N4y6ackCArsl8Sn_i1T2Ew4_eMFUjb2qgm3V9sqvHNg/edit?usp=sharing
Your initial statistics and visualizations should be complete by July 25th along with a list of analysis activities to carry out going forward that will let you connect those statistics to the content. You are all expected to actively participate in the group organization and computation over the course of the week - you can't just jump in on the morning of the 25th and participate. I will review the document over the week, but you must also submit an activity log that roughly details your participation in the discussion over the course of the week. It's fine if you only spend a few minutes checking in and commenting each day, but you as a group (i.e. without my intervention) must have a final list of statistics, shared gephi files to work from, and analyses to complete, by the 25th. If you slack off, I will know. Submit your activity log via canvas under Final: Team Activity Log 1.
Next, you will carry out all that analysis. Each person should claim certain tasks and connect those statistics to the content. Keep detailed notes on your stats, the analysis you do, and the insights you find.
All analysis linking the stats to Twitter content must be complete on August 3. Everyone's analysis notes will go into a shared document by this date.
Final writeup is due. Your group should email Jen a link to your shared Google doc when you begin work on it. That is what will be graded for this section.