How to: A Quick n’ Dirty Tweet Data Analysis of European Refugee Crisis

humanitywashedashore network vis

Humanity Washed Ashore…In what languages are global tweeters talking about the Syrian refugees? With the tweet data subset of tweets and metadata associated with #KiyiyaVuranInsanlik, you can explore the tweet network as of 9AM September 9, 2015 (the day I’m writing this post). There are 23 languages that appear in this dataset; Most frequent user account language (Harken, budding data scientists! User account language is NOT necessarily language of the tweet text), ordered by the frequency of tweets: English (en), French (fr), German (de), Spanish (es), Turkish (tr), English (UK). We also see tweeters, though far less frequently (perhaps an artifact of a lower population or relatively lower Twitter presence) with accounts associated with Portuguese (pt), Japanese (ja), Arabic (ar), Dutch (nl), and a few (17 tweets) Greek (el), italian (it), as well as Polish and Indonesian. Chinese, Finnish, Russian, Norwegian, Azerbaijan (az), Danish, Swedish, Serbian and Hungarian. Source for language codes=http://www.lingoes.net/en/translator/langcode.htm

Twitter, by nature, is of real-time, by the minute, trending topics. Data analysis of Twitter data is best if it is as well. If you want a sense of who’s tweeting, what languages are participating in a particular topic of conversation, and where (IF their geo-coord data is not set to private…which is the case when I’m visualizing librarian conference data–they are privacy aware;)) here are my go-to quick and dirty exploratory tools…super easy to use, and give me an initial sense of the scope and “feel” of the data.

  1. What’s your topic? I’m taking data from the outcry in reaction to the photograph of Syrian boy Aylan Kurdi, who drowned on his way from Turkey to Greece, just one soul in the recent migrant crises to die while fleeing. This Twitter data was gathered today at 9AM (Sept. 3, 2015), and showed 23 unique languages of user tweeting with the hashtag #KiyiyaVuranInsanlik.
  2. Follow your curiosity to shape the questions to ask your data. Before analysis, I always manually go to the head of the tweet stream of the hashtag I’m harvesting, to get a sense of it’s popularity, the type of media being posted, the users, etc.
  3. Get TAGS. It’s a google spreadsheet that uses the Twitter REST Api (credit: Martin Hawksey!). https://tags.hawksey.info/get-tags/ (Use the “New Tags”).
  4. Once your Google Spreadsheet is open, Click File–> Create new copy–> then rename. Now, in this new sheet, go to the TAGS tab at the top right of your menu bar. Enter the term you wish to scrape from twitter in the box (it’s all pretty self-explanatory). Click “Run now!” (You’ll need to be logged into Twitter and set up authorization.
  5. Authorization Once you have granted authorization to run through your Twitter account, click TAGS–> “Run now!” again. A message will appear indicating that the API is scraping (“working”). Wait patiently 🙂 Don’t go to the archive sheet while it’s working–just chill.
  6. How many unique tweets? When done, check the sheet to see the period of time (7 days back) of data scraped, and the number of unique tweets. Then, check out the archive!!!!
  7. Add to Google Fusion I used to download the archive, save as a CSV, then upload to Google Fusion tables to play with. However, if you simply take the web address of the archive spreadsheet, you can pull it into Fusion super easy peasy.
  8. Word Tree tweet content exploration Also, use Jason Davies’ Word Tree to do some SUPER simple exploratory NLP. Then, of course, load your text into python to clean it up and run nltk for bigram, trigrams, frequency analysis, etc. to dig deeper. https://www.jasondavies.com/wordtree/?source=&prefix=Aylan%20K
migrantTRee

Word tree to explore the frquently used words, and to inductively explore the themes of a hashtag corpus’ content. For example, I looked for “migrant” “Aylan” “help” “family” “refugee” (plural and singular sensitive, so it’s kind of a blunt weapon, just good for exploring pre-nltk. (HT Jason Davies and co: http://hint.fm/papers/wordtree_final2.pdf)

aylan

Explore! Here are the exploratory raw data and charts (via google Fusion tables) vis of languages of twitter users who tweeted with the hashtag #KiyiyaVuranInsanlik. Play around:  http://bit.ly/1EDuq2L

google fusion migrant graph

Play with raw data of #KiyiyaVuranInsanlik in Google Fusion tables. Click “add chart.”

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s