The scent of hot braai and sea air have made way for incense and spicy garam masala: Mango airlines has safely delivered it’s sea-sprayed, big-data-acronymed-exhausted, conference-energized cargo to Durban, South Africa. I’ve arrived at the home of my co-presenter Kusturie and family. And what a journey it’s been!
The conference is over. But the work has just begun. After a non-stop whirlwind of international collaboration, cultural events, and brilliant librarianship–scholarship that makes me proud to be a part of this socially engaged, politically active, not-afraid-to-shake-a+ profession–I’m now coming to grips with and synthesizing the invaluable tacit cultural knowledge as well as practical applied techniques learned through conversation with international scholars and practitioners.
Those who gave more than $50 to the campaign that funded my plane trip will receive a recreated experience of South Africa (check your mailboxes!) and (thanks again!):
I’ve curated a Flickr photo album (https://flic.kr/s/aHskhxPnAR) and a Storify (https://storify.com/55dde8b4df8c14634427694b), sharing the experience.
Lekker and Dui, may you get to the mother city someday!
Humanity Washed Ashore…In what languages are global tweeters talking about the Syrian refugees? With the tweet data subset of tweets and metadata associated with #KiyiyaVuranInsanlik, you can explore the tweet network as of 9AM September 9, 2015 (the day I’m writing this post). There are 23 languages that appear in this dataset; Most frequent user account language (Harken, budding data scientists! User account language is NOT necessarily language of the tweet text), ordered by the frequency of tweets: English (en), French (fr), German (de), Spanish (es), Turkish (tr), English (UK). We also see tweeters, though far less frequently (perhaps an artifact of a lower population or relatively lower Twitter presence) with accounts associated with Portuguese (pt), Japanese (ja), Arabic (ar), Dutch (nl), and a few (17 tweets) Greek (el), italian (it), as well as Polish and Indonesian. Chinese, Finnish, Russian, Norwegian, Azerbaijan (az), Danish, Swedish, Serbian and Hungarian. Source for language codes=http://www.lingoes.net/en/translator/langcode.htm
Twitter, by nature, is of real-time, by the minute, trending topics. Data analysis of Twitter data is best if it is as well. If you want a sense of who’s tweeting, what languages are participating in a particular topic of conversation, and where (IF their geo-coord data is not set to private…which is the case when I’m visualizing librarian conference data–they are privacy aware;)) here are my go-to quick and dirty exploratory tools…super easy to use, and give me an initial sense of the scope and “feel” of the data.
- What’s your topic? I’m taking data from the outcry in reaction to the photograph of Syrian boy Aylan Kurdi, who drowned on his way from Turkey to Greece, just one soul in the recent migrant crises to die while fleeing. This Twitter data was gathered today at 9AM (Sept. 3, 2015), and showed 23 unique languages of user tweeting with the hashtag #KiyiyaVuranInsanlik.
- Follow your curiosity to shape the questions to ask your data. Before analysis, I always manually go to the head of the tweet stream of the hashtag I’m harvesting, to get a sense of it’s popularity, the type of media being posted, the users, etc.
- Get TAGS. It’s a google spreadsheet that uses the Twitter REST Api (credit: Martin Hawksey!). https://tags.hawksey.info/get-tags/ (Use the “New Tags”).
- Once your Google Spreadsheet is open, Click File–> Create new copy–> then rename. Now, in this new sheet, go to the TAGS tab at the top right of your menu bar. Enter the term you wish to scrape from twitter in the box (it’s all pretty self-explanatory). Click “Run now!” (You’ll need to be logged into Twitter and set up authorization.
- Authorization Once you have granted authorization to run through your Twitter account, click TAGS–> “Run now!” again. A message will appear indicating that the API is scraping (“working”). Wait patiently 🙂 Don’t go to the archive sheet while it’s working–just chill.
- How many unique tweets? When done, check the sheet to see the period of time (7 days back) of data scraped, and the number of unique tweets. Then, check out the archive!!!!
- Add to Google Fusion I used to download the archive, save as a CSV, then upload to Google Fusion tables to play with. However, if you simply take the web address of the archive spreadsheet, you can pull it into Fusion super easy peasy.
- Word Tree tweet content exploration Also, use Jason Davies’ Word Tree to do some SUPER simple exploratory NLP. Then, of course, load your text into python to clean it up and run nltk for bigram, trigrams, frequency analysis, etc. to dig deeper. https://www.jasondavies.com/wordtree/?source=&prefix=Aylan%20K
Word tree to explore the frquently used words, and to inductively explore the themes of a hashtag corpus’ content. For example, I looked for “migrant” “Aylan” “help” “family” “refugee” (plural and singular sensitive, so it’s kind of a blunt weapon, just good for exploring pre-nltk. (HT Jason Davies and co: http://hint.fm/papers/wordtree_final2.pdf)
Explore! Here are the exploratory raw data and charts (via google Fusion tables) vis of languages of twitter users who tweeted with the hashtag #KiyiyaVuranInsanlik. Play around: http://bit.ly/1EDuq2L
Play with raw data of #KiyiyaVuranInsanlik in Google Fusion tables. Click “add chart.”