The recent explosion of user generated content in online social media presents a wealth of new opportunities for data analytics applications. Structured data is just the tip of the iceberg; most of this data is locked up as unstructured text, which is difficult for current algorithms to process. There has been growing interest in adapting natural language processing (NLP) and information extraction (IE) technology to this data, as well as identifying new opportunities for applications on big, noisy, informal text data. Examples include computational social science, user modeling, personalization, news recommendation, event detection and more.
The course will involve reading and discussing recent papers from top conferences in the field. Students will propose and complete an open-ended course project; example projects might include anything from extracting a concert calendar from Twitter to automatically generating answers to health questions in online patient forums.
While the course will cover some technical material, emphasis will be on applications and building systems rather than mathematical details. Some prior coursework in Artificial Intelligence or Machine Learning will be very helpful.
Grading will be based on 2 components:
Date | Topic | Reading |
---|---|---|
8/27 | Course Overview | No Reading |
8/29 | Relation Extraction (Alan will present) - Useful videos: 1 2 3 4 | Distant supervision for relation extraction without labeled data, Mike Mintz, Steven Bills, Rion Snow, Dan Jurafsky, ACL 2009 Email instructor with your preferred presentation slot date before class. |
9/3 | Relation Extraction | Coupled Semi-Supervised Learning for Information Extraction A. Carlson, J. Betteridge, R.C. Wang, E.R. Hruschka Jr. and T.M. Mitchell. In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM), 2010. Email instructor with project groups before class. |
9/5 | Event Extraction | Open Domain Event Extraction from Twitter Alan Ritter, Mausam, Oren Etzioni, Sam Clark, KDD 2012 |
9/10 | Text-Driven Forecasting | Predicting a Scientific Community’s Response to an Article Dani Yogatama, Michael Heilman, Brendan O’Connor, Chris Dyer, EMNLP 2011 |
9/12 | Brainstorm Project Ideas | No Reading |
9/17 | Text-Driven Forecasting | Predicting the Present with Google Trends Hyunyoung Choi, Hal Varian |
9/19 | Computational Social Science | No Country for Old Members: User Lifecycle and Linguistic Change in Online Communities Cristian Danescu-Niculescu-Mizil, Robert West, Dan Jurafsky, Jure Leskovec, Christopher Potts, WWW 2013 |
9/24 | Relation Extraction | Modeling Missing Data in Distant Supervision for Information Extraction Alan Ritter, Luke Zettlemoyer, Mausam, Oren Etzioni, TACL 2013 |
9/26 | Geographical Modeling | Hierarchical Geographical Modeling of User Locations from Social Media Posts Amr Ahmed, Liangjie Hong, Alex Smola, WWW 2013 |
10/1 | Geographical Modeling Initial Project Proposals Due | Finding Your Friends and Following Them to Where You Are Adam Sadilek, Henry Kautz, Jeffrey Bigham, WSDM 2012 |
10/3 | NLP in Noisy Text | Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters Olutobi Owoputi, Brendan O’Connor, Chris Dyer, Kevin Gimpel, Nathan Schneider, Noah A. Smith, NAACL 2013 |
10/8 | Summarization | Towards Twitter Context Summarization with User Influence Models Yi Chang, Xuanhui Wang, Qiaozhu Mei, Yan Liu WSDM 2013 |
10/10 | Text Driven Forecasting | Success with style: Using writing style to predict the success of novels Vikas Ganjigunte, Ashok Song Feng, Yejin Choi, EMNLP 2013 |
10/15 | NLP in Noisy Text | Learning part-of-speech taggers with inter-annotator agreement loss, Barbara Plank, Dirk Hovy, Anders Søgaard, EACL 2014 |
10/17 | Event Extraction | Major Life Event Extraction from Twitter based on Congratulations/Condolences Speech Acts, Jiwei Li, Alan Ritter, Claire Cardie and Eduard Hovy, EMNLP 2014 |
10/22 | Entity Linking | To Link or Not to Link? A Study on End-to-EndTweet Entity Linking Stephen Guo, Ming-Wei Chang, Emre Kiciman, NAACL 2013 |
10/24 | NLP in Noisy Text | Lexical Normalisation of Short Text Messages: Makn Sens a #twitter Bo Han, Timothy Baldwin, ACL 2011 (note: the paper was changed on 10/20 - if you read the previous paper it's fine to submit a critique for that instead). |
10/29 | NLP in Noisy Text | What to do about bad language on the internet Jacob Eisenstein |
10/31 | Guest Lecture: Micha Elsner | Disentangling chat with local coherence models, ACL 2011 |
11/5 | NLP in Noisy Text | A Dependency Parser for Tweets Liangpeng Kong et. al., EMNLP 2014 |
11/7 | Guest Lecture: Wei Xu (Upenn) | A Preliminary Study of Tweet Summarization using Information Extraction |
11/12 | Distributed Representations of Words and Phrases and their Compositionality | |
11/14 | Sentiment | Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! Using Word Lengthening to Detect Sentiment in Microblogs |
11/19 | Guest Lecture: Marie-Catherine de Marneffe | Easy Victories and Uphill Battles in Coreference Resolution Durrett and Klein, EMNLP 2013 |
11/21 | Event Extraction | Event Discovery in Social Media Feeds Benson et. al. ACL 2011 | 11/26 | Thanksgiving break |
11/28 | Columbus Day | |
12/3 | Course Projects | Office hours, feel free to drop by to discuss any questions about projects. Dreese 595. |
12/15 | Course Projects | Final Project Presentations @ 4pm |