CSE 5525: Speech and Language Processing

Fundamentals of natural language processing, automatic speech recognition and speech synthesis; lab projects concentrating on building systems to process written and/or spoken language.

Details
Topics:
Textbook:
Grading

Grading will be based on:

Participation and in-class Exercises (10%)

You will receive credit for asking and answering questions related to the homework on Piazza, engaging in class discussion and participating in the in-class exercises.

Homeworks (50%)

The homeworks will include both written and programming assignments. Homework should be submitted to the Dropbox folder in Carmen by 11:59pm on the day it is due (unless otherwise instructed). Late homework will be accepted up to 48 hours later for 50% credit. After 48 hours, late homework will not be accepted. Please email your homework to the instructor if there are any technical issues with submission.

Midterm (20%)

There will be an in-class midterm on March 4.

Final Projects (20%)

The final project is an open-ended assignment, with the goal of gaining experience applying the techniques presented in class to real-world datasets. Students should work in groups of 3-4. It is a good idea to discuss your planned project with the instructor to get feedback. The final project report should be 4 pages and is due on April 30. The report should describe the problem you are solving, what data is being used, the proposed technique you are applying in addition to what baseline is used to compare against.

Resources
  • Piazza (discussion, announcements and restricted resources). https://piazza.com/osu/spring2016/5525/home
  • Carmen (homework submission + grades). https://carmen.osu.edu/d2l/home/11684583
  • Academic Integrity
    Any assignment or exam that you hand in must be your own work (with the exception of group projects). However, talking with others to better understand the material is strongly encouraged. Copying a solution or letting someone copy your solution is cheating. Everything you hand in must be your own words. Code you hand in must be written by you, with the exception of any code provided as part of the assignment. Any collaboration during an exam is considered cheating. Any student who is caught cheating will be reported to the Committee on Academic Misconduct. Please don't take a chance - if you are having trouble understanding the material, let us know and we will be happy to help.
    Homeworks
  • Homework 1 (Due 1/15, hand in paper copy at the beginning of class)
  • Homework 2 [Starter Code] [Data] (Due 2/5, turn in to Carmen before the beginning of class)
  • Homework 3 [Starter Code] (Due 3/23, turn in to Carmen before the beginning of class)
  • Homework 4 (Due 4/25, turn in to Carmen before 11:59pm)
  • Anonymous Feedback
    Schedule
    Date Topic Required Reading Suggested Reading
    1/13 Course Overview J+M, 2nd Edition Chapter 1
    1/15 Probability Review and Naive Bayes Mackay Book 2.1-2.3 (Probability), J+M, 3rd Edition 7.1 (Naive Bayes)
    1/20 More Text Classification J+M, 3rd Edition 7.2-7.3
    1/22 Logistic Regression J+M, 3rd Edition 7.4 Michael Collins' notes on Log-Linear Models
    1/27 More Logistic Regression J+M, 3rd Edition 4.1-4.3 CIML Chapter 3 (Perceptron Algorithm)
    1/29 Language Modeling J+M, 3rd Edition 4.4-4.5 Michael Collins' notes on Language Models
    2/3 Kneser-Ney Smoothing J+M, 3rd Edition 4.4-4.5
    2/5 Parts of Speech and Hidden Markov Models J+M, 3rd Edition 9.1-9.3 and 8.1-8.2 Michael Collins' notes on Hidden Markov Models
    2/10 The Viterbi Algorithm J+M, 3rd Edition 8.3-8.4,9.4
    2/12 Maximum Entropy Markov Models J+M, 3rd Edition 9.5
    2/17 Maximum Entropy Markov Models J+M, 3rd Edition 9.5
    2/19 Conditional Random Fields J+M, 3rd Edition 9.6 CRF Tutorial (Sutton and McCallum)
    2/24 Parsing J+M, 2nd Edition 12.1-12.7
    2/26 Parsing J+M, 2nd Edition 13.1-13.5
    3/11 Brown Clustering J+M, 3rd Edition 19.1-19.3 Class-Based n-gram Models of Natural Language (Brown et. al. 1993)
    3/23 Relation Extraction J+M, 3rd Edition 20.2 Distant supervision for relation extraction without labeled data (Mintz et. al. 2009)
    4/1 Machine Translation J+M, 2nd Edition 25.1-25.3
    4/6 Machine Translation J+M, 2nd Edition 25.4-25.7
    4/8 Machine Translation
    4/13 Machine Translation
    4/15 Deep Learning in NLP Yoav Goldberg's Tutorial on Neural Networks in NLP
    4/20 Speech Recognition
    4/22 Speech Recognition
    4/28 Project Presentations 12-1:30pm