CSE 5525: Speech and Language Processing

Details

Time: Wednesday, Friday, 11:10 - 12:30
Place: Dreese Lab 480
Instructor: Alan Ritter (ritter.1492@osu.edu)
Office Hours: Fridays 4:00-5:00pm, Dreese 595
TA: Ashutosh Baheti (ashutosh.baheti95@gmail.com)
Office Hours: Wednesday 1-2pmpm, DL 190

Textbooks:

There are two excellent NLP textbooks that are freely available online. I will assign readings from both - there is a lot of value in seeing multiple perspectives on the same material. If a concept you encounter seems confusing at first, try reading about it in the other textbook to get a different perspective.

Dan Jurafsky and James H. Martin. Speech and Language Processing (3rd Edition).
Jacob Eisenstein Natural Language Processing.
(There will be other readings as well)

Grading

Grading will be based on:

Participation (10%)

You will receive credit for asking and answering questions related to the homework on Piazza, engaging in class discussion and participating in the in-class exercises.

Homeworks (50%)

The homeworks will include both written and programming assignments. Homework should be submitted to the Dropbox folder in Carmen by 11:59pm on the day it is due (unless otherwise instructed). Each student will have 3 flexible days to turn in late homework throughout the semester. As an example, you could turn in the first homework 2 days late and the second homework 1 day late without any penalty. After that you will loose 20% for each day the homework is late. Please email your homework to the instructor in case there are any technical issues with submission.

Midterm (20%)

There will be an in-class midterm (the date is 11/1/2019).

Final Projects (20%)

The final project is an open-ended assignment, with the goal of gaining experience applying the techniques presented in class to real-world datasets. Students should work in groups of 3-4. It is a good idea to discuss your planned project with the instructor to get feedback. The final project report should be 4 pages. The report should describe the problem you are solving, what data is being used, the proposed technique you are applying in addition to what baseline is used to compare against.

Resources

Piazza (discussion, announcements, etc...). https://piazza.com/class/jzd3wfpji4b6uj

Carmen (homework submission + grades). https://osu.instructure.com/courses/66042

Academic Integrity

Any assignment or exam that you hand in must be your own work (with the exception of group projects). However, talking with others to better understand the material is strongly encouraged. Copying a solution or letting someone copy your solution is considered cheating. Everything you hand in must be your own words. Code you hand in must be written by you, with the exception of any code provided as part of the assignment. Any collaboration during an exam is considered cheating. Any student who is caught cheating will be reported to the Committee on Academic Misconduct. Please don't take a chance - if you are having trouble understanding the material, let us know and we will be happy to help.

Homework

Homework 1 (Due 8/30, submit report and code to Dropbox on Carmen)

Homework 2 (Due 9/20, submit report and code to Dropbox on Carmen)

Homework 3 (Due 10/23, submit report and code to Dropbox on Carmen)

Anonymous Feedback

http://goo.gl/forms/bc6zU8K0Et

Tentative Schedule:

https://docs.google.com/spreadsheets/d/1KqO9cL4hAlLh9KO7ryuaW1W29KE8CWmc354PQa8pOro/edit?usp=sharing

Schedule

Date	Topic	Required Reading	Suggested Reading
8/21	Course Overview		J+M, 3rd Edition Chapter 1
8/23	Machine Learning (classification)	Eisenstein 2.0-2.5, 4.1,4.3-4.5, CIML, 4.1-4.4, 4.6-4.7
8/28	Machine Learning (cont.)	Eisenstein 2.0-2.5, 4.1,4.3-4.5, CIML, 4.1-4.4, 4.6-4.7	CIML Chapter 5 (Linear Models / SVM)
8/30	Multiclass Learning	J+M Chapter 5
9/3	Neural Networks in NLP	Eisenstein 3.1-3.3, J+M 7.1-7.4	Goldberg 1-4
9/5	Sequence Tagging	Eisenstein 7.0-7.4, J+M Chapter 8
9/11	Viterbi Algorithm	J+M Chapter 8,
9/13	Conditional Random Fields	Eisenstein 7.5, 8.3,	Manning 2011 “Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics?”
9/17	Conditional Random Fields (cont), NER	Eisenstein 7.5, 8.3,
9/19	Word Embeddings	Eisenstein 3.3.4, 14.5, 14.6, J+M 6	Goldberg 5, word2vec, Levy, GloVe, Dropout
9/25	Word Embeddings (cont.) and RNNs	J+M Chapter 9, Goldberg 10,11,
9/27	Recurrent Neural Networks (cont) + Convolutional Neural Networks, Neural CRFs	Eisenstein 3.4, 7.6	Goldberg 9, Kim, Collobert and Weston, Neural NER
10/7	Machine Translation	Eisenstein 18.1, 18.2
10/15	Encoder-Decoder Networks	Seq2Seq
10/22	Information Extraction	Eisenstein 13, 17
10/24	Neural Machine Translation	Eisenstein 18.3
10/30	Reading Comprehension	E2E Memory Networks, CBT, SQuAD, BiDAF
11/6	Summarization (Presented by Prof. Wei Xu)	Eisenstein 19, MMR, Gillick, Sentence compression, SummaRuNNER, Pointer
11/8	Dialogue	J+M Chapter 24	RNN chatbots, Diversity, Goal-oriented, Latent Intention, QA-as-dialogue
11/12	Unsupervised Learning in NLP	Painless unsup, Bowman VAE, ELMo, BERT
11/19	Unsupervised Learning in NLP (cont)		A Tutorial on Deep Latent Variable Models of Natural Language
11/21	Ethics/Wrapup
12/12 12:00pm-1:45pm	Final Project Presentations