Link Search Menu Expand Document

About

Table of contents

  1. Textbook(s)
  2. Computing Resources / Colab Pro
  3. Attendance
  4. Prerequisites
  5. Assignments / Grading
    1. Programming Assignments (Projects) - 40%
    2. Written Assignments (Problem Sets) - 20%
    3. Midterm Exam - 15%
    4. Participation - 5%
    5. Final Project - 20%
  6. Academic Integrity

Textbook(s)

There are two excellent NLP textbooks that are freely available online. Readings will be assigned from both. There is value in seeing multiple perspectives on the same material. If a concept you encounter seems confusing at first, try reading about it in the other book to get a different perspective.

Computing Resources / Colab Pro

The programming assignments will ask you to implement state-of-the-art natural language processing algorithms using neural networks. For this purpose we will use Pytorch, and you will require access to GPUs. The class will use Google Colab, which provides easy access to GPUs. We highly recommend signing up for Colab Pro (once you start working on the part of the homework that uses Pytorch). This costs $10 / month, which is roughly equivalent to the cost of a textbook over the course of the semester. This will provide a better experience working on the homework assignments by giving you access to better GPUs, etc. We have investigated other options to provide students access to GPUs for the homework assignments (Google cloud credits and PACE/ICE), and have found Colab Pro is the best solution. Of course you are welcome to use other GPU resources to complete the assignments if you choose, but we cannot provide support for this. You will need to submit your code and program output in Jupyter Notebook format that runs on Colab Pro.

Attendance

Students are expected to attend lecture, and complete the required reading assingments.

Prerequisites

This is an advanced course on Natural Language Processing. Modern NLP is heavily based on Machine Learning. The course involves mathematical problem solving and programming exercises to provide hands-on experience with concepts in NLP. To succeed, you need a strong programming background, in addition to a firm grasp of probability, linear algebra, and multivariable calculus. You should be comfortable working on medium-to-large software projects in Python and be comfortable learning and using new Python libraries, or you should have developed the ability to independently learn a new programming language and environment very quickly.

There will be a math background test (due in the 1st week) and a warmup programming assignment due shortly afterward. If you find these difficult, you should expect a lot of extra work and challenges to catch up – in that case we recommend waiting to take this class in a later semester, once you are better prepared. Please reach out to the course staff to discuss whether you have the right background to succeed in this course, especially if there are any symbols or concepts on these assignments which you are unfamiliar with.

(For students on the wait list: we don’t have any additional information on whether you will be able to enroll in the course, but if you plan to try and enroll, please complete and submit Problem Set 0, which is due during the first week. Please post a Piazza message to get the access code to access Gradescope.)

Assignments / Grading

Graded work will include written and programming assignments. Assignments should be submitted to Gradescope by 11:59pm on the day they are due, unless otherwise noted. Please email your homework to the instructor and cc the TAs, in case of any technical issues with submission.

Each student will have six flexible days to turn in late homework throughout the semester. Late days will be applied to homework assignments in the order of submission. As an example, you could turn in the first homework three days late and the second homework three days late without any penalty. After that you will loose 20% for each day further assignments are handed in late. Late penalties are managed in increments of days and apply to the entire assignment. The six late days are meant for personal emergencies; if you use late days for non-emergencies but later encounter emergencies, you will not be given extra late days. No late days will be allowed for the final course project, due to the tight deadline for final grades required by the university.

All graded components of the course will be rescaled proportionally into a final numerical grade, which will be mapped to letter grade according to a cutoff based on the overall class grade distribution.The standard cutoff is 90/80/70% for A/B/C, but we may curve up (never down), i.e., use lower cutoffs than these. These cutoffs can only be determined after we grade the final project at the end of the semester.

Programming Assignments (Projects) - 40%

We plan to assign three programming assignments that provide hands-on experience implementing algorithms discussed during lecture. The assignments are in Python, and make use of the Numpy and Pytorch libraries. These programming projects will require non-trivial computation; we recommend using Google’s Colab platform which provides access to GPUs. Completing these projects will require waiting for your models to train (this can range from about 30 minutes to hours depending on the efficiency of your implementation), so we strongly recommend starting work on these programming assignments well in advance of the deadline. If you start working on an assignment the day before it is due, it is highly unlikely you will be able to complete it on time.

The portion of your final grade based on the programming assignments will be as follows:

  • Project 0 (5%)
  • Project 1 (10%)
  • Project 2 (10%)
  • Project 3 (15%)

Written Assignments (Problem Sets) - 20%

Written assignments are mostly mathematical. You can scan and upload your solutions to Gradescope. Please write answers clearly, as we won’t be able to award credit for answers that we are not legible.

The portion of your final grade based on the written assignments is as follows:

  • Problem Set 0 (5%)
  • Problem Set 1 (7.5%)
  • Problem Set 2 (7.5%)

Midterm Exam - 15%

The midterm will cover topics that are selected from the lectures, assigned reading, and homework assignments.

Participation - 5%

You will receive credit for asking and answering thoughtful questions related to the course content on Piazza, engaging in discussion in class and generally for participating in the class. There are many ways to show participation. Asking a question that is marked as a “good question” by an instructor on Piazza, or having an answer that is marked as an “endorsed answer” is one example. Asking insightful questions, and engaging in discussion during class is another example. Please be polite and respectful towards TAs and other students in the class.

Final Project - 20%

The final project is an open-ended assignment, with the goal of gaining experience applying the techniques presented in class to real-world datasets. Students should work in groups of 2-4. It is a good idea to discuss your planned project with the instructor and/or TAs to get feedback. The final project report should be 4 pages. The report should describe the problem you are solving, what data is being used, the proposed technique you are applying in addition to what baseline is used to compare against.

The grading rubric for the final project is as follows:

  • Clarity (1-5) For the reasonably well-prepared reader, is it clear what was done and why? Is the report well-written and well structured?
  • Originality / Innovativeness (1-5) How original is the approach? Does this project break new ground in topic, methodology, or content? How exciting and innovative is the work that it describes?
  • Soundness / Correctness (1-5) First, is the technical approach sound and well-chosen? Second, can one trust the claims of the report – are they supported by proper experiments, proofs, or other argumentation?
  • Meaningful Comparison (1-5) Does the author make clear where the problems and methods sit with respect to existing literature? Are any experimental results meaningfully compared with the best prior approaches?
  • Substance (1-5) Does this project have enough substance, or would it benefit from more ideas or results? Note that this question mainly concerns the amount of work; its quality is evaluated in other categories.
  • Overall (1-5)

Academic Integrity

Any assignment or exam that you hand in must be your own work (with the exception of group projects). However, talking with others to better understand the material is strongly encouraged. Copying a solution or letting someone copy your solution is considered cheating. Everything you hand in must be your own words. Code you hand in must be written by you, with the exception of any code provided as part of the assignment. Any collaboration during an exam is considered cheating. Any student who is caught cheating will be reported to the Office of Student Integrity. Please don’t take a chance - if you are having trouble understanding the material, let us know and we will be happy to help.