Course Description

​Hands-on introduction to computational methods in empirical linguistic analysis and natural language processing. Topics include language modeling, text classification, linguistic annotation, computational semantics, and machine translation. Students will implement and apply computational models to real linguistic datasets, and conclude the course with a final project.



Schedule

Week Dates Content Materials
1 3/30
3/31
Introduction
what is computational linguistics?, course overview and policies, Quest login, regular expressions

Before Class

In Class Wednesday

For Reference

Relevant Readings

Slides Part 1 and Part 2 (J+M)

2 4/5
4/7
Text Normalization and Edit Distance

tokenization, stemming, lemmatization, edit distance, dynamic programming

Before Class

Relevant Readings

Slides Part 1 (J+M)

Assignment 1: Edit Distance [due 4/11]

3 4/12
4/14
Language Modeling

n-grams, perplexity, maximum likelihood estimation, smoothing

Before Class

Relevant Readings

Slides Part 1 (J+M)

Assignment 2: Ngram LM [due 4/18]

4 4/19
4/21
Classification Foundations

supervised learning, naive bayes, data splits, evaluation metrics

Before Class

Relevant Readings

Slides Part 1 (J+M)

Assignment 3: Naive Bayes Classification [due 4/29]

5 4/26
4/28
Linguistic Structure and Annotation

NLP libraries, POS tagging, parsing, NER, crowdsourcing, annotator agreement, ethical and practical concerns

Software

Relevant Readings

Slides Part 1

Midterm Self-Evaluation [due 5/2]

6 5/3
5/5
Computational Semantics 1

association metrics, word sense disambiguation, semantic resources

Before Class

Relevant Readings

Assignment 4: Bias Audit [due 5/9]

7 5/10
5/12
Computational Semantics 2

vector space semantics and embedding models, similarity metrics

Before Class

Relevant Readings

Assignment 5: Semantic Similarity [due 5/21]

8 5/17
5/19
Surveying the Landscape

Assignment 6: Group Presentations on CL/NLP Topic Areas [due in class 5/24]

9-10 5/24
5/26
6/2
State of the Art

sequence models, neural networks, contextual embeddings, application areas

Before Class

For Reference

Relevant Readings

Slides

Final Project [due 6/10]

Final Self-Evaluation [due 6/10]


* Coronavirus Note *

We remain in a difficult situation in the world. Everyone is distracted and anxious to some degree, we're all facing challenges across the board, and our physical and mental health (and those of our loved ones) are the top priorities. I trust we will all be doing our best to adapt to the cirumstances and giving each other a lot of leeway and understanding. Luckily enough this class is well-suited to a remote format, and I think it can still be a fun and productive time that will hopefully provide a welcome antidote to cabin fever.

In this context, I am particularly willing to be understanding of problems that arise; in exchange I ask for communication on your part. If need be, please just let me know what's going on and I'll do what I can to help.


Materials

All course materials are available for free online. We will refer primarily to these two textbooks:

When assigned readings, before class I encourage you to skim for basic understanding rather than detail. When doing the assignments, refer back to get details.

Some of the course materials are drawn from or inspired by relevant courses at other institutions, which also serve as excellent resources and points of reference, including:


Structure

Lectures, Readings, and Videos

This course will incorporate components of a "partially flipped" course, where occasionally we will do some reading or watching of videos before class, and then spend time in class working together on the assignments. All lectures will be recorded on Zoom and available in Canvas.


Assignments

The coursework will be structured around assignments to be completed individually, roughly one per week (with a few breaks), aimed at giving you hands-on practice with the material from that week.

Each homework will have an autograder on Quest that helps you check the accuracy of your outputs, and Wes and I will read your assignments to provide qualitative feedback. We can be flexible with deadlines if circumstances arise, but it's important to stay on top of these assignments because we will be moving quickly from one topic to another.


Group Work and Peer Evaluation

A few of the Wednesdays throughout the quarter will include time for either in-class group work or peer evaluation of one another's assignments. I strongly encourage you to attend these synchronous class meetings if you are able!


Final Project

This course includes a final project coponent. I'm very open as to what this could be. Basically it's an opportunity for you to take a self-directed approach to learning more about some topic in this field. Midway through the quarter I'll ask you for ideas on what you might do, and am always glad to consult on any questions you might have. In terms of structure here's a few possibilities:

  • Develop and carry out an independent project applying methods from this class
  • Use techniques learned in class to advance your existing research
  • Carry out a detailed linguistic error analysis on the outputs of an NLP system or systems
  • Replicate a paper in the field (Rob will provide some examples of good papers for this)
  • Write up a literature survey on a topic in the field

Regardless, the requirement is to present a writeup in ACL 2020 format as well as your code (if any). At minimum, if your project involves substantial coding I expect a 2-4 page writeup explaining what you did; if your project is only written (e.g. lit survey or error analysis), I expect 6-8 pages.

Group projects of up to three members are allowed, however I will expect the effort involved to scale roughly linearly with the number of group members. If you work in a group, you must include a paragraph at the end of your writeup explaining who did what.


Evaluation

Not a fan of grades, to be honest. Research has shown that traditional numerical/letter grades decrease intrinsic motivation and joy for learning, can undermine performance, and are potentially riddled with implicit bias. For more reading on this topic:

Therefore, grades go against my central goal for this course: getting you excited about and engaged with the wonderful world of computational linguistics. I am much more interested in helping you get what you want to out of the course through qualitative evaluation for your benefit. This will largely come in the form of written and in-person feedback on your work from Wes and me, as well as peer evaluation from your classmates.

In the interest of maintaining a healthy working relationship with the registrar, however, I will submit final grades at the end of the quarter. Below are the forms of evaluation we'll do and how much they'll contribute to what I end up submitting.


Self-evaluation (50%)

You know at least as well as I do how the course is going for you, so we'll have two self-evaluations, at the middle and end of the class. In the first week I'll send out a survey in which I ask you to explain your goals for the course; then for each self-evaluation I'll ask you to reflect on your process and progress towards those goals, your participation in the course, and ultimately to give yourself a grade and explain your reasoning.

In doing these evaluations, here are the kinds of questions I'll ask you to consider:

  • Did you turn in your assignments on time? (Note: for me, turning in assignments late with a good reason is equivalent to turning them in on time.)
  • Did you attend (or watch later) in-class lectures as much as you could? (Note: again, for me, missing class with a good reason is equivalent to attending.)
  • Did you keep up with readings, videos, and in-class activites?
  • Did you spend any allotted time in breakout rooms on work for this class?
  • Did your assignments run all the way through, and pass any tests?
  • Did you reach out for help when you needed it? (Note: doing this is positive!!!)
  • Did you collaborate with others to contribute to our classroom community (in breakout rooms, by helping on Ed, or outside of class)?
  • Did you challenge yourself, or did you do the minimum?

I hope to simply take your self-evaluation grades at face value, although if your self-evaluation disagrees significantly with my perception (in either direction) I may ask you to meet with me to hash out why our impressions differ.


Effortful Completion (50%)

Wes and I, in turn, will be watching your process, providing structures for learning, and trying to help keep you on track. At the end of the quarter, I'll give you a holistic grade for effortful completion of the assignments, peer code review, participation (e.g. on Ed), and your final project. My evaluation is also very liable to be influenced by your self-evaluation and report of your process and progress.


Inclusion Statement

I am committed to creating an inclusive environment that actively values the diversity of backgrounds, identities, and experiences of everyone in the classroom. I welcome you to talk with me if you have any feedback or if there's anything I can do to better support you. If you'd prefer to contact me anonymously you can do so using the form at the bottom of my faculty webpage.

University-Requested Syllabus Inclusions

This class or portions of this class will be recorded by the instructor for educational purposes. These recordings will be shared only with students enrolled in the course and will be deleted at the end of the Spring Quarter. Your instructor will communicate how you can access the recordings. [RV: On Canvas in the cloud recordings tab in the Zoom section.]

Unauthorized student recording of classroom or other academic activities (including advising sessions or office hours) is prohibited. Unauthorized recording is unethical and may also be a violation of University policy and state law. Students requesting the use of assistive technology as an accommodation should contact AccessibleNU. Unauthorized use of classroom recordings — including distributing or posting them — is also prohibited. Under the University’s Copyright Policy, faculty own the copyright to instructional materials — including those resources created specifically for the purposes of instruction, such as syllabi, lectures and lecture notes, and presentations. Students cannot copy, reproduce, display or distribute these materials. [RV: You're welcome to share slides, assignments, and other materials with anyone.] Students who engage in unauthorized recording, unauthorized use of a recording or unauthorized distribution of instructional materials will be referred to the appropriate University office for follow-up.


Contingency Plan

Given the tricky situation in the world at the moment, the university has asked us to provide contingency plans. If you receive a notification that I am ill and cannot teach, continue work on any assignments that have yet been posted on the course website on the normal weekly schedule. All of the textbook readings we are doing have exercises at the end of each chapter; for weeks where an assignment has not yet been posted, do a reasonable subset of the exercises at the end of the chapters for that week, and submit those you do as a Python file in your user directory on Quest. Zoom meetings at the normal times are pre-scheduled through the end of the quarter, and the links will still work, as will Piazza. I encourage you to coordinate and work with your classmates on Zoom or Piazza.