Course Description

​Hands-on introduction to computational methods in empirical linguistic analysis and natural language processing. Topics include language modeling, text classification, linguistic annotation, computational semantics, and machine translation. Students will implement and apply computational models to real linguistic datasets, and conclude the course with a final project.


Learning Objectives

  • Understanding of key issues in several core areas of research in computational linguistics.
  • Ability to implement a set of foundational algorithms for linguistic understanding from scratch.
  • Experience with empirical evaluation of computational models, including error analysis.
  • Practical familiarity with software packages for language processing in Python.


Schedule

Week Dates Content Materials
1 9/22 Introduction
what is computational linguistics?, course overview and policies, Quest login

For Reference

Relevant Readings

Slides [1]

2 9/27
9/29
Regular Expressions, Text Normalization, and Edit Distance

regexes, tokenization, stemming, lemmatization, edit distance, dynamic programming

Before Class

In Class (as time allows)

Relevant Readings

Slides [1, 2]

Assignment 1: Edit Distance [due 10/5]

3 10/4
10/6
Language Modeling

n-grams, perplexity, maximum likelihood estimation, smoothing

Before Class

In Class (for reference)

Relevant Readings

Slides [1, 2]

Assignment 2: Ngram LM [due 10/12]

4 10/11
10/13
Classification Foundations

supervised learning, naive bayes, perceptron, generative vs. discriminative models, data splits, evaluation metrics

Before Class

Relevant Readings

Slides [1]

Assignment 3: Classification [due 10/19]

5 10/18
10/20
Linguistic Structure and Annotation

NLP libraries, POS tagging, parsing, NER, crowdsourcing, annotator agreement, ethical and practical concerns

Software

Relevant Readings

Slides [1]

Midterm Self-Evaluation [due 10/26]

6 10/25
10/27
Computational Semantics 1

association metrics, word sense disambiguation, semantic resources

Before Class

Relevant Readings

Slides [1]

Assignment 4: Bias Audit [due 11/2]

7 11/1
11/3
Computational Semantics 2

vector space semantics and embedding models, similarity metrics

Before Class

Relevant Readings

Slides [1, 2]

Assignment 5: Semantic Similarity [due 11/12]

8 11/8
11/10
Topic Models

unsupervised learning, graphical models basics, latent dirichlet allocation, gibbs sampling, k-means clustering, expectation-maximization algorithms

Before Class

Relevant Readings

Slides [1 (Blei, slides 18-41)]

Assignment 6: Topic Modeling

10 11/15
11/17
State of the Art

sequence models, neural networks, contextual embeddings, application areas

Before Class

In Class (Wednesday 11/17)

  • Final project idea generation, extra OH (attendance optional)

For Reference

Relevant Readings

Slides [1]

9 11/15
11/17
Applied NLP

more classification models, feature engineering, software libraries, experimental design

Slides [1]

Final Project [due 12/9]

Final Self-Evaluation [due 12/9]


* Coronavirus Note *

We remain in a difficult situation in the world. Everyone is distracted and anxious to some degree, we're all facing challenges across the board, and our physical and mental health (and those of our loved ones) are the top priorities. I trust we will all be doing our best to adapt to the cirumstances and giving each other a lot of leeway and understanding.

In this context, I am particularly willing to be understanding of problems that arise; in exchange I ask for communication on your part. If need be, please just let me know what's going on and I'll do what I can to help.


Materials

All course materials are available for free online. We will refer primarily to these two textbooks:

When assigned readings, before class I encourage you to skim for basic understanding rather than detail. When doing the assignments, refer back to get details.

Some of the course materials are drawn from or inspired by relevant courses at other institutions, which also serve as excellent resources and points of reference, including:


Structure

Lectures, Readings, and Videos

This course will incorporate components of a "partially flipped" course, where occasionally we will do some reading or watching of videos before class, and then spend time in class working together on the assignments. All lectures will be recorded on Zoom and available in Canvas.


Assignments

The coursework will be structured around assignments to be completed individually, roughly one per week (with a few breaks), aimed at giving you hands-on practice with the material from that week.

Each homework will have an autograder on Quest that helps you check the accuracy of your outputs, and Thomas and I will read your assignments to provide qualitative feedback. We can be flexible with deadlines if circumstances arise, but it's important to stay on top of these assignments because we will be moving quickly from one topic to another.


Group Work and Peer Evaluation

A few of the Wednesdays throughout the quarter will include time for either in-class group work or peer evaluation of one another's assignments. I strongly encourage you to attend these synchronous class meetings if you are able!


Final Project

This course includes a final project coponent. I'm very open as to what this could be. Basically it's an opportunity for you to take a self-directed approach to learning more about some topic in this field. Midway through the quarter I'll ask you for ideas on what you might do, and am always glad to consult on any questions you might have. In terms of structure here's a few possibilities:

  • Develop and carry out an independent project applying methods from this class
  • Use techniques learned in class to advance your existing research
  • Carry out a detailed linguistic error analysis on the outputs of an NLP system or systems
  • Replicate a paper in the field (Rob will provide some examples of good papers for this)
  • Write up a literature survey on a topic in the field

Regardless, the requirement is to present a writeup in ACL 2020 format as well as your code (if any). At minimum, if your project involves substantial coding I expect a 2-4 page writeup explaining what you did; if your project is only written (e.g. lit survey or error analysis), I expect 6-8 pages.

Group projects of up to three members are allowed, however I will expect the effort involved to scale roughly linearly with the number of group members. If you work in a group, you must include a paragraph at the end of your writeup explaining who did what.


Evaluation

Not a fan of grades, to be honest. Research has shown that traditional numerical/letter grades decrease intrinsic motivation and joy for learning, can undermine performance, and are potentially riddled with implicit bias. For more reading on this topic:

Therefore, grades go against my central goal for this course: getting you excited about and engaged with the wonderful world of computational linguistics. I am much more interested in helping you get what you want to out of the course through qualitative evaluation for your benefit. This will largely come in the form of written and in-person feedback on your work from Thomas and me, as well as peer evaluation from your classmates.

In the interest of maintaining a healthy working relationship with the registrar, however, I will submit final grades at the end of the quarter. Below are the forms of evaluation we'll do and how much they'll contribute to what I end up submitting.


Self-evaluation (50%)

You know at least as well as I do how the course is going for you, so we'll have two self-evaluations, at the middle and end of the class. In the first week I'll send out a survey in which I ask you to explain your goals for the course; then for each self-evaluation I'll ask you to reflect on your process and progress towards those goals, your participation in the course, and ultimately to give yourself a grade and explain your reasoning.

In doing these evaluations, here are the kinds of questions I'll ask you to consider:

  • Did you turn in your assignments on time? (Note: for me, turning in assignments late with a good reason is equivalent to turning them in on time.)
  • Did you attend (or watch later) in-class lectures as much as you could? (Note: again, for me, missing class with a good reason is equivalent to attending.)
  • Did you keep up with readings, videos, and in-class activites?
  • Did you spend any allotted time in breakout rooms on work for this class?
  • Did your assignments run all the way through, and pass any tests?
  • Did you reach out for help when you needed it? (Note: doing this is positive!!!)
  • Did you collaborate with others to contribute to our classroom community (in breakout rooms, by helping on Ed, or outside of class)?
  • Did you challenge yourself, or did you do the minimum?

I hope to simply take your self-evaluation grades at face value, although if your self-evaluation disagrees significantly with my perception (in either direction) I may ask you to meet with me to hash out why our impressions differ.


Effortful Completion (50%)

Thomas and I, in turn, will be watching your process, providing structures for learning, and trying to help keep you on track. At the end of the quarter, I'll give you a holistic grade for effortful completion of the assignments, peer code review, participation (e.g. on Ed), and your final project. My evaluation is also very liable to be influenced by your self-evaluation and report of your process and progress.


Inclusion Statement

I am committed to creating an inclusive environment that actively values the diversity of backgrounds, identities, and experiences of everyone in the classroom. I welcome you to talk with me if you have any feedback or if there's anything I can do to better support you. If you'd prefer to contact me anonymously you can do so using the form at the bottom of my faculty webpage.

University-Requested Syllabus Inclusions

Academic Integrity Statement

Students in this course are required to comply with the policies found in the booklet, "Academic Integrity at Northwestern University: A Basic Guide". All papers submitted for credit in this course must be submitted electronically unless otherwise instructed by the professor. Your written work may be tested for plagiarized content. For details regarding academic integrity at Northwestern or to download the guide, visit: https://www.northwestern.edu/provost/policies/academic-integrity/index.html


Accessibility Statement

Northwestern University is committed to providing the most accessible learning environment as possible for students with disabilities. Should you anticipate or experience disability-related barriers in the academic setting, please contact AccessibleNU to move forward with the university’s established accommodation process (e: accessiblenu@northwestern.edu; p: 847-467-5530). If you already have established accommodations with AccessibleNU, please let me know as soon as possible, preferably within the first two weeks of the term, so we can work together to implement your disability accommodations. Disability information, including academic accommodations, is confidential under the Family Educational Rights and Privacy Act.


COVID-19 Classroom Expectations Statement

Students, faculty, and staff must comply with University expectations regarding appropriate classroom behavior, including those outlined below and in the COVID-19 Code of Conduct. With respect to classroom procedures, this includes:

  • Policies regarding masking and social distancing evolve as the public health situation changes. Students are responsible for understanding and complying with current masking, testing, Symptom Tracking, and social distancing requirements.
  • In some classes, masking and/or social distancing may be required as a result of an Americans with Disabilities Act (ADAccommodation for the instructor or a student in the class even when not generally required on campus. In such cases, the instructor will notify the class.
  • No food is allowed inside classrooms. Drinks are permitted, but please keep your face covering on and use a straw.
  • Faculty may assign seats in some classes to help facilitate contact tracing in the event that a student tests positive for COVID-19. Students must sit in their assigned seats.

If a student fails to comply with the COVID-19 Code of Conduct or other University expectations related to COVID-19, the instructor may ask the student to leave the class. The instructor is asked to report the incident to the Office of Community Standards for additional follow-up.


COVID-19 Testing Compliance Statement

To protect the health of our community, Northwestern University requires unvaccinated students who are in on-campus programs to be tested for COVID-19 twice per week.

Students who fail to comply with current or future COVID-19 testing protocols will be referred to the Office of Community standards to face disciplinary action, including escalation up to restriction from campus and suspension.


Exceptions to Class Modality

Class sessions for this course will occur in person. Individual students will not be granted permission to attend remotely except as the result of an Americans with Disabilities Act (ADA) accommodation as determined by AccessibleNU.

Maintaining the health of the community remains our priority. If you are experiencing any symptoms of COVID do not attend class and update your Symptom Tracker application right away to connect with Northwestern’s Case Management Team for guidance on next steps. Also contact the instructor as soon as possible to arrange to complete coursework.

Students who experience a personal emergency should contact the instructor as soon as possible to arrange to complete coursework.

Should public health recommendations prevent in person class from being held on a given day, the instructor or the university will notify students.


Course Recordings

This class or portions of this class will be recorded by the instructor for educational purpose and available to the class during the quarter. Your instructor will communicate how you can access the recordings. [RV: On Canvas in the Panopto section.] Portions of the course that contain images, questions or commentary/discussion by students will be edited out of any recordings that are saved beyond the current term.


Prohibition of Recording of Class Sessions by Students

Unauthorized student recording of classroom or other academic activities (including advising sessions or office hours) is prohibited. Unauthorized recording is unethical and may also be a violation of University policy and state law. Students requesting the use of assistive technology as an accommodation should contact AccessibleNU. Unauthorized use of classroom recordings – including distributing or posting them – is also prohibited. Under the University’s Copyright Policy, faculty own the copyright to instructional materials – including those resources created specifically for the purposes of instruction, such as syllabi, lectures and lecture notes, and presentations. Students cannot copy, reproduce, display, or distribute these materials. Students who engage in unauthorized recording, unauthorized use of a recording, or unauthorized distribution of instructional materials will be referred to the appropriate University office for follow-up.


Support for Wellness and Mental Health

Northwestern University is committed to supporting the wellness of our students. Student Affairs has multiple resources to support student wellness and mental health. If you are feeling distressed or overwhelmed, please reach out for help. Students can access confidential resources through the Counseling and Psychological Services (CAPS), Religious and Spiritual Life (RSL) and the Center for Awareness, Response and Education (CARE). Additional information on all of the resources mentioned above can be found here:

  • https://www.northwestern.edu/counseling/
  • https://www.northwestern.edu/religious-life/
  • https://www.northwestern.edu/care/