Course Description
Hands-on introduction to computational methods in
empirical linguistic analysis and natural language
processing. Topics include language modeling, text
classification, linguistic annotation, computational
semantics, and machine translation. Students will implement
and apply computational models to real linguistic datasets,
and conclude the course with a final project.
Learning Objectives
- Understanding of key issues in several core areas of research in computational linguistics.
- Ability to implement a set of foundational algorithms for linguistic understanding from scratch.
- Experience with empirical evaluation of computational models, including error analysis.
- Practical familiarity with software packages for language processing in Python.
Schedule
Week | Dates | Content | Materials |
1 | 9/22 | Introduction
what is computational linguistics?, course overview and
policies, Quest login |
For Reference Relevant
Readings Slides [1]
|
2 | 9/27 9/29 |
Regular Expressions, Text Normalization, and
Edit Distance regexes, tokenization,
stemming, lemmatization, edit distance, dynamic
programming |
Before Class
In Class (as
time allows)
Relevant Readings
Slides [1, 2]
Assignment
1: Edit Distance [due 10/5]
|
3 | 10/4 10/6 |
Language Modeling n-grams,
perplexity, maximum likelihood estimation,
smoothing | Before
Class
In Class (for reference)
Relevant Readings
Slides [1, 2]
Assignment 2: Ngram LM [due
10/12]
|
4 | 10/11 10/13 |
Classification Foundations
supervised learning, naive bayes,
perceptron, generative vs. discriminative models,
data splits, evaluation metrics |
Before Class Relevant
Readings
Slides [1]
Assignment
3: Classification [due
10/19] |
5 | 10/18 10/20 |
Linguistic Structure and Annotation
NLP libraries, POS tagging, parsing,
NER, crowdsourcing, annotator agreement, ethical
and practical concerns |
Software Relevant Readings
Slides [1]
Midterm Self-Evaluation [due
10/26]
|
6 | 10/25 10/27 |
Computational Semantics 1
association metrics, word sense
disambiguation, semantic resources
|
Before Class Relevant Readings
Slides [1]
Assignment
4: Bias Audit [due 11/2]
|
7 |
11/1 11/3 | Computational
Semantics 2 vector space
semantics and embedding models, similarity
metrics |
Before Class
Relevant Readings
Slides [1, 2]
Assignment 5: Semantic Similarity
[due 11/12]
|
8 |
11/8 11/10 | Topic Models unsupervised learning,
graphical models basics, latent dirichlet allocation, gibbs sampling, k-means clustering, expectation-maximization algorithms |
Before Class
Relevant Readings
Slides [1 (Blei, slides 18-41)]
Assignment 6: Topic Modeling |
10 | 11/15 11/17 |
State of the Art sequence models,
neural networks, contextual embeddings, application
areas | Before Class
In Class (Wednesday 11/17)
- Final project idea generation, extra OH (attendance optional)
For Reference Relevant
Readings
Slides [1]
|
9 | 11/15 11/17 |
Applied NLP more
classification models, feature engineering,
software libraries, experimental design |
Slides [1] Final Project [due 12/9]
Final
Self-Evaluation [due 12/9]
|
* Coronavirus Note *
We
remain in a difficult situation in the world. Everyone is
distracted and anxious to some degree, we're all facing
challenges across the board, and our physical and mental health
(and those of our loved ones) are the top priorities. I trust we
will all be doing our best to adapt to the cirumstances and
giving each other a lot of leeway and understanding.
In
this context, I am particularly willing to be understanding of
problems that arise; in exchange I ask for communication on your
part. If need be, please just let me know what's going on and
I'll do what I can to help.
Materials
All
course materials are available for free online. We will refer
primarily to these two textbooks:
When assigned readings, before class I encourage you to
skim for basic understanding rather than detail. When doing the
assignments, refer back to get details.
Some of the
course materials are drawn from or inspired by relevant courses
at other institutions, which also serve as excellent resources
and points of reference, including:
Structure
Lectures, Readings, and Videos
This course will
incorporate components of a "partially flipped" course,
where occasionally we will do some reading or watching of
videos before class, and then spend time in class working
together on the assignments. All lectures will be recorded
on Zoom and available in Canvas.
Assignments
The coursework will be structured
around assignments to be completed individually, roughly one
per week (with a few breaks), aimed at giving you hands-on
practice with the material from that week.
Each
homework will have an autograder on Quest that helps you
check the accuracy of your outputs, and Thomas and I will
read your assignments to provide qualitative feedback. We
can be flexible with deadlines if circumstances arise, but
it's important to stay on top of these assignments because
we will be moving quickly from one topic to another.
Group Work and Peer Evaluation
A few of the
Wednesdays throughout the quarter will include time for
either in-class group work or peer evaluation of one
another's assignments. I strongly encourage you to attend
these synchronous class meetings if you are able!
Final Project
This course includes a final
project coponent. I'm very open as to what this could be.
Basically it's an opportunity for you to take a
self-directed approach to learning more about some topic in
this field. Midway through the quarter I'll ask you for
ideas on what you might do, and am always glad to consult on
any questions you might have. In terms of structure here's a
few possibilities:
- Develop and carry out
an independent project applying methods from this class
- Use techniques learned in class to advance your existing
research
- Carry out a detailed linguistic error
analysis on the outputs of an NLP system or systems
- Replicate a paper in the field (Rob will provide some
examples of good papers for this)
- Write up a
literature survey on a topic in the field
Regardless, the requirement is to present a writeup in
ACL
2020 format as well as your code (if any). At minimum,
if your project involves substantial coding I expect a 2-4
page writeup explaining what you did; if your project is
only written (e.g. lit survey or error analysis), I expect
6-8 pages.
Group projects of up to three members
are allowed, however I will expect the effort involved to
scale roughly linearly with the number of group members. If
you work in a group, you must include a paragraph at the end
of your writeup explaining who did what.
Evaluation
Not
a fan of grades, to be honest. Research has shown that
traditional numerical/letter grades decrease
intrinsic motivation and joy for learning, can
undermine performance, and are
potentially riddled with implicit bias. For more reading on
this topic:
Therefore, grades go
against my central goal for this course: getting you excited
about and engaged with the wonderful world of computational
linguistics. I am much more interested in helping you get what
you want to out of the course through qualitative evaluation for
your benefit. This will largely come in the form of written and
in-person feedback on your work from Thomas and me, as well as
peer evaluation from your classmates.
In the interest of
maintaining a healthy working relationship with the registrar,
however, I will submit final grades at the end of the quarter.
Below are the forms of evaluation we'll do and how much they'll
contribute to what I end up submitting.
Self-evaluation (50%)
You know at least as
well as I do how the course is going for you, so we'll have
two self-evaluations, at the middle and end of the class. In
the first week I'll send out a survey in which I ask you to
explain your goals for the course; then for each
self-evaluation I'll ask you to reflect on your process and
progress towards those goals, your participation in the
course, and ultimately to give yourself a grade and explain
your reasoning.
In doing these evaluations, here
are the kinds of questions I'll ask you to consider:
- Did you turn in your assignments on time? (Note: for me,
turning in assignments late with a good reason is equivalent
to turning them in on time.)
- Did you attend (or
watch later) in-class lectures as much as you could? (Note:
again, for me, missing class with a good reason is
equivalent to attending.)
- Did you keep up with
readings, videos, and in-class activites?
- Did you
spend any allotted time in breakout rooms on work for this
class?
- Did your assignments run all the way
through, and pass any tests?
- Did you reach out for
help when you needed it? (Note: doing this is
positive!!!)
- Did you collaborate with others to
contribute to our classroom community (in breakout rooms, by
helping on Ed, or outside of class)?
- Did you
challenge yourself, or did you do the minimum?
I hope to simply take your self-evaluation grades at
face value, although if your self-evaluation disagrees
significantly with my perception (in either direction) I may
ask you to meet with me to hash out why our impressions
differ.
Effortful Completion (50%)
Thomas and I, in turn, will be watching your process,
providing structures for learning, and trying to help keep
you on track. At the end of the quarter, I'll give you a
holistic grade for effortful completion of the assignments,
peer code review, participation (e.g. on Ed), and your final
project. My evaluation is also very liable to be influenced
by your self-evaluation and report of your process and
progress.
Inclusion Statement
I am
committed to creating an inclusive environment that actively
values the diversity of backgrounds, identities, and experiences
of everyone in the classroom. I welcome you to talk with me if
you have any feedback or if there's anything I can do to better
support you. If you'd prefer to contact me anonymously you can
do so using the form at the
bottom
of my faculty webpage.
University-Requested Syllabus Inclusions
Academic Integrity Statement
Students in this
course are required to comply with the policies found in the
booklet, "Academic Integrity at Northwestern University: A
Basic Guide". All papers submitted for credit in this course
must be submitted electronically unless otherwise instructed
by the professor. Your written work may be tested for
plagiarized content. For details regarding academic integrity
at Northwestern or to download the guide, visit:
https://www.northwestern.edu/provost/policies/academic-integrity/index.html
Accessibility Statement
Northwestern
University is committed to providing the most accessible
learning environment as possible for students with
disabilities. Should you anticipate or experience
disability-related barriers in the academic setting, please
contact AccessibleNU to move forward with the university’s
established accommodation process (e:
accessiblenu@northwestern.edu; p: 847-467-5530). If you
already have established accommodations with AccessibleNU,
please let me know as soon as possible, preferably within the
first two weeks of the term, so we can work together to
implement your disability accommodations. Disability
information, including academic accommodations, is
confidential under the Family Educational Rights and Privacy
Act.
COVID-19 Classroom Expectations Statement
Students, faculty, and staff must comply with University
expectations regarding appropriate classroom behavior,
including those outlined below and in the COVID-19 Code of
Conduct. With respect to classroom procedures, this includes:
- Policies regarding masking and social distancing
evolve as the public health situation changes. Students are
responsible for understanding and complying with current
masking, testing, Symptom Tracking, and social distancing
requirements.
- In some classes, masking and/or social
distancing may be required as a result of an Americans with
Disabilities Act (ADAccommodation for the instructor or a
student in the class even when not generally required on
campus. In such cases, the instructor will notify the
class.
- No food is allowed inside classrooms. Drinks
are permitted, but please keep your face covering on and use a
straw.
- Faculty may assign seats in some classes to
help facilitate contact tracing in the event that a student
tests positive for COVID-19. Students must sit in their
assigned seats.
If a student fails to
comply with the COVID-19 Code of Conduct or other University
expectations related to COVID-19, the instructor may ask the
student to leave the class. The instructor is asked to report
the incident to the Office of Community Standards for
additional follow-up.
COVID-19 Testing Compliance Statement
To
protect the health of our community, Northwestern University
requires unvaccinated students who are in on-campus programs
to be tested for COVID-19 twice per week.
Students
who fail to comply with current or future COVID-19 testing
protocols will be referred to the Office of Community
standards to face disciplinary action, including escalation up
to restriction from campus and suspension.
Exceptions to Class Modality
Class sessions for
this course will occur in person. Individual students will not
be granted permission to attend remotely except as the result of
an Americans with Disabilities Act (ADA) accommodation as
determined by AccessibleNU.
Maintaining the health of the community remains our priority.
If you are experiencing any symptoms of COVID do not attend class
and update your Symptom Tracker application right away to connect
with Northwestern’s Case Management Team for guidance on next steps.
Also contact the instructor as soon as possible to arrange to
complete coursework.
Students who experience a personal
emergency should contact the instructor as soon as possible to
arrange to complete coursework.
Should public health
recommendations prevent in person class from being held on a given
day, the instructor or the university will notify students.
Course Recordings
This class or portions of this class will be recorded by
the instructor for educational purpose and available to the
class during the quarter. Your instructor will communicate
how you can access the recordings. [RV: On Canvas in the
Panopto section.] Portions of the course that contain
images, questions or commentary/discussion by students will
be edited out of any recordings that are saved beyond the
current term.
Prohibition of Recording of Class Sessions by Students
Unauthorized student recording of classroom or other academic activities
(including advising sessions or office hours) is prohibited. Unauthorized
recording is unethical and may also be a violation of University policy and
state law. Students requesting the use of assistive technology as an
accommodation should contact AccessibleNU. Unauthorized use of classroom
recordings – including distributing or posting them – is also prohibited. Under
the University’s Copyright Policy, faculty own the copyright to instructional
materials – including those resources created specifically for the purposes of
instruction, such as syllabi, lectures and lecture notes, and presentations.
Students cannot copy, reproduce, display, or distribute these materials.
Students who engage in unauthorized recording, unauthorized use of a recording,
or unauthorized distribution of instructional materials will be referred to the
appropriate University office for follow-up.
Support for Wellness and Mental Health
Northwestern University is committed to supporting the
wellness of our students. Student Affairs has multiple
resources to support student wellness and mental health. If
you are feeling distressed or overwhelmed, please reach out
for help. Students can access confidential resources through
the Counseling and Psychological Services (CAPS), Religious
and Spiritual Life (RSL) and the Center for Awareness,
Response and Education (CARE). Additional information on all
of the resources mentioned above can be found here:
- https://www.northwestern.edu/counseling/
- https://www.northwestern.edu/religious-life/
- https://www.northwestern.edu/care/