Course Description

​This course offers a practical introduction to programming and the analysis of natural language text as quantitative data. Aimed at linguists, social scientists, and humanists with little-to-no programming background, students will gain hands-on familiarity with Unix command line tools for text processing, basic programming and web scraping in Python, and algorithmic thinking concepts like abstraction and decomposition. We will study how to clean and organize linguistic datasets, and how to apply methods from computational linguistics including regular expressions, syntactic parsing, and vector representations of meaning. The course will conclude with a final project in which students curate and analyze a new dataset.


Schedule

Week Dates Content Materials
1 4/6
4/8
Intro, Unix, Shell, Environment, Files
course overview, terminal, Quest login, ssh, man, cd, ls, mkdir, mv, cp, echo

Before Class

For Reference

Assignment 1

Slides from in-class lecture

2 4/13
4/15
Command-line Text Processing

less, cat, wget, emacs, I/O redirection, text filters (sed, sort, uniq, tr, head, tail, cut), scripting

Before Class

For Reference

Assignment 2 [solutions]

Slides from in-class lecture

3-4 4/20
4/22
4/27
Basic Python 1

REPL, variables, types, errors, print, control flow, loops, functions, built-ins, range

Before Class

  • Think Python, Chapters 1, 2, 3, 5.1-5.7, 6.1-6.4, 7, 8, 10 (reading, ~50 pages)

For Reference

Assignment 3 [solutions]

Slides from in-class lecture

4-5 4/29
5/4
5/6
Basic Python 2

standard libraries (random, string), sets, dictionaries

Before Class

  • Think Python, Chapters 9, 11, 12, 13, 14 (reading, ~40 pages)

For Reference

Assignment 4 [solutions]

Slides [1, 2] from in-class lecture

6 5/11
5/13
Basic Python 3

jupyter, pickle, defaultdict, Counter, reading complex files

Before Class

For Reference

Assignment 5 [Solutions, and as HTML]

Slides [1, 2] from in-class lecture

7-8 5/18
5/20
5/27
Python for Text 1

json, csv, regular expressions, external libraries (NLTK, spacy), tokenization, lemmatization, POS tagging

Before Class

In Class

For Reference

Assignment 6 [due Friday 5/29]

9 6/1
6/3
Python for Text 2

dependency syntax, n-grams, word vectors, text classification

Before Class

For Reference

Assignment 7 / Final Project
[due Tuesday 6/9, exam day]


* Coronavirus Note *

This is a wild situation. Everyone is distracted and anxious to some degree, we're all facing new challenges across the board, and our physical and mental health (and those of our loved ones) are the top priorities. I trust we will all be doing our best to adapt to the cirumstances and giving each other a lot of leeway and understanding. Here's a reasonable approximation of how I'm thinking of this from a friend of a friend.

Luckily enough this class is well-suited to a remote format, and I think it can still be a fun and productive time that will hopefully provide a welcome antidote to cabin fever. Please just let me know if you run into any difficulties and I'll do what I can to help.


Materials

All course materials are available for free online. In addition to informative videos from assorted sources, we'll use these free ebooks:

When assigned readings, before class I encourage you to skim for basic understanding rather than detail. When doing the assignments, refer back to get details of the syntax and so on.

Some of the course materials are drawn from or inspired by relevant courses at other institutions, which also serve as excellent resources and points of reference, including:

Lastly, there are many useful websites to know about for more practice with the skills we'll learn in this class. Here are a few:


Structure

Lectures, Readings, and Videos

This course is intended as a "partially flipped" course, where we will do some reading or watching of videos before class, and then spend the majority of time in class working together on the assignments. Each week in the schedule above has material listed as "Before Class" which you'll be expected to watch/read before class that week. I may also give short lectures at the beginning of class which will be recorded as well, with links posted here on the syllabus.


Assignments

The coursework will be structured around weekly assignments aimed at giving you practice with the material from that week. We will work on these in class together, and my goal is that by the end of our Wednesday class you'll have finished a good deal of the week's assignment if not the entire thing.

Assignments will often have a "core" section and an "extra" section - I expect you to complete the core section each week, and the extra section is available if you finish early or want more practice. I'll evaluate your assignments weekly and sometimes ask you to re-submit if large changes are necessary.


Peer Code Review

Reading other people's code is a super important programming skill that not only helps you collaborate with others, but also lets you stand on the shoulders of giants by reading and learning from code out in the world. Therefore in this course we'll practice this skill by doing some code review of your classmates! Approximately twice throughout the course (when exactly TBD) I'll make random assignments between students, and ask you each to review the other's code. You can comment on correctness, coding style, let them know if you have easier ways to do the same thing, or let them know if you learned something from reading their code.


Final Project

This course has a final project component, where you'll use your new skills to do something useful and interesting to you. I'm extremely open as to what this could be. Possible examples include:

  • Scraping a website with interesting language on it and processing it into an organized format (e.g. JSON or CSV)
  • Using an API to obtain an interesting set of language data (e.g. from Twitter or Reddit) and processing it into an organized format
  • Writing a program to perform some useful operation on linguistic data, for instance munging interview transcripts or Praat textgrids
  • Making progress on organizing/cleaning/manipulating language data related to your research
During week 5 I'll ask you to submit a (very short) final project proposal where you tell me what you're planning to do. If you're struggling to come up with something, email me or come to office hours to talk out some possibilities.


Evaluation

Not a fan of grades, to be honest. Research has shown that traditional numerical/letter grades decrease intrinsic motivation and joy for learning, can undermine performance, and are potentially riddled with implicit bias. For more reading on this topic:

Therefore, grades go against my central goal for this course: getting you excited about and engaged with the wonderful world of programming and text processing. I am much more interested in helping you get what you want to out of the course through qualitative evaluation for your benefit. This will largely come in the form of written and in-person feedback on your work from me, as well as peer evaluation from your classmates.

In the interest of maintaining a healthy working relationship with the registrar, however, I will submit final grades at the end of the quarter. Below are the forms of evaluation we'll do and how much they'll contribute to what I end up submitting.

Update: Northwestern has moved to mandatory Pass/No Pass grading for this quarter. Hooray!


Self-evaluation (50%)

You know at least as well as I do how the course is going for you, so we'll have two self-evaluations, at the middle and end of the class. In the first week we'll have an activity in which I ask you to explain your goals for the course; then for each self-evaluation I'll ask you to reflect on your process and progress towards those goals, your participation in the course, and ultimately to give yourself a grade and explain your reasoning.

I hope to simply take your self-evaluation grades at face value, although if your self-evaluation disagrees significantly with my perception (in either direction) I may ask you to meet with me to hash out why our impressions differ.


Effortful Completion (50%)

Learning programming is about leaning into the struggle, embracing bugs and errors, and pushing through the frustration of getting your brain to think in a new and weird way. I am confident that you'll ultimately be able to get through all the assignments, but it'll take some real effort. If you're struggling to figure something out, the steps to take are:

  • Carefully observe and think about the errors you're getting, double-check your code, try again
  • Read the relevant documentation (e.g. man pages or Python documentation)
  • Come to (zoom) office hours or post asking for help on Piazza
If all else fails, write in the assignment a description of what you tried and what isn't working.

Each week will build on the material of the previous weeks, so this course very much relies on you keeping up throughout. At the end of the quarter, I'll give you a holistic grade for effortful completion of the assignments, peer code review, participation (e.g. on Piazza), and your final project. If you participate and put in a solid effort on the work for the course, I anticipate that you'll get full credit.


Inclusion Statement

I am committed to creating an inclusive environment that actively values the diversity of backgrounds, identities, and experiences of everyone in the classroom. I welcome you to talk with me if you have any feedback or if there's anything I can do to better support you. If you'd prefer to contact me anonymously you can do so using the form at the bottom of my faculty webpage.

University-Requested Syllabus Inclusions

This class or portions of this class will be recorded by the instructor for educational purposes. These recordings will be shared only with students enrolled in the course and will be deleted at the end of the Spring Quarter. Your instructor will communicate how you can access the recordings. [RV: On Canvas in the cloud recordings tab in the Zoom section.]

Unauthorized student recording of classroom or other academic activities (including advising sessions or office hours) is prohibited. Unauthorized recording is unethical and may also be a violation of University policy and state law. Students requesting the use of assistive technology as an accommodation should contact AccessibleNU. Unauthorized use of classroom recordings — including distributing or posting them — is also prohibited. Under the University’s Copyright Policy, faculty own the copyright to instructional materials — including those resources created specifically for the purposes of instruction, such as syllabi, lectures and lecture notes, and presentations. Students cannot copy, reproduce, display or distribute these materials. [RV: You're welcome to share slides, assignments, and other materials with anyone.] Students who engage in unauthorized recording, unauthorized use of a recording or unauthorized distribution of instructional materials will be referred to the appropriate University office for follow-up.


Contingency Plan

Given the tricky situation in the world at the moment, the university has asked us to provide contingency plans. If you receive a notification that I am ill and cannot teach, continue work on any assignments that have yet been posted on the course website on the normal weekly schedule. All of the textbook readings we are doing have exercises at the end of each chapter; for weeks where an assignment has not yet been posted, do a reasonable subset of the exercises at the end of the chapters for that week, and submit those you do as a Python file in your user directory on Quest. Zoom meetings at the normal times are pre-scheduled through the end of the quarter, and the links will still work, as will Piazza. I encourage you to coordinate and work with your classmates on Zoom or Piazza.