Course Description

​This course offers a practical introduction to programming and the analysis of natural language text as quantitative data. Aimed at linguists, social scientists, and humanists with little-to-no programming background, students will gain hands-on familiarity with Unix command line tools for text processing, basic programming and web scraping in Python, and algorithmic thinking concepts like abstraction and decomposition. We will study how to clean and organize linguistic datasets, and how to apply methods from computational linguistics including regular expressions, syntactic parsing, and vector representations of meaning. The course will conclude with a final project in which students curate and analyze a new dataset.


Learning Objectives

  • Practical familiarity with command line text manipulation in Unix and basic programming in the Python programming language.
  • Understanding of basic tasks in computational text processing and familiarity with relevant software packages.
  • Introductory understanding of algorithmic thinking and problem decomposition for computational analysis of language-related questions.


Schedule

Week Dates Content Materials
1 1/5 Intro, Unix, Shell, Environment, Files
course overview, terminal, Quest login, ssh, man, cd, ls, mkdir, mv, cp, echo

Outside Class

For Reference

Assignment 1 [due 1/11]

Slides from in-class lecture

2 1/10
1/12
Command-line Text Processing

less, cat, wget, emacs, I/O redirection, text filters (sed, sort, uniq, tr, head, tail, cut), scripting

Outside Class

For Reference

Assignment 2 [due 1/18]

Slides from in-class lecture

3-4 1/17
1/19
1/24
Basic Python 1

REPL, variables, types, errors, print, control flow, loops, functions, built-ins, range

Outside Class

  • Think Python, Chapters 1, 2, 3, 5.1-5.7, 6.1-6.4, 7, 8, 10 (reading, ~50 pages)

For Reference

Assignment 3 [due 1/27]

Slides from in-class lecture

4-5 1/26
1/31
2/2
Basic Python 2

standard libraries (random, string), sets, dictionaries

Outside Class

  • Think Python, Chapters 9, 11, 12, 13, 14 (reading, ~40 pages)

For Reference

Assignment 4 [due 2/5]

Slides and Part 2 from in-class lecture

Midterm Self-Evaluation [due 2/8]

6-7 2/7
2/9
2/14
2/16
Basic Python 3

assignment review, jupyter, pickle, defaultdict, Counter, reading complex files

Outside Class

For Reference

Assignment 5 [due 2/20]

Slides and Part 2 from in-class lecture

8-9 2/21
2/23
2/28
Python for Text 1

json, csv, regular expressions, external libraries (NLTK, spacy), tokenization, lemmatization, POS tagging

Outside Class

In Class

For Reference

Assignment 6 [due 3/3]

Slides from in-class lecture

9-10 3/2
3/7
Python for Text 2

dependency syntax, n-grams, word vectors, text classification; research, applications, where to go from here?

Outside Class

For Reference

Assignment 7 - Final Project [due 3/15, exam day]

Slides and Part 2 from in-class lecture


Materials

All course materials are available for free online. In addition to informative videos from assorted sources, we'll use these free ebooks:

When assigned readings before class I encourage you to skim for basic understanding rather than detail. When doing the assignments, refer back to get details of the syntax and so on.

Some of the course materials are drawn from or inspired by relevant courses at other institutions, which also serve as excellent resources and points of reference, including:

Lastly, there are many useful websites to know about for more practice with the skills we'll learn in this class. Here are a few:


Structure

Lectures, Readings, and Videos

This course is intended as a "partially flipped" course, where we will do some reading or watching of videos Outside Class, and then spend the majority of time in class working together on the assignments. Each week in the schedule above has material listed as "Outside Class" which you'll be expected to watch/read outside class (generally before class if possible) that week. I will also frequently give short lectures at the beginning of class which will be recorded as well, with links posted here on the syllabus.


Assignments

The coursework will be structured around assignments, roughly one per week, aimed at giving you practice with the material from that week. We will work on these in class together, and my goal is that by the end of the last class for a given unit you'll have finished a good deal of the week's assignment if not the entire thing.

Assignments will often have a "core" section and an "extra" section - I expect you to complete the core section each week, and the extra section is available if you finish early or want more practice. We'll evaluate your assignments weekly and sometimes ask you to re-submit if large changes are necessary.


Peer Code Review

Reading other people's code is a super important programming skill that not only helps you collaborate with others, but also lets you stand on the shoulders of giants by reading and learning from code out in the world. Therefore in this course we'll practice this skill by doing some code review of your classmates! Once or twice throughout the course (when exactly TBD) we'll have an activity where I ask you review each others' code. You can comment on correctness, coding style, let them know if you have easier ways to do the same thing, or let them know if you learned something from reading their code.


Final Project

This course has a final project component, where you'll use your new skills to do something useful and interesting to you. I'm extremely open as to what this could be. Possible examples include:

  • Scraping a website with interesting language on it and processing it into an organized format (e.g. JSON or CSV)
  • Using an API to obtain an interesting set of language data (e.g. from Twitter or Reddit) and processing it into an organized format
  • Writing a program to perform some useful operation on linguistic data, for instance munging interview transcripts or Praat textgrids
  • Making progress on organizing/cleaning/manipulating language data related to your research
In the second half of the quarter I'll ask you to submit a (very short) final project proposal where you tell me what you're planning to do. If you're struggling to come up with something, email me or come to office hours to talk out some possibilities.


Evaluation

Not a fan of grades, to be honest. Research has shown that traditional numerical/letter grades decrease intrinsic motivation and joy for learning, can undermine performance, and are potentially riddled with implicit bias. For more reading on this topic:

Therefore, grades go against my central goal for this course: getting you excited about and engaged with the wonderful world of programming and text processing. I am much more interested in helping you get what you want to out of the course through qualitative evaluation for your benefit. This will largely come in the form of written and in-person feedback on your work from Grace and me, as well as peer evaluation from your classmates.

In the interest of maintaining a healthy working relationship with the registrar, however, I will submit final grades at the end of the quarter. Below are the forms of evaluation we'll do and how much they'll contribute to what I end up submitting.


Self-evaluation (50%)

You know at least as well as I do how the course is going for you, so we'll have two self-evaluations, at the middle and end of the class. In the first week we'll have an activity in which I ask you to explain your goals for the course; then for each self-evaluation I'll ask you to reflect on your process and progress towards those goals, your participation in the course, and ultimately to give yourself a grade and explain your reasoning.

I hope to simply take your self-evaluation grades at face value, although if your self-evaluation disagrees significantly with my perception (in either direction) I may ask you to meet with me to hash out why our impressions differ.

If you'd like a better idea of exactly what this looks like, click here to see the text used in prior courses in presenting the self-evaluation question regarding grades.

Considerations for Grading

Below I will ask you to evaluate your work on a traditional letter grade scale. As you see in the syllabus, this choice will weigh heavily in the grade I ultimately submit for you to the registrar.

I take this very seriously and hope you do as well.

My interpretation of an "A" is something like, "I put in a strong, consistent effort and worked hard to learn - I was engaged throughout, did all the work, and my performance was the best it could have been given my starting point and individual circumstances."

My interpretation of a "B" is something like, "I gave it a solid go, but this class was not my priority this quarter and I could have done better. I did most of the work but sometimes did it cursorily, late, incompletely, or without feeling that I understood. My performance was fine given my starting point and individual circumstances but if I had invested more in this class I might have learned more."

My interpretation of a "C" is something like, "I kind of barely made it through. I had trouble getting things done on time, or didn't sufficiently reach out for help, or avoided the work for this class. I entirely skipped some readings or homework problems, but I managed to still get the idea of the material more or less. In sum, my performance didn't quite meet my own expectations for myself."

I ask you to interpret this holistically and be kind to yourself - you can still get an A if you had a few slip-ups here and there. But I also ask you to be honest with yourself about your performance.

Some questions to consider as you make this determination:

  • Did you turn in your assignments on time? (Note: for me, turning in assignments late with a good reason is equivalent to turning them in on time.)
  • Did you attend (or watch later) in-class lectures as much as you could? (Note: again, for me, missing class with a good reason is equivalent to attending.)
  • Did you keep up with readings, videos, and in-class activites?
  • Did you spend the allotted time in workshopping time on work for this class?
  • Did your assignments run all the way through?
  • Did you manage to pass all the tests for problems which had them?
  • Did you reach out for help when you needed it? (Note: doing this is positive!!!)
  • Did you collaborate with others to contribute to our classroom community (in breakout rooms, by helping on piazza, or outside of class)?
  • Did you challenge yourself, or did you do the minimum?

As I've stated on the syllabus and will explain in class, if I disagree with your assessment (in either direction), I reserve the right to reach out to you for clarification or further discussion, and ultimately the right to make a different determination myself if we can't work it out. I think the latter especially is relatively unlikely, and in that event I promise to provide you a substantial justification of my reasoning.


Effortful Completion (50%)

Learning programming is about leaning into the struggle, embracing bugs and errors, and pushing through the frustration of getting your brain to think in a new and weird way. I am confident that you'll ultimately be able to get through all the assignments, but it'll take some real effort. If you're struggling to figure something out, the steps to take are:

  • Carefully read and think about the errors you're getting, double-check your code, try again
  • Read the relevant documentation (e.g. man pages or Python documentation)
  • Come to office hours or post asking for help on Ed
If all else fails, write in the assignment a description of what you tried and what isn't working.

Each week will build on the material of the previous weeks, so this course very much relies on you keeping up throughout. At the end of the quarter, I'll give you a holistic grade for effortful completion of the assignments, peer code review, participation (both in class and on Ed), and your final project. If you participate and put in a solid effort on the work for the course, I anticipate that you'll get full credit.


Inclusion Statement

I am committed to creating an inclusive environment that actively values the diversity of backgrounds, identities, and experiences of everyone in the classroom. I welcome you to talk with me if you have any feedback or if there's anything I can do to better support you. If you'd prefer to contact me anonymously you can do so using the form at the bottom of my faculty webpage.

University-Requested Syllabus Inclusions

This class or portions of this class will be recorded by the instructor for educational purposes. These recordings will be shared only with students enrolled in the course and will be deleted at the end of the Winter Quarter. Your instructor will communicate how you can access the recordings. [RV: On Canvas in the cloud recordings tab in the Zoom section.]

Unauthorized student recording of classroom or other academic activities (including advising sessions or office hours) is prohibited. Unauthorized recording is unethical and may also be a violation of University policy and state law. Students requesting the use of assistive technology as an accommodation should contact AccessibleNU. Unauthorized use of classroom recordings — including distributing or posting them — is also prohibited. Under the University’s Copyright Policy, faculty own the copyright to instructional materials — including those resources created specifically for the purposes of instruction, such as syllabi, lectures and lecture notes, and presentations. Students cannot copy, reproduce, display or distribute these materials. [RV: You're welcome to share slides, assignments, and other materials with anyone.] Students who engage in unauthorized recording, unauthorized use of a recording or unauthorized distribution of instructional materials will be referred to the appropriate University office for follow-up.