Course Description
This course offers a practical introduction to programming and the analysis of natural language text as quantitative data. Aimed at linguists, social scientists, and humanists with little-to-no programming background,
students will
gain hands-on familiarity with Unix command line tools for text processing, basic programming and web scraping in Python, and algorithmic thinking concepts like abstraction and decomposition. We will study how to clean and
organize
linguistic datasets, and how to apply methods from computational linguistics including regular expressions, syntactic parsing, and vector representations of meaning. The course will conclude with a final project in which
students curate
and analyze a new dataset.
Schedule
Week |
Dates |
Content |
Materials |
1 |
1/12 1/14 |
Intro, Unix, Shell, Environment, Files
course overview, terminal, Quest login, ssh, man, cd, ls, mkdir, mv, cp, echo
|
Before Class
For Reference
Assignment 1 [due 1/17]
Slides from in-class lecture
|
|
2 |
1/19 1/21 |
Command-line Text Processing
less, cat, wget, emacs, I/O redirection, text filters (sed, sort, uniq, tr, head, tail, cut), scripting
|
Before Class
For Reference
Assignment 2 [due 1/24] and Solutions
Slides (Part 1, Part 2) from in-class lecture
|
|
3-4 |
1/26 1/28 2/2 |
Basic Python 1
REPL, variables, types, errors, print, control flow, loops, functions, built-ins, range
|
Before Class
- Think Python, Chapters 1, 2, 3, 5.1-5.7, 6.1-6.4, 7, 8, 10 (reading, ~50 pages)
For Reference
Assignment 3 [due 2/3] and Solutions
Slides (Part 1) from in-class lecture
|
|
4-5 |
2/4 2/9 2/11 |
Basic Python 2
standard libraries (random, string), sets, dictionaries
|
Before Class
- Think Python, Chapters 9, 11, 12, 13, 14 (reading, ~40 pages)
For Reference
Assignment 4 [due 2/14] and Solutions
Slides (Part 1, Part 2) from in-class lecture
|
|
6 |
2/16 2/18 2/23 |
Basic Python 3
jupyter, pickle, defaultdict, Counter, reading complex files
|
Before Class
For Reference
Assignment 5 [due 2/24] and Solutions (and as HTML)
Slides from in-class lecture
|
|
7-8 |
2/25 3/2 3/4 |
Python for Text 1
json, csv, regular expressions, external libraries (NLTK, spacy), tokenization, lemmatization, POS tagging
|
Before Class
In Class
For Reference
Assignment 6 [due 3/7] and Solutions (and as HTML)
Slides (Part 1) from in-class lecture
|
|
9 |
3/9 3/11 |
Python for Text 2
dependency syntax, n-grams, word vectors, text classification
|
Before Class
For Reference
Assignment 7 (and as HTML) [due 3/18, exam day]
Slides [1, 2] from in-class lecture
|
* Coronavirus Note *
We remain in a difficult situation in the world. Everyone is distracted and anxious to some degree, we're all facing new challenges across the board, and our physical and mental health (and those of our loved ones) are the top priorities. I trust
we will all be doing our best to adapt to the cirumstances and giving
each other a lot of leeway and understanding. Here's
a reasonable approximation of how I'm thinking of this from a friend of a friend.
Luckily enough this class is well-suited to a remote format, and I think it can still be a fun and productive time that will hopefully provide a welcome antidote to cabin fever. Please just let me know if you run into any
difficulties and I'll do what I can to help.
Materials
All course materials are available for free online. In addition to informative videos from assorted sources, we'll use these free ebooks:
When assigned readings, before class I encourage you to skim for basic understanding rather than detail. When doing the assignments, refer back to get details of the syntax and so on.
Some of the course materials are drawn from or inspired by relevant courses at other institutions, which also serve as excellent resources and points of reference, including:
Lastly, there are many useful websites to know about for more practice with the skills we'll learn in this class. Here are a few:
Structure
Lectures, Readings, and Videos
This course is intended as a "partially flipped" course, where we will do some reading or watching of videos before class, and then spend the majority of time in class working together on the assignments. Each week in the
schedule above
has material listed as "Before Class" which you'll be expected to watch/read before class that week. I will also frequently give short lectures at the beginning of class which will be recorded as well, with links posted here on the
syllabus.
Assignments
The coursework will be structured around assignments, roughly one per week, aimed at giving you practice with the material from that week. We will work on these in class together, and my goal is that by the end of our Thursday class you'll
have finished a good deal of the week's assignment if not the entire thing.
Assignments will often have a "core" section and an "extra" section - I expect you to complete the core section each week, and the extra section is available if you finish early or want more practice. I'll evaluate your
assignments weekly and sometimes ask you to re-submit if large changes are necessary.
Peer Code Review
Reading other people's code is a super important programming skill that not only helps you collaborate with others, but also lets you
stand on the shoulders of giants by reading and learning from code out in the world.
Therefore in this course we'll practice this skill by doing some code review of your classmates! Once or twice throughout the course (when exactly TBD) we'll have an activity where I ask you review each others' code. You can comment on correctness, coding style, let them know if you have easier ways to do the same thing, or let them know if you learned something from reading their code.
Final Project
This course has a final project component, where you'll use your new skills to do something useful and interesting to you. I'm extremely open as to what this could be. Possible examples include:
- Scraping a website with interesting language on it and processing it into an organized format (e.g. JSON or CSV)
- Using an API to obtain an interesting set of language data (e.g. from Twitter or Reddit) and processing it into an organized format
- Writing a program to perform some useful operation on linguistic data, for instance munging interview transcripts or Praat textgrids
- Making progress on organizing/cleaning/manipulating language data related to your research
In the second half of the quarter I'll ask you to submit a (very short) final project proposal where you tell me what you're planning to do. If you're struggling to come up with something, email me or come to office hours to talk out some
possibilities.
Evaluation
Not a fan of grades, to be honest. Research has shown that traditional numerical/letter grades decrease intrinsic motivation and joy for learning, can
undermine performance,
and are potentially riddled with implicit bias. For more reading on this topic:
Therefore, grades go against my central goal for this course: getting you excited about and engaged with the wonderful world of programming and text processing. I am much more interested in helping you get what you want to out
of the
course through qualitative evaluation for your benefit. This will largely come in the form of written and in-person feedback on your work from Thomas and me, as well as peer evaluation from your classmates.
In the interest of maintaining a healthy working relationship with the registrar, however, I will submit final grades at the end of the quarter.
Below are the forms of evaluation we'll do and how much they'll contribute to what I end up submitting.
Self-evaluation (50%)
You know at least as well as I do how the course is going for you, so we'll have two self-evaluations, at the middle and end of the class.
In the first week we'll have an activity in which I ask you to explain your goals for the course; then for each self-evaluation I'll ask you to reflect on your process and progress towards those goals, your participation in
the course,
and ultimately to give yourself a grade and explain your reasoning.
I hope to simply take your self-evaluation grades at face value, although if your self-evaluation disagrees significantly with my perception (in either direction)
I may ask you to meet with me to hash out why our impressions differ.
Effortful Completion (50%)
Learning programming is about leaning into the struggle, embracing bugs and errors, and pushing through the frustration of getting your brain to think in a new and weird way. I am confident that you'll ultimately be able to
get through
all the assignments, but it'll take some real effort.
If you're struggling to figure something out, the steps to take are:
- Carefully read and think about the errors you're getting, double-check your code, try again
- Read the relevant documentation (e.g. man pages or Python documentation)
- Come to office hours or post asking for help on Piazza
If all else fails, write in the assignment a description of what you tried and what isn't working.
Each week will build on the material of the previous weeks, so this course very much relies on you keeping up throughout. At the end of the quarter, I'll give you a holistic grade for effortful completion of the assignments,
peer code
review, participation (e.g. on Piazza), and your final project. If you participate and put in a solid effort on the work for the course, I anticipate that you'll get full credit.
Inclusion Statement
I am committed to creating an inclusive environment that actively values the diversity of backgrounds, identities, and experiences of everyone in the classroom. I welcome you to talk with me if you have any feedback or if there's
anything
I can do to better support you. If you'd prefer to contact me anonymously you can do so using the form at the
bottom of my faculty webpage.
University-Requested Syllabus Inclusions
This class or portions of this class will be recorded by the instructor for educational purposes. These recordings will be shared only with students enrolled in the course and will be deleted at the end of the Winter Quarter.
Your
instructor will communicate how you can access the recordings. [RV: On Canvas in the cloud recordings tab in the Zoom section.]
Unauthorized student recording of classroom or other academic activities (including advising sessions or office hours) is prohibited. Unauthorized recording is unethical and may also be a violation of University policy and
state law.
Students requesting the use of assistive technology as an accommodation should contact AccessibleNU. Unauthorized use of classroom recordings — including distributing
or posting
them
— is also prohibited. Under the University’s Copyright Policy, faculty own the copyright to instructional materials — including those resources created specifically for the purposes of instruction, such as syllabi, lectures
and lecture
notes, and presentations. Students cannot copy, reproduce, display or distribute these materials. [RV: You're welcome to share slides, assignments, and other materials with anyone.] Students who engage in unauthorized
recording,
unauthorized use of a recording or unauthorized distribution of instructional materials will
be referred to the appropriate University office for follow-up.
Contingency Plan
Given the tricky situation in the world at the moment, the university has asked us to provide contingency plans. If you receive a notification that I am ill and cannot teach, continue work on any assignments that have yet
been posted on
the course website on the normal weekly schedule. All of the textbook readings we are doing have exercises at the end of each chapter; for weeks where an assignment has not yet been posted, do a reasonable subset of the
exercises at the
end of the chapters for that week, and submit those you do as a Python file in your user directory on Quest. Zoom meetings at the normal times are pre-scheduled through the end of the quarter, and the links will still work,
as will
Piazza. I encourage you to coordinate and work with your classmates on Zoom or Piazza.