10 Oct 2016
TL;DR: Take-home midterm, out Friday, due Wednesday.
We are almost to the middle of the semester. It is hard to imagine that.
However, this means that we are also coming up on the promised midterm.
The midterm was supposed to be Monday in class, according to the schedule.
You may guess that my phrasing it that way suggests that I’m rethinking that.
Generally speaking, I don’t think the stuff we’ve been doing really lends
itself that well to an in-class midterm. This is in significant part a
“practical” course, and perhaps even more so here in this first half.
I could provide a written test, though most of what we’ve been doing is
getting Python on the computer to obey our commands. So, it seems more
appropriate for the test to be (at least in part) in the same Python
environment we’ve been working in. But the classes are 50 minutes long,
and one technical mishap and half the time evaporates. So, to
relieve the time pressure, etc., the midterm will be a “take-home.”
It is still a midterm, not a homework, and it will be midterm-sized,
not homework-sized. It should still take about an hour, not several,
and it won’t be on new things but testing your facility with what we
have covered, with possibly some slight extensions.
So, the plan is: I will provide you with the midterm on Friday, and
it will be due by class on Wednesday. Friday will still be a review
day, but Monday will be a regular day when we will start new things,
and Wednesday I intend to give out the next homework.
What have we done?
The topics in the schedule and what has been mentioned in class are
what I will draw the midterm material from.
The sections of the NLTK book below are what I will restrict my
attention to. We also talked a bit about CHILDES, which is not in
the book, though we did not get too far with it. But I still might
ask something about it, so be familiar with basically what it is
and what we did with it.
Chapter 1 |
basic corpora, frequency distributions |
Chapter 2 |
working with corpora, conditional frequency distributions, pronunciation corpus, WordNet |
Chapter 3, sec. 9 |
formatting output |
Chapter 4, sec. 1-4 |
basic Python, variables, functions |
Chapter 8, sec. 1-4.3 |
parsing with grammars |
Generally speaking, if we actually talked about something in the classroom
or it was involved in a homework, it has a relatively high likelihood
of coming up in the midterm. If it was in the readings mentioned
above and we didn’t talk about it directly, it still might come up, but
the likelihood is smaller. So, preparing by re-reading those parts of
the book would be sensible. Preferably before Friday so you can ask
questions about anything you’re not clear about.
The midterm will ask you how to do some basic things in Python and
with NLTK, and will set up a couple of problems to consider, describe,
or solve. I will provide the ground rules with the midterm, but this
is not going to be a “closed-book test” of any sort. You’ll want to
know the material basically, but if you forget a detail but know where
to find it, you can look it up. (You can’t ask your roommate/classmate,
but you can look it up.) What I want you to do mostly is
understand how it works and why we’re doing it, and I will try to
focus the questions on that. But, really, in real life after this
course, if you want to do something with NLP, you would look it up.
I want you to be able to find things and look them up efficiently, but
there’s no real point in memorizing things just to take a test on them.
06 Oct 2016
Ok, I’m happy enough with Homework 4 now that
I’ll call it complete. I’m not really sure how long it will take, though
I think it is going to take you a lot less long to do than it took to write.
Let me know if you see typos, or find anything unclear or not working.
05 Oct 2016
Because it’s now the middle of the night, I’m posting just the
first part of Homework 4 right now.
I will continue to expand this tomorrow a bit more. However,
if you want to get started on it, the part that is there is
mainly notes on helping make sure that you can get the parser
running on your computer. So, it would make sense to go through
that part first, particularly if you had any trouble getting it
to work in class.
I’ll post again when the homework is actually ready. Right
now it just covers very basic tree parsing and drawing, like
we did in class, and adding in the capability of doing adjectives.
What’s coming up are some further exercises involving complex
sentences (one sentence inside another), locating subjects and
objects by position in the tree, differentiating between
transitive and intransitive verbs, and some initial explorations
of relative clauses (like the person who wrote the article or
the person who I met).
30 Sep 2016
As usual, 50 minutes arrived pretty fast. I’d hoped to talk a bit about the haiku generator,
but it wasn’t necessary, really, since the thing itself more or less explains everything as it
goes along.
I started with CHILDES, but all we did really is go through the basic characteristics of the corpora
in there and started getting NLTK to recognize it. So, I’ll continue with that next time.
If you want to see what I’d been planning to do, or to replicate what was done in class
(since this is not actually in the NLTK book), here is
the presentation I was following. It was mostly just
sketching what I was going to be typing, not what happened.
I’ll make a more elaborate version for next time, but if you wanted to see where I was headed,
you can.
Also, notes on installing CHILDES:
- On the CHILDES site,
you want the XML version of the database
- You need to put this somewhere NLTK can find it. The canonical place is in a folder
called
nltk_data
inside the “home” directory. This is not immediately reachable on a
default MacOS/Mac OS X installation, so you may need to go to “Documents” and then
to “Enclosing Folder” from the “Go” menu. The icon should be a little house.
- You may need to create the
nltk_data
folder if it wasn’t already there. It’s
not there on the lab computers, but it might be there on your own.
- Then, the appropriate place for it to be is, inside a
corpora
folder, inside a childes
folder, inside a data-xml
folder, inside a Eng-USA-MOR
folder (assuming you downloaded
something from the CHILDES Eng-USA-MOR directory, like Brown.zip
). That is:
~/nltk_data/corpora/childes/data-xml/Eng-USA-MOR/
.
- Once the corpus is in place the stuff we did in class should work.
Next time, we can try to actually find something substantive out about child
acquisition of English using this.
28 Sep 2016
I’ve changed the due date of homework 3 (haiku generator) to Monday.
Also, I’ve put the presentation slides from today online.
They are linked from the schedule page from the title of today’s topic.
I’ll plan regularly to link things in that way when there are future presentation slides.
It’s just a web page, but it’s smart enough to let you page through it with the arrow keys.
(I made these using Remark, which seems like a pretty “lightweight”
way to make such things, without needing PowerPoint or Keynote or anything installed.)
If it doesn’t work for you immediately, try just refreshing the page—but it probably
should “just work.”
I didn’t get to slides 12-15, maybe I’ll come back and briefly talk about them on Friday.