12 Dec 2016
The final has now been posted.
It’s extensively documented in there. It’s due by the last
day of finals, when the final would have been. You may
feel free to do it during the time the final would have been
scheduled if you wish, but you can of course do it sooner too.
03 Dec 2016
Homework 7 has now been posted.
The link above or in the sidebar (“Semantics”) will get you there.
It’s due next Friday, Dec. 9, and it will be the last one of the
semester.
Much like the other ones, it’s a lot of reading (largely reading that
elaborates on but basically follows what’s happening in chapter 10),
with occasionally interspersed things for you to do. So it might be
considered to be more of a reading assignment that has exercises
in it.
20 Nov 2016
I’ve had a couple of questions about how to approach task 2 on homework 6, so let me
share a little program that has kind of an analogous “memory” (or “history”) property
to what we need there.
There is an example in the book of how to get a part-of-speech analyzer to use the
history of its choices, but the task there was a little bit different.
Here, really all we want to do is add “previous result” to the set of features
that we know about a post, so we want to go through the posts, extract the training
features (which are features like “contains(the)” for the words), and add a feature
named “prev-class” that holds the previous feature.
Perhaps it is due to idiosyncracies of the kind of programming I usually do, but there
is a kind of “idiom” that I use all the time to do this kind of thing, where an element
in a list needs to know something about the previous element. So, let me present that
to you here, as a kind of conceptual model of how it might be done.
Suppose that we have a list of words like this:
basic_words = ['the', 'cat', 'pounced']
Suppose that for whatever reason, we want to convert this into a list like this:
word_list = [{'start': 't', 'word': 'the', 'prev': 'NONE'},
{'start': 'c', 'word': 'cat', 'prev': 'the'},
{'start': 'p', 'word': 'pounced', 'prev': 'cat'}]
This is a list of dictionaries. There is a dictionary for each word.
In that dictionary, we have values for ‘start’, ‘word’, and ‘prev’.
The value of ‘start’ is the first letter, the value of ‘word’ is the word,
and the value of ‘prev’ is the preceding word.
print(word_list[0]['word'])
the
print(word_list[1]['prev'])
the
So what we want to do is define a function that will take a list of words
and return a list of of these dictionaries, where one of the things we need
to know is what the previous word was.
I would do this as follows:
def words2dicts(words):
return_list = [] # we are building this list, it starts out empty
prev_word = "NONE" # at the beginning, there is no previous word, so use "NONE"
for w in words:
# build the feature dictionary
f = {'word': w, 'start': w[0]}
# add in the previous word too
f['prev'] = prev_word
# add this dictionary to our running list of dictionaries
return_list.append(f)
# set prev_word to be the word we just processed
prev_word = w
# return the list we built
return return_list
In words2dicts
above, I built f
, the feature dictionary, in two steps. The first
step defined the keys ‘word’ and ‘start’, and the second step added one more, ‘prev’.
This could all have been done in one step, but I thought it would be useful to see
both ways of adding to the dictionary.
The “idiom” I was referring to before is just the practice of using a variable to
remember what we processed the previous time through a loop. So, I set it to
something that means “nothing” before we start the loop, and then once we have
processed a word w, I set that variable to be the word w we just processed. Then,
the next time through the loop, we can use that variable to refer to the word
we did the previous time.
This is probably simpler than precomputing all the tags ahead of time and finding
them in a list. The memory is “fleeting” in a way, because we don’t remember in
any long-term way what words we have processed (except to the extent that the word
itself is recorded in the dictionaries we are adding to the overall list), all we
remember is just the immediately preceding word.
print(basic_words)
['the', 'cat', 'pounced']
dictlist = words2dicts(basic_words)
print(dictlist)
[{'word': 'the', 'prev': 'NONE', 'start': 't'}, {'word': 'cat', 'prev': 'the', 'start': 'c'},
{'word': 'pounced', 'prev': 'cat', 'start': 'p'}]
print(dictlist == word_list)
True
Anyway, this example has the basic contours of the function you’d probably want to
create for task 2 in the homework. There are other ways to do it, but this one
is the one that’s always my first reflex.
One other thing to be careful about, specifically about task 2 and Naive Bayes
Classifiers: the structure of the training and test set must be a list of pairs.
The pairs are in the form (dictionary1, answer1), and what the classifier is doing
is trying to learn what features of the dictionrary correspond to different answers,
so that it can look at a new dictionary and guess the answer. So, what you are
trying to make in task 2 is a list of that form, and the real problem is
essentially how to add “prev-class” as one of the features contained within the
dictionary of features in the first position of each of those pairs.
14 Nov 2016
Homework 6 has now been posted.
The link above or in the sidebar (“Classification”) will get you there.
It’s due next Monday, Nov. 21.
28 Oct 2016
People are still finding homework 5 to be challenging, so I’m going to extend it just
a little bit further, and I’m going to try to spend the last half of today going over
things that are relevant or that people have questions on. So, even if you didn’t ask,
Monday’s now a fine day to hand the homework in.
This may wind up bumping homework 6 out of the schedule completely, since we’re coming
up on the BUCLD weekend when many of you (and I) will be occupied in one way or another.
We’ll see.
But anyway, if you already turned in homework 5 and are happy with it, today can be a
short day if you want. I’ll say a couple of things to close out the tagging chapter,
but then I’ll want to go back to talk a bit about trouble people have run into in
chapter 3 and with those exercises I assigned.