The final is here

The final has now been posted. It’s extensively documented in there. It’s due by the last day of finals, when the final would have been. You may feel free to do it during the time the final would have been scheduled if you wish, but you can of course do it sooner too.

  • homework

Homework 7

Homework 7 has now been posted.

The link above or in the sidebar (“Semantics”) will get you there. It’s due next Friday, Dec. 9, and it will be the last one of the semester.

Much like the other ones, it’s a lot of reading (largely reading that elaborates on but basically follows what’s happening in chapter 10), with occasionally interspersed things for you to do. So it might be considered to be more of a reading assignment that has exercises in it.

  • homework

Fleeting memory

I’ve had a couple of questions about how to approach task 2 on homework 6, so let me share a little program that has kind of an analogous “memory” (or “history”) property to what we need there.

There is an example in the book of how to get a part-of-speech analyzer to use the history of its choices, but the task there was a little bit different.

Here, really all we want to do is add “previous result” to the set of features that we know about a post, so we want to go through the posts, extract the training features (which are features like “contains(the)” for the words), and add a feature named “prev-class” that holds the previous feature.

Perhaps it is due to idiosyncracies of the kind of programming I usually do, but there is a kind of “idiom” that I use all the time to do this kind of thing, where an element in a list needs to know something about the previous element. So, let me present that to you here, as a kind of conceptual model of how it might be done.

Suppose that we have a list of words like this:

basic_words = ['the', 'cat', 'pounced']

Suppose that for whatever reason, we want to convert this into a list like this:

word_list = [{'start': 't', 'word': 'the', 'prev': 'NONE'},
    {'start': 'c', 'word': 'cat', 'prev': 'the'},
    {'start': 'p', 'word': 'pounced', 'prev': 'cat'}]

This is a list of dictionaries. There is a dictionary for each word. In that dictionary, we have values for ‘start’, ‘word’, and ‘prev’. The value of ‘start’ is the first letter, the value of ‘word’ is the word, and the value of ‘prev’ is the preceding word.

print(word_list[0]['word'])
the
print(word_list[1]['prev'])
the

So what we want to do is define a function that will take a list of words and return a list of of these dictionaries, where one of the things we need to know is what the previous word was.

I would do this as follows:

def words2dicts(words):
    return_list = []   # we are building this list, it starts out empty
    prev_word = "NONE" # at the beginning, there is no previous word, so use "NONE"
    for w in words:
        # build the feature dictionary
        f = {'word': w, 'start': w[0]}
        # add in the previous word too
        f['prev'] = prev_word
        # add this dictionary to our running list of dictionaries
        return_list.append(f)
        # set prev_word to be the word we just processed
        prev_word = w
    # return the list we built
    return return_list

In words2dicts above, I built f, the feature dictionary, in two steps. The first step defined the keys ‘word’ and ‘start’, and the second step added one more, ‘prev’. This could all have been done in one step, but I thought it would be useful to see both ways of adding to the dictionary.

The “idiom” I was referring to before is just the practice of using a variable to remember what we processed the previous time through a loop. So, I set it to something that means “nothing” before we start the loop, and then once we have processed a word w, I set that variable to be the word w we just processed. Then, the next time through the loop, we can use that variable to refer to the word we did the previous time.

This is probably simpler than precomputing all the tags ahead of time and finding them in a list. The memory is “fleeting” in a way, because we don’t remember in any long-term way what words we have processed (except to the extent that the word itself is recorded in the dictionaries we are adding to the overall list), all we remember is just the immediately preceding word.

print(basic_words)
['the', 'cat', 'pounced']
dictlist = words2dicts(basic_words)
print(dictlist)
[{'word': 'the', 'prev': 'NONE', 'start': 't'}, {'word': 'cat', 'prev': 'the', 'start': 'c'},
{'word': 'pounced', 'prev': 'cat', 'start': 'p'}]
print(dictlist == word_list)
True

Anyway, this example has the basic contours of the function you’d probably want to create for task 2 in the homework. There are other ways to do it, but this one is the one that’s always my first reflex.

One other thing to be careful about, specifically about task 2 and Naive Bayes Classifiers: the structure of the training and test set must be a list of pairs. The pairs are in the form (dictionary1, answer1), and what the classifier is doing is trying to learn what features of the dictionrary correspond to different answers, so that it can look at a new dictionary and guess the answer. So, what you are trying to make in task 2 is a list of that form, and the real problem is essentially how to add “prev-class” as one of the features contained within the dictionary of features in the first position of each of those pairs.

  • homework

Homework 6

Homework 6 has now been posted.

The link above or in the sidebar (“Classification”) will get you there. It’s due next Monday, Nov. 21.

  • homework

H o m e w o o o o r k ... F i i i v e

People are still finding homework 5 to be challenging, so I’m going to extend it just a little bit further, and I’m going to try to spend the last half of today going over things that are relevant or that people have questions on. So, even if you didn’t ask, Monday’s now a fine day to hand the homework in.

This may wind up bumping homework 6 out of the schedule completely, since we’re coming up on the BUCLD weekend when many of you (and I) will be occupied in one way or another. We’ll see.

But anyway, if you already turned in homework 5 and are happy with it, today can be a short day if you want. I’ll say a couple of things to close out the tagging chapter, but then I’ll want to go back to talk a bit about trouble people have run into in chapter 3 and with those exercises I assigned.

  • homework