Hats

CAS LX 390 /	NLP/CL	Homework 5
GRS LX 690	Fall 2017	due Tue 10/17

Task 1

Define a function called non_five_odds that will take an integer n as an argument, and return a list of all the odd numbers between 0 and n except 5.

Notes:

“Between” means “between”; 7 is not between 0 and 7. 7 is between 0 and 8.
“…integer n” here means that you are not responsible for how non_five_odds(10.5) behaves. It does not need to reject 10.5 as being a non-integer, but it does not matter what the function returns in that case.
non_five_odds(10) should return [1, 3, 7, 9].
You can accomplish this however you want. There are several possible approaches to this.
If you divide one number by another, the result is a float (floating point number, a number with a decimal point), not an integer. (Some of the possible approaches require dividing.)
It is nearly certain that range() will be useful, but note that the number you provide to range() must be an integer.
You can use int() to convert a float to an integer (the integer toward 0): int(29/10) == 2.

Task 2

Use Python to count how many different words in “Alice’s Adventures in Wonderland” contain the sequence hat.

Notes / assumptions:

“hat” is one word, no matter how many times it occurs.
“hat” and “Hat” are not different words.

Task 3

Use Python to create a list containing any sentences in “Alice’s Adventures in Wonderland” in which the actual word “hat” appears, and provide the resulting list.

Notes:

“what” or “hate” or “hatter” are not the actual word “hat”. Only “hat” is.
the list containing sentences should contain text strings, rather than containing lists of words. Something like ['I have a hat .', 'You have a hat .'] and not [['I', 'have', 'a', 'hat', '.'],['You', 'have', 'a', 'hat', '.']]
I was a bit surprised by the answer I got. If you find the answer surprising, it might nevertheless still be correct.

Task 4

Use WordNet to provide the senses of the word “hat”, the definition of each, hypernyms of each, and examples of each.

Notes:

You can do this in several steps. That is, first find the different meanings/senses that “hat” can have, then go through the list of meanings and find the definition, then go through the list and find the hypernyms, etc.
By “definition” I mean just the one that is already in the WordNet corpus for each sense.
Not all of the senses have examples within WordNet. Make up one of your own if there isn’t one for any given sense, based on the definition and hypernyms you found for that sense.
I found some of this surprising too. Who knew there was so much left to be surprised about in the domain of hats?

Task 5

Get the CHILDES data and be sure you can recreate what happened in class.

Compare the N:V ratio over time of Adam, and Sarah, to Eve. Show a graph, and describe what it tells us.
Compare the N:V ratio of the parents of Adam, Eve, and Sarah to each other. Show a graph, and describe what it tells us.
Plot the N:V ratio over time for Eve against MLUs.

Note: It seems to be necessary to convert MLUs into an actual list for the purpose of using it in a graph.

Below is an abbreviated version of what happened in class. Note that there may be path problems, graphs may act differently. Let me know if you run into trouble, I’ll try to collate information about it here on the blog.

from nltk.corpus.reader import CHILDESCorpusReader
brown = CHILDESCorpusReader('nltk_data/corpora/childes/data-xml/Eng-USA-MOR/', 'Brown/.*.xml')
def nvratio(f, speaker=['CHI']):
    ws = brown.tagged_words(f, speaker=speaker)
    ns = [w for (w,p) in ws if p and p[0] == 'n']
    vs = [w for (w,p) in ws if p and p[0] == 'v']
    nns = len(ns)
    nvs = len(vs)
    ratio = nns/nvs
    return(ratio)
eve = [f for f in brown.fileids() if f[6:9] == 'Eve']
mlus = brown.MLU(eve)
mlusl = list(mlus)
from matplotlib import pyplot as plt
age_months = [brown.age(f, month=True)[0] for f in eve]
eve_rat = [nvratio(f) for f in eve]
plt.plot(age_months, eve_rat)
plt.plot(age_months, mlusl)