Tagging

CAS LX 390 / NLP/CL Homework 6
GRS LX 690 Fall 2017 due Tue 10/31

These are from the problems at the end of chapter 5 in the NLTK book. The earlier ones are easier than the later ones.

14

Use sorted() and set() to get a sorted list of tags used in the Brown corpus, removing duplicates.

15

Write programs to process the Brown Corpus and find answers to the following questions:

  • Which nouns are more common in their plural form, rather than their singular form? (Only consider regular plurals, formed with the -s suffix.)
  • Which word has the greatest number of distinct tags. What are they, and what do they represent?
  • List tags in order of decreasing frequency. What do the 20 most frequent tags represent?
  • Which tags are nouns most commonly found after? What do these tags represent?

20

Write code to search the Brown Corpus for particular words and phrases according to tags, to answer the following questions:

  • Produce an alphabetically sorted list of the distinct words tagged as MD.
  • Identify words that can be plural nouns or third person singular verbs (e.g., deals, flies).
  • Identify three-word prepositional phrases of the form IN+DET+NN (e.g., in the lab).
  • What is the ratio of masculine to feminine pronouns?

34

There are 264 distinct words in the Brown Corpus having exactly three possible tags.

  • Print a table with the integers 1 to 10 in one column (one number per row) and the number of distinct words in the corpus have the associated number (1-10) distinct tags in the second column.
  • For the word with the greatest number of distinct tags, print out sentences from the corpus containing the word, one sentence for each of the possible tags.