Discourse Representation Theory, chapter 10 stuff

In [1]:
import nltk

In [2]:
read_dexpr = nltk.sem.DrtExpression.fromstring

In [4]:
drs1 = read_dexpr('([x,y],[angus(x), dog(y), own(x,y)])')

In [5]:
print(drs1)

([x,y],[angus(x), dog(y), own(x,y)])


In [6]:
drs1.draw()

In [7]:
print(drs1.fol())

exists x y.(angus(x) & dog(y) & own(x,y))


In [8]:
drs2 = read_dexpr('([x], [walk(x)]) + ([y], [run(y)])')

In [9]:
print(drs2)

(([x],[walk(x)]) + ([y],[run(y)]))


In [10]:
print(drs2.simplify())

([x,y],[walk(x), run(y)])


In [11]:
drs3 = read_dexpr('([], [(([x], [dog(x)]) -> ([y], [ankle(y), bite(x, y)]))])')

In [12]:
print(drs3.fol())

all x.(dog(x) -> exists y.(ankle(y) & bite(x,y)))


In [13]:
drs3.draw()

In [14]:
drs4 = read_dexpr('([x,y], [angus(x), dog(y), own(x,y)])')

In [15]:
drs5  = read_dexpr('([u, z], [PRO(u), irene(z), bite(u,z)])')

In [16]:
drs6 = drs4 + drs5

In [17]:
print(drs6)

(([x,y],[angus(x), dog(y), own(x,y)]) + ([u,z],[PRO(u), irene(z), bite(u,z)]))


In [18]:
print(drs6.simplify())

([u,x,y,z],[angus(x), dog(y), own(x,y), PRO(u), irene(z), bite(u,z)])


In [19]:
print(drs6.simplify().resolve_anaphora())

([u,x,y,z],[angus(x), dog(y), own(x,y), (u = [x,y,z]), irene(z), bite(u,z)])


In [20]:
print((drs6.simplify()).resolve_anaphora())

([u,x,y,z],[angus(x), dog(y), own(x,y), (u = [x,y,z]), irene(z), bite(u,z)])


In [21]:
from nltk import load_parser

In [22]:
parser = load_parser('grammars/book_grammars/drt.fcfg', logic_parser=nltk.sem.drt.DrtParser())

In [23]:
trees = list(parser.parse('Angus owns a dog'.split()))

In [24]:
print(trees[0].label()['SEM'].simplify())

([x,z2],[Angus(x), dog(z2), own(x,z2)])


Discourse coherence, also from chapter 10. Requires installing a prover that is not installed by default so this probably will not work for you.  I believe that I installed Prover9 in the past, which is why it worked for me.

In [25]:
dt = nltk.DiscourseTester(['A student dances', 'Every student is a person'])

In [26]:
dt.readings()


s0 readings:

s0-r0: exists z1.(student(z1) & dance(z1))

s1 readings:

s1-r0: all z1.(student(z1) -> person(z1))


In [27]:
dt.add_sentence('No person dances', consistchk=True)

Inconsistent discourse: d0 ['s0-r0', 's1-r0', 's2-r0']:
    s0-r0: exists z1.(student(z1) & dance(z1))
    s1-r0: all z1.(student(z1) -> person(z1))
    s2-r0: -exists z1.(person(z1) & dance(z1))



In [28]:
dt.retract_sentence('No person dances', verbose=True)

Current sentences are 
s0: A student dances
s1: Every student is a person


In [29]:
dt.add_sentence('A person dances', informchk=True)

Sentence 'A person dances' under reading 'exists x.(person(x) & dance(x))':
Not informative relative to thread 'd0'


Chapter 11 stuff, looking at the structure of corpora and a couple of specific things about the TIMIT corpus.

In [30]:
phonetic = nltk.corpus.timit.phones('dr1-fvmh0/sa1')

In [32]:
print(phonetic)

['h#', 'sh', 'iy', 'hv', 'ae', 'dcl', 'y', 'ix', 'dcl', 'd', 'aa', 'kcl', 's', 'ux', 'tcl', 'en', 'gcl', 'g', 'r', 'iy', 's', 'iy', 'w', 'aa', 'sh', 'epi', 'w', 'aa', 'dx', 'ax', 'q', 'ao', 'l', 'y', 'ih', 'ax', 'h#']


In [35]:
print(nltk.corpus.timit.word_times('dr1-fvmh0/sa1'))

[('she', 7812, 10610), ('had', 10610, 14496), ('your', 14496, 15791), ('dark', 15791, 20720), ('suit', 20720, 25647), ('in', 25647, 26906), ('greasy', 26906, 32668), ('wash', 32668, 37890), ('water', 38531, 42417), ('all', 43091, 46052), ('year', 46052, 50522)]


In [36]:
print(nltk.corpus.timit.fileids())

['dr1-fvmh0/sa1.phn', 'dr1-fvmh0/sa1.txt', 'dr1-fvmh0/sa1.wav', 'dr1-fvmh0/sa1.wrd', 'dr1-fvmh0/sa2.phn', 'dr1-fvmh0/sa2.txt', 'dr1-fvmh0/sa2.wav', 'dr1-fvmh0/sa2.wrd', 'dr1-fvmh0/si1466.phn', 'dr1-fvmh0/si1466.txt', 'dr1-fvmh0/si1466.wav', 'dr1-fvmh0/si1466.wrd', 'dr1-fvmh0/si2096.phn', 'dr1-fvmh0/si2096.txt', 'dr1-fvmh0/si2096.wav', 'dr1-fvmh0/si2096.wrd', 'dr1-fvmh0/si836.phn', 'dr1-fvmh0/si836.txt', 'dr1-fvmh0/si836.wav', 'dr1-fvmh0/si836.wrd', 'dr1-fvmh0/sx116.phn', 'dr1-fvmh0/sx116.txt', 'dr1-fvmh0/sx116.wav', 'dr1-fvmh0/sx116.wrd', 'dr1-fvmh0/sx206.phn', 'dr1-fvmh0/sx206.txt', 'dr1-fvmh0/sx206.wav', 'dr1-fvmh0/sx206.wrd', 'dr1-fvmh0/sx26.phn', 'dr1-fvmh0/sx26.txt', 'dr1-fvmh0/sx26.wav', 'dr1-fvmh0/sx26.wrd', 'dr1-fvmh0/sx296.phn', 'dr1-fvmh0/sx296.txt', 'dr1-fvmh0/sx296.wav', 'dr1-fvmh0/sx296.wrd', 'dr1-fvmh0/sx386.phn', 'dr1-fvmh0/sx386.txt', 'dr1-fvmh0/sx386.wav', 'dr1-fvmh0/sx386.wrd', 'dr1-mcpm0/sa1.phn', 'dr1-mcpm0/sa1.txt', 'dr1-mcpm0/sa1.wav', 'dr1-mcpm0/sa1.wrd', 'dr1-mc

In [38]:
print(dir(nltk.corpus.timit))

['_FILE_RE', '_UTTERANCE_RE', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__unicode__', '__weakref__', '_encoding', '_fileids', '_get_root', '_root', '_speakerinfo', '_tagset', '_unload', '_utterance_fileids', '_utterances', 'abspath', 'abspaths', 'audiodata', 'citation', 'encoding', 'ensure_loaded', 'fileids', 'license', 'open', 'phone_times', 'phone_trees', 'phones', 'play', 'readme', 'root', 'sent_times', 'sentid', 'sents', 'speakers', 'spkrid', 'spkrinfo', 'spkrutteranceids', 'transcription_dict', 'unicode_repr', 'utterance', 'utteranceids', 'wav', 'word_times', 'words']


In [42]:
nltk.corpus.timit.readme()

b'TIMIT CORPUS SAMPLE\n\nThis corpus contains a selection from the TIMIT Acoustic-Phonetic\nContinuous Speech Corpus, consisting of speech files, annotations,\nand associated materials:\n\n* 16 speakers from 8 dialect regions\n* 1 male and 1 female from each dialect region\n* total 130 sentences (10 sentences per speaker; note that some\n  sentences are shared among other speakers, sa1 and sa2\n  are spoken by all speakers.)\n* total 160 sentence recordings (10 recordings per speaker)\n* audio format: wav format, single channel, 16kHz sampling, 16 bit sample, PCM encoding\n\nLICENSE\n\nThis corpus sample is Copyright 1993 Linguistic Data Consortium,\nand is distributed under the terms of the Creative Commons\nAttribution, Non-Commercial, ShareAlike license.  http://creativecommons.org/\n\nTIMIT Corpus Description:\n\nThe TIMIT corpus of read speech is designed to provide speech data for\nacoustic-phonetic studies and for the development and evaluation of\nautomatic speech recognition s

Things generally related to the SHRDLU thing, chapter 4 on Python structure, and whatever comes up.

First bit here is about what the `global` line is doing.

In [52]:
recent_objects = []

def add_recent(obj):
    global recent_objects
    recent_objects = ([obj] + recent_objects)[:5]

In [53]:
recent_objects

[]

In [54]:
add_recent('a')

In [55]:
recent_objects

['a']

In [56]:
def add_recent2(obj, robj):
    return ([obj] + robj[:5])

In [57]:
recent_objects = add_recent2('b', recent_objects)

In [58]:
recent_objects

['b', 'a']

A note about using `enumerate`, and commenting things out with `#`

In [62]:
papers = ['paper1', 'paper2']
authors = ['author1', 'author2']
def collect_measures():
#     for n in range(len(papers)):
    # go through the positions
    for posn, paper in enumerate(papers):
        author = authors[posn]
#         paper = papers[n]
        print("{}: {}".format(author, paper))
    return True

In [63]:
collect_measures()

author1: paper1
author2: paper2


True

Demonstration of variable scope and local vs global variable access

In [64]:
test_var = 4
def test_function():
    print(test_var)

In [65]:
test_function()

4


In [66]:
test_var

4

In [67]:
test_var = 4
def test_function():
    test_var = 9
    print(test_var)

In [68]:
print(test_var)

4


In [69]:
test_function()

9


In [70]:
print(test_var)

4
