Final

CAS LX 390 / NLP/CL Homework 8
GRS LX 690 Fall 2017 due Tue 11/21

SHRDLU

The first part of this we did in class. Mostly up to the point where the world representation was finished. The parsing and actions are left for you to do, but this write-up walks through it in such detail that, though it might take a little while what with all the reading, it should be quite do-able.

NOTE: DO READ THROUGH THIS, THERE ARE A COUPLE OF TASKS TO DO EVEN IN THE PART WE COVERED IN CLASS.

As we saw briefly in class, SHRDLU is Terry Winograd’s 1971 natural language “understander” that took user input, interpreted it in the context of a small “block world” and seemed to be pretty good at conversing about it.

What we’re going to do here is make a “little” version of this. It’s still actually pretty extensive, but there will still be a lot left to do by the time we’re done.

There are few different separable parts of this endeavor, some more related to Python than they are to NLTK, but by the end you should have a cool little program you can interact with in a limited way.

The basic parts of the program are:

  • representation of the objects in the world
  • grammar for syntactic parsing and semantic composition
  • display module to show the current state of the world
  • user input loop
  • interpretation of user input to respond

The goal by the end is going to be for “SHRDLU” to be able to answer questions like “Is the red block on an even square” and perform actions like “put a pyramid on a block”.

Note: There is a lot of reading here, and in most cases I’m giving you the pieces of the program. It should generally be possible to copy and paste from this web page into the Python text editor so it does not all need to be typed. But even if you are copying and pasting it you should be trying to understand what it is doing, why it is written that way, how it works. Despite this being a very long web page, I am not asking you to do that many things, so it shouldn’t take all that long.

I would suggest doing this in Spyder rather than in Jupyter Notebook (as I did in class) because it’s kind of designed to be in a single file that gets fully re-executed every time you test it (rather than just executing it piece by piece, which is the standard mode of using Jupyter Notebook). Either way can work, but the write-up here basically assumes you’re using something like Spyder.

Setting up the world

The setup of the world is essentially the same as what we did in homework 7. We are creating a model of the world, with a domain of individuals, and a valuation function that determines how the objects are arranged in the world and what properties they have.

Rather than having a 3D world, for simplicity and since it doesn’t really matter, our world just has 8 squares in a line on which things can be piled.

So, let’s begin. Start with this (save it in a file like shrdlu.py):

import nltk
print("Setting up the world.")
squares = ['s1', 's2', 's3', 's4', 's5', 's6', 's7', 's8']
dom = {'a', 'b', 'c', 'd', 'e'} | set(squares)
valstr = """
square => {s1, s2, s3, s4, s5, s6, s7, s8}
odd => {s1, s3, s5, s7}
even => {s2, s4, s6, s8}
block => {a, b}
pyramid => {c, e}
table => {d}
thing => {a, b, c, d, e}
red => {a}
blue => {b, e}
green => {c, d}
on => {(a,s1),(b,s2),(d,s4),(c,d)}
"""
val = nltk.sem.Valuation.fromstring(valstr)
val['held'] = {('e',)}
m = nltk.Model(dom, val)
g = nltk.Assignment(dom)

This completely specifies the world now. We have 13 objects, 8 of which are squares that represent the floor (4 of which are odd, 4 of which are even), and 5 of which are shapes of various kinds (block, pyramid, table) with various properies (red, big, etc.). Three of the objects are on the floor, one object is on another one, and one is in the robot hand.

Looking at the world

It’s kind of nice to be able to visualize what’s happening, so we next want to set up a way to display the state of the world. Rather than risk using graphics, we will be satisfied with a text representation.

So, the plan is this: Each of the floor squares will represent a column, and we will stack shapes up from the floor. A little bit above the stacks will be the robot hand and the shape that it is holding.

This means that the first order of business is going to be to figure out what is stacked up in each column. This can be accomplished like this:

stacks = [build_stack(square) for square in squares]

But now of course we need to write the build_stack() function. Which we will, in a bit. Think first about how we’ll do this. We can assume that the current floor square is at the bottom of the stack. And we have information in our model about what things are on what other things, so what we want to do is ask what is on square 1, then what’s on the thing on square 1, and so forth, to build a column from the bottom. So, we need to be able to figure out what is on an object. We can do this by looking at the model to see if something stands in an on relation to the object. (We will assume there is only one—that is, that an object cannot have two things directly on top of it.)

def whats_on(obj):
    # ask: what is on the current support?
    f = nltk.sem.Expression.fromstring("on(x,s)")
    g2 = nltk.Assignment(dom, [('s', obj)])
    try:
        next_obj = list(m.satisfiers(f, 'x', g2))[0]
    except:
        next_obj = None
    return next_obj

This is fairly straightforward. We set up a formula that contains two variables (x and s) and set an assignment to point to the obj we are checking with s. Then we ask the model for a set of the xes that are satisfiers of on(x,s). The reason that we put the check for satisfiers in a try block is that if there are no individuals in the domain that satisfy the formula (that is, if nothing is on this object), then it throws an error. By using try...except we can check for the error and, if there is an error, set next_obj to be None. As for what’s happening in the successful case…

Task. Describe what is happening in the line between try and except. That is, why did I enclose m.satsifiers(...) in list(...)[0]?

So, we now have a function whats_on(obj) that gives us back what object is on obj, or None if nothing is on obj. We’re well on our way to constructing the stacks now, we just need to see what’s on a square, what’s on that, what’s on that, until we’re at the top.

Activity. Try it out. You should be able to compile what you have so far, so run it and then see what’s on square 1, square 3, square 4, and on ‘d’.

def build_stack(square):
    stack = [square]
    while True:
        next_obj = whats_on(stack[-1])
        if next_obj:
            stack.append(next_obj)
        else:
            break
    return stack

So, here, we start by putting the square at the bottom of the stack, and we look for what’s on the last object in the stack repeatedly until there isn’t anything more. If whats_on() returns None it will break out of the while True loop and return the stack we have built.

Activity. Try it out, this should also work. Just give build_stack() a square and it should give you the stack on that square (including the square). Square 4 is the only one that has more than one thing on it.

Now that we have 8 stacks, we need to figure out a way to represent them and draw them from the top down. We’ll do this by fashioning “text graphics” that represent the shapes, with a label that indicates something about the properties the object has. So, before we can construct the whole image of the world, we need to be able to draw an individual block, pyramid, etc.

I have made an arbitrary decision that each shape will be 4 lines tall and 8 characters wide. So, we will define a function draw_shape(obj) that will return an array of four 8-character strings, to be displayed vertically.

Here’s the basic idea in a simple version before we do the whole thing:

def draw_shape(obj):
    s = r"""
/------\
|      |
|{: ^6}|
\------/
""".format('label!')
    return s.split("\n")[1:-1]

To make it easier to work with, it defines the shape as a multi-line string using the r""" delimiter. The r (for “raw text”) is important there, because we are using the \ character, and we do not want it to be interpreted as “escaping” the next character. Notice too that we have to put the shape all the way on the left (not indented) or else it will include a bunch of leading spaces we do not waht to have. So, we draw the shape we want, including a format string for the label in the third line, and then insert the string “label!” in there. The format string says that the inserted string should be 6 characters long (minimally), and padded with spaces on either side in order for it to be centered. Once the multi-line string is defined in s, it is then split on line breaks (\n), and then the first and last lines are discarded.

Activity. Try it out, see what draw_shape('a') gives you.

If you want to see the shape actually drawn:

print("\n".join(draw_shape('a')))

With the proof of concept out of the way, we can now make a more sophisticated draw_shape(obj) function that will check to see what type of object obj is, draw the right shape for the object, and fill in the label based on the other (non-shape) properties.

But, wait, before we can do that, we need to be able to figure out what properties an object has. As far as I know, we cannot directly ask the model this, we need to compute it ourselves. So, let’s detour to create that function first.

def obj_properties(obj):
    return {v for v in val if (obj,) in val[v]}

Task. What are the properties of object a?

One of those properties (block) is relevant to knowing what shape we are going to draw, and the rest of the properties will be used for the label. So, let’s also define a function that takes the properties and separates out the shape property, then builds a label with the rest of the properties by using the first two letters of at most three of them (since we have only 6 characters) to work with. (So, a big red block would be a block carrying a label of “bire” or “rebi”.)

def obj_form(obj):
    properties = obj_properties(obj) - {'held', 'thing'}
    shape = {'block', 'pyramid', 'table', 'square'} & properties
    abbrevs = [prop[:2] for prop in properties - shape]
    label = "".join(abbrevs)[:6]
    return(shape.pop(), label)

So, what’s happening here? We retrieve the object’s properties, and then find which shape property the object has (by intersecting the object properties with a set of shape properties). We want to disregard the “held” property (which is true of the object that is currently in the robot hand), because we don’t want that as part of the label. Same for “thing”, which is a property of all of the shapes. We then take the rest of the properties (properties - shape) and extract the first two letters of the property name into a list, join them together, cut it off at 6 characters (in case it would have been longer). We then return a pair, the first member of which is the shape property, and the second member of which is the label. (The reason for shape.pop() there is that shape is a set containing just one property, the shape property. We’d rather just have that property, so we “pop” out an arbitrary member of the set.)

Task. Try this out. What is the shape and label of object a?

Now, we’re all set to finish up the draw_shape() function so that it can draw more than just a block. There are two special shapes that we want to be able to draw apart from blocks, pyramids, squares, and tables: the robot hand, and an empty space. So, we’re going to set up draw_shape() to respond to a couple of “magic” values. If we send it "HAND" as the object, it will draw the hand, and if we send it None it will draw an empty space. Otherwise, it will look up the object in the model and get its properties and label.

If you type this in (rather than copying and pasting it), be careful in particular about the pyramid and in the line under the robot hand. There are spaces after the right edge, and under the hand, since every line in the multi-line string needs to be 8 characters long.

def draw_shape(obj):
    if obj:
        if obj == "HAND":
            s = r"""
       /
======<=
       \
        
"""
        else:
            (shape, label) = obj_form(obj)
            if shape == 'block':
                s = r"""
/------\
|      |
|{: ^6}|
\------/
""".format(label)
            elif shape == 'pyramid':
                s = r"""
   /\   
  /  \  
 /    \ 
/{:_^6}\
""".format(label)
            elif shape == 'table':
                s = r"""
|------|
|      |
|{: ^6}|
|      |
""".format(label)
            else:
                s = r"""
========
|======|
|{:=^6}|
|======|
""".format(label)
        return s.split("\n")[1:-1]
    else:
        return [' '*8]*4

And now we can finish up the “draw the world” part of this. Arguably not the most relevant part for the computational linguistics course, but it helps to be able to see what’s going on, and it’s good Python practice.

One thing that we will define now (and also use later) is a function that returns the object that the robot is currently holding. It’s a bit clumsy to determine this, so it is clearer if we hide it away in its own function:

def obj_in_hand():
    if len(val['held']) == 0:
        return None
    else:
        return list(val['held'])[0][0]

So, we start off making a list of the stacks (as we outlined a long time ago now), then build a list of rows up from the bottom until we wind up at a row that has nothing in it. Then, we add a short row (2 lines), and a row with the hand and held object (if any) in it. And then we draw it in reverse (from the top down) so that the squares are at the bottom.

def draw(m, g):
    stacks = [build_stack(square) for square in squares]
    # build up from the bottom
    rows = []
    while True:
        current_row = len(rows)
        empty = True
        row = []
        for stack in stacks:
            if current_row < len(stack):
                row.append(draw_shape(stack[current_row]))
                empty = False
            else:
                row.append(draw_shape(None))
        if empty:
            break
        rows.append(row)
    rows.append([[' ',' ']])
    rows.append([draw_shape('HAND'), draw_shape(obj_in_hand())])
    for row in reversed(rows):
        for line in range(len(row[0])):
            print("".join([col[line] for col in row]))

You should be able to see the world now, with draw(m,g).

Task. This is possibly kind of challenging, but to see if you’re following along: Suppose you wanted to put | between each pair of columns. How would you change the draw() function to accomplish that? See below for “before” and “after” displays. It’s actually a pretty small change, once you see where to make it. You can feel free to change it back afterwards.

Current display:

       /   /\   
======<=  /  \  
       \ /    \ 
        /__bl__\
 
 
                           /\                                   
                          /  \                                  
                         /    \                                 
                        /__gr__\                                
/------\/------\        |------|                                
|      ||      |        |      |                                
|  re  ||  bl  |        |  gr  |                                
\------/\------/        |      |                                
================================================================
|======||======||======||======||======||======||======||======|
|==od==||==ev==||==od==||==ev==||==od==||==ev==||==od==||==ev==|
|======||======||======||======||======||======||======||======|

Goal display:

       /|   /\   
======<=|  /  \  
       \| /    \ 
        |/__bl__\
 
 
        |        |        |   /\   |        |        |        |        
        |        |        |  /  \  |        |        |        |        
        |        |        | /    \ |        |        |        |        
        |        |        |/__gr__\|        |        |        |        
/------\|/------\|        ||------||        |        |        |        
|      |||      ||        ||      ||        |        |        |        
|  re  |||  bl  ||        ||  gr  ||        |        |        |        
\------/|\------/|        ||      ||        |        |        |        
========|========|========|========|========|========|========|========
|======|||======|||======|||======|||======|||======|||======|||======|
|==od==|||==ev==|||==od==|||==ev==|||==od==|||==ev==|||==od==|||==ev==|
|======|||======|||======|||======|||======|||======|||======|||======|

Talking about the world

Now that the representation of the world is under control, we can move to the interpretation of English sentences. Unlike when SHRDLU was designed initially, we have NLTK available to make this a bit more straightforward.

To begin, we can add the following on to our file in progress:

print("Setting up the grammar.")
from nltk import grammar

Now we need to construct the grammar. As before, we will do this by defining a big multi-line string and then letting NLTK parse it into a FeatureGrammar from the string. But it is big and somewhat complicated, so we will build it up in parts.

First of all, we might as well automate the simple parts, so for the simple lexical items, we can define sets of words that will be converted into lines in the string we will eventually parse as a grammar.

nouns = {'block', 'pyramid', 'table', 'square', 'thing'}
adjectives = {'red', 'blue', 'green', 'white', 'big', 'small', 'odd', 'even'}
prepositions = {'on'}

Given those sets of lexical items, we can define functions that will construct the lines of the grammar corresponding to them, as below. Nouns and adjectives are simple 1-place predicates, and prepositions are 2-place predicates.

# NP[SEM=<\x.block(x)>] -> 'block'
def npstring(noun):
    return r"NP[SEM=<\x.{n}(x)>] -> '{n}'".format(n=noun)

# Adj[SEM=<\x.big(x)>] -> 'big'
def adjstring(adj):
    return r"Adj[SEM=<\x.{a}(x)>] -> '{a}'".format(a=adj)

# P[SEM=<\X x.X(\y.on(x,y))>] -> 'on'
def pstring(prep):
    return r"P[SEM=<\X x.X(\y.{p}(x,y))>] -> '{p}'".format(p=prep)

That will save us a bunch of typing (and potential typos). We can assemble them as follows:

npdef = "\n".join([npstring(noun) for noun in nouns])
adjdef = "\n".join([adjstring(adj) for adj in adjectives])
pdef = "\n".join([pstring(prep) for prep in prepositions])

Task. What does npdef now contain after running the program so far?

THIS IS BASICALLY AS FAR AS WE GOT IN CLASS, THE REST IS NEW.

The other parts of the grammar are a bit less easy to take shortcuts on. The first part we’ll concentrate on are the noun phrases. This was implicit above, but the grammar we’re using here is one where the subjects and objects are “DP”s (“determiner phrases”). This is standard in theoretical syntax, even if it isn’t standard in the NLTK book. So, something like “a block” has a determiner (“a”) and an NP (“block”) that together form a DP. We can also add any number of adjectives between the Det and NP, so we formalize that by saying that we can take an Adj and an NP and make an NP out of them.

dpdef = r"""
DP[SEM=<?det(?np)>, DEF=?def] -> Det[SEM=?det, DEF=?def] NP[SEM=?np]
NP[SEM=<\x.(?adj(x) & ?np(x))>] -> Adj[SEM=?adj] NP[SEM=?np]
"""

We haven’t defined the Dets yet (that’s next), but they will carry a distinction in definiteness (“a” is indefinite, “the” is definite), and that is why above we say that the DEF feature of a DP is inherited from the DEF feature of the Det forming it.

The semantics of the determiners is basically what we’ve seen before. The only new thing here is that I added in “the”, with a +DEF feature (definite). We will handle that later on when we try to interpret sentences.

dpdef += r"""
Det[SEM=<\P Q.exists x.(P(x) & Q(x))>, -DEF] -> 'a'
Det[SEM=<\P Q.exists x.(P(x) & Q(x))>, -DEF] -> 'an'
Det[SEM=<\P Q.all x.(P(x) -> Q(x))>, -DEF] -> 'every'
Det[SEM=<\P Q.exists x.(P(x) & Q(x))>, +DEF] -> 'the'
"""

Now, here is a place where we are going to take a shortcut. We are really only going to handle one kind of sentence here, which are those where the “verb” is “be PP”. So, like “a red block is on a table”. That means our only verb is “be” and it doesn’t mean anything (all of the meaning is carried by the PP “on the table”). So we will define a VP that has just “is” and a PP in it, and we will define PP in much the same way we defined transitive verbs before.

ppdef = r"""
PP[SEM=<?p(?dp)>] -> P[SEM=?p] DP[SEM=?dp]
"""

vpdef = r"""
VP[SEM=?pp] -> VCOP PP[SEM=?pp]
VCOP -> 'is'
"""

Almost there (for now). We need to finish this off by defining the top of the tree. Again following recent syntax (rather than the older style syntax that the NLTK book follows), we are going to say that trees start at “CP” (rather than “S”). This will be useful later when we add questions in. For now we are going to stick to declarative statements, so we say that the trees start with CP, and the meaning of CP comes from S through a CBAR node.

# declarative S
sdef = r"""
% start CP
CP[SEM=?cbar, CT=?ct] -> CBAR[SEM=?cbar, CT=?ct]
CBAR[SEM=?s, CT=?ct] -> S[SEM=?s, CT=?ct]
S[SEM=<?subj(?vp)>, CT='dec'] -> DP[SEM=?subj] VP[SEM=?vp]
"""

CP has a CT feature, which stands for “clause type.” These are the rules for a declarative sentence. Before we are done we will have ‘ynq’ (yes-no question) and ‘imp’ (imperative) clause types as well.

That will do, so we can add all of those strings defined above together, and then build a FeatureGrammar out of it.

print("Assembling grammar.")
cfgdef = "\n".join([sdef, adjdef, npdef, dpdef, ppdef, pdef, vpdef])
gram = grammar.FeatureGrammar.fromstring(cfgdef)
cp = nltk.FeatureChartParser(gram)

At this point, if you run the program, it should work (that is, it should define everything), and so you can try it out (at the command line):

sent = 'a block is on the table'
parses = list(cp.parse(sent.split()))
print(parses[0])

To get the semantics of just the top of the tree (the truth conditions):

print(parses[0].label()['SEM'])

And the clause type of the sentence can be retrieved by replacing SEM with CT:

print(parses[0].label()['CT'])

So, is it true (in our model)? Is a block on the table? Well, we can ask our model:

sem = parses[0].label()['SEM']
m.satisfy(sem, g)

Task. Try it yourself, but with two sentences that are true in the model. That is, parse the sentence, then ask the model whether it is true.

Talking to the robot

Let’s add some interactivity, now that the world and parser are set up. We can add this to the end of the file. This will print some instructions, then wait for input, and then process the sentence. We will start with a simple version that we can expand a bit later.

print('Type bye to leave, type look to show the scene.')
print()
while True:
    sent = input("> ").lower()
    if sent == 'bye':
        break
    elif sent == 'look':
        draw(m, g)
        continue
    try:
        parses = list(cp.parse(sent.split()))
        treetype = parses[0].label()['CT']
    
        if treetype == 'dec':
            if check_truth(parses):
                print('That is true.')
            else:
                print('That is not true.')
    except:
        print("Sorry, what?")
print("Bye!")

There are two “magic” statements you can say to the robot. One is ‘bye’, which will end it (break exits the while True loop); the other is ‘look’, which will draw the world and then go back and get more input (continue goes back up and starts the while True loop again).

If it gets an error while parsing (for example, if you use a word it does not know), then it will print “Sorry, what?” and get more input.

Note. The way try...except works, it will catch any error at all that occurs in the try block and execute the except block if it hits one. The most obvious kind of error is one where you type a word that the grammar does not know how to parse. However, other errors like typos in the program can also have this effect, and because we are generically dealing with all errors by saying “Sorry, what?”, we lose some of our ability to debug problems. When I was working on it myself, I had the following as the last couple of lines, rather than the two (except: and print(...)) above.:

    except Exception as e:
        print("Sorry, what?")
        print(e)

What this does is puts the text of the actual error in e and then prints out what e is. So, if you use a word it doesn’t know, it might say Grammar does not cover some of the input words: "'cheese'". If you find that your program is saying “Sorry, what?” when you were not expecting it to, you might want to do this as well, so you can see what specific error it ran into.

This won’t quite work yet because if it succeeds in parsing the sentence, it calls a function check_truth(parses) that we have not defined yet. But what we want it to do is just the thing we did by hand at the end of the previous section.

Note: You need to put the check_truth() definition in the file somewhere before the loop that gets and processes input that we just added in the last step. (It’s important to define check_truth() before it is called upon.)

def check_truth(parses):
    treesem = parses[0].label()['SEM']
    return m.satisfy(treesem, g)

Questioning the world

It is a bit unnatural to just say things and have the robot confirm whether it is or is not true. It would be nicer if we could ask it.

All we need for yes-no questions is to fix up the parser so that it understands when a yes-no question is asked. The basic task the program performs once the semantics is established is identical to checking the truth of a statement (except it will say “Yes” instead of “That is true”).

The form of a yes-no question, in this limited grammar is just “is DP PP?”. Specifically, the auxiliary “is” appears at the front instead of between the DP and PP.

# ynq S

sdef += r"""
CBAR[SEM=?s, CT=?ct] -> VCOP SXAUX[SEM=?s, CT=?ct]
SXAUX[SEM=<?subj(?vp)>, CT='ynq'] -> DP[SEM=?subj] VPXAUX[SEM=?vp]
VPXAUX[SEM=?pp] -> PP[SEM=?pp]
"""

Here, we defined a version of CBAR that has the VCOP first, and an “S-without-an-aux” (SXAUX) that has just a DP and a “VP-without-an-aux” (VPXAUX), which itself contains just a PP.

This needs to go after the definition of the declarative S and before the “Assembling grammar.” line (so that it is there when we parse the grammar).

And then we add the following into the input processing loop. You can figure out where it goes.

        elif treetype == 'ynq':
            if check_truth(parses):
                print('Yes.')
            else:
                print('No.')

Activity. Try asking it some questions now. Like “is a red block on an odd square” etc. Note: You should not put a question mark at the end, the grammar will not know what to do with punctuation.

Changing the world

After kind of a slow start, we’ve made a bunch of progress pretty quickly. We now have a world, and we can state and ask things about it, and the robot understands to the extent that it can determine the truth of statements based on the facts of the world.

The last thing we’ll do is add the ability to change the configuration of the world, by adding imperatives to the parser. This is also where we have to add a lot more “smarts” to the system, because although NLTK was able to take care of the heavy lifting in the domain of parsing and evaluation of logical formulae, we need to tell the robot how to actually affect the world.

The first thing we need to do is revise the grammar to have one more clause type, an imperative. An imperative has a silent (implicit) subject, but is otherwise mostly the same as a statement. So, all we really need to do is add one more alternative definition of S:

# imperative S

sdef += r"""
S[SEM=<?vp(hand)>, CT='imp'] -> VP[SEM=?vp]
"""

Again, this needs to go just before the “Assembling grammar” step. As you can see, what it essentially does is looks for a sentence with no subject, and assumes that “hand” (the robot’s actor) is the subject, and sets the clause type to ‘imp’.

This is not quite enough to be satisfying, though, we need to add some verbs. We are going to add just two verbs: take and put. What we want the robot to do if we tell it to take something is to put it in its hand, and if we tell it to put something somewhere, it will do that.

Although we’ll go ahead and fill in the semantics here, we actually are not going to wind up using a lot of the semantics it computes. But, here is a little more to add before “Assembling grammar”:

vpdef += r"""
VP[SEM=<?obj(?v)>] -> V[SEM=?v] DP[SEM=?obj]
V[SEM=<\x y.take(y,x)>] -> 'take'
VP[SEM=<?v(?obj,?pp)>] -> VDT[SEM=?v] DP[SEM=?obj] PP[SEM=?pp]
VDT[SEM=<\Y X x.X(\z.Y(\y.put(x,y,z)))>] -> 'put'
"""

At this point it will already parse “put a red block on an even square” but the robot doesn’t know what to do with that.

And this is where things get a little hairier, because there are a lot of things to take into account. To begin, let’s add something in the input processor that notices when an imperative sentence is used and calls a function (which we have not yet defined) to handle it.

        elif treetype == 'imp':
            process_command(parses)

And now we need to define process_command(). Again, we need to define all of these functions before we get to the input loop, so from here on the definitions need to go above the “Type bye to leave” instructions that start the input loop.

The process_command() function is supposed to look at the imperative tree, determine what verb was used (take or put), and then take the action. We’re going start with take because it’s simpler.

Think first about what the robot is supposed to do if it is asked to take something. If you say “take a block” then there are several possible choices. The robot can just pick one. However, you can’t take something that’s underneath something else, so it needs to check for that. Also, taking something means putting it in its hand, but if it is already holding something, it needs to put that thing down first.

Let’s try a couple of things by hand before we try to automate anything. Run the script as it is and leave with ‘bye’, which should leave the grammar defined so you can use it. Parse the sentence “take a red block” by hand:

parses = list(cp.parse('take a red block'.split()))
print(parses[0].label()['CT']) # should say 'imp'
print(parses[0])
print(parses[0][0])
print(parses[0][0][0])

As you can see, we’re kind of working our way down the tree. We can get the VP like this:

vp = parses[0][0][0][0]
print(vp)
print(vp[0])
print(vp[0][0])

After a whole lot of [0]s, we managed to zoom into the actual verb. Because we have a very limited grammar, this is reliable enough. That is, we can count on being able to figure out what verb we have by looking at vp[0][0][0]. So, in particular, we can check to see if it is “take”.

Likewise, we can get the object (the thing we’re asking the robot to take) with this:

obj = vp[1]
print(obj)

For the object, we actually do need to use the semantics NLTK computed for us. Specifically, we need to figure out what individuals in the model the object can describe (so we know what the robot is choosing between when it takes something).

Here, I wound up doing something a bit tricky to get this into a form that we can use NLTK’s evaluation functions for. The logical formula for the object is:

obj_sem = obj.label()['SEM']
print(obj_sem) # \Q.exists x.(red(x) & block(x) & Q(x))

This is the right form for the subject of a sentence, it takes a predicate (Q) and is true if there is an x such that x is red, a block, and Q holds of it. However, right now we just want to find out what the red blocks are. The simplest way I could think of to get at this is to just set Q to be a kind of identity predicate like this:

idfun = nltk.sem.Expression.fromstring(r'\x.(x=y)')

This is true for any individual that is… y. Whatever y is set to be. So, if we substitute idfun in for Q:

id_obj = nltk.sem.ApplicationExpression(obj_sem, idfun).simplify()
print(id_obj) # exists x.(red(x) & block(x) & (x = y))

We now have an open formula (on y) that we can check for satisfiers on. As I indicated, this is wandering around in the weeds a bit, but the main thing is that if we say:

options = m.satisfiers(id_obj, 'y', g)
print(options) # {'a'}

we can get a list of those individuals that are red blocks.

Having walked through that, let’s just commit that to a function that will do all of that and give us a set of options. Put this somewhere above the input loop.

idfun = nltk.sem.Expression.fromstring(r'\x.(x=y)')

def find_options(obj):
    obj_sem = obj.label()['SEM']
    id_obj = nltk.sem.ApplicationExpression(obj_sem, idfun).simplify()
    options = m.satisfiers(id_obj, 'y', g)
    return options

We still haven’t defined the process_command() function (so the robot can’t yet handle imperatives), but we now know better what it should look like.

def process_command(parses):
    vp = parses[0][0][0][0]
    v = vp[0][0]
    if v == 'take':
        obj_opts = find_options(vp[1])
        if len(obj_opts) == 0:
            print('There is no such object to take.')
        else:
            # do it
            print('out of order.')
    else:
        print("I'm unsure what you are asking me to do.")

Of course, we want to replace the “out of order” message with some actual action. But at this point, it should at least run, and give you an “out of order” message if you try to “take a red block”, and a “there is no such object to take” message if you try to “take a blue table”.

Assuming there are some options, the robot needs to pick one to take. It might already have something in its hand. In fact it might already have the relevant thing, in which case it doesn’t need to do anything. But if it has something different in its hand, it needs to put that down. And it can’t pick something up that’s underneath something else.

Kind of complicated.

Since we have more squares than other objects, the robot can always put down what’s in its hand onto an empty square. So, we can define a function that will locate an empty square:

def empty_square():
    for square in squares:
        if not whats_on(square):
            return square

And we can define a function that will remove from the options any objects that are not visible (because they are hidden under something else).

def just_visible(objects):
    return {obj for obj in objects if not whats_on(obj)}

Since the robot can’t pick up the floor squares, we also want to eliminate those from the options of pick-up-able things. We don’t want them to be invisible/hidden (because we can put things on them), but we can add an extra filter to remove any squares. Since invisible things are also not takable, we will incorporate the visibility filter in just_takable() as well:

def just_takable(objects):
    return {obj for obj in just_visible(objects) if 'square' not in obj_properties(obj)}

Now, to put an object down, the robot will put the object on another object. For right now, it will just put the object on an empty square. This means that we remove the object from the robot’s hand, and we put it in on relation to another object. So, here is a function that makes this happen.

def put_on(target):
    obj_to_place = obj_in_hand()
    val['held'] = {}
    val['on'] |= {(obj_to_place, target)}

Lastly, to pick an object up, we put it in the robot’s hand, and remove the on relation that it was on top of.

def pick_up(obj):
    val['held'] = {(obj,)}
    val['on'] = {(x,y) for (x,y) in val['on'] if x != obj}

Those are all the pieces we need to implement take. There is one more consideration however, thinking ahead. We are going to want to implement put next, and conceptually, take is part of put. That is, in order for the robot to put something somewhere, it first needs to take that thing. So, to allow both take and put to use the “taking” script, we will define a function that implements take. We will give it a set of objects that meet the description, and it will take it from there.

def do_take(obj_opts):
    if len(obj_opts) == 0:
        print('There is no such object to take.')
    else:
        if obj_in_hand() in obj_opts:
            print('I am already holding such a thing.')
            return True
        else:
            takable = just_takable(obj_opts)
            if len(takable) == 0:
                print('I cannot see such a thing to take.')
            else:
                put_on(empty_square())
                # take the first option
                obj_to_take = list(takable)[0]
                pick_up(obj_to_take)
                print('Object taken.')
                return True
    return False

We can then replace the “out of order” message with do_take():

    (...)
    if v == 'take':
        obj_opts = find_options(vp[1])
        if do_take(obj_opts):
            draw(m, g)
    else:
    (...)

Activity. Play around with it a little. It’s kind of fun. Take a block. Take a red block. Maybe it’s not all that fun. But it’s a little bit fun.

The next thing to implement is put, but we already have a lot of those pieces in place just from take. In order to put something somewhere, the robot must take it first and then put it in the target location. Now the target need not be a floor square, so we can make stacks of things.

So, first of all, we just execute take on the object we are supposed to be putting somewhere.

So, if the verb is put, then we need to determine whether we can take the object, and then we need to determine whether the place we want to put it is visible. And then we do it.

    (...)
    elif v == 'put':
        obj_opts = find_options(vp[1])
        if do_take(obj_opts):
            pp = vp[2]
            p = pp[0][0]
            loc_opts = find_options(pp[1])
            if len(loc_opts) == 0:
                print('There is no such place to put anything.')
            else:
                visible = just_visible(loc_opts) - {obj_in_hand()}
                if len(visible) == 0:
                    print("I cannot see such a place to put anything.")
                else:
                    # pick the first option
                    target = list(visible)[0]
                    put_on(target)
                    print('Object placed.')
                    draw(m, g)
    else:
    (...)

Being definite with history

There are a bunch of things that would be neat to add, and are kind of within reach. For example: the real SHRDLU had a planning module, so that if you said you wanted to pick up something that was covered, it would move things out of the way so it could get it. That would be relatively easy to implement. We could also add wh-questions, to handle things like “what is on the blue pyramid” or “how many blocks are on even squares” or “where is the blue pyramid”. Our support for the universal quantifier is a bit incomplete as well; you can ask “is every block on an even square” but you can’t “put every pyramid on a block”. SHRDLU also had the ability to learn things like that a stack of a block and pyramid can be called a “steeple” (and then “steeples” can be referred to from then on), and could absorb facts like “I like red blocks” and then answer questions about them. And an easy one, but our robot only knows singulars, not plurals.

So, though this is kind of sophisticated, there are still a lot of places it can be expanded even just in this blocks world. However, it also feels like we could get pretty close to being able to handle almost anything a person asked the robot about pertaining to the blocks world.

One (nearly final) thing that SHRDLU can do that we’ll approximate here is handling definite noun phrases. At the moment, our robot treats “the pyramid” and “a pyramid” exactly the same (except that it marks “the pyramid” as definite). But if we ask the robot to do something with “the block” it should be unsure which block we meant. Unless we had just been interacting with a block recently, in which case we can assume that it was that block we meant.

So, to handle this, we will keep track of the last five objects we did something with, and if that disambiguates sufficiently for a definite noun phrase, the robot will not complain.

We will keep track of recent objects in a list (in the global space) called recent_objects, and every time we interact with an object we call add_recent() to add the object to the recently addressed objects (rolling off the oldest ones in the process, to keep it to 5).

recent_objects = []

def add_recent(obj):
    global recent_objects
    recent_objects = ([obj] + recent_objects)[:5]

Then, in do_take() we add add_recent(obj_to_take) after it is picked up, and in the “put” section of process_command() we add add_recent(target) after it an object put on the target.

Task. Do that. I was slightly less specific about where these go, but if you’re still following along it shouldn’t be too hard to get those add_recent() lines in the right place.

The place where this will matter is in find_options() where it is finding the possible options for a referring description. We can insert a check for a definite DP before the return line:

def find_options(obj):
    obj_sem = obj.label()['SEM']
    id_obj = nltk.sem.ApplicationExpression(obj_sem, idfun).simplify()
    options = m.satisfiers(id_obj, 'y', g)
    if obj.label()['DEF']:
        if len(options) != 1:
            recent_options = options.intersection(set(recent_objects))
            if len(recent_options) != 1:
                print("I don't know which one you mean.")
                options = {}
            else:
                options = recent_options
    return options

Pyramids are pointy

This one I’m not going to give you explicit guidance on, but your task is:

Task. Modify the program so that if prevents you from putting anything on a pyramid.

That’s it

Task. Give me the final state of the program so I can see it all in context. It should be possible to run it in Python, move things around, interrogate it about properties objects have, as above. You couldn’t have possibly gotten this far without having compiled the program, but I’m just reminding you that I’d like to have it to refer to.