Class 6a

name: title
layout: true
class: center, middle, inverse
---
# Issues in parsing #

---
layout: false

We've been looking at parsing with a "recursive descent" parser by
building a little grammar and finding the parses of a string.

```python
nltk.app.rdparser()
```

Recursive descent parser is breaking up goals into subgoals.

Goal is to find an S.

This can be accomplished by finding an NP and a VP.

Finding an NP can be accomplished in one of three ways:

```python
NP -> Det N PP
NP -> Det N
NP -> 'I'
```

First one (`Det N NP`) can be accomplished in one of two ways:

```python
Det -> 'the'
Det -> 'a'
```

First one—matched!  Now on to N, then to PP.
Which fails, ultimately, so backtrack to `Det N`.

---

We could write such a parser, but NLTK has one built in.
That's what we were using.

This is a "top-down" parser.  It can be kind of inefficient.
Particularly when the vocabulary gets bigger.

Shift-reduce parsing: a "bottom-up" parser.

Seems like it would be more efficient, it only considers
the words that are actually there, and then works out what
it can build from them.

```python
nltk.app.srparser()
```

Fairly clear what it does as it proceeds.

---

Shift-reduce parser:

Finds something it can build out of what it has plus the
incremental addition of the next word.

But the default sentence 
"my dog saw a man in the park with a statue" fails!

Because it seens that it can build "saw a man" as a VP
and then that can be built to an S, and then there's no more
room for the PP.  You have to shift in "in" before it
incorporates the NP into the VP.

This is also partly due to it not remembering its choices
and having a way to undo them if it gets into trouble.

We should be able to think of ways to improve this.
For example, allow backtracking of some kind to make a
different choice (generally speaking, failing to reduce
something that can be reduced).  But it's clearly not an 
easy problem to solve generally, at least not "cheaply."

---

A "Left-Corner Parser" tries to combine a top-down
approach with a connection to the actual words by using only
top-down expansions that could line up with the words.

```python
S -> NP VP
VP -> V NP
NP -> Det N | PropN
PropN -> 'John' | 'Mary'
Det -> 'the'
N -> 'dog'
V -> 'saw'
```

As we expand down, looking for S, then looking for
NP VP, we have three options for expanding NP:
'John', 'Mary', and Det N.

"Mary" is the first word (if we're going for "Mary saw the dog").
So, whatever we do here, we need to expand S such that the
first word is "Mary".  How to accomplish this?

---

```python
S -> NP VP
VP -> V NP
NP -> Det N | PropN
PropN -> 'John' | 'Mary'
Det -> 'the'
N -> 'dog'
V -> 'saw'
```

Looking at the grammar, the "left corner" of each non-terminal is:

| LHS category | left corner |
|--------------|-------------|
| S            | NP          |
| NP           | Det, PropN  |
| VP           | V           |

So, to find `S` we need to find `NP VP`.
To find `NP` we weed to find either `Det` or `PropN`.
But "Mary" is not a `Det`, so discard that.
So find `PropN`.  "Mary" is a `PropN` so we found the `NP`.
The rest is easy (here).  Then: scale it up.

Beyond that, we'll just leave that as a concept without
trying to implement it.

There is a quite complex discusson of "chart parsing" that
follows this in the book, and maybe we can work through that
at some later point.  Maybe in a homework, where there's
time to think it through.

---

## Human sentence processing ##

Tom said that Bill had taken the cleaning out yesterday.

cf. One morning, I shot an elephant in my pyjamas.

The steak with the sauce that was tasty didn't win the prize.

---

## Human sentence processing ##

The teacher told the kids the ghost story that she knew
would frighten them

The teacher told the kids the ghost story had frightened
that it wasn't true

I
--
 told
--
 the
--
 boy
--
 the
--
 cat
--
 scratched
--
 Mary
--
 would
--
 help
--
 her.

The cotton clothing is made of grows in Mississippi.

I like garden path sentences am delightful.

---

## Python structures ##

Moving on (back) to chapter 4.

Variables, type/token (reference/value), `id()`,
conditionals (`if...elif...else`),
sequences, mutability,
"procedural vs. declarative" loops, index variables,
(really, I think a matter of readability
and efficiency).

Next time: functions, variable scope issues.