25 Oct 2016
By the way, the yellow book I was referring to in class on Monday is
Connections and Symbols (1988, eds. Steven Pinker, Jacques Mehler, MIT Press),
which is available through the BU Library here:
Connections and Symbols
You may need to go to the library site to sign in first in order to have
access to the full text, but it is available once you have signed in.
25 Oct 2016
I’ve had a couple of questions about homework 5 and as has happened
before it seems like it’s a bit harder in spots than anticipated.
So, I’ll just generally
make it due on Friday this week rather than being due tomorrow. If you’ve gotten
through it already, great! But, otherwise, it’s worth looking over
before class so you can ask questions, and I’ll see if there are things
I want to bring up as well.
(Also don’t forget to read the immediately preceding post about the
errors in the book in the example given for the first of the assigned
problems! It doesn’t work as “printed.”)
24 Oct 2016
It has been observed that the first homework problem I assigned is made…
difficult… by the fact that the example code does not actually work.
This is in fact still an open bug of the NLTK book,
but there are some details in a related bug report.
However, the main thing is that the thing you would base your approach to problem 9 on,
the listing that starts with text = 'That U.S.A. poster-print costs $12.40...'
, does not work
on the current version of NLTK. The reason is that the behavior of “capturing groups” has
changed.
Basically, it misbehaves when you use grouping parentheses, so where you have a capturing
group like (...)
you want to instead use a non-capturing group like (?:...)
. Concretely,
the example from chapter 3 should read:
>>> text = "That U.S.A. poster-print costs $12.40..."
>>> pattern = r'''(?x) # set flag to allow verbose regexps
... (?:[A-Z]\.)+ # abbreviations, e.g. U.S.A.
... | \w+(?:-\w+)* # words with optional internal hyphens
... | \$?\d+(?:\.\d+)?%? # currency and percentages, e.g. $12.40, 82%
... | \.\.\. # ellipsis
... | [][.,;"'?():-_`] # these are separate tokens; includes ], [
... '''
>>> nltk.regexp_tokenize(text, pattern)
['That', 'U.S.A.', 'poster-print', 'costs', '$12.40', '...']
That worked for me anyway. That should be able to give you a basis to work from
when doing problem 9.
19 Oct 2016
For homework 5, I’ll go back to using a few exercises from the book.
These are due on October 26.
From chapter 3, numbers 9, 10, 25, 29, 38.
For 9 and 38, it would be useful for me to have the text that you were working with as well.
12 Oct 2016
In class, a question about how you can debug your programs came up.
I talked a little bit about things that debuggers have in common,
and showed a bit about how you can use the debugger in Anaconda/Spyder,
although I don’t have a great deal of experience with that debugger
myself (so I immediately ran into the problem that I didn’t know
what commands were available at the python debugger prompt).
But apparently when people are taught Python in CS classes, the
site pythontutor.com is recommended
as a troubleshooting resource.
It looks pretty nice. For a short program, you can copy and paste
it in, and then step through your program as it runs, line by line,
to see how variables evolve.
I have not played with it really, I expect that while it will likely
be useful for short functions that do not depend on NLTK, you’ll
need to use a different debugger (like the one in Spyder) once you
start using things that are not built into Python. (That is to say,
for most things that require you to import <something>
before using them.)
Still, it provides quite a nice visualization tool for simple things
and can help you test your logic and see what is happening to
your variables. I’ll see if I can come up with either some
tutorials or documentation for the Spyder debugging functions.
I haven’t looked for any at this point.
But, still, check out pythontutor.com
if your functions aren’t doing what you think they should be.