Starting with CHILDES
30 Sep 2016As usual, 50 minutes arrived pretty fast. I’d hoped to talk a bit about the haiku generator, but it wasn’t necessary, really, since the thing itself more or less explains everything as it goes along.
I started with CHILDES, but all we did really is go through the basic characteristics of the corpora in there and started getting NLTK to recognize it. So, I’ll continue with that next time.
If you want to see what I’d been planning to do, or to replicate what was done in class (since this is not actually in the NLTK book), here is the presentation I was following. It was mostly just sketching what I was going to be typing, not what happened.
I’ll make a more elaborate version for next time, but if you wanted to see where I was headed, you can.
Also, notes on installing CHILDES:
- On the CHILDES site, you want the XML version of the database
- You need to put this somewhere NLTK can find it. The canonical place is in a folder
called
nltk_data
inside the “home” directory. This is not immediately reachable on a default MacOS/Mac OS X installation, so you may need to go to “Documents” and then to “Enclosing Folder” from the “Go” menu. The icon should be a little house. - You may need to create the
nltk_data
folder if it wasn’t already there. It’s not there on the lab computers, but it might be there on your own. - Then, the appropriate place for it to be is, inside a
corpora
folder, inside achildes
folder, inside adata-xml
folder, inside aEng-USA-MOR
folder (assuming you downloaded something from the CHILDES Eng-USA-MOR directory, likeBrown.zip
). That is:~/nltk_data/corpora/childes/data-xml/Eng-USA-MOR/
. - Once the corpus is in place the stuff we did in class should work.
Next time, we can try to actually find something substantive out about child acquisition of English using this.