Starting with CHILDES

30 Sep 2016

As usual, 50 minutes arrived pretty fast. I’d hoped to talk a bit about the haiku generator, but it wasn’t necessary, really, since the thing itself more or less explains everything as it goes along.

I started with CHILDES, but all we did really is go through the basic characteristics of the corpora in there and started getting NLTK to recognize it. So, I’ll continue with that next time.

If you want to see what I’d been planning to do, or to replicate what was done in class (since this is not actually in the NLTK book), here is the presentation I was following. It was mostly just sketching what I was going to be typing, not what happened.

I’ll make a more elaborate version for next time, but if you wanted to see where I was headed, you can.

Also, notes on installing CHILDES:

On the CHILDES site, you want the XML version of the database
You need to put this somewhere NLTK can find it. The canonical place is in a folder called nltk_data inside the “home” directory. This is not immediately reachable on a default MacOS/Mac OS X installation, so you may need to go to “Documents” and then to “Enclosing Folder” from the “Go” menu. The icon should be a little house.
You may need to create the nltk_data folder if it wasn’t already there. It’s not there on the lab computers, but it might be there on your own.
Then, the appropriate place for it to be is, inside a corpora folder, inside a childes folder, inside a data-xml folder, inside a Eng-USA-MOR folder (assuming you downloaded something from the CHILDES Eng-USA-MOR directory, like Brown.zip). That is: ~/nltk_data/corpora/childes/data-xml/Eng-USA-MOR/.
Once the corpus is in place the stuff we did in class should work.

Next time, we can try to actually find something substantive out about child acquisition of English using this.

handouts

NLP&CL

Starting with CHILDES

Related Posts

The final is here 12 Dec 2016

Homework 7 03 Dec 2016

Fleeting memory 20 Nov 2016