Project specs

Updated:

Suddenly, the end of the semester is upon us.

So this is a bit of guidance about what to do with the project.

I said some of this in the last class, but:

The project here is fairly free-form, and in fact it doesn’t even really require that the ultimate result you were aiming for is achieved, it’s really more about the journey than it is about the destination.

When trying to decide what kind of project to do, probably the best thing to do first is to figure out what your corpus is going to be. Generally, you want something that is going to be big enough to give somewhat meaningful results. So, 100 words from a given language is probably too small to say much about using our computational techniques, but the text of a novel, or a few novels, would be probably useful. If you pick novels, you’d tend to be headed for a project that is in some way characterizing the writing style/choices of the author, which can either be used to learn something about the writing of that author (or, alternatively, writing in a genre), or it can be used to try to determine the authorship/genre of an unknown text, or it can be used to generate a novel text in the “spirit” of that author/genre.

But I think that probably the best place to start is to think about what corpora you can get ahold of and that you would find interesting to look at. Maybe you can find something that is kind of related to another project you’re doing for another class, even, or a project you’d like to do in a bigger way over the longer term.

So, though this is a short time frame, it’s also a short homework assignment:

TASK. Look for a corpus to analyze for the project, make sure you can get it, and think a bit about what you might like to do with the corpus. Give that to me on Mon Apr 22, and I will be uncharacteristically quick in commenting on that so that you can work on the next steps.

The ultimate product of this project will be something kind of in the 10-15-page range, but I don’t care really about spacing and margins and so forth. I think of those 10 pages including the code you run, summaries of the results you got, graphs and tables where appropriate, as well as the prose description of what corpus you are looking at, what you are trying to determine using that corpus, how you went about it, what results you got, why it didn’t work as well as you’d hoped, etc. I don’t think it’ll be too hard to fill up 10 pages with this, but if you wind up saying all you need to in 8 pages, that’s fine. Maybe slow down if you start getting into your 25th page.

It’s officially due on May 2, the last day of classes this semester. I’ll have a bit more to say about that in class though too.

Tags:

Updated: