Course information

NLP&CL, Spring 2019

(CAS LX 394, GRS LX 694)

_ _
Meeting time M 2:30-5:15, CAS 229
Instructor Paul Hagstrom
Email hagstrom@bu.edu
Phone (617) 353-6220
Office 621 Commonwealth Ave., Rm. 105
Office Hours M 1-2; TR 2-3 (and by appointment)

Prerequisite: CAS LX250 (Introduction to Linguistics), or consent of instructor.

Course Description: Introduction to computational techniques to explore linguistic models and test empirical claims. Serves as an introduction to programming, algorithms, and data structures, focused on modern applications to NLP. Topics include tagging and classification, parsing models, meaning representation, and information extraction.

Course Synopsis: The quantity of language data available for natural language analysis has greatly increased in recent years, as has computing power and tool development. Doing large-scale language analysis to address theoretical issues in Linguistics is now within reach, and interest in natural language processing is similarly increasing for use in human-computer interfaces in industry. The purpose of this course is to gain facility with some of the powerful language analysis tools that are available for doing these kinds of large-scale analyses and to become familiar with the types of problems they are best suited to address. The course introduces basic programming concepts alongside the use of a natural language toolkit that allows characterizing and classifying texts, parsing syntactic structure, extracting and modeling information, processing basic logical relations, and basic natural language understanding. By the end of the course, students should have the background, and confidence, to use these techniques in addressing further linguistic questions that may arise beyond the class. The projects in the class will consist of some basic programming exercises, some mini-projects (generating poetry, determining authorship, development of child English), and a more extensive final project proposing a problem to investigate, a method for studying it, and a written paper reporting the results and implications.

Learning objectives

Students completing this course will:

  • Gain a basic understanding of the types of research done in the field of natural language processing
  • Gain experience approaching and solving these problems using Python and available corpora and libraries

Course Requirements

Readings. There will be readings for each class session. All readings mentioned on the schedule are required, and should be completed by the beginning of class.

Attendance and participation. Regular attendance is required, and participation in classroom discussions is expected.

Homework. There will be homework assignments on roughly a weekly schedule.

Exam. There will be a midterm exams, at about the middle of the term, which will be a take-home project.

Final Project. Students registered for LX394 will have a project topic outlined for them to work with, while students registered for LX694 will be responsible for finding and proposing a suitable topic. A progress report in the form of a written methodology section will be due prior to the finished project. The finished project will take the form of a research paper, 10 pages (LX394) to 20 pages (LX694) in length.

Late assignments. Late assignments will not be accepted without prior arrangement.

Electronic communication

We live in an electronic age. You (unlike me) have always lived in an electronic age. You are expected to be reachable via your BU email address. The central communication center for the course is the course blog. Announcements, notes on readings, homework errata, and other information will be posted there on a regular basis, and things that are posted there will be assumed to have been communicated. Homework assignments can be sent (whenever feasible, and unless otherwise indicated) by email, or handed in on paper. It is your responsibility to ensure that electronically submitted material is in a readable format—if there is a question (for example, if you use a special font or an obscure word processor), send it early for verification. Unreadable submissions do not count as having been handed in.

Readings

The textbook for the course is Steven Bird, Ewan Klein, and Edward Loper (2016). Natural Language Processing with Python (Python 3, NLTK 3 version). Other readings may be assigned from time to time.

Grading Scheme

Homework (lowest dropped) 50% 40%
Regular attendance, participation 15% 15%
Midterm 15% 15%
Final project: proposal   10%
Final project: methodology 10% 5%
Final project: final paper 10% 15%

CAS/GRS Academic Conduct Code

It is essential that you read and adhere to the CAS Student Academic Conduct Code. Graduate students must also follow the policies of the GRS Academic Conduct code.

http://www.bu.edu/academics/policies/academic-conduct-code/

http://www.bu.edu/cas/students/graduate/grs-forms-policies-procedures/academic-discipline-procedures/

Hub Learning Outcomes and course-specific objectives

This course can be used to satisfy Quantiative Reasoning II and Toolkit/Research and Information Literacy units for the BU Hub. As a result of having taken this course, students will…

…frame and solve complex problems using quantitative tools, such as analytical, statistical, or computational methods.

…apply quantitative tools in diverse settings to answer discipline-specific questions or to engage societal questions and debates.

…formulate, and test an argument by marshaling and analyzing quantitative evidence.

…communicate quantitative information symbolically, visually, numerically, or verbally.

…recognize and articulate the capacity and limitations of quantitative methods and the risks of using them improperly.

…be able to search for, select, and use a range of publicly available and discipline-specific information sources ethically and strategically to address research questions.

…demonstrate understanding of the overall research process and its component parts, and be able to formulate good research questions or hypotheses, gather and analyze information, and critique, interpret, and communicate findings.