Natural Language Processing (NLP) of Liberal Arts College Newspapers in Ohio over 30 years

Abstract

Computers have been extremely useful in humanity’s quest for knowledge, performing calculations and other strenuous tasks in seconds. For a computer to perform the tasks, it requires a specific set of instructions, or code, to tell it what to do. These series of commands and instructions are strict, in that any syntactic error results in faulty, or zero functionality. Human language is very much unlike that of a computer, in that it can be grammatically incorrect, irregular, or even incomplete, yet another human may still get the point and understand the information being exchanged. A significant part about what makes us human is the ability to use and develop our dynamic language. When a computer is able to completely understand and mimic human language, we will likely have something closer to artificial intelligence than anything we’ve seen yet. Development in this area has lead to Natural Language Processing, or NLP. On March 31st2017, students, professors, and hobbyists alike gathered together at the HackOH5 Student Newspaper Hackathon to analyze 170,000+ pages of student newspapers from 5 colleges: Kenyon, Denison, Oberlin, Ohio Wesleyan, and the College of Wooster. Spanning over 160 years, the digitized libraries were filled with years of student coverage organized in a huge dataset of text and images. What can be done with all of this newly digitized information? NLP allows us to analyze, visualize, and contextualize this textual data. This project aims to analyze textual information recorded between 1970 and 2000 by three different colleges: Kenyon, Denison, and Oberlin. By using NLP, any word used by any school from any issue can be mapped to an n-dimensional semantic space where the distances between the words can be used to represent their semantic closeness. For example, words like student will be closely associated with professor, college, people, alumni, etc. By investigating specific words that are historically relevant, we can try to understand how each college might perceive certain events and compare them with each other

    Similar works