Search CORE

774 research outputs found

Challenges of Cheap Resource Creation for Morphological Tagging

Author: Feldman Anna
Hana Jirka
Publication venue: Montclair State University Digital Commons
Publication date: 01/07/2010
Field of study

We describe the challenges of resource creation for a resource-light system for morphological tagging of fusional languages (Feldman and Hana, 2010). The constraints on resources (time, expertise, and money) introduce challenges that are not present in development of morphological tools and corpora in the usual, resource intensive way

Montclair State University Digital Commons

Leveraging NLP and Social Network Analytic Techniques to Detect Censored Keywords: System Design and Experiments

Author: Feldman Anna
Leberknight Chris
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2019
Field of study

Internet regulation in the form of online censorship and Internet shutdowns have been increasing over recent years. This paper presents a natural language processing (NLP) application for performing cross country probing that conceals the exact location of the originating request. A detailed discussion of the application aims to stimulate further investigation into new methods for measuring and quantifying Internet censorship practices around the world. In addition, results from two experiments involving search engine queries of banned keywords demonstrates censorship practices vary across different search engines. These results suggest opportunities for developing circumvention technologies that enable open and free access to information

Crossref

ScholarSpace at University of Hawai'i at Manoa

AIS Electronic Library (AISeL)

Automatic Detection of Idiomatic Clauses

Author: Feldman Anna
Peng Jing
Publication venue: Montclair State University Digital Commons
Publication date: 03/03/2013
Field of study

We describe several experiments whose goal is to automatically identify idiomatic expressions in written text. We explore two approaches for the task: 1) idiom recognition as outlier detection; and 2) supervised classification of sentences. We apply principal component analysis for outlier detection. Detecting idioms as lexical outliers does not exploit class label information. So, in the following experiments, we use linear discriminant analysis to obtain a discriminant subspace and later use the three nearest neighbor classifier to obtain accuracy. We discuss pros and cons of each approach. All the approaches are more general than the previous algorithms for idiom detection - neither do they rely on target idiom types, lexicons, or large manually annotated corpora, nor do they limit the search space by a particular type of linguistic construction

Montclair State University Digital Commons

ARIDA: An Arabic Interlanguage Database and Its Applications: A Pilot Study

Author: Abuhakema Ghazi
Feldman Anna
Fitzpatrick Eileen
Publication venue: Montclair State University Digital Commons
Publication date: 17/11/2008
Field of study

This paper describes a pilot study in which we collected a small learner corpus of Arabic, developed a tagset for error-annotation of Arabic learner data, tagged the data for error 1, and performed simple Computer-aided Error Analysis (CEA)

Montclair State University Digital Commons

Directory of Open Access Journals

Designing a Russian Idiom-Annotated Corpus

Author: Aharodnik Katsiaryna
Feldman Anna
Peng Jing
Publication venue: Montclair State University Digital Commons
Publication date: 01/01/2019
Field of study

This paper describes the development of an idiom-annotated corpus of Russian. The corpus is compiled from freely available resources online and contains texts of different genres. The idiom extraction, annotation procedure, and a pilot experiment using the new corpus are outlined in the paper. Considering the scarcity of publicly available Russian annotated corpora, the corpus is a much-needed resource that can be utilized for literary and linguistic studies, pedagogy as well as for various Natural Language Processing tasks

Montclair State University Digital Commons

Annotating an Arabic Learner Corpus for Error

Author: Abuhakema Ghazi
Faraj Reem
Feldman Anna
Fitzpatrick Eileen
Publication venue: Montclair State University Digital Commons
Publication date: 01/01/2008
Field of study

This paper describes an ongoing project in which we are collecting a learner corpus of Arabic, developing a tagset for error annotation and performing Computer-aided Error Analysis (CEA) on the data. We adapted the French Interlanguage Database FRIDA tagset (Granger, 2003a) to the data. We chose FRIDA in order to follow a known standard and to see whether the changes needed to move from a French to an Arabic tagset would give us a measure of the distance between the two languages with respect to learner difficulty. The current collection of texts, which is constantly growing, contains intermediate and advanced-level student writings. We describe the need for such corpora, the learner data we have collected and the tagset we have developed. We also describe the error frequency distribution of both proficiency levels and the ongoing work

CiteSeerX

Montclair State University Digital Commons

Apollo: A System for Tracking Internet Censorship

Author: Feldman Anna
Goldeck Matthew
Joyce Eric
Leberknight Christopher S.
Publication venue: AIS Electronic Library (AISeL)
Publication date: 13/12/2018
Field of study

If it remains debatable whether the Internet has surpassed print media in making information accessible to the public, then it must nevertheless be conceded that the Internet makes the manipulation and censorship of information easier than had been on the printed page. In coming years and in an increasing number of countries, everyday producers and consumers of online information will likely have to cultivate a sense of censorship. It behooves the online community to learn how to detect and evade interference by governments, regimes, corporations, con-artists, and vandals. The contribution of this research is to describe a method and platform to study Internet censorship detection and evasion. This paper presents the concepts, initial theories, and future work

AIS Electronic Library (AISeL)