153 research outputs found
Adaptive Sentence Boundary Disambiguation
Labeling of sentence boundaries is a necessary prerequisite for many natural
language processing tasks, including part-of-speech tagging and sentence
alignment. End-of-sentence punctuation marks are ambiguous; to disambiguate
them most systems use brittle, special-purpose regular expression grammars and
exception rules. As an alternative, we have developed an efficient, trainable
algorithm that uses a lexicon with part-of-speech probabilities and a
feed-forward neural network. After training for less than one minute, the
method correctly labels over 98.5\% of sentence boundaries in a corpus of over
27,000 sentence-boundary marks. We show the method to be efficient and easily
adaptable to different text genres, including single-case texts.Comment: This is a Latex version of the previously submitted ps file
(formatted as a uuencoded gz-compressed .tar file created by csh script). The
software from the work described in this paper is available by contacting
[email protected]
Give Text A Chance: Advocating for Equal Consideration for Language and Visualization
Visualization research tends to de-emphasize consideration of the textual
context in which its images are placed. We argue that visualization research
should consider textual representations as a primary alternative to visual
options when assessing designs, and when assessing designs, equal attention
should be given to the construction of the language as to the visualizations.
We also call for a consideration of readability when integrating visualizations
with written text. In highlighting these points, visualization research would
be elevated in efficacy and demonstrate thorough accounting for viewers' needs
and responses.Comment: 2 page
Can Natural Language Processing Become Natural Language Coaching?
How we teach and learn is undergoing a revolution, due to changes in technology and connectivity. Education may be one of the best application areas for advanced NLP techniques, and NLP researchers have much to contribute to this problem, especially in the areas of learning to write, mastery learning, and peer learning. In this paper I consider what happens when we convert natural language processors into natural language coaches. 1 Why Should You Care, NLP Researcher? There is a revolution in learning underway. Stu
Improving the Recognizability of Syntactic Relations Using Contextualized Examples
A common task in qualitative data analy-sis is to characterize the usage of a linguis-tic entity by issuing queries over syntac-tic relations between words. Previous in-terfaces for searching over syntactic struc-tures require programming-style queries. User interface research suggests that it is easier to recognize a pattern than to com-pose it from scratch; therefore, interfaces for non-experts should show previews of syntactic relations. What these previews should look like is an open question that we explored with a 400-participant Me-chanical Turk experiment. We found that syntactic relations are recognized with 34 % higher accuracy when contextual ex-amples are shown than a baseline of nam-ing the relations alone. This suggests that user interfaces should display contex-tual examples of syntactic relations to help users choose between different relations.
Full Text and Figure Display Improves Bioscience Literature Search
When reading bioscience journal articles, many researchers focus attention on the figures and their captions. This observation led to the development of the BioText literature search engine [1], a freely available Web-based application that allows biologists to search over the contents of Open Access Journals, and see figures from the articles displayed directly in the search results. This article presents a qualitative assessment of this system in the form of a usability study with 20 biologist participants using and commenting on the system. 19 out of 20 participants expressed a desire to use a bioscience literature search engine that displays articles' figures alongside the full text search results. 15 out of 20 participants said they would use a caption search and figure display interface either frequently or sometimes, while 4 said rarely and 1 said undecided. 10 out of 20 participants said they would use a tool for searching the text of tables and their captions either frequently or sometimes, while 7 said they would use it rarely if at all, 2 said they would never use it, and 1 was undecided. This study found evidence, supporting results of an earlier study, that bioscience literature search systems such as PubMed should show figures from articles alongside search results. It also found evidence that full text and captions should be searched along with the article title, metadata, and abstract. Finally, for a subset of users and information needs, allowing for explicit search within captions for figures and tables is a useful function, but it is not entirely clear how to cleanly integrate this within a more general literature search interface. Such a facility supports Open Access publishing efforts, as it requires access to full text of documents and the lifting of restrictions in order to show figures in the search interface
Integrating Prosodic and Lexical Cues for Automatic Topic Segmentation
We present a probabilistic model that uses both prosodic and lexical cues for
the automatic segmentation of speech into topically coherent units. We propose
two methods for combining lexical and prosodic information using hidden Markov
models and decision trees. Lexical information is obtained from a speech
recognizer, and prosodic features are extracted automatically from speech
waveforms. We evaluate our approach on the Broadcast News corpus, using the
DARPA-TDT evaluation metrics. Results show that the prosodic model alone is
competitive with word-based segmentation methods. Furthermore, we achieve a
significant reduction in error by combining the prosodic and word-based
knowledge sources.Comment: 27 pages, 8 figure
Recommended from our members
Prerendered User Interfaces for Higher-Assurance Electronic Voting
We propose an electronic voting machine architecture in which the voting user interface is prerendered and published before election day. The prerendered user interface is a verifiable artifact—an electronic sample ballot—enabling public participation in the review, verification, usability testing, and accessibility testing of the ballot. Preparing the user interface outside of the voting machine dramatically reduces the amount and difficulty of software verification required to assure the correctness of the election result. We present a design for a high-assurance touchscreen voting machine that supports a wide range of user interface styles and demonstrate its feasibility by implementing it in less than 300 lines of Python code
- …