Search CORE

153 research outputs found

Adaptive Sentence Boundary Disambiguation

Author: Hearst Marti A.
Palmer David D.
Publication venue
Publication date: 01/01/1994
Field of study

Labeling of sentence boundaries is a necessary prerequisite for many natural language processing tasks, including part-of-speech tagging and sentence alignment. End-of-sentence punctuation marks are ambiguous; to disambiguate them most systems use brittle, special-purpose regular expression grammars and exception rules. As an alternative, we have developed an efficient, trainable algorithm that uses a lexicon with part-of-speech probabilities and a feed-forward neural network. After training for less than one minute, the method correctly labels over 98.5\% of sentence boundaries in a corpus of over 27,000 sentence-boundary marks. We show the method to be efficient and easily adaptable to different text genres, including single-case texts.Comment: This is a Latex version of the previously submitted ps file (formatted as a uuencoded gz-compressed .tar file created by csh script). The software from the work described in this paper is available by contacting [email protected]

arXiv.org e-Print Archive

CiteSeerX

Give Text A Chance: Advocating for Equal Consideration for Language and Visualization

Author: Hearst Marti A.
Stokes Chase
Publication venue
Publication date: 29/03/2024
Field of study

Visualization research tends to de-emphasize consideration of the textual context in which its images are placed. We argue that visualization research should consider textual representations as a primary alternative to visual options when assessing designs, and when assessing designs, equal attention should be given to the construction of the language as to the visualizations. We also call for a consideration of readability when integrating visualizations with written text. In highlighting these points, visualization research would be elevated in efficacy and demonstrate thorough accounting for viewers' needs and responses.Comment: 2 page

arXiv.org e-Print Archive

Can Natural Language Processing Become Natural Language Coaching?

Author: Marti A. Hearst
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2015
Field of study

How we teach and learn is undergoing a revolution, due to changes in technology and connectivity. Education may be one of the best application areas for advanced NLP techniques, and NLP researchers have much to contribute to this problem, especially in the areas of learning to write, mastery learning, and peer learning. In this paper I consider what happens when we convert natural language processors into natural language coaches. 1 Why Should You Care, NLP Researcher? There is a revolution in learning underway. Stu

CiteSeerX

Crossref

eScholarship - University of California

Improving the Recognizability of Syntactic Relations Using Contextualized Examples

Author: Aditi Muralidharan
Marti A. Hearst
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2014
Field of study

A common task in qualitative data analy-sis is to characterize the usage of a linguis-tic entity by issuing queries over syntac-tic relations between words. Previous in-terfaces for searching over syntactic struc-tures require programming-style queries. User interface research suggests that it is easier to recognize a pattern than to com-pose it from scratch; therefore, interfaces for non-experts should show previews of syntactic relations. What these previews should look like is an open question that we explored with a 400-participant Me-chanical Turk experiment. We found that syntactic relations are recognized with 34 % higher accuracy when contextual ex-amples are shown than a baseline of nam-ing the relations alone. This suggests that user interfaces should display contex-tual examples of syntactic relations to help users choose between different relations.

CiteSeerX

Crossref

eScholarship - University of California

Full Text and Figure Display Improves Bioscience Literature Search

Author: Anna Divoli
Marti A. Hearst
Michael A. Wooldridge
Robert P. Futrelle
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

When reading bioscience journal articles, many researchers focus attention on the figures and their captions. This observation led to the development of the BioText literature search engine [1], a freely available Web-based application that allows biologists to search over the contents of Open Access Journals, and see figures from the articles displayed directly in the search results. This article presents a qualitative assessment of this system in the form of a usability study with 20 biologist participants using and commenting on the system. 19 out of 20 participants expressed a desire to use a bioscience literature search engine that displays articles' figures alongside the full text search results. 15 out of 20 participants said they would use a caption search and figure display interface either frequently or sometimes, while 4 said rarely and 1 said undecided. 10 out of 20 participants said they would use a tool for searching the text of tables and their captions either frequently or sometimes, while 7 said they would use it rarely if at all, 2 said they would never use it, and 1 was undecided. This study found evidence, supporting results of an earlier study, that bioscience literature search systems such as PubMed should show figures from articles alongside search results. It also found evidence that full text and captions should be searched along with the article title, metadata, and abstract. Finally, for a subset of users and information needs, allowing for explicit search within captions for figures and tables is a useful function, but it is not entirely clear how to cleanly integrate this within a more general literature search interface. Such a facility supports Open Access publishing efforts, as it requires access to full text of documents and the lifting of restrictions in order to show figures in the search interface

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Integrating Prosodic and Lexical Cues for Automatic Topic Segmentation

Author: Andreas Stolcke
Dilek Hakkani-Tür
Elizabeth Shriberg
Grosz B.
Gökhan Tür
Hearst Marti A
Passonneau Rebecca J
Publication venue
Publication date: 01/01/2000
Field of study

We present a probabilistic model that uses both prosodic and lexical cues for the automatic segmentation of speech into topically coherent units. We propose two methods for combining lexical and prosodic information using hidden Markov models and decision trees. Lexical information is obtained from a speech recognizer, and prosodic features are extracted automatically from speech waveforms. We evaluate our approach on the Broadcast News corpus, using the DARPA-TDT evaluation metrics. Results show that the prosodic model alone is competitive with word-based segmentation methods. Furthermore, we achieve a significant reduction in error by combining the prosodic and word-based knowledge sources.Comment: 27 pages, 8 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

Bilkent University Institutional Repository

Recommended from our members

Prerendered User Interfaces for Higher-Assurance Electronic Voting

Author: Bellovin Steven Michael
Hearst Marti
Wagner David A.
Yee Ka-Ping
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2006
Field of study

We propose an electronic voting machine architecture in which the voting user interface is prerendered and published before election day. The prerendered user interface is a verifiable artifact—an electronic sample ballot—enabling public participation in the review, verification, usability testing, and accessibility testing of the ballot. Preparing the user interface outside of the voting machine dramatically reduces the amount and difficulty of software verification required to assure the correctness of the election result. We present a design for a high-assurance touchscreen voting machine that supports a wide range of user interface styles and demonstrate its feasibility by implementing it in less than 300 lines of Python code

Columbia University Academic Commons