12 research outputs found

    THE LINGUIST'S SEARCH ENGINE: GETTING STARTED GUIDE

    Get PDF
    The World Wide Web can be viewed as a naturally occurring resource that embodies the rich and dynamic nature of language, a data repository of unparalleled size and diversity. However, current Web search methods are oriented more toward shallow information retrieval techniques than toward the more sophisticated needs of linguists. Using the Web in linguistic research is not easy. It will, however, be getting easier. This report introduces the Linguist's Search Engine, a new linguist-friendly tool that makes it possible to retrieve naturally occurring sentences from the World Wide Web on the basis of lexical content and syntactic structure. Its aim is to help linguists of all stripes in conducting more thoroughly empirical exploration of evidence, with particular attention to variability and the role of context. LAMP-TR-108 UMIACS-TR-2003-10

    Blind men and elephants: What do citation summaries tell us about a research article?

    Full text link
    The old Asian legend about the blind men and the elephant comes to mind when looking at how different authors of scientific papers describe a piece of related prior work. It turns out that different citations to the same paper often focus on different aspects of that paper and that neither provides a full description of its full set of contributions. In this article, we will describe our investigation of this phenomenon. We studied citation summaries in the context of research papers in the biomedical domain. A citation summary is the set of citing sentences for a given article and can be used as a surrogate for the actual article in a variety of scenarios. It contains information that was deemed by peers to be important. Our study shows that citation summaries overlap to some extent with the abstracts of the papers and that they also differ from them in that they focus on different aspects of these papers than do the abstracts. In addition to this, co-cited articles (which are pairs of articles cited by another article) tend to be similar. We show results based on a lexical similarity metric called cohesion to justify our claims.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/57540/1/20707_ftp.pd

    Michigan Molecular Interactions (MiMI): putting the jigsaw puzzle together

    Get PDF
    Protein interaction data exists in a number of repositories. Each repository has its own data format, molecule identifier and supplementary information. Michigan Molecular Interactions (MiMI) assists scientists searching through this overwhelming amount of protein interaction data. MiMI gathers data from well-known protein interaction databases and deep-merges the information. Utilizing an identity function, molecules that may have different identifiers but represent the same real-world object are merged. Thus, MiMI allows the users to retrieve information from many different databases at once, highlighting complementary and contradictory information. To help scientists judge the usefulness of a piece of data, MiMI tracks the provenance of all data. Finally, a simple yet powerful user interface aids users in their queries, and frees them from the onerous task of knowing the data format or learning a query language. MiMI allows scientists to query all data, whether corroborative or contradictory, and specify which sources to utilize. MiMI is part of the National Center for Integrative Biomedical Informatics (NCIBI) and is publicly available at:

    Making database systems usable

    No full text
    Database researchers have striven to improve the capability of a database in terms of both performance and functionality. We assert that the usability of a database is as important as its capability. In this paper, we study why database systems today are so difficult to use. We identify a set of five pain points and propose a research agenda to address these. In particular, we introduce a presentation data model and recommend direct data manipulation with a schema later approach. We also stress the importance of provenance and of consistency across presentation models
    corecore