81 research outputs found

    Natural Language Processing for Information Retrieval and Knowledge Discovery

    Get PDF
    Natural Language Processing (NLP) is a powerful technology for the vital tasks of information retrieval (IR) and knowledge discovery (KD) which, in turn, feed the visualization systems of the present and future and enable knowledge workers to focus more of their time on the vital tasks of analysis and prediction.published or submitted for publicatio

    Natural Language Processing

    Get PDF
    Natural Language Processing (NLP) is the computerized approach to analyzing text that is based on both a set of theories and a set of technologies. And, being a very active area of research and development, there is not a single agreed-upon definition that would satisfy everyone, but there are some aspects, which would be part of any knowledgeable person’s definition

    Document Retrieval, Automatic

    Get PDF
    Document Retrieval is the computerized process of producing a relevance ranked list of documents in response to an inquirer’s request by comparing their request to an automatically produced index of the documents in the system. Everyone uses such systems today in the form of web-based search engines. While evolving from a fairly small discipline in the 1940s, to a large, profitable industry today, the field has maintained a healthy research focus, supported by test collections and large-scale annual comparative tests of systems. A document retrieval system is comprised of three core modules: document processor, query analyzer, and matching function. There are several theoretical models on which document retrieval systems are based: Boolean, Vector Space, Probabilistic, and Language Model

    Searching and Search Engines: When is Current Research Going to Lead to Major Progress?

    Get PDF
    For many years, users of commercial search engines have been hearing how the latest in information and computer science research is going to improve the quality of the engines they rely on for fulfilling their daily information needs. However, despite what is heard, these promises have not been fulfilled. While the Internet has dramatically increased the amount of information to which users now have access, the key issue appears to be unresolved – the results for substantive queries are not improving. However, the past need not predict the future because sophisticated advances in Natural Language Processing (NLP) have, in fact, accomplished significant improvements in engines that can provide ease of access for users as well as improved quality of retrieved information

    A Breadth of NLP Applications

    Get PDF
    The Center for Natural Language Processing (CNLP) was founded in September 1999 in the School of Information Studies, the “Original Information School”, at Syracuse University. CNLP’s mission is to advance the development of human-like, language understanding software capabilities for government, commercial, and consumer applications. The Center conducts both basic and applied research, building on its recognized capabilities in Natural Language Processing. The Center’s seventeen employees are a mix of doctoral students in information science or computer engineering, software engineers, linguistic analysts, and research engineers

    Use of Subject Field Codes from a Machine-Readable Dictionary for Automatic Classification of Documents

    Get PDF
    We are currently eveloping a system whose goal is to emulate a human classifier who peruses a large set of documents and sons them into richly defined classes based solely on the subject content of the documents. To accomplish this task, our system tags each word in a document with the appropriate Subject Field Code (SFC) from a machine-readable dictionary. The within- document SFCs are then summed and normalized and each document is represented as a vector of the SFCs occurring in that document. These vectors are clustered using Ward's agglomerative clustering algorithm (Ward, 1963) to form classes in a document database. For retrieval, queries are likewise represented as SFC vectors and then matched to the prototype SFC vector of each cluster in the database. Clusters whose prototype SFC vectors exhibit a predetermined criterion of similarity to the query SFC vector are passed on to other system components for more computationally expensive representation and matching

    Towards the use of situational information in information retrieval

    Get PDF
    This paper is an exploratory study of one approach to incorporating situational information into information retrieval systems, drawing on principles and methods of discourse linguistics. A tenet of discourse linguistics is that texts of a specific type possess a structure above the syntactic level, which follows conventions known to the people using such texts to communicate. In some cases, such as literature describing work done, the structure is closely related to situations, and may therefore be a useful representational vehicle for the present purpose. Abstracts of empirical research papers exhibit a well-defined discourse- level structure, which is revealed by lexical clues. Two methods of detecting the structure automatically are presented: (i) a Bayesian probabilistic analysis; and (ii) a neural network model. Both methods show promise in preliminary implementations. A study of users\u27 oral problem statements indicates that they are not amenable to the same kind of processing. However, from in-depth interviews with users and search intermediaries, the following conclusions are drawn: (i) the notion of a generic research script is meaningful to both users and intermediaries as a high-level description of situation; (ii) a researcher\u27s position in the script is a predictor of the relevance of documents; and (iii) currently, intermediaries can make very little use of situational information. The implications of these findings for system design are discussed, and a system structure presented to serve as a framework for future experimental work on the factors identified in this paper. The design calls for a dialogue with the user on his or her position in a research script and incorporates features permitting discourse-level components of abstracts to be specified in search strategies

    Discerning Emotions in Texts

    Get PDF
    We present an empirically verified model of discernable emotions, Watson and Tellegen’s Circumplex Theory of Affect from social and personality psychology, and suggest its usefulness in NLP as a potential model for an automation of an eight-fold categorization of emotions in written English texts. We developed a data collection tool based on the model, collected 287 responses from 110 non-expert informants based on 50 emotional excerpts (min=12, max=348, average=86 words), and analyzed the inter-coder agreement per category and per strength of ratings per sub-category. The respondents achieved an average 70.7% agreement in the most commonly identified emotion categories per text. The categories of high positive affect and pleasantness were most common in our data. Within those categories, the affective terms “enthusiastic”, “active”, “excited”, “pleased”, and “satisfied” had the most consistent ratings of strength of presence in the texts. The textual clues the respondents chose had comparable length and similar key words. Watson and Tellegen’s model appears to be usable as a guide for development of an NLP algorithm for automated identification of emotion in English texts, and the non-expert informants (with college degree and higher) provided sufficient information for future creation of a gold standard of clues per category

    Evaluating diverse electronic consultation programs with a common framework.

    Get PDF
    BackgroundElectronic consultation is an emerging mode of specialty care delivery that allows primary care providers and their patients to obtain specialist expertise without an in-person visit. While studies of individual programs have demonstrated benefits related to timely access to specialty care, electronic consultation programs have not achieved widespread use in the United States. The lack of common evaluation metrics across health systems and concerns related to the generalizability of existing evaluation efforts may be hampering further growth. We sought to identify gaps in knowledge related to the implementation of electronic consultation programs and develop a set of shared evaluation measures to promote further diffusion.MethodsUsing a case study approach, we apply the Reach, Effectiveness, Adoption, Implementation and Maintenance (RE-AIM) and the Quadruple Aim frameworks of evaluation to examine electronic consultation implementation across diverse delivery systems. Data are from 4 early adopter healthcare delivery systems (San Francisco Health Network, Mayo Clinic, Veterans Administration, Champlain Local Health Integration Network) that represent varied organizational structures, care for different patient populations, and have well-established multi-specialty electronic consultation programs. Data sources include published and unpublished quantitative data from each electronic consultation database and qualitative data from systems' end-users.ResultsOrganizational drivers of electronic consultation implementation were similar across the systems (challenges with timely and/or efficient access to specialty care), though unique system-level facilitators and barriers influenced reach, adoption and design. Effectiveness of implementation was consistent, with improved patient access to timely, perceived high-quality specialty expertise with few negative consequences, garnering high satisfaction among end-users. Data about patient-specific clinical outcomes are lacking, as are policies that provide guidance on the legal implications of electronic consultation and ideal remuneration strategies.ConclusionA core set of effectiveness and implementation metrics rooted in the Quadruple Aim may promote data-driven improvements and further diffusion of successful electronic consultation programs
    • …
    corecore