11 research outputs found

    Acronym-Expansion Disambiguation for Intelligent Processing of Enterprise Information

    Get PDF
    An acronym is an abbreviation of several words in such a way that the abbreviation itself forms a pronounceable word. Acronyms occur frequently throughout various documents, especially those of a technical nature, for example, research papers and patents. While these acronyms can enhance document readability, in a variety of fields, they have a negative effect on business intelligence. To resolve this problem, we propose a method of acronym-expansion disambiguation to collect high-quality enterprise information. In experimental evaluations, we demonstrate its efficiency through the use of objective comparisons

    NASA Pilot-Engaged Expert Response Using IBM Watson Technology: Prototype Evaluation of Knowledge Retrieval System

    Get PDF
    NASA Langley Research Center and IBM have been investigating the use of IBM Watson technology in aerospace research and development. One application of Watson technology is the Pilot-Engaged Expert Response (PEER) use case. The PEER system is envisioned as an in-cockpit advisor that will act as a source of situationally-relevant information for pilots and other flight crew members to assist in decision making about real-time events and situations that arise in the course of aircraft operations. PEER will make available vast stores of knowledge and information quickly and directly, putting important informational resources where they are needed most. IBM has worked with NASA to develop an architecture and articulate a roadmap for the development of the PEER system. That vision is built around Watson Discovery Advisor (WDA) software solution, derived from IBM's Jeopardy!-winning automatic question answering system. PEER makes use of WDA's sophisticated question-answering capabilities as its core, adding important User Interface components and other customizations for the cockpit environment, including communication with flight systems and other external data sources. The development plan for PEER includes four development stages, with the current project constituting the first phase. In this project, a prototype instance of PEER was successfully adapted to the aviation domain, enabling users to ask questions about aviation topics and receive useful and accurate answers to these questions. Major tasks accomplished include the development of procedures for domain adaptation through automatic lexicon extraction from domain glossaries; generation of question-answer training data which was used to train the system; and assessment of the effectiveness of domain adaptation, which showed a dramatic improvement in the ability of the PEER system to answer domain-relevant questions. In addition, the vision for the PEER system was pushed forward by the articulation of a plan for the automatic enhancement of question-answering with contextual information. This initial phase focused on two main goals: 1) the targeted domain adaptation of the underlying WDA system to the aviation domain; and, 2) the design of the software systems needed to leverage flight-contextual data. Domain adaptation of the WDA system proceeds via three main activities: Domain data ingestion, lexical customization and model training. A textual corpus consisting of 1,147 individual documents with more than 7.5 million words of text was ingested into the system and this served as the basis of all further development. A domain lexicon of over 3,500 aviation-domain terms was semi-automatically generated from domain documents and used to train the system. In addition, a set of over 500 question-answer (QA) pairs relevant to the PEER use case was developed; these were used to train and assess the system. These important first steps established the basis for the PEER system. In addition, steps were taken towards the integration of the PEER system into the cockpit environment with the development of a functional design for the Contextual Data Augmentation (CDA) subsystem. This subsystem brings to bear contextual data to improve system responses. It has three main submodules: the Contextual Data Collection module, the Contextual Data Selection module, and the Contextual QA Augmentation module. These modules form a processing pipeline that addresses the problems associated with automatically integrating information from external resources into the knowledge-retrieval mechanism

    Automatic pure anchor-based taxonomy generation from the world wide web.

    Get PDF
    This thesis proposes a new method of automatic taxonomy generation using the link structure of Webpages. Taxonomy is a hierarchy of concepts where each child concept is said to be encompassed by its parent concept. Techniques have previously been developed to extract taxonomies from a traditional text corpus, but this thesis relies exclusively on the links between documents in the corpus, as opposed to the text of the corpus itself. A series of algorithms were designed and implemented to realize the objectives of this thesis. These programs perform comparably to other techniques using the text in the documents and have shown that there is information available in the link structure of Webpages when creating concept taxonomies

    Location histogram privacy by sensitive location hiding and target histogram avoidance/resemblance

    Get PDF
    A location histogram is comprised of the number of times a user has visited locations as they move in an area of interest, and it is often obtained from the user in the context of applications such as recommendation and advertising. However, a location histogram that leaves a user's computer or device may threaten privacy when it contains visits to locations that the user does not want to disclose (sensitive locations), or when it can be used to profile the user in a way that leads to price discrimination and unsolicited advertising (e.g. as 'wealthy' or 'minority member'). Our work introduces two privacy notions to protect a location histogram from these threats: sensitive location hiding, which aims at concealing all visits to sensitive locations, and target avoidance/resemblance, which aims at concealing the similarity/dissimilarity of the user's histogram to a target histogram that corresponds to an undesired/desired profile. We formulate an optimization problem around each notion: Sensitive Location Hiding (SLH), which seeks to construct a histogram that is as similar as possible to the user's histogram but associates all visits with nonsensitive locations, and Target Avoidance/Resemblance (TA/TR), which seeks to construct a histogram that is as dissimilar/similar as possible to a given target histogram but remains useful for getting a good response from the application that analyzes the histogram. We develop an optimal algorithm for each notion, which operates on a notion-specific search space graph and finds a shortest or longest path in the graph that corresponds to a solution histogram. In addition, we develop a greedy heuristic for the TA/TR problem, which operates directly on a user's histogram. Our experiments demonstrate that all algorithms are effective at preserving the distribution of locations in a histogram and the quality of location recommendation. They also demonstrate that the heuristic produces near-optimal solutions while being orders of magnitude faster than the optimal algorithm for TA/TR

    A Taxonomy Learning Method And Its Application To Characterize a Scientific Web Community

    No full text
    The need to extract and manage domain-specific taxonomies has become increasingly relevant in recent years. A taxonomy is a form of business intelligence used to integrate information, reduce semantic heterogeneity, describe emergent communities and interest groups, and facilitate communication between information systems. We present a semiautomated strategy to extract domain-specific taxonomies from Web documents and its application to model a Network of Excellence in the emerging research field of enterprise interoperability

    A Taxonomy Learning Method and its Application to Characterize a Scientific web Community

    No full text
    The need to extract and manage domain-specific taxonomies has become increasingly relevant in recent years. A taxonomy is a form of business intelligence used to integrate information, reduce semantic heterogeneity, describe emergent communities and interest groups, and facilitate communication between information systems. We present a semiautomated strategy to extract domain-specific taxonomies from Web documents and its application to model a Network of Excellence in the emerging research field of enterprise interoperability

    A Taxonomy Learning Method and its Application to Characterize a Scientific web Community

    No full text
    The need to extract and manage domain-specific taxonomies has become increasingly relevant in recent years. A taxonomy is a form of business intelligence used to integrate information, reduce semantic heterogeneity, describe emergent communities and interest groups, and facilitate communication between information systems. We present a semiautomated strategy to extract domain-specific taxonomies from Web documents and its application to model a Network of Excellence in the emerging research field of enterprise interoperability

    A Taxonomy Learning Method and Its Application to Characterize a Scientific Web Community

    No full text
    corecore