5,553 research outputs found

    Hi, how can I help you?: Automating enterprise IT support help desks

    Full text link
    Question answering is one of the primary challenges of natural language understanding. In realizing such a system, providing complex long answers to questions is a challenging task as opposed to factoid answering as the former needs context disambiguation. The different methods explored in the literature can be broadly classified into three categories namely: 1) classification based, 2) knowledge graph based and 3) retrieval based. Individually, none of them address the need of an enterprise wide assistance system for an IT support and maintenance domain. In this domain the variance of answers is large ranging from factoid to structured operating procedures; the knowledge is present across heterogeneous data sources like application specific documentation, ticket management systems and any single technique for a general purpose assistance is unable to scale for such a landscape. To address this, we have built a cognitive platform with capabilities adopted for this domain. Further, we have built a general purpose question answering system leveraging the platform that can be instantiated for multiple products, technologies in the support domain. The system uses a novel hybrid answering model that orchestrates across a deep learning classifier, a knowledge graph based context disambiguation module and a sophisticated bag-of-words search system. This orchestration performs context switching for a provided question and also does a smooth hand-off of the question to a human expert if none of the automated techniques can provide a confident answer. This system has been deployed across 675 internal enterprise IT support and maintenance projects.Comment: To appear in IAAI 201

    An Extensible "SCHEMA-LESS" Database Framework for Managing High-Throughput Semi-Structured Documents

    Get PDF
    Object-Relational database management system is an integrated hybrid cooperative approach to combine the best practices of both the relational model utilizing SQL queries and the object-oriented, semantic paradigm for supporting complex data creation. In this paper, a highly scalable, information on demand database framework, called NETMARK, is introduced. NETMARK takes advantages of the Oracle 8i object-relational database using physical addresses data types for very efficient keyword search of records spanning across both context and content. NETMARK was originally developed in early 2000 as a research and development prototype to solve the vast amounts of unstructured and semistructured documents existing within NASA enterprises. Today, NETMARK is a flexible, high-throughput open database framework for managing, storing, and searching unstructured or semi-structured arbitrary hierarchal models, such as XML and HTML

    An Architecture for Text Management in Organizations

    Get PDF
    Most of the available on-line data/information in organizations that also is efficiently organized for storage and retrieval is numerical in nature. Along with numerical data/information, organizations also use a substantial amount of text-based data/information. With the advent of ecommerce and Intranets, more and more text-based information is now available on-line. While textual information can be a rich source of information to organizations, there are several issues regarding the efficient storage and retrieval of text-based data/information. This paper examines the issues with text storage and retrieval and proposes a high level architectural solution to overcome some issues. Many of the features in the proposed architecture are already implemented in various software solutions available today but in a fragmented fashion. The architecture emphasizes open standards to enable seamless sharing of text-based data/information in a networked environment

    Churn prediction based on text mining and CRM data analysis

    Get PDF
    Within quantitative marketing, churn prediction on a single customer level has become a major issue. An extensive body of literature shows that, today, churn prediction is mainly based on structured CRM data. However, in the past years, more and more digitized customer text data has become available, originating from emails, surveys or scripts of phone calls. To date, this data source remains vastly untapped for churn prediction, and corresponding methods are rarely described in literature. Filling this gap, we present a method for estimating churn probabilities directly from text data, by adopting classical text mining methods and combining them with state-of-the-art statistical prediction modelling. We transform every customer text document into a vector in a high-dimensional word space, after applying text mining pre-processing steps such as removal of stop words, stemming and word selection. The churn probability is then estimated by statistical modelling, using random forest models. We applied these methods to customer text data of a major Swiss telecommunication provider, with data originating from transcripts of phone calls between customers and call-centre agents. In addition to the analysis of the text data, a similar churn prediction was performed for the same customers, based on structured CRM data. This second approach serves as a benchmark for the text data churn prediction, and is performed by using random forest on the structured CRM data which contains more than 300 variables. Comparing the churn prediction based on text data to classical churn prediction based on structured CRM data, we found that the churn prediction based on text data performs as well as the prediction using structured CRM data. Furthermore we found that by combining both structured and text data, the prediction accuracy can be increased up to 10%. These results show clearly that text data contains valuable information and should be considered for churn estimation

    Enrichment of the Phenotypic and Genotypic Data Warehouse analysis using Question Answering systems to facilitate the decision making process in cereal breeding programs

    Get PDF
    Currently there are an overwhelming number of scientific publications in Life Sciences, especially in Genetics and Biotechnology. This huge amount of information is structured in corporate Data Warehouses (DW) or in Biological Databases (e.g. UniProt, RCSB Protein Data Bank, CEREALAB or GenBank), whose main drawback is its cost of updating that makes it obsolete easily. However, these Databases are the main tool for enterprises when they want to update their internal information, for example when a plant breeder enterprise needs to enrich its genetic information (internal structured Database) with recently discovered genes related to specific phenotypic traits (external unstructured data) in order to choose the desired parentals for breeding programs. In this paper, we propose to complement the internal information with external data from the Web using Question Answering (QA) techniques. We go a step further by providing a complete framework for integrating unstructured and structured information by combining traditional Databases and DW architectures with QA systems. The great advantage of our framework is that decision makers can compare instantaneously internal data with external data from competitors, thereby allowing taking quick strategic decisions based on richer data.This paper has been partially supported by the MESOLAP (TIN2010-14860) and GEODAS-BI (TIN2012-37493-C03-03) projects from the Spanish Ministry of Education and Competitivity. Alejandro Maté is funded by the Generalitat Valenciana under an ACIF grant (ACIF/2010/298)

    Ontology Population via NLP Techniques in Risk Management

    Get PDF
    In this paper we propose an NLP-based method for Ontology Population from texts and apply it to semi automatic instantiate a Generic Knowledge Base (Generic Domain Ontology) in the risk management domain. The approach is semi-automatic and uses a domain expert intervention for validation. The proposed approach relies on a set of Instances Recognition Rules based on syntactic structures, and on the predicative power of verbs in the instantiation process. It is not domain dependent since it heavily relies on linguistic knowledge. A description of an experiment performed on a part of the ontology of the PRIMA project (supported by the European community) is given. A first validation of the method is done by populating this ontology with Chemical Fact Sheets from Environmental Protection Agency . The results of this experiment complete the paper and support the hypothesis that relying on the predicative power of verbs in the instantiation process improves the performance.Information Extraction, Instance Recognition Rules, Ontology Population, Risk Management, Semantic Analysis
    corecore