3,748 research outputs found

    Soft matching for question answering

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Arabic Cooperative Answer Generation via Wikipedia Article Infoboxes

    Full text link
    [EN] The typical question-answering system is facing many challenges related to the processing of questions and information resources in the extraction and generation of adequate answers. These challenges increase when the requested answer is cooperative and its language is Arabic. In this paper, we propose an original approach to generate cooperative answers for user-definitional questions designed to be integrated in a question-answering system. This approach is mainly based on the exploitation of the semi-structured Web knowledge which consists in using features derived from Wikipedia article infoboxes to generate cooperative answers. It is globally independent of a particular language, which gives it the ability to be integrated in any definitional question-answering system. We have chosen to integrate and experiment it in a definitional question-answering system dealing with the Arabic language entitled DefArabicQA. The results showed that this system has a significant impact on the approach efficiency regarding the improvement of the quality of the answer.The work of the third author was partially funded by the Spanish Ministry of Economy, Industry and Competitiveness (MINECO) under the SomEMBED research project (TIN2015-71147-C2-1-P) and by the Generalitat Valenciana under the grant ALMAMATER (PrometeoII/2014/030).Trigui, O.; Belguith, L.; Rosso, P. (2017). Arabic Cooperative Answer Generation via Wikipedia Article Infoboxes. Research in Computing Science. 132:129-153. http://hdl.handle.net/10251/103731S12915313

    Multimedia Answering and Retrieval System based on CQA with Media Query Generation

    Get PDF
    The question answering system which has recently received an attention from the various information retrieval systems, machine learning, information extraction and the natural language processing the goal of the QAS is to retrieve the answer to the question than full documents. This question answering system which works on the various modules related only to the question processing, the document processing, and the answer processing. This QAS which doesn’t work properly with the main module which is questioning processing this system fails to categorize properly the questions. So to overcome the QAS the Community question answering (CQA) has gained popularity. As compare to QAS and automated QA sites the CQA sites are more effective. In this drawback available for community question answering system is that it only provides the textual answer. Here in this paper, we propose a scheme that enhances the textual answer with the multimedia data. The outline of Community question answering which mainly consists of three components: the selection of answer medium, the query generation for multimedia search and the selection and presentation of multimedia data. This approach automatically defines which type of media information should be added for the textual answer. Then it automatically collects the data from the web to supplement the answer.by handling an available dataset of QA pairs and adding them to a pool, in this, our approach is to allow a new multimedia question answering (MMQA) approach so as the users can find the answer in multimedia matching the questions pair those in the pool. Therefore, the users can approach MMQA from Web information will answer the questions in different media formats (text, video, and image) as particularly selected by the users

    Global rule induction for information extraction

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Structurally Tractable Uncertain Data

    Full text link
    Many data management applications must deal with data which is uncertain, incomplete, or noisy. However, on existing uncertain data representations, we cannot tractably perform the important query evaluation tasks of determining query possibility, certainty, or probability: these problems are hard on arbitrary uncertain input instances. We thus ask whether we could restrict the structure of uncertain data so as to guarantee the tractability of exact query evaluation. We present our tractability results for tree and tree-like uncertain data, and a vision for probabilistic rule reasoning. We also study uncertainty about order, proposing a suitable representation, and study uncertain data conditioned by additional observations.Comment: 11 pages, 1 figure, 1 table. To appear in SIGMOD/PODS PhD Symposium 201

    Semi-automated Ontology Generation for Biocuration and Semantic Search

    Get PDF
    Background: In the life sciences, the amount of literature and experimental data grows at a tremendous rate. In order to effectively access and integrate these data, biomedical ontologies – controlled, hierarchical vocabularies – are being developed. Creating and maintaining such ontologies is a difficult, labour-intensive, manual process. Many computational methods which can support ontology construction have been proposed in the past. However, good, validated systems are largely missing. Motivation: The biocuration community plays a central role in the development of ontologies. Any method that can support their efforts has the potential to have a huge impact in the life sciences. Recently, a number of semantic search engines were created that make use of biomedical ontologies for document retrieval. To transfer the technology to other knowledge domains, suitable ontologies need to be created. One area where ontologies may prove particularly useful is the search for alternative methods to animal testing, an area where comprehensive search is of special interest to determine the availability or unavailability of alternative methods. Results: The Dresden Ontology Generator for Directed Acyclic Graphs (DOG4DAG) developed in this thesis is a system which supports the creation and extension of ontologies by semi-automatically generating terms, definitions, and parent-child relations from text in PubMed, the web, and PDF repositories. The system is seamlessly integrated into OBO-Edit and Protégé, two widely used ontology editors in the life sciences. DOG4DAG generates terms by identifying statistically significant noun-phrases in text. For definitions and parent-child relations it employs pattern-based web searches. Each generation step has been systematically evaluated using manually validated benchmarks. The term generation leads to high quality terms also found in manually created ontologies. Definitions can be retrieved for up to 78% of terms, child ancestor relations for up to 54%. No other validated system exists that achieves comparable results. To improve the search for information on alternative methods to animal testing an ontology has been developed that contains 17,151 terms of which 10% were newly created and 90% were re-used from existing resources. This ontology is the core of Go3R, the first semantic search engine in this field. When a user performs a search query with Go3R, the search engine expands this request using the structure and terminology of the ontology. The machine classification employed in Go3R is capable of distinguishing documents related to alternative methods from those which are not with an F-measure of 90% on a manual benchmark. Approximately 200,000 of the 19 million documents listed in PubMed were identified as relevant, either because a specific term was contained or due to the automatic classification. The Go3R search engine is available on-line under www.Go3R.org

    Semi-automated Ontology Generation for Biocuration and Semantic Search

    Get PDF
    Background: In the life sciences, the amount of literature and experimental data grows at a tremendous rate. In order to effectively access and integrate these data, biomedical ontologies – controlled, hierarchical vocabularies – are being developed. Creating and maintaining such ontologies is a difficult, labour-intensive, manual process. Many computational methods which can support ontology construction have been proposed in the past. However, good, validated systems are largely missing. Motivation: The biocuration community plays a central role in the development of ontologies. Any method that can support their efforts has the potential to have a huge impact in the life sciences. Recently, a number of semantic search engines were created that make use of biomedical ontologies for document retrieval. To transfer the technology to other knowledge domains, suitable ontologies need to be created. One area where ontologies may prove particularly useful is the search for alternative methods to animal testing, an area where comprehensive search is of special interest to determine the availability or unavailability of alternative methods. Results: The Dresden Ontology Generator for Directed Acyclic Graphs (DOG4DAG) developed in this thesis is a system which supports the creation and extension of ontologies by semi-automatically generating terms, definitions, and parent-child relations from text in PubMed, the web, and PDF repositories. The system is seamlessly integrated into OBO-Edit and Protégé, two widely used ontology editors in the life sciences. DOG4DAG generates terms by identifying statistically significant noun-phrases in text. For definitions and parent-child relations it employs pattern-based web searches. Each generation step has been systematically evaluated using manually validated benchmarks. The term generation leads to high quality terms also found in manually created ontologies. Definitions can be retrieved for up to 78% of terms, child ancestor relations for up to 54%. No other validated system exists that achieves comparable results. To improve the search for information on alternative methods to animal testing an ontology has been developed that contains 17,151 terms of which 10% were newly created and 90% were re-used from existing resources. This ontology is the core of Go3R, the first semantic search engine in this field. When a user performs a search query with Go3R, the search engine expands this request using the structure and terminology of the ontology. The machine classification employed in Go3R is capable of distinguishing documents related to alternative methods from those which are not with an F-measure of 90% on a manual benchmark. Approximately 200,000 of the 19 million documents listed in PubMed were identified as relevant, either because a specific term was contained or due to the automatic classification. The Go3R search engine is available on-line under www.Go3R.org
    corecore