5,064 research outputs found

    Hybrid approach for disease comorbidity and disease gene prediction using heterogeneous dataset

    Get PDF
    High throughput analysis and large scale integration of biological data led to leading researches in the field of bioinformatics. Recent years witnessed the development of various methods for disease associated gene prediction and disease comorbidity predictions. Most of the existing techniques use network-based approaches and similarity-based approaches for these predictions. Even though network-based approaches have better performance, these methods rely on text data from OMIM records and PubMed abstracts. In this method, a novel algorithm (HDCDGP) is proposed for disease comorbidity prediction and disease associated gene prediction. Disease comorbidity network and disease gene network were constructed using data from gene ontology (GO), human phenotype ontology (HPO), protein-protein interaction (PPI) and pathway dataset. Modified random walk restart algorithm was applied on these networks for extracting novel disease-gene associations. Experimental results showed that the hybrid approach has better performance compared to existing systems with an overall accuracy around 85%

    CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information

    Full text link
    Open Information Extraction (OpenIE) methods extract (noun phrase, relation phrase, noun phrase) triples from text, resulting in the construction of large Open Knowledge Bases (Open KBs). The noun phrases (NPs) and relation phrases in such Open KBs are not canonicalized, leading to the storage of redundant and ambiguous facts. Recent research has posed canonicalization of Open KBs as clustering over manuallydefined feature spaces. Manual feature engineering is expensive and often sub-optimal. In order to overcome this challenge, we propose Canonicalization using Embeddings and Side Information (CESI) - a novel approach which performs canonicalization over learned embeddings of Open KBs. CESI extends recent advances in KB embedding by incorporating relevant NP and relation phrase side information in a principled manner. Through extensive experiments on multiple real-world datasets, we demonstrate CESI's effectiveness.Comment: Accepted at WWW 201

    A Survey on Identification of Motifs and Ontology in Medical Database

    Get PDF
    Motifs and ontology are used in medical database for identifyingand diagnose of the disease. A motif is a pattern network used for analysis of the disease. It also identifies the pattern of the signal. Based on the motifs the disease can be predicted, classified and diagnosed. Ontology is knowledge based representation, and it is used as a user interface to diagnose the disease. Ontology is also used by medical expert to diagnose and analyse the disease easily. Gene ontology is used to express the gene of the disease

    Comparative cluster labelling involving external text sources

    Get PDF
    Giving clear, straightforward names to individual result groups of clustering data is most important in making research usable. This is especially so when clustering is the real outcome of the analysis and not just a tool for data preparation. In this case, the underlying concept of the cluster itself makes the result meaningful and useful. However, a cluster comes alive only in the investigator’s mind since it can be defined or described in words. Our method introduced in this paper aims to facilitate and partly automate this verbal characterisation process. The external text database is joined to the objects of the clustering that adds new, previously unused features to the data set. Clusters are described by labels produced by text mining analytics. The validity of clustering can be characterised by the shape of the final word cloud

    Proceedings of the First Workshop on Computing News Storylines (CNewsStory 2015)

    Get PDF
    This volume contains the proceedings of the 1st Workshop on Computing News Storylines (CNewsStory 2015) held in conjunction with the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2015) at the China National Convention Center in Beijing, on July 31st 2015. Narratives are at the heart of information sharing. Ever since people began to share their experiences, they have connected them to form narratives. The study od storytelling and the field of literary theory called narratology have developed complex frameworks and models related to various aspects of narrative such as plots structures, narrative embeddings, characters’ perspectives, reader response, point of view, narrative voice, narrative goals, and many others. These notions from narratology have been applied mainly in Artificial Intelligence and to model formal semantic approaches to narratives (e.g. Plot Units developed by Lehnert (1981)). In recent years, computational narratology has qualified as an autonomous field of study and research. Narrative has been the focus of a number of workshops and conferences (AAAI Symposia, Interactive Storytelling Conference (ICIDS), Computational Models of Narrative). Furthermore, reference annotation schemes for narratives have been proposed (NarrativeML by Mani (2013)). The workshop aimed at bringing together researchers from different communities working on representing and extracting narrative structures in news, a text genre which is highly used in NLP but which has received little attention with respect to narrative structure, representation and analysis. Currently, advances in NLP technology have made it feasible to look beyond scenario-driven, atomic extraction of events from single documents and work towards extracting story structures from multiple documents, while these documents are published over time as news streams. Policy makers, NGOs, information specialists (such as journalists and librarians) and others are increasingly in need of tools that support them in finding salient stories in large amounts of information to more effectively implement policies, monitor actions of “big players” in the society and check facts. Their tasks often revolve around reconstructing cases either with respect to specific entities (e.g. person or organizations) or events (e.g. hurricane Katrina). Storylines represent explanatory schemas that enable us to make better selections of relevant information but also projections to the future. They form a valuable potential for exploiting news data in an innovative way.JRC.G.2-Global security and crisis managemen

    Visualizing Incongruity: Visual Data Mining Strategies for Modeling Humor in Text

    Get PDF
    The goal of this project is to investigate the use of visual data mining to model verbal humor. We explored various means of text visualization to identify key featrues of garden path jokes as compared with non jokes. With garden path jokes one interpretation is established in the setup but new information indicating some alternative interpretation triggers some resolution process leading to a new interpretation. For this project we visualize text in three novel ways, assisted by some web mining to build an informal ontology, that allow us to see the differences between garden path jokes and non jokes of similar form. We used the results of the visualizations to build a rule based model which was then compared with models from tradtitional data mining toi show the use of visual data mining. Additional experiments with other forms of incongruity including visualization of ’shilling’ or the introduction of false reviews into a product review set. The results are very similar to that of garden path jokes and start to show us there is a shape to incongruity. Overall this project shows as that the proposed methodologies and tools offer a new approach to testing and generating hypotheses related to theories of humor as well as other phenomena involving opposition, incongruities, and shifts in classification

    Automatic reconstruction of itineraries from descriptive texts

    Get PDF
    Esta tesis se inscribe dentro del marco del proyecto PERDIDO donde los objetivos son la extracción y reconstrucción de itinerarios a partir de documentos textuales. Este trabajo se ha realizado en colaboración entre el laboratorio LIUPPA de l' Université de Pau et des Pays de l' Adour (France), el grupo de Sistemas de Información Avanzados (IAAA) de la Universidad de Zaragoza y el laboratorio COGIT de l' IGN (France). El objetivo de esta tesis es concebir un sistema automático que permita extraer, a partir de guías de viaje o descripciones de itinerarios, los desplazamientos, además de representarlos sobre un mapa. Se propone una aproximación para la representación automática de itinerarios descritos en lenguaje natural. Nuestra propuesta se divide en dos tareas principales. La primera pretende identificar y extraer de los textos describiendo itinerarios información como entidades espaciales y expresiones de desplazamiento o percepción. El objetivo de la segunda tarea es la reconstrucción del itinerario. Nuestra propuesta combina información local extraída gracias al procesamiento del lenguaje natural con datos extraídos de fuentes geográficas externas (por ejemplo, gazetteers). La etapa de anotación de informaciones espaciales se realiza mediante una aproximación que combina el etiquetado morfo-sintáctico y los patrones léxico-sintácticos (cascada de transductores) con el fin de anotar entidades nombradas espaciales y expresiones de desplazamiento y percepción. Una primera contribución a la primera tarea es la desambiguación de topónimos, que es un problema todavía mal resuelto dentro del reconocimiento de entidades nombradas (Named Entity Recognition - NER) y esencial en la recuperación de información geográfica. Se plantea un algoritmo no supervisado de georreferenciación basado en una técnica de clustering capaz de proponer una solución para desambiguar los topónimos los topónimos encontrados en recursos geográficos externos, y al mismo tiempo, la localización de topónimos no referenciados. Se propone un modelo de grafo genérico para la reconstrucción automática de itinerarios, donde cada nodo representa un lugar y cada arista representa un camino enlazando dos lugares. La originalidad de nuestro modelo es que además de tener en cuenta los elementos habituales (caminos y puntos del recorrido), permite representar otros elementos involucrados en la descripción de un itinerario, como por ejemplo los puntos de referencia visual. Se calcula de un árbol de recubrimiento mínimo a partir de un grafo ponderado para obtener automáticamente un itinerario bajo la forma de un grafo. Cada arista del grafo inicial se pondera mediante un método de análisis multicriterio que combina criterios cualitativos y cuantitativos. El valor de estos criterios se determina a partir de informaciones extraídas del texto e informaciones provenientes de recursos geográficos externos. Por ejemplo, se combinan las informaciones generadas por el procesamiento del lenguaje natural como las relaciones espaciales describiendo una orientación (ej: dirigirse hacia el sur) con las coordenadas geográficas de lugares encontrados dentro de los recursos para determinar el valor del criterio ``relación espacial''. Además, a partir de la definición del concepto de itinerario y de las informaciones utilizadas en la lengua para describir un itinerario, se ha modelado un lenguaje de anotación de información espacial adaptado a la descripción de desplazamientos, apoyándonos en las recomendaciones del consorcio TEI (Text Encoding and Interchange). Finalmente, se ha implementado y evaluado las diferentes etapas de nuestra aproximación sobre un corpus multilingüe de descripciones de senderos y excursiones (francés, español, italiano)
    corecore