12 research outputs found

    Designing Semantic Kernels as Implicit Superconcept Expansions

    Get PDF
    Recently, there has been an increased interest in the exploitation of background knowledge in the context of text mining tasks, especially text classification. At the same time, kernel-based learning algorithms like Support Vector Machines have become a dominant paradigm in the text mining community. Amongst other reasons, this is also due to their capability to achieve more accurate learning results by replacing standard linear kernel (bag-of-words) with customized kernel functions which incorporate additional apriori knowledge. In this paper we propose a new approach to the design of ‘semantic smoothing kernels’ by means of an implicit superconcept expansion using well-known measures of term similarity. The experimental evaluation on two different datasets indicates that our approach consistently improves performance in situations where (i) training data is scarce or (ii) the bag-ofwords representation is too sparse to build stable models when using the linear kernel

    Автоматические операции с запросами к машинам поиска интернета на основе тезауруса: подходы и оценки

    Full text link
    В работе рассматриваются предпосылки использования автоматических операций с запросами к машинам поиска (МП) интернета на основе тезауруса. В рамках предложенного подхода описаны операции перевода запроса, расширения запроса на основе шаблона, построения запроса на основе пути между двумя концепциями, а также ослабления запроса. Приведены примеры запросов, сформированных на основе тезауруса предметной области «Автоматический оптический контроль печатных плат», и соответствующие отклики МП Яндекс и Google. Обсуждаются результаты и даются рекомендации по практическому применению предложенных методов.Работа выполнена при поддержке РФФИ, грант № 03-07-90342

    Information Retrieval on Text using Concept Similarity

    Get PDF
    Retrieving proper information from internet is a huge task due to the high amount of information available there. Identifying the individual concepts according to the queries is time consuming. To retrieve documents, keyword based retrieval method was used before. Using this type searching, the relationship between associated keywords can’t be identified. If the same concept is described by different keywords, inaccurate and improper results will be retrieved. Concept based retrieval methods are the solution for this scenario. This gives the benefit of getting semantic relationships among concepts in finding relevant documents. Irrelevant documents can be eliminated by detecting conceptual mismatches, which is another benefit obtained from this. The main challenges identified are the ambiguity occurring due to multiple nature of words for the same concepts. Semantic analysis can reveal the conceptual relationships among words in a given document. In this paper the potential of concept-based information access via semantic analysis is explored with the help of a lexical database called WordNet. The mechanism is applied in the selected text documents and extracting the Synonym, Hyponym, Hypernym of each word from WordNet. The ranking will be calculated after checking the frequency rate of each word in the input documents and a hierarchy model will be generated according to the ranking

    Using Dempster-Shafer’s Evidence Theory for Query Expansion Based on Freebase Knowledge

    Full text link

    Estimating similarity among collaboration contributions

    Full text link

    Utilizing Knowledge Bases In Information Retrieval For Clinical Decision Support And Precision Medicine

    Get PDF
    Accurately answering queries that describe a clinical case and aim at finding articles in a collection of medical literature requires utilizing knowledge bases in capturing many explicit and latent aspects of such queries. Proper representation of these aspects needs knowledge-based query understanding methods that identify the most important query concepts as well as knowledge-based query reformulation methods that add new concepts to a query. In the tasks of Clinical Decision Support (CDS) and Precision Medicine (PM), the query and collection documents may have a complex structure with different components, such as disease and genetic variants that should be transformed to enable an effective information retrieval. In this work, we propose methods for representing domain-specific queries based on weighted concepts of different types whether exist in the query itself or extracted from the knowledge bases and top retrieved documents. Besides, we propose an optimization framework, which allows unifying query analysis and expansion by jointly determining the importance weights for the query and expansion concepts depending on their type and source. We also propose a probabilistic model to reformulate the query given genetic information in the query and collection documents. We observe significant improvement of retrieval accuracy will be obtained for our proposed methods over state-of-the-art baselines for the tasks of clinical decision support and precision medicine

    A Two-Level Information Modelling Translation Methodology and Framework to Achieve Semantic Interoperability in Constrained GeoObservational Sensor Systems

    Get PDF
    As geographical observational data capture, storage and sharing technologies such as in situ remote monitoring systems and spatial data infrastructures evolve, the vision of a Digital Earth, first articulated by Al Gore in 1998 is getting ever closer. However, there are still many challenges and open research questions. For example, data quality, provenance and heterogeneity remain an issue due to the complexity of geo-spatial data and information representation. Observational data are often inadequately semantically enriched by geo-observational information systems or spatial data infrastructures and so they often do not fully capture the true meaning of the associated datasets. Furthermore, data models underpinning these information systems are typically too rigid in their data representation to allow for the ever-changing and evolving nature of geo-spatial domain concepts. This impoverished approach to observational data representation reduces the ability of multi-disciplinary practitioners to share information in an interoperable and computable way. The health domain experiences similar challenges with representing complex and evolving domain information concepts. Within any complex domain (such as Earth system science or health) two categories or levels of domain concepts exist. Those concepts that remain stable over a long period of time, and those concepts that are prone to change, as the domain knowledge evolves, and new discoveries are made. Health informaticians have developed a sophisticated two-level modelling systems design approach for electronic health documentation over many years, and with the use of archetypes, have shown how data, information, and knowledge interoperability among heterogenous systems can be achieved. This research investigates whether two-level modelling can be translated from the health domain to the geo-spatial domain and applied to observing scenarios to achieve semantic interoperability within and between spatial data infrastructures, beyond what is possible with current state-of-the-art approaches. A detailed review of state-of-the-art SDIs, geo-spatial standards and the two-level modelling methodology was performed. A cross-domain translation methodology was developed, and a proof-of-concept geo-spatial two-level modelling framework was defined and implemented. The Open Geospatial Consortium’s (OGC) Observations & Measurements (O&M) standard was re-profiled to aid investigation of the two-level information modelling approach. An evaluation of the method was undertaken using II specific use-case scenarios. Information modelling was performed using the two-level modelling method to show how existing historical ocean observing datasets can be expressed semantically and harmonized using two-level modelling. Also, the flexibility of the approach was investigated by applying the method to an air quality monitoring scenario using a technologically constrained monitoring sensor system. This work has demonstrated that two-level modelling can be translated to the geospatial domain and then further developed to be used within a constrained technological sensor system; using traditional wireless sensor networks, semantic web technologies and Internet of Things based technologies. Domain specific evaluation results show that twolevel modelling presents a viable approach to achieve semantic interoperability between constrained geo-observational sensor systems and spatial data infrastructures for ocean observing and city based air quality observing scenarios. This has been demonstrated through the re-purposing of selected, existing geospatial data models and standards. However, it was found that re-using existing standards requires careful ontological analysis per domain concept and so caution is recommended in assuming the wider applicability of the approach. While the benefits of adopting a two-level information modelling approach to geospatial information modelling are potentially great, it was found that translation to a new domain is complex. The complexity of the approach was found to be a barrier to adoption, especially in commercial based projects where standards implementation is low on implementation road maps and the perceived benefits of standards adherence are low. Arising from this work, a novel set of base software components, methods and fundamental geo-archetypes have been developed. However, during this work it was not possible to form the required rich community of supporters to fully validate geoarchetypes. Therefore, the findings of this work are not exhaustive, and the archetype models produced are only indicative. The findings of this work can be used as the basis to encourage further investigation and uptake of two-level modelling within the Earth system science and geo-spatial domain. Ultimately, the outcomes of this work are to recommend further development and evaluation of the approach, building on the positive results thus far, and the base software artefacts developed to support the approach

    Contribution à la définition de modèles de recherche d'information flexibles basés sur les CP-Nets

    Get PDF
    This thesis addresses two main problems in IR: automatic query weighting and document semantic indexing. Our global contribution consists on the definition of a theoretical flexible information retrieval (IR) model based on CP-Nets. The CP-Net formalism is used for the graphical representation of flexible queries expressing qualitative preferences and for automatic weighting of such queries. Furthermore, the CP-Net formalism is used as an indexing language in order to represent document representative concepts and related relations in a roughly compact way. Concepts are identified by projection on WordNet. Concept relations are discovered by means of semantic association rules. A query evaluation mechanism based on CP-Nets graph similarity is also proposed.Ce travail de thèse adresse deux principaux problèmes en recherche d'information : (1) la formalisation automatique des préférences utilisateur, (ou la pondération automatique de requêtes) et (2) l'indexation sémantique. Dans notre première contribution, nous proposons une approche de recherche d'information (RI) flexible fondée sur l'utilisation des CP-Nets (Conditional Preferences Networks). Le formalisme CP-Net est utilisé d'une part, pour la représentation graphique de requêtes flexibles exprimant des préférences qualitatives et d'autre part pour l'évaluation flexible de la pertinence des documents. Pour l'utilisateur, l'expression de préférences qualitatives est plus simple et plus intuitive que la formulation de poids numériques les quantifiant. Cependant, un système automatisé raisonnerait plus simplement sur des poids ordinaux. Nous proposons alors une approche de pondération automatique des requêtes par quantification des CP-Nets correspondants par des valeurs d'utilité. Cette quantification conduit à un UCP-Net qui correspond à une requête booléenne pondérée. Une utilisation des CP-Nets est également proposée pour la représentation des documents dans la perspective d'une évaluation flexible des requêtes ainsi pondéreés. Dans notre seconde contribution, nous proposons une approche d'indexation conceptuelle basée sur les CP-Nets. Nous proposons d'utiliser le formalisme CP-Net comme langage d'indexation afin de représenter les concepts et les relations conditionnelles entre eux d'une manière relativement compacte. Les noeuds du CP-Net sont les concepts représentatifs du contenu du document et les relations entre ces noeuds expriment les associations conditionnelles qui les lient. Notre contribution porte sur un double aspect : d'une part, nous proposons une approche d'extraction des concepts en utilisant WordNet. Les concepts résultants forment les noeuds du CP-Net. D'autre part, nous proposons d'étendre et d'utiliser la technique de règles d'association afin de découvrir les relations conditionnelles entre les concepts noeuds du CP-Nets. Nous proposons enfin un mécanisme d'évaluation des requêtes basé sur l'appariement de graphes (les CP-Nets document et requête en l'occurrence)

    Modelo conceptual de gestión del conocimiento empresarial

    Get PDF
    Para hacer frente a los desafíos que presenta el mercado actual las organizaciones deben ser capaces de gestionar eficientemente el conocimiento que poseen. Sin embargo, con frecuencia, los gerentes organizacionales no pueden identificar dónde reside el valor del conocimiento que poseen, ni cómo usarlo como ventaja competitiva. En la literatura asociada se describen una multiplicidad de modelos e iniciativas, cada uno de ellos se enfoca en ciertos elementos de la Gestión del Conocimiento pero ninguno los resume a todos. Una estrategia de Gestión del Conocimiento debe estar basada en un entendimiento exhaustivo de lo que implica la Gestión del Conocimiento. En esta tesis se identifican un conjunto de requisitos que un modelo conceptual de Gestión del Conocimiento organizacional debería satisfacer para constituirse en marco de referencia para una implementación de Gestión de Conocimiento y para el desarrollo de tecnologías de información, y se muestra que ninguno de los modelos conceptuales de gestión del conocimiento propuestos en la bibliografía satisface todos estos requisitos. Surge entonces una problemática concreta que es la falta de un modelo conceptual unificado y más abarcativo que satisfaga todos los requisitos identificados en esta tesis que sirva como marco de referencia para las iniciativas de Gestión de Conocimiento y de desarrollo de tecnologías de información para su implementación concreta. Con el propósito de solucionar esta problemática, el objetivo de esta tesis es proponer un Modelo Conceptual para la Gestión del Conocimiento Organizacional que, cumpliendo con todos los requisitos identificados, integre tanto los aspectos tecnológicos como los sociales de este fenómeno. Sustentada en este modelo como marco de referencia, se propone una arquitectura de Memoria Organizacional Distribuida que se implementa en un sistema de tres capas (Onto-DOM) que aborda dos problemas comunes en implementaciones de estas características: la sobrecarga de documentación que implica para los trabajadores la elicitación de conocimiento para contribuir a los repositorios y la descontextualización del conocimiento producto de su conversión entre sus formas tácita y explícita. Se presentan además, una estrategia de anotado y una de recuperación basadas en ontologías, que permiten un tratamiento semántico automático de las fuentes de conocimiento organizacionales heterogéneas dentro de esta Memoria Organizacional.Fil: Ale, Mariel Alejandra. Universidad Tecnológica. Nacional. Facultad Regional Santa Fe; Argentina.Peer ReviewedSe presenta esta Tesis en cumplimiento de los requisitos exigidos por la Universidad Tecnológica Nacional para la obtención del grado académico de Doctor en Ingeniería, mención Sistemas de Informació
    corecore