700 research outputs found

    Review of Semantic Importance and Role of using Ontologies in Web Information Retrieval Techniques

    Get PDF
    The Web contains an enormous amount of information, which is managed to accumulate, researched, and regularly used by many users. The nature of the Web is multilingual and growing very fast with its diverse nature of data including unstructured or semi-structured data such as Websites, texts, journals, and files. Obtaining critical relevant data from such vast data with its diverse nature has been a monotonous and challenging task. Simple key phrase data gathering systems rely heavily on statistics, resulting in a word incompatibility problem related to a specific word's inescapable semantic and situation variants. As a result, there is an urgent need to arrange such colossal data systematically to find out the relevant information that can be quickly analyzed and fulfill the users' needs in the relevant context. Over the years ontologies are widely used in the semantic Web to contain unorganized information systematic and structured manner. Still, they have also significantly enhanced the efficiency of various information recovery approaches. Ontological information gathering systems recover files focused on the semantic relation of the search request and the searchable information. This paper examines contemporary ontology-based information extraction techniques for texts, interactive media, and multilingual data types. Moreover, the study tried to compare and classify the most significant developments utilized in the search and retrieval techniques and their major disadvantages and benefits

    Relation Discovery from Web Data for Competency Management

    Get PDF
    This paper describes a technique for automatically discovering associations between people and expertise from an analysis of very large data sources (including web pages, blogs and emails), using a family of algorithms that perform accurate named-entity recognition, assign different weights to terms according to an analysis of document structure, and access distances between terms in a document. My contribution is to add a social networking approach called BuddyFinder which relies on associations within a large enterprise-wide "buddy list" to help delimit the search space and also to provide a form of 'social triangulation' whereby the system can discover documents from your colleagues that contain pertinent information about you. This work has been influential in the information retrieval community generally, as it is the basis of a landmark system that achieved overall first place in every category in the Enterprise Search Track of TREC2006

    Discovering Relations by Entity Search in Lightweight Semantic Text Graphs

    Get PDF
    Entity search is becoming a popular alternative for full text search. Recently Google released its entity search based on confirmed, human-generated data such as Wikipedia. In spite of these developments, the task of entity discovery, search, or relation search in unstructured text remains a major challenge in the fields of information retrieval and information extraction. This paper tries to address that challenge, focusing specifically on entity relation discovery. This is achieved by processing unstructured text using simple information extraction methods, building lightweight semantic graphs and reusing them for entity relation discovery by applying algorithms from graph theory. An important part is also user interaction with semantic graphs, which can significantly improve information extraction results and entity relation search. Entity relations can be discovered by various text mining methods, but the advantage of the presented method lies in the similarity between the lightweight semantics extracted from a text and the information networks available as structured data. Both graph structures have similar properties and similar relation discovery algorithms can be applied. In addition, we can benefit from the integration of such graph data. We provide both a relevance and performance evaluations of the approach and showcase it in several use case applications

    Technology Forecasting Using Data Mining and Semantics: First Annual Report

    Get PDF
    The planning and management of research and development is a challenging process which is compounded by the large amounts of information which is available. The goal of this project is to mine science and technology databases for patterns and trends which facilitate the formation of research strategies. Examples of the types of information sources which we exploit are diverse and include academic journals, patents, blogs and news stories. The intended outputs of the project include growth forecasts for various technological sectors (with an emphasis on sustainable energy), an improved understanding of the underlying research landscape, as well as the identification of influential researchers or research groups. This paper focuses on the development of techniques to both organize and visualize the data in a way which reflects the semantic relationships between keywords. We studied the use of the joint term frequencies of pairs of keywords, as a means of characterizing this semantic relationship – this is based on the intuition that terms which frequently appear together are more likely to be closely related. Some of the results reported herein describe: (1) Using appropriate tools and methods, exploitable patterns and information can certainly be extracted from publicly available databases, (2) Adaptation of the Normalized Google Distance (NGD) formalism can provide measures of keyword distances that facilitate keyword clustering and hierarchical visualization, (3) Further adaptation of the NGD formalism can be used to provide an asymmetric measure of keyword distances to allow the automatic creation of a keyword taxonomy, and (4) Adaptation of the Latent Semantic Approach (LSA) can be used to identify concepts underlying collections of keywords

    A conceptual framework and a risk management approach for interoperability between geospatial datacubes

    Get PDF
    De nos jours, nous observons un intérêt grandissant pour les bases de données géospatiales multidimensionnelles. Ces bases de données sont développées pour faciliter la prise de décisions stratégiques des organisations, et plus spécifiquement lorsqu’il s’agit de données de différentes époques et de différents niveaux de granularité. Cependant, les utilisateurs peuvent avoir besoin d’utiliser plusieurs bases de données géospatiales multidimensionnelles. Ces bases de données peuvent être sémantiquement hétérogènes et caractérisées par différent degrés de pertinence par rapport au contexte d’utilisation. Résoudre les problèmes sémantiques liés à l’hétérogénéité et à la différence de pertinence d’une manière transparente aux utilisateurs a été l’objectif principal de l’interopérabilité au cours des quinze dernières années. Dans ce contexte, différentes solutions ont été proposées pour traiter l’interopérabilité. Cependant, ces solutions ont adopté une approche non systématique. De plus, aucune solution pour résoudre des problèmes sémantiques spécifiques liés à l’interopérabilité entre les bases de données géospatiales multidimensionnelles n’a été trouvée. Dans cette thèse, nous supposons qu’il est possible de définir une approche qui traite ces problèmes sémantiques pour assurer l’interopérabilité entre les bases de données géospatiales multidimensionnelles. Ainsi, nous définissons tout d’abord l’interopérabilité entre ces bases de données. Ensuite, nous définissons et classifions les problèmes d’hétérogénéité sémantique qui peuvent se produire au cours d’une telle interopérabilité de différentes bases de données géospatiales multidimensionnelles. Afin de résoudre ces problèmes d’hétérogénéité sémantique, nous proposons un cadre conceptuel qui se base sur la communication humaine. Dans ce cadre, une communication s’établit entre deux agents système représentant les bases de données géospatiales multidimensionnelles impliquées dans un processus d’interopérabilité. Cette communication vise à échanger de l’information sur le contenu de ces bases. Ensuite, dans l’intention d’aider les agents à prendre des décisions appropriées au cours du processus d’interopérabilité, nous évaluons un ensemble d’indicateurs de la qualité externe (fitness-for-use) des schémas et du contexte de production (ex., les métadonnées). Finalement, nous mettons en œuvre l’approche afin de montrer sa faisabilité.Today, we observe wide use of geospatial databases that are implemented in many forms (e.g., transactional centralized systems, distributed databases, multidimensional datacubes). Among those possibilities, the multidimensional datacube is more appropriate to support interactive analysis and to guide the organization’s strategic decisions, especially when different epochs and levels of information granularity are involved. However, one may need to use several geospatial multidimensional datacubes which may be semantically heterogeneous and having different degrees of appropriateness to the context of use. Overcoming the semantic problems related to the semantic heterogeneity and to the difference in the appropriateness to the context of use in a manner that is transparent to users has been the principal aim of interoperability for the last fifteen years. However, in spite of successful initiatives, today's solutions have evolved in a non systematic way. Moreover, no solution has been found to address specific semantic problems related to interoperability between geospatial datacubes. In this thesis, we suppose that it is possible to define an approach that addresses these semantic problems to support interoperability between geospatial datacubes. For that, we first describe interoperability between geospatial datacubes. Then, we define and categorize the semantic heterogeneity problems that may occur during the interoperability process of different geospatial datacubes. In order to resolve semantic heterogeneity between geospatial datacubes, we propose a conceptual framework that is essentially based on human communication. In this framework, software agents representing geospatial datacubes involved in the interoperability process communicate together. Such communication aims at exchanging information about the content of geospatial datacubes. Then, in order to help agents to make appropriate decisions during the interoperability process, we evaluate a set of indicators of the external quality (fitness-for-use) of geospatial datacube schemas and of production context (e.g., metadata). Finally, we implement the proposed approach to show its feasibility

    DARIAH and the Benelux

    Get PDF
    • …
    corecore