577 research outputs found

    Faceted Thesauri

    Get PDF

    BioEve Search: A Novel Framework to Facilitate Interactive Literature Search

    Get PDF
    Background. Recent advances in computational and biological methods in last two decades have remarkably changed the scale of biomedical research and with it began the unprecedented growth in both the production of biomedical data and amount of published literature discussing it. An automated extraction system coupled with a cognitive search and navigation service over these document collections would not only save time and effort, but also pave the way to discover hitherto unknown information implicitly conveyed in the texts. Results. We developed a novel framework (named “BioEve”) that seamlessly integrates Faceted Search (Information Retrieval) with Information Extraction module to provide an interactive search experience for the researchers in life sciences. It enables guided step-by-step search query refinement, by suggesting concepts and entities (like genes, drugs, and diseases) to quickly filter and modify search direction, and thereby facilitating an enriched paradigm where user can discover related concepts and keywords to search while information seeking. Conclusions. The BioEve Search framework makes it easier to enable scalable interactive search over large collection of textual articles and to discover knowledge hidden in thousands of biomedical literature articles with ease

    Linked Open Data - Creating Knowledge Out of Interlinked Data: Results of the LOD2 Project

    Get PDF
    Database Management; Artificial Intelligence (incl. Robotics); Information Systems and Communication Servic

    Feature Extraction and Duplicate Detection for Text Mining: A Survey

    Get PDF
    Text mining, also known as Intelligent Text Analysis is an important research area. It is very difficult to focus on the most appropriate information due to the high dimensionality of data. Feature Extraction is one of the important techniques in data reduction to discover the most important features. Proce- ssing massive amount of data stored in a unstructured form is a challenging task. Several pre-processing methods and algo- rithms are needed to extract useful features from huge amount of data. The survey covers different text summarization, classi- fication, clustering methods to discover useful features and also discovering query facets which are multiple groups of words or phrases that explain and summarize the content covered by a query thereby reducing time taken by the user. Dealing with collection of text documents, it is also very important to filter out duplicate data. Once duplicates are deleted, it is recommended to replace the removed duplicates. Hence we also review the literature on duplicate detection and data fusion (remove and replace duplicates).The survey provides existing text mining techniques to extract relevant features, detect duplicates and to replace the duplicate data to get fine grained knowledge to the user

    Fuzzy concept analysis for semantic knowledge extraction

    Get PDF
    2010 - 2011Availability of controlled vocabularies, ontologies, and so on is enabling feature to provide some added values in terms of knowledge management. Nevertheless, the design, maintenance and construction of domain ontologies are a human intensive and time consuming task. The Knowledge Extraction consists of automatic techniques aimed to identify and to define relevant concepts and relations of the domain of interest by analyzing structured (relational databases, XML) and unstructured (text, documents, images) sources. Specifically, methodology for knowledge extraction defined in this research work is aimed at enabling automatic ontology/taxonomy construction from existing resources in order to obtain useful information. For instance, the experimental results take into account data produced with Web 2.0 tools (e.g., RSS-Feed, Enterprise Wiki, Corporate Blog, etc.), text documents, and so on. Final results of Knowledge Extraction methodology are taxonomies or ontologies represented in a machine oriented manner by means of semantic web technologies, such as: RDFS, OWL and SKOS. The resulting knowledge models have been applied to different goals. On the one hand, the methodology has been applied in order to extract ontologies and taxonomies and to semantically annotate text. On the other hand, the resulting ontologies and taxonomies are exploited in order to enhance information retrieval performance and to categorize incoming data and to provide an easy way to find interesting resources (such as faceted browsing). Specifically, following objectives have been addressed in this research work: Ontology/Taxonomy Extraction: that concerns to automatic extraction of hierarchical conceptualizations (i.e., taxonomies) and relations expressed by means typical description logic constructs (i.e., ontologies). Information Retrieval: definition of a technique to perform concept-based the retrieval of information according to the user queries. Faceted Browsing: in order to automatically provide faceted browsing capabilities according to the categorization of the extracted contents. Semantic Annotation: definition of a text analysis process, aimed to automatically annotate subjects and predicates identified. The experimental results have been obtained in some application domains: e-learning, enterprise human resource management, clinical decision support system. Future challenges go in the following directions: investigate approaches to support ontology alignment and merging applied to knowledge management.X n.s

    Search improvement within the geospatial web in the context of spatial data infrastructures

    Get PDF
    El trabajo desarrollado en esta tesis doctoral demuestra que es posible mejorar la búsqueda en el contexto de las Infraestructuras de Datos Espaciales mediante la aplicación de técnicas y buenas prácticas de otras comunidades científicas, especialmente de las comunidades de la Web y de la Web Semántica (por ejemplo, Linked Data). El uso de las descripciones semánticas y las aproximaciones basadas en el contenido publicado por la comunidad geoespacial pueden ayudar en la búsqueda de información sobre los fenómenos geográficos, y en la búsqueda de recursos geoespaciales en general. El trabajo comienza con un análisis de una aproximación para mejorar la búsqueda de las entidades geoespaciales desde la perspectiva de geocodificación tradicional. La arquitectura de geocodificación compuesta propuesta en este trabajo asegura una mejora de los resultados de geocodificación gracias a la utilización de diferentes proveedores de información geográfica. En este enfoque, el uso de patrones estructurales de diseño y ontologías en esta aproximación permite una arquitectura avanzada en términos de extensibilidad, flexibilidad y adaptabilidad. Además, una arquitectura basada en la selección de servicio de geocodificación permite el desarrollo de una metodología de la georreferenciación de diversos tipos de información geográfica (por ejemplo, direcciones o puntos de interés). A continuación, se presentan dos aplicaciones representativas que requieren una caracterización semántica adicional de los recursos geoespaciales. El enfoque propuesto en este trabajo utiliza contenidos basados en heurísticas para el muestreo de un conjunto de recursos geopesaciales. La primera parte se dedica a la idea de la abstracción de un fenómeno geográfico de su definición espacial. La investigación muestra que las buenas prácticas de la Web Semántica se puede reutilizar en el ámbito de una Infraestructura de Datos Espaciales para describir los servicios geoespaciales estandarizados por Open Geospatial Consortium por medio de geoidentificadores (es decir, por medio de las entidades de una ontología geográfica). La segunda parte de este capítulo desglosa la aquitectura y componentes de un servicio de geoprocesamiento para la identificación automática de ortoimágenes ofrecidas a través de un servicio estándar de publicación de mapas (es decir, los servicios que siguen la especificación OGC Web Map Service). Como resultado de este trabajo se ha propuesto un método para la identificación de los mapas ofrecidos por un Web Map Service que son ortoimágenes. A continuación, el trabajo se dedica al análisis de cuestiones relacionadas con la creación de los metadatos de recursos de la Web en el contexto del dominio geográfico. Este trabajo propone una arquitectura para la generación automática de conocimiento geográfico de los recursos Web. Ha sido necesario desarrollar un método para la estimación de la cobertura geográfica de las páginas Web. Las heurísticas propuestas están basadas en el contenido publicado por os proveedores de información geográfica. El prototipo desarrollado es capaz de generar metadatos. El modelo generado contiene el conjunto mínimo recomendado de elementos requeridos por un catálogo que sigue especificación OGC Catalogue Service for the Web, el estandar recomendado por deiferentes Infraestructuras de Datos Espaciales (por ejemplo, the Infrastructure for Spatial Information in the European Community (INSPIRE)). Además, este estudio determina algunas características de la Web Geoespacial actual. En primer lugar, ofrece algunas características del mercado de los proveedores de los recursos Web de la información geográfica. Este estudio revela algunas prácticas de la comunidad geoespacial en la producción de metadatos de las páginas Web, en particular, la falta de metadatos geográficos. Todo lo anterior es la base del estudio de la cuestión del apoyo a los usuarios no expertos en la búsqueda de recursos de la Web Geoespacial. El motor de búsqueda dedicado a la Web Geoespacial propuesto en este trabajo es capaz de usar como base un motor de búsqueda existente. Por otro lado, da soporte a la búsqueda exploratoria de los recursos geoespaciales descubiertos en la Web. El experimento sobre la precisión y la recuperación ha demostrado que el prototipo desarrollado en este trabajo es al menos tan bueno como el motor de búsqueda remoto. Un estudio dedicado a la utilidad del sistema indica que incluso los no expertos pueden realizar una tarea de búsqueda con resultados satisfactorios

    A Preliminary SKOS Implementation of the Art and Architecture Thesaurus: Machine-Actionable Controlled Vocabulary for the Semantic Web

    Get PDF
    This paper presents an experimental implementation of the Art and Architecture Thesaurus, a knowledge organizational system for the visual arts and architecture, within the SKOS (Simple Knowledge Organization System) framework. Such treatment allows for machine-actionability on thesaurus records for automated expansion of search queries and also provides a framework for interoperability across metadata schemas in a linked data environment. SKOS enables more complex semantic processing by utilizing a simple framework for Semantic Web technology within thesauri. The analysis establishes an application profile for AAT, which accommodates the faceted and polyhierarchical structure of the thesaurus, as well as the detailed source referencing contained within AAT’s documentary note

    Design, Development and Implementation of Tools in Drug Discovery

    Get PDF
    The main focus of our work is to develop, apply and assess cheminformatics tools and methods. In particular, we focus on the following three areas: Integration of open source tools with application to drug discovery, usability studies to assess the efficacy of these software tools and finally, developing novel techniques for database query. Rapid globalization in the present time has sparked a need in the scientific community to interact with each other at an economic and a fast pace. This is achieved by developing and sharing open source databases using World Wide Web. A web based open source database application has been developed to incorporate freeware from varied sources. The deployment of developed database and user interface in a university lab setting is discussed. To aid in connecting the end user and the software tools, usability studies are necessary. These studies communicate the end users’ needs and desires, resulting in a user-friendly and more powerful interactive software packages. Usability studies were conducted on developed database student application and on different drawing packages to determine their effectiveness. Developing new and interactive search engines to query publicly available databases helps researchers work more efficiently. The huge volume of data available and its heterogeneous nature presents issues related to querying, integration and presentation. In aiding the retrieval process, an innovative multi faceted classification system, called ChemFacets, is developed. This system provides dynamic categorization of large result sets retrieved from multiple databases
    corecore