23,144 research outputs found

    Object-oriented querying of existing relational databases

    Get PDF
    In this paper, we present algorithms which allow an object-oriented querying of existing relational databases. Our goal is to provide an improved query interface for relational systems with better query facilities than SQL. This seems to be very important since, in real world applications, relational systems are most commonly used and their dominance will remain in the near future. To overcome the drawbacks of relational systems, especially the poor query facilities of SQL, we propose a schema transformation and a query translation algorithm. The schema transformation algorithm uses additional semantic information to enhance the relational schema and transform it into a corresponding object-oriented schema. If the additional semantic information can be deducted from an underlying entity-relationship design schema, the schema transformation may be done fully automatically. To query the created object-oriented schema, we use the Structured Object Query Language (SOQL) which provides declarative query facilities on objects. SOQL queries using the created object-oriented schema are much shorter, easier to write and understand and more intuitive than corresponding S Q L queries leading to an enhanced usability and an improved querying of the database. The query translation algorithm automatically translates SOQL queries into equivalent SQL queries for the original relational schema

    Enriched biodiversity data as a resource and service

    Get PDF
    Background: Recent years have seen a surge in projects that produce large volumes of structured, machine-readable biodiversity data. To make these data amenable to processing by generic, open source “data enrichment” workflows, they are increasingly being represented in a variety of standards-compliant interchange formats. Here, we report on an initiative in which software developers and taxonomists came together to address the challenges and highlight the opportunities in the enrichment of such biodiversity data by engaging in intensive, collaborative software development: The Biodiversity Data Enrichment Hackathon. Results: The hackathon brought together 37 participants (including developers and taxonomists, i.e. scientific professionals that gather, identify, name and classify species) from 10 countries: Belgium, Bulgaria, Canada, Finland, Germany, Italy, the Netherlands, New Zealand, the UK, and the US. The participants brought expertise in processing structured data, text mining, development of ontologies, digital identification keys, geographic information systems, niche modeling, natural language processing, provenance annotation, semantic integration, taxonomic name resolution, web service interfaces, workflow tools and visualisation. Most use cases and exemplar data were provided by taxonomists. One goal of the meeting was to facilitate re-use and enhancement of biodiversity knowledge by a broad range of stakeholders, such as taxonomists, systematists, ecologists, niche modelers, informaticians and ontologists. The suggested use cases resulted in nine breakout groups addressing three main themes: i) mobilising heritage biodiversity knowledge; ii) formalising and linking concepts; and iii) addressing interoperability between service platforms. Another goal was to further foster a community of experts in biodiversity informatics and to build human links between research projects and institutions, in response to recent calls to further such integration in this research domain. Conclusions: Beyond deriving prototype solutions for each use case, areas of inadequacy were discussed and are being pursued further. It was striking how many possible applications for biodiversity data there were and how quickly solutions could be put together when the normal constraints to collaboration were broken down for a week. Conversely, mobilising biodiversity knowledge from their silos in heritage literature and natural history collections will continue to require formalisation of the concepts (and the links between them) that define the research domain, as well as increased interoperability between the software platforms that operate on these concepts

    Knowledge Organization Systems (KOS) in the Semantic Web: A Multi-Dimensional Review

    Full text link
    Since the Simple Knowledge Organization System (SKOS) specification and its SKOS eXtension for Labels (SKOS-XL) became formal W3C recommendations in 2009 a significant number of conventional knowledge organization systems (KOS) (including thesauri, classification schemes, name authorities, and lists of codes and terms, produced before the arrival of the ontology-wave) have made their journeys to join the Semantic Web mainstream. This paper uses "LOD KOS" as an umbrella term to refer to all of the value vocabularies and lightweight ontologies within the Semantic Web framework. The paper provides an overview of what the LOD KOS movement has brought to various communities and users. These are not limited to the colonies of the value vocabulary constructors and providers, nor the catalogers and indexers who have a long history of applying the vocabularies to their products. The LOD dataset producers and LOD service providers, the information architects and interface designers, and researchers in sciences and humanities, are also direct beneficiaries of LOD KOS. The paper examines a set of the collected cases (experimental or in real applications) and aims to find the usages of LOD KOS in order to share the practices and ideas among communities and users. Through the viewpoints of a number of different user groups, the functions of LOD KOS are examined from multiple dimensions. This paper focuses on the LOD dataset producers, vocabulary producers, and researchers (as end-users of KOS).Comment: 31 pages, 12 figures, accepted paper in International Journal on Digital Librarie

    Are crowdsourced datasets suitable for specialized routing services? Case study of Openstreetmap for routing of people with limited mobility

    Get PDF
    Nowadays, Volunteered Geographic Information (VGI) has increasingly gained attractiveness to both amateur users and professionals. Using data generated from the crowd has become a hot topic for several application domains including transportation. However, there are concerns regarding the quality of such datasets. As one of the most famous crowdsourced mapping platforms, we analyze the fitness for use of OpenStreetMap (OSM) database for routing and navigation of people with limited mobility. We assess the completeness of OSM data regarding sidewalk information. Relevant attributes for sidewalk information such as sidewalk width, incline, surface texture, etc. are considered, and through both extrinsic and intrinsic quality analysis methods, we present the results of fitness for use of OSM data for routing services of disabled persons. Based on empirical results, it is concluded that OSM data of relatively large spatial extents inside all studied cities could be an acceptable region of interest to test and evaluate wheelchair routing and navigation services, as long as other data quality parameters such as positional accuracy and logical consistency are checked and proved to be acceptable. We present an extended version of OSMatrix web service and explore how it is employed to perform spatial and temporal analysis of sidewalk data completeness in OSM. The tool is beneficial for piloting activities, whereas the pilot site planners can query OpenStreetMap and visualize the degree of sidewalk data availability in a certain region of interest. This would allow identifying the areas that data are mostly missing and plan for data collection events. Furthermore, empirical results of data completeness for several OSM data indicators and their potential relation to sidewalk data completeness are presented and discussed. Finally, the article ends with an outlook for future research study in this area

    Plasmodium falciparum parasite population structure and gene flow associated to anti-malarial drugs resistance in Cambodia

    Get PDF
    Background: Western Cambodia is recognized as the epicentre of emergence of Plasmodium falciparum multi-drug resistance. The emergence of artemisinin resistance has been observed in this area since 2008–2009 and molecular signatures associated to artemisinin resistance have been characterized in k13 gene. At present, one of the major threats faced, is the possible spread of Asian artemisinin resistant parasites over the world threatening millions of people and jeopardizing malaria elimination programme efforts. To anticipate the diffusion of artemisinin resistance, the identification of the P. falciparum population structure and the gene flow among the parasite population in Cambodia are essential. Methods: To this end, a mid-throughput PCR-LDR-FMA approach based on LUMINEX technology was developed to screen for genetic barcode in 533 blood samples collected in 2010–2011 from 16 health centres in malaria endemics areas in Cambodia. Results: Based on successful typing of 282 samples, subpopulations were characterized along the borders of the country. Each 11-loci barcode provides evidence supporting allele distribution gradient related to subpopulations and gene flow. The 11-loci barcode successfully identifies recently emerging parasite subpopulations in western Cambodia that are associated with the C580Y dominant allele for artemisinin resistance in k13 gene. A subpopulation was identified in northern Cambodia that was associated to artemisinin (R539T resistant allele of k13 gene) and mefloquine resistance. Conclusions: The gene flow between these subpopulations might have driven the spread of artemisinin resistance over Cambodia

    Impact of the spatial context on human communication activity

    Full text link
    Technology development produces terabytes of data generated by hu- man activity in space and time. This enormous amount of data often called big data becomes crucial for delivering new insights to decision makers. It contains behavioral information on different types of human activity influenced by many external factors such as geographic infor- mation and weather forecast. Early recognition and prediction of those human behaviors are of great importance in many societal applications like health-care, risk management and urban planning, etc. In this pa- per, we investigate relevant geographical areas based on their categories of human activities (i.e., working and shopping) which identified from ge- ographic information (i.e., Openstreetmap). We use spectral clustering followed by k-means clustering algorithm based on TF/IDF cosine simi- larity metric. We evaluate the quality of those observed clusters with the use of silhouette coefficients which are estimated based on the similari- ties of the mobile communication activity temporal patterns. The area clusters are further used to explain typical or exceptional communication activities. We demonstrate the study using a real dataset containing 1 million Call Detailed Records. This type of analysis and its application are important for analyzing the dependency of human behaviors from the external factors and hidden relationships and unknown correlations and other useful information that can support decision-making.Comment: 12 pages, 11 figure

    Using Visualization to Support Data Mining of Large Existing Databases

    Get PDF
    In this paper. we present ideas how visualization technology can be used to improve the difficult process of querying very large databases. With our VisDB system, we try to provide visual support not only for the query specification process. but also for evaluating query results and. thereafter, refining the query accordingly. The main idea of our system is to represent as many data items as possible by the pixels of the display device. By arranging and coloring the pixels according to the relevance for the query, the user gets a visual impression of the resulting data set and of its relevance for the query. Using an interactive query interface, the user may change the query dynamically and receives immediate feedback by the visual representation of the resulting data set. By using multiple windows for different parts of the query, the user gets visual feedback for each part of the query and, therefore, may easier understand the overall result. To support complex queries, we introduce the notion of approximate joins which allow the user to find data items that only approximately fulfill join conditions. We also present ideas how our technique may be extended to support the interoperation of heterogeneous databases. Finally, we discuss the performance problems that are caused by interfacing to existing database systems and present ideas to solve these problems by using data structures supporting a multidimensional search of the database

    Constrained tGAP for generalisation between scales: the case of Dutch topographic data

    Get PDF
    This article presents the results of integrating large- and medium-scale data into a unified data structure. This structure can be used as a single non-redundant representation for the input data, which can be queried at any arbitrary scale between the source scales. The solution is based on the constrained topological Generalized Area Partition (tGAP), which stores the results of a generalization process applied to the large-scale dataset, and is controlled by the objects of the medium-scale dataset, which act as constraints on the large-scale objects. The result contains the accurate geometry of the large-scale objects enriched with the generalization knowledge of the medium-scale data, stored as references in the constraint tGAP structure. The advantage of this constrained approach over the original tGAP is the higher quality of the aggregated maps. The idea was implemented with real topographic datasets from The Netherlands for the large- (1:1000) and medium-scale (1:10,000) data. The approach is expected to be equally valid for any categorical map and for other scales as well
    • …
    corecore