1,172 research outputs found

    Object-oriented querying of existing relational databases

    Get PDF
    In this paper, we present algorithms which allow an object-oriented querying of existing relational databases. Our goal is to provide an improved query interface for relational systems with better query facilities than SQL. This seems to be very important since, in real world applications, relational systems are most commonly used and their dominance will remain in the near future. To overcome the drawbacks of relational systems, especially the poor query facilities of SQL, we propose a schema transformation and a query translation algorithm. The schema transformation algorithm uses additional semantic information to enhance the relational schema and transform it into a corresponding object-oriented schema. If the additional semantic information can be deducted from an underlying entity-relationship design schema, the schema transformation may be done fully automatically. To query the created object-oriented schema, we use the Structured Object Query Language (SOQL) which provides declarative query facilities on objects. SOQL queries using the created object-oriented schema are much shorter, easier to write and understand and more intuitive than corresponding S Q L queries leading to an enhanced usability and an improved querying of the database. The query translation algorithm automatically translates SOQL queries into equivalent SQL queries for the original relational schema

    Storage Solutions for Big Data Systems: A Qualitative Study and Comparison

    Full text link
    Big data systems development is full of challenges in view of the variety of application areas and domains that this technology promises to serve. Typically, fundamental design decisions involved in big data systems design include choosing appropriate storage and computing infrastructures. In this age of heterogeneous systems that integrate different technologies for optimized solution to a specific real world problem, big data system are not an exception to any such rule. As far as the storage aspect of any big data system is concerned, the primary facet in this regard is a storage infrastructure and NoSQL seems to be the right technology that fulfills its requirements. However, every big data application has variable data characteristics and thus, the corresponding data fits into a different data model. This paper presents feature and use case analysis and comparison of the four main data models namely document oriented, key value, graph and wide column. Moreover, a feature analysis of 80 NoSQL solutions has been provided, elaborating on the criteria and points that a developer must consider while making a possible choice. Typically, big data storage needs to communicate with the execution engine and other processing and visualization technologies to create a comprehensive solution. This brings forth second facet of big data storage, big data file formats, into picture. The second half of the research paper compares the advantages, shortcomings and possible use cases of available big data file formats for Hadoop, which is the foundation for most big data computing technologies. Decentralized storage and blockchain are seen as the next generation of big data storage and its challenges and future prospects have also been discussed

    Использование онтологий для построения семантических запросов в реляционных базах данных

    Get PDF
    На сьогодні всесвітня павутина є найбільшим сховищем інформації. Проте для використання цієї інформації потрібна людина. Мета Семантичного Вебу — представити інформацію у вигляді, придатному для машинної обробки. Він забезпечує можливість спільного доступу до даних, а також їх повторного використання. Велика частина інформації у всесвітній павутині зберігається в реляційних базах даних. Семантичний Веб не може їх використовувати безпосередньо, але реляційні бази даних можуть бути використані для побудови онтологій. Ця ідея привернула увагу багатьох дослідників, які запропонували алгоритми та відповідні програмні рішення для автоматичного або напівавтоматичного вилучення структурованої синтаксичної інформації. У цій роботі досліджено існуючі рішення, показано різні підходи до формалізації логічної моделі реляційної бази даних і перетворення цієї моделі в OWL (мова Семантичного Вебу). Відзначено проблеми розглянутих рішень, а також виділено аспекти, які необхідно враховувати в майбутньому.Nowadays, the Web is the biggest existing information repository. However, to operate with its information human action is required, but the Semantic Web aims to change this. It provides a common framework that allows data to be shared and reused across application, allowing more uses than the traditional Web. Most of the information on the Web is stored in relational databases and the Semantic Web cannot use such databases. Relational databases can be used to construct ontology as the core of the Semantic Web. This task has attracted the interest of many researches, which have made algorithms (wrappers) able to extract structured syntactic information in an automatic or semi-automatic way. At our work we drew experience from those works. We showed different approaches of formalization of a logic model of relational databases, and a transformation of that model into OWL, a Semantic Web language. We closed this paper by mentioning some problems that have only been lightly touched by database to ontology mapping solutions as well as some aspects that need to be considered by future approaches.На сегодняшний день всемирная паутина является крупнейшим хранилищем информации. Тем не менее для использования этой информации необходим человек. Цель Семантического Веба — представить информацию в виде пригодном для машинной обработки. Он обеспечивает возможность совместного доступа к данным, а также их повторного использования. Большая часть информации во всемирной паутине хранится в реляционных базах данных. Семантический Веб не может их использовать непосредственно, но реляционные базы данных могут быть применены для построения онтологий. Эта идея привлекла интерес многих исследователей, которые предложили алгоритмы и соответствующие программные решения для автоматического или полуавтоматического извлечения структурированной синтаксической информации. В этой работе исследованы существующие решения, показаны различные подходы к формализации логической модели реляционной базы данных и преобразования этой модели в OWL (язык Семантического Веба). Отмечены проблемы рассмотренных решений, а также выделены аспекты, которые необходимо учитывать в будущем

    Rule-based information integration

    Get PDF
    In this report, we show the process of information integration. We specifically discuss the language used for integration. We show that integration consists of two phases, the schema mapping phase and the data integration phase. We formally define transformation rules, conversion, evolution and versioning. We further discuss the integration process from a data point of view

    NoXperanto: Crowdsourced Polyglot Persistence

    No full text
    This paper proposes NoXperanto , a novel crowdsourcing approach to address querying over data collections managed by polyglot persistence settings. The main contribution of NoXperanto is the ability to solve complex queries involving different data stores by exploiting queries from expert users (i.e. a crowd of database administrators, data engineers, domain experts, etc.), assuming that these users can submit meaningful queries. NoXperanto exploits the results of meaningful queries in order to facilitate the forthcoming query answering processes. In particular, queries results are used to: (i) help non-expert users in using the multi- database environment and (ii) improve performances of the multi-database environment, which not only uses disk and memory resources, but heavily rely on network bandwidth. NoXperanto employs a layer to keep track of the information produced by the crowd modeled as a Property Graph and managed in a Graph Database Management System (GDBMS)

    A Molecular Biology Database Digest

    Get PDF
    Computational Biology or Bioinformatics has been defined as the application of mathematical and Computer Science methods to solving problems in Molecular Biology that require large scale data, computation, and analysis [18]. As expected, Molecular Biology databases play an essential role in Computational Biology research and development. This paper introduces into current Molecular Biology databases, stressing data modeling, data acquisition, data retrieval, and the integration of Molecular Biology data from different sources. This paper is primarily intended for an audience of computer scientists with a limited background in Biology

    Polyflow: a Polystore-compliant mechanism to provide interoperability to heterogeneous provenance graphs

    Get PDF
    Many scientific experiments are modeled as workflows. Workflows usually output massive amounts of data. To guarantee the reproducibility of workflows, they are usually orchestrated by Workflow Management Systems (WfMS), that capture provenance data. Provenance represents the lineage of a data fragment throughout its transformations by activities in a workflow. Provenance traces are usually represented as graphs. These graphs allows scientists to analyze and evaluate results produced by a workflow. However, each WfMS has a proprietary format for provenance and do it in different granularity levels. Therefore, in more complex scenarios in which the scientist needs to interpret provenance graphs generated by multiple WfMSs and workflows, a challenge arises. To first understand the research landscape, we conduct a Systematic Literature Mapping, assessing existing solutions under several different lenses. With a clearer understanding of the state of the art, we propose a tool called Polyflow, which is based on the concept of Polystore systems, integrating several databases of heterogeneous origin by adopting a global ProvONE schema. Polyflow allows scientists to query multiple provenance graphs in an integrated way. Polyflow was evaluated by experts using provenance data collected from real experiments that generate phylogenetic trees through workflows. The experiment results suggest that Polyflow is a viable solution for interoperating heterogeneous provenance data generated by different WfMSs, from both a usability and performance standpoint.Muitos experimentos científicos são modelados como workflows (fluxos de trabalho). Workflows produzem comumente um grande volume de dados. De forma a garantir a reprodutibilidade desses workflows, estes geralmente são orquestrados por Sistemas de Gerência de Workflows (SGWfs), garantindo que dados de proveniência sejam capturados. Dados de proveniência representam o histórico de derivação de um dado ao longo da execução do workflow. Assim, o histórico de derivação dos dados pode ser representado por meio de um grafo de proveniência. Este grafo possibilita aos cientistas analisarem e avaliarem resultados produzidos por um workflow. Todavia, cada SGWf tem seu formato proprietário de representação para dados de proveniência, e os armazenam em diferentes granularidades. Consequentemente, em cenários mais complexos em que um cientista precisa analisar de forma integrada grafos de proveniência gerados por múltiplos workflows, isso se torna desafiador. Primeiramente, para entender o campo de pesquisa, realizamos um Mapeamento Sistemático da Literatura, avaliando soluções existentes sob diferentes lentes. Com uma compreensão mais clara do atual estado da arte, propomos uma ferramenta chamada Polyflow, inspirada em conceitos de sistemas Polystore, possibilitando a integração de várias bases de dados heterogêneas por meio de uma interface de consulta única que utiliza o ProvONE como schema global. Polyflow permite que cientistas submetam consultas em múltiplos grafos de proveniência de maneira integrada. Polyflow foi avaliado em conjunto com especialistas usando dados de proveniência coletados de workflows reais que apoiam o estudo de geração de árvores filogenéticas. O resultado da avaliação mostrou a viabilidade do Polyflow para interoperar semanticamente dados de proveniência gerado por distintos SGWfs, tanto do ponto de vista de desempenho quanto de usabilidade