Search CORE

446 research outputs found

Storage Solutions for Big Data Systems: A Qualitative Study and Comparison

Author: Alam Mansaf
Ali Syed Arshad
Khan Samiya
Liu Xiufeng
Publication venue
Publication date: 01/01/2019
Field of study

Big data systems development is full of challenges in view of the variety of application areas and domains that this technology promises to serve. Typically, fundamental design decisions involved in big data systems design include choosing appropriate storage and computing infrastructures. In this age of heterogeneous systems that integrate different technologies for optimized solution to a specific real world problem, big data system are not an exception to any such rule. As far as the storage aspect of any big data system is concerned, the primary facet in this regard is a storage infrastructure and NoSQL seems to be the right technology that fulfills its requirements. However, every big data application has variable data characteristics and thus, the corresponding data fits into a different data model. This paper presents feature and use case analysis and comparison of the four main data models namely document oriented, key value, graph and wide column. Moreover, a feature analysis of 80 NoSQL solutions has been provided, elaborating on the criteria and points that a developer must consider while making a possible choice. Typically, big data storage needs to communicate with the execution engine and other processing and visualization technologies to create a comprehensive solution. This brings forth second facet of big data storage, big data file formats, into picture. The second half of the research paper compares the advantages, shortcomings and possible use cases of available big data file formats for Hadoop, which is the foundation for most big data computing technologies. Decentralized storage and blockchain are seen as the next generation of big data storage and its challenges and future prospects have also been discussed

arXiv.org e-Print Archive

Online Research Database In Technology

requirements and use cases

Author: Coskun Gökhan
Heese Ralf
Luczak-Rösch Markus
Oldakowski Radoslaw
Schäfermeier Ralph
Streibel Olga
Publication venue
Publication date: 01/01/2008
Field of study

In this report, we introduce our initial vision of the Corporate Semantic Web as the next step in the broad field of Semantic Web research. We identify requirements of the corporate environment and gaps between current approaches to tackle problems facing ontology engineering, semantic collaboration, and semantic search. Each of these pillars will yield innovative methods and tools during the project runtime until 2013. Corporate ontology engineering will improve the facilitation of agile ontology engineering to lessen the costs of ontology development and, especially, maintenance. Corporate semantic collaboration focuses the human-centered aspects of knowledge management in corporate contexts. Corporate semantic search is settled on the highest application level of the three research areas and at that point it is a representative for applications working on and with the appropriately represented and delivered background knowledge. We propose an initial layout for an integrative architecture of a Corporate Semantic Web provided by these three core pillars

Institutional Repository of the Freie Universität Berlin

Author
Publication venue: Published by Elsevier B.V.
Publication date
Field of study

Elsevier - Publisher Connector

Web technologies for environmental big data

Author: Buytaert Wouter
Elkhatib Yehia
Macleod Christopher J.A.
Reusser Dominik
Vitolo Claudia
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

Recent evolutions in computing science and web technology provide the environmental community with continuously expanding resources for data collection and analysis that pose unprecedented challenges to the design of analysis methods, workflows, and interaction with data sets. In the light of the recent UK Research Council funded Environmental Virtual Observatory pilot project, this paper gives an overview of currently available implementations related to web-based technologies for processing large and heterogeneous datasets and discuss their relevance within the context of environmental data processing, simulation and prediction. We found that, the processing of the simple datasets used in the pilot proved to be relatively straightforward using a combination of R, RPy2, PyWPS and PostgreSQL. However, the use of NoSQL databases and more versatile frameworks such as OGC standard based implementations may provide a wider and more flexible set of features that particularly facilitate working with larger volumes and more heterogeneous data sources

Elsevier - Publisher Connector

Repositorium für Naturwissenschaften und Technik

Lancaster E-Prints

Web technologies for environmental Big Data

Author: Vitolo C
Publication venue: 'Elsevier BV'
Publication date: 31/10/2014
Field of study

Spiral - Imperial College Digital Repository

The National COVID Cohort Collaborative (N3C): Rationale, design, infrastructure, and deployment

Author: et al.
Haendel Melissa A
Payne Philip R O
Publication venue: Digital Commons@Becker
Publication date: 01/03/2021
Field of study

OBJECTIVE: Coronavirus disease 2019 (COVID-19) poses societal challenges that require expeditious data and knowledge sharing. Though organizational clinical data are abundant, these are largely inaccessible to outside researchers. Statistical, machine learning, and causal analyses are most successful with large-scale data beyond what is available in any given organization. Here, we introduce the National COVID Cohort Collaborative (N3C), an open science community focused on analyzing patient-level data from many centers. MATERIALS AND METHODS: The Clinical and Translational Science Award Program and scientific community created N3C to overcome technical, regulatory, policy, and governance barriers to sharing and harmonizing individual-level clinical data. We developed solutions to extract, aggregate, and harmonize data across organizations and data models, and created a secure data enclave to enable efficient, transparent, and reproducible collaborative analytics. RESULTS: Organized in inclusive workstreams, we created legal agreements and governance for organizations and researchers; data extraction scripts to identify and ingest positive, negative, and possible COVID-19 cases; a data quality assurance and harmonization pipeline to create a single harmonized dataset; population of the secure data enclave with data, machine learning, and statistical analytics tools; dissemination mechanisms; and a synthetic data pilot to democratize data access. CONCLUSIONS: The N3C has demonstrated that a multisite collaborative learning health network can overcome barriers to rapidly build a scalable infrastructure incorporating multiorganizational clinical data for COVID-19 analytics. We expect this effort to save lives by enabling rapid collaboration among clinicians, researchers, and data scientists to identify treatments and specialized care and thereby reduce the immediate and long-term impacts of COVID-19

Digital Commons@Becker

The National COVID Cohort Collaborative (N3C): Rationale, design, infrastructure, and deployment.

Author: Amor Benjamin
Austin Christopher P
Bennett Tellen D
Blacketer Clair
Bradford Robert L
Chute Christopher G
Cimino James J
Clark Marshall
Colmenares Evan W
Eichmann David A
Francis Patricia A
Gabriel Davera
Gersing Ken R
Girvin Andrew T
Graves Alexis
Guinney Justin
Haendel Melissa A
Hemadri Raju
Hong Stephanie S
Hripscak George
Jiao Dazhi
Kibbe Warren A
Klann Jeffrey G
Kostka Kristin
Kurilla Michael G
Lee Adam M
Lehmann Harold P
Lingrey Lora
Manna Amin
Michael Sam G
Miller Robert T
Morris Michele
Murphy Shawn N
Natarajan Karthik
Palchuk Matvey B
Payne Philip R O
Pfaff Emily R
Portilla Lili M
Qureshi Nabeel
Robinson Peter N
Rutter Joni L
Saltz Joel H
Sheikh Usman
Solbrig Harold
Spratt Heidi
Suver Christine
Visweswaran Shyam
Walden Anita
Walters Kellie M
Weber Griffin M
Wilbanks John
Wilcox Adam B
Williams Andrew E
Wu Chunlei
Zhang Xiaohan Tanner
Zhu Richard L
Publication venue: The Mouseion at the JAXlibrary
Publication date: 01/03/2021
Field of study

The Jackson Laboratory: The Mouseion at the JAXlibrary

Towards a unified methodology for supporting the integration of data sources for use in web applications

Author: Nunn Jeremy
Publication venue
Publication date: 01/01/2016
Field of study

Organisations are making increasing use of web applications and web-based systems as an integral part of providing services. Examples include personalised dynamic user content on a website, social media plug-ins or web-based mapping tools. For these types of applications to have maximum use for the user where the applications are fully functional, they require the integration of data from multiple sources. The focus of this thesis is in improving this integration process with a focus on web applications with multiple sources of data. Integration of data from multiple sources is problematic for many reasons. Current integration methods tend to be domain specific and application specific. They are often complex, have compatibility issues with different technologies, lack maturity, are difficult to re-use, and do not accommodate new and emerging models and integration technologies. Technologies to achieve integration, such as brokers and translators do exist, but they cannot be used as a generic solution for developing web-applications achieving the integration outcomes required for successful web application development due to their domain specificity. It is because of these difficulties with integration, and the wide variety of integration approaches that there is a need to provide assistance to the developer in selecting the integration approach most appropriate to their needs. This thesis proposes GIWeb, a unified top-down data integration methodology instantiated with a framework that will aid developers in their integration process. It will act as a conceptual structure to support the chosen technical approach. The framework will assist in the integration of data sources to support web application builders. The thesis presents the rationale for the need for the framework based on an examination of the range of applications, associated data sources and the range of potential solutions. The framework is evaluated using four case studies

Research Repository