45,988 research outputs found
Storage Solutions for Big Data Systems: A Qualitative Study and Comparison
Big data systems development is full of challenges in view of the variety of
application areas and domains that this technology promises to serve.
Typically, fundamental design decisions involved in big data systems design
include choosing appropriate storage and computing infrastructures. In this age
of heterogeneous systems that integrate different technologies for optimized
solution to a specific real world problem, big data system are not an exception
to any such rule. As far as the storage aspect of any big data system is
concerned, the primary facet in this regard is a storage infrastructure and
NoSQL seems to be the right technology that fulfills its requirements. However,
every big data application has variable data characteristics and thus, the
corresponding data fits into a different data model. This paper presents
feature and use case analysis and comparison of the four main data models
namely document oriented, key value, graph and wide column. Moreover, a feature
analysis of 80 NoSQL solutions has been provided, elaborating on the criteria
and points that a developer must consider while making a possible choice.
Typically, big data storage needs to communicate with the execution engine and
other processing and visualization technologies to create a comprehensive
solution. This brings forth second facet of big data storage, big data file
formats, into picture. The second half of the research paper compares the
advantages, shortcomings and possible use cases of available big data file
formats for Hadoop, which is the foundation for most big data computing
technologies. Decentralized storage and blockchain are seen as the next
generation of big data storage and its challenges and future prospects have
also been discussed
The Hierarchic treatment of marine ecological information from spatial networks of benthic platforms
Measuring biodiversity simultaneously in different locations, at different temporal scales, and over wide spatial scales is of strategic importance for the improvement of our understanding of the functioning of marine ecosystems and for the conservation of their biodiversity. Monitoring networks of cabled observatories, along with other docked autonomous systems (e.g., Remotely Operated Vehicles [ROVs], Autonomous Underwater Vehicles [AUVs], and crawlers), are being conceived and established at a spatial scale capable of tracking energy fluxes across benthic and pelagic compartments, as well as across geographic ecotones. At the same time, optoacoustic imaging is sustaining an unprecedented expansion in marine ecological monitoring, enabling the acquisition of new biological and environmental data at an appropriate spatiotemporal scale. At this stage, one of the main problems for an effective application of these technologies is the processing, storage, and treatment of the acquired complex ecological information. Here, we provide a conceptual overview on the technological developments in the multiparametric generation, storage, and automated hierarchic treatment of biological and environmental information required to capture the spatiotemporal complexity of a marine ecosystem. In doing so, we present a pipeline of ecological data acquisition and processing in different steps and prone to automation. We also give an example of population biomass, community richness and biodiversity data computation (as indicators for ecosystem functionality) with an Internet Operated Vehicle (a mobile crawler). Finally, we discuss the software requirements for that automated data processing at the level of cyber-infrastructures with sensor calibration and control, data banking, and ingestion into large data portals.Peer ReviewedPostprint (published version
Decoding Across the Quantum LDPC Code Landscape
We show that belief propagation combined with ordered statistics
post-processing is a general decoder for quantum low density parity check codes
constructed from the hypergraph product. To this end, we run numerical
simulations of the decoder applied to three families of hypergraph product
code: topological codes, fixed-rate random codes and a new class of codes that
we call semi-topological codes. Our new code families share properties of both
topological and random hypergraph product codes, with a construction that
allows for a finely-controlled trade-off between code threshold and stabilizer
locality. Our results indicate thresholds across all three families of
hypergraph product code, and provide evidence of exponential suppression in the
low error regime. For the Toric code, we observe a threshold in the range
. This result improves upon previous quantum decoders based on
belief propagation, and approaches the performance of the minimum weight
perfect matching algorithm. We expect semi-topological codes to have the same
threshold as Toric codes, as they are identical in the bulk, and we present
numerical evidence supporting this observation.Comment: The code for the BP+OSD decoder used in this work can be found on
Github: https://github.com/quantumgizmos/bp_os
Tea: A High-level Language and Runtime System for Automating Statistical Analysis
Though statistical analyses are centered on research questions and
hypotheses, current statistical analysis tools are not. Users must first
translate their hypotheses into specific statistical tests and then perform API
calls with functions and parameters. To do so accurately requires that users
have statistical expertise. To lower this barrier to valid, replicable
statistical analysis, we introduce Tea, a high-level declarative language and
runtime system. In Tea, users express their study design, any parametric
assumptions, and their hypotheses. Tea compiles these high-level specifications
into a constraint satisfaction problem that determines the set of valid
statistical tests, and then executes them to test the hypothesis. We evaluate
Tea using a suite of statistical analyses drawn from popular tutorials. We show
that Tea generally matches the choices of experts while automatically switching
to non-parametric tests when parametric assumptions are not met. We simulate
the effect of mistakes made by non-expert users and show that Tea automatically
avoids both false negatives and false positives that could be produced by the
application of incorrect statistical tests.Comment: 11 page
- …