519 research outputs found
BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking
Data generation is a key issue in big data benchmarking that aims to generate
application-specific data sets to meet the 4V requirements of big data.
Specifically, big data generators need to generate scalable data (Volume) of
different types (Variety) under controllable generation rates (Velocity) while
keeping the important characteristics of raw data (Veracity). This gives rise
to various new challenges about how we design generators efficiently and
successfully. To date, most existing techniques can only generate limited types
of data and support specific big data systems such as Hadoop. Hence we develop
a tool, called Big Data Generator Suite (BDGS), to efficiently generate
scalable big data while employing data models derived from real data to
preserve data veracity. The effectiveness of BDGS is demonstrated by developing
six data generators covering three representative data types (structured,
semi-structured and unstructured) and three data sources (text, graph, and
table data)
Improving Data Quality by Leveraging Statistical Relational Learning
Digitally collected data su
↵
ers from many data quality issues, such as duplicate, incorrect, or incomplete data. A common
approach for counteracting these issues is to formulate a set of data cleaning rules to identify and repair incorrect, duplicate and
missing data. Data cleaning systems must be able to treat data quality rules holistically, to incorporate heterogeneous constraints
within a single routine, and to automate data curation. We propose an approach to data cleaning based on statistical relational
learning (SRL). We argue that a formalism - Markov logic - is a natural fit for modeling data quality rules. Our approach
allows for the usage of probabilistic joint inference over interleaved data cleaning rules to improve data quality. Furthermore, it
obliterates the need to specify the order of rule execution. We describe how data quality rules expressed as formulas in first-order
logic directly translate into the predictive model in our SRL framework
Harvesting Multiqubit Entanglement from Ultrastrong Interactions in Circuit Quantum Electrodynamics
We analyze a multi-qubit circuit QED system in the regime where the qubit-photon coupling dominates over the system’s bare energy scales. Under such conditions a manifold of low-energy states with a high degree of entanglement emerges. Here we describe a time-dependent protocol for extracting these quantum correlations and converting them into well-defined multi-partite entangled states of non-interacting qubits. Based on a combination of various ultrastrong-coupling effects the protocol can be operated in a fast and robust manner, while still being consistent with experimental constraints on switching times and typical energy scales encountered in superconducting circuits. Therefore, our scheme can serve as a probe for otherwise inaccessible correlations in strongly-coupled circuit QED systems. It also shows how such correlations can potentially be exploited as a resource for entanglement-based applications
The impact of columnar file formats on SQL‐on‐hadoop engine performance: A study on ORC and Parquet
n/
Improving Data Quality by Leveraging Statistical Relational\ud Learning
Digitally collected data su\ud
↵\ud
ers from many data quality issues, such as duplicate, incorrect, or incomplete data. A common\ud
approach for counteracting these issues is to formulate a set of data cleaning rules to identify and repair incorrect, duplicate and\ud
missing data. Data cleaning systems must be able to treat data quality rules holistically, to incorporate heterogeneous constraints\ud
within a single routine, and to automate data curation. We propose an approach to data cleaning based on statistical relational\ud
learning (SRL). We argue that a formalism - Markov logic - is a natural fit for modeling data quality rules. Our approach\ud
allows for the usage of probabilistic joint inference over interleaved data cleaning rules to improve data quality. Furthermore, it\ud
obliterates the need to specify the order of rule execution. We describe how data quality rules expressed as formulas in first-order\ud
logic directly translate into the predictive model in our SRL framework
Concurrent Computing with Shared Replicated Memory
The behavioural theory of concurrent systems states that any concurrent
system can be captured by a behaviourally equivalent concurrent Abstract State
Machine (cASM). While the theory in general assumes shared locations, it
remains valid, if different agents can only interact via messages, i.e. sharing
is restricted to mailboxes. There may even be a strict separation between
memory managing agents and other agents that can only access the shared memory
by sending query and update requests to the memory agents. This article is
dedicated to an investigation of replicated data that is maintained by a memory
management subsystem, whereas the replication neither appears in the requests
nor in the corresponding answers. We show how the behaviour of a concurrent
system with such a memory management can be specified using concurrent
communicating ASMs. We provide several refinements of a high-level ground model
addressing different replication policies and internal messaging between data
centres. For all these refinements we analyse their effects on the runs such
that decisions concerning the degree of consistency can be consciously made.Comment: 23 page
Dynamical Stability and Habitability of Gamma Cephei Binary-Planetary System
It has been suggested that the long-lived residual radial velocity variations
observed in the precision radial velocity measurements of the primary of Gamma
Cephei (HR8974, HD222404, HIP116727) are likely due to a Jupiter-like planet
around this star (Hatzes et al, 2003). In this paper, the orbital dynamics of
this plant is studied and also the possibility of the existence of a
hypothetical Earth-like planet in the habitable zone of its central star is
discussed. Simulations, which have been carried out for different values of the
eccentricity and semimajor axis of the binary, as well as the orbital
inclination of its Jupiter-like planet, expand on previous studies of this
system and indicate that, for the values of the binary eccentricity smaller
than 0.5, and for all values of the orbital inclination of the Jupiter-like
planet ranging from 0 to 40 degrees, the orbit of this planet is stable. For
larger values of the binary eccentricity, the system becomes gradually
unstable. Integrations also indicate that, within this range of orbital
parameters, a hypothetical Earth-like planet can have a long-term stable orbit
only at distances of 0.3 to 0.8 AU from the primary star. The habitable zone of
the primary, at a range of approximately 3.1 to 3.8 AU, is, however, unstable.Comment: 25 pages, 7 figures, 3 tables, submitted for publicatio
Novel Collective Effects in Integrated Photonics
Superradiance, the enhanced collective emission of energy from a coherent
ensemble of quantum systems, has been typically studied in atomic ensembles. In
this work we study theoretically the enhanced emission of energy from coherent
ensembles of harmonic oscillators. We show that it should be possible to
observe harmonic oscillator superradiance for the first time in waveguide
arrays in integrated photonics. Furthermore, we describe how pairwise
correlations within the ensemble can be measured with this architecture. These
pairwise correlations are an integral part of the phenomenon of superradiance
and have never been observed in experiments to date.Comment: 7 pages, 3 figure
Specific staining of human chromosomes in Chinese hamster x man hybrid cell lines demonstrates interphase chromosome territories
In spite of Carl Rabl's (1885) and Theodor Boveri's (1909) early hypothesis that chromosomes occupy discrete territories or domains within the interphase nucleus, evidence in favor pf this hypothesis has been limited and indirect so far in higher plants and animals. The alternative possibility that the chromatin fiber of single chromosomes might be extended throughout the major part of even the whole interphase nucleus has been considered for many years. In the latter case, chromosomes would only exist as discrete chromatin bodies during mitosis but not during interphase. Both possibilities are compatible with Boveri's well established paradigm of chromosome individuality. Here we show that an active human X chromosome contained as the only human chromosome in a Chinese hamster x man hybrid cell line can be visualized both in metaphse plates and in interphase nuclei after in situ hybridization with either 3H- or biotin-labeled human genomic DNA. We demonstrate that this chromosome is organized as a distinct chromatin body throughout interphase. In addition, evidence for the territorial organization of human chromosomes is also presented for another hybrid cell line containing several autosomes and the human X chromosome. These findings are discussed in the context of our present knowledge of the organization and topography of interphase chromosomes. General applications of a strategy aimed at specific staining of individual chromosomes in experimental and clinical cytogenetics are briefly considered
- …