738 research outputs found
VerdictDB: Universalizing Approximate Query Processing
Despite 25 years of research in academia, approximate query processing (AQP)
has had little industrial adoption. One of the major causes of this slow
adoption is the reluctance of traditional vendors to make radical changes to
their legacy codebases, and the preoccupation of newer vendors (e.g.,
SQL-on-Hadoop products) with implementing standard features. Additionally, the
few AQP engines that are available are each tied to a specific platform and
require users to completely abandon their existing databases---an unrealistic
expectation given the infancy of the AQP technology. Therefore, we argue that a
universal solution is needed: a database-agnostic approximation engine that
will widen the reach of this emerging technology across various platforms.
Our proposal, called VerdictDB, uses a middleware architecture that requires
no changes to the backend database, and thus, can work with all off-the-shelf
engines. Operating at the driver-level, VerdictDB intercepts analytical queries
issued to the database and rewrites them into another query that, if executed
by any standard relational engine, will yield sufficient information for
computing an approximate answer. VerdictDB uses the returned result set to
compute an approximate answer and error estimates, which are then passed on to
the user or application. However, lack of access to the query execution layer
introduces significant challenges in terms of generality, correctness, and
efficiency. This paper shows how VerdictDB overcomes these challenges and
delivers up to 171 speedup (18.45 on average) for a variety of
existing engines, such as Impala, Spark SQL, and Amazon Redshift, while
incurring less than 2.6% relative error. VerdictDB is open-sourced under Apache
License.Comment: Extended technical report of the paper that appeared in Proceedings
of the 2018 International Conference on Management of Data, pp. 1461-1476.
ACM, 201
Database Learning: Toward a Database that Becomes Smarter Every Time
In today's databases, previous query answers rarely benefit answering future
queries. For the first time, to the best of our knowledge, we change this
paradigm in an approximate query processing (AQP) context. We make the
following observation: the answer to each query reveals some degree of
knowledge about the answer to another query because their answers stem from the
same underlying distribution that has produced the entire dataset. Exploiting
and refining this knowledge should allow us to answer queries more
analytically, rather than by reading enormous amounts of raw data. Also,
processing more queries should continuously enhance our knowledge of the
underlying distribution, and hence lead to increasingly faster response times
for future queries.
We call this novel idea---learning from past query answers---Database
Learning. We exploit the principle of maximum entropy to produce answers, which
are in expectation guaranteed to be more accurate than existing sample-based
approximations. Empowered by this idea, we build a query engine on top of Spark
SQL, called Verdict. We conduct extensive experiments on real-world query
traces from a large customer of a major database vendor. Our results
demonstrate that Verdict supports 73.7% of these queries, speeding them up by
up to 23.0x for the same accuracy level compared to existing AQP systems.Comment: This manuscript is an extended report of the work published in ACM
SIGMOD conference 201
Transcript copy number estimation using a mouse whole-genome oligonucleotide microarray
The ability to quantitatively measure the expression of all genes in a given tissue or cell with a single assay is an exciting promise of gene-expression profiling technology. An in situ-synthesized 60-mer oligonucleotide microarray designed to detect transcripts from all mouse genes was validated, as well as a set of exogenous RNA controls derived from the yeast genome (made freely available without restriction), which allow quantitative estimation of absolute endogenous transcript abundance
Recommended from our members
Evaluation of Candidate In-Pile Thermal Conductivity Techniques
Thermophysical properties of materials must be known for proper design, test, and application of new fuels and structural properties in nuclear reactors. In the case of nuclear fuels during irradiation, the physical structure and chemical composition change as a function of time and position within the rod. Typically, thermal conductivity changes, as well as other thermophysical properties being evaluated during irradiation in a materials and test reactor, are measured out-of-pile in “hot-cells.” Repeatedly removing samples from a test reactor to make out-of-pile measurements is expensive, has the potential to disturb phenomena of interest, and only provide understanding of the sample's end state at the time each measurement is made. There are also limited thermophysical property data for advanced fuels. Such data are needed for the development of next generation reactors and advanced fuels for existing nuclear plants. Having the capacity to effectively and quickly characterize fuels and material properties during irradiation has the potential to improve the fidelity of nuclear fuel data and reduce irradiation testing costs
Homozygosity for a missense mutation in the 67 kDa isoform of glutamate decarboxylase in a family with autosomal recessive spastic cerebral palsy: parallels with Stiff-Person Syndrome and other movement disorders
Background
Cerebral palsy (CP) is an heterogeneous group of neurological disorders of movement and/or posture, with an estimated incidence of 1 in 1000 live births. Non-progressive forms of symmetrical, spastic CP have been identified, which show a Mendelian autosomal recessive pattern of inheritance. We recently described the mapping of a recessive spastic CP locus to a 5 cM chromosomal region located at 2q24-31.1, in rare consanguineous families.
Methods
Here we present data that refine this locus to a 0.5 cM region, flanked by the microsatellite markers D2S2345 and D2S326. The minimal region contains the candidate gene GAD1, which encodes a glutamate decarboxylase isoform (GAD67), involved in conversion of the amino acid and excitatory neurotransmitter glutamate to the inhibitory neurotransmitter γ-aminobutyric acid (GABA).
Results
A novel amino acid mis-sense mutation in GAD67 was detected, which segregated with CP in affected individuals.
Conclusions
This result is interesting because auto-antibodies to GAD67 and the more widely studied GAD65 homologue encoded by the GAD2 gene, are described in patients with Stiff-Person Syndrome (SPS), epilepsy, cerebellar ataxia and Batten disease. Further investigation seems merited of the possibility that variation in the GAD1 sequence, potentially affecting glutamate/GABA ratios, may underlie this form of spastic CP, given the presence of anti-GAD antibodies in SPS and the recognised excitotoxicity of glutamate in various contexts
Developing core sets for persons following amputation based on the International Classification of Functioning, Disability and Health as a way to specify functioning
Amputation is a common late stage sequel of peripheral vascular disease and diabetes or a sequel of accidental trauma, civil unrest and landmines. The functional impairments affect many facets of life including but not limited to: Mobility; activities of daily living; body image and sexuality. Classification, measurement and comparison of the consequences of amputations has been impeded by the limited availability of internationally, multiculturally standardized instruments in the amputee setting. The introduction of the International Classification of Functioning, Disability and Health (ICF) by the World Health Assembly in May 2001 provides a globally accepted framework and classification system to describe, assess and compare function and disability. In order to facilitate the use of the ICF in everyday clinical practice and research, ICF core sets have been developed that focus on specific aspects of function typically associated with a particular disability. The objective of this paper is to outline the development process for the ICF core sets for persons following amputation. The ICF core sets are designed to translate the benefits of the ICF into clinical routine. The ICF core sets will be defined at a Consensus conference which will integrate evidence from preparatory studies, namely: (a) a systematic literature review regarding the outcome measures of clinical trails and observational studies, (b) semi-structured patient interviews, (c) international experts participating in an internet-based survey, and (d) cross-sectional, multi-center studies for clinical applicability. To validate the ICF core sets field-testing will follow. Invitation for participation: The development of ICF Core Sets is an inclusive and open process. Anyone who wishes to actively participate in this process is invited to do so
Ethnic differences in the distribution of normally formed singleton stillbirths
Summary The normally formed singleton stillbirth deliveries occurring in Dudley Road Hospital in 1979, 1980 and 1981 were classified according to the primary aetiology. There was a higher than normal stillbirth rate in the Indian group which was almost entirely accounted for by the increased number of stillbirths falling into the 'intrauterine death before labour' group
Approaching an investigation of multi-dimensional inequality through the lenses of variety in models of capitalism
After a synthetic presentation of the state of poverty and inequality in the world and the contradictions incurred by economic theory in this field after decades of globalization and in the midst of a persisting global crisis, in paragraphs 2. and 3. we outline the rational for our theoretical analysis, underlining two main aspects. First of all, in paragraph 2. we recall the reasons which makes inequality a multidimensional phenomenon, while in paragraph 3. we explore the reasons why the models of capitalism theory is relevant for studying multidimensional inequality. These paragraphs emphasise that inequality is a multidimensional and cumulative phenomenon and it should not be conceived only as the result of the processes of personal and functional distribution of income and wealth, which even by themselves are intrinsically multidimensional. The basic idea is that institutions, the cobweb of relations among them and their interaction with the economic structure define the model of capitalism which characterises a specific country and this, in turn, affects the level and the dynamics of inequality. This approach is consistent with the sociological approach by Rehbein and Souza (2014), based on the analytical framework developed by Pierre Bourdieu. In paragraph 4. we outline the rational for our empirical analysis, applying the notion of institutional complementarity and examining the relationship between institutional complementarity, models of capitalism and inequality. Besides, refining Amable’s analysis (2003), we provide empirical evidence on the relationship between inequality in income distribution and models of capitalism. Additionally, basing on cluster analysis, we identify six different models of capitalism in a sample of OECD countries, provide preliminary evidence on the different level of inequality which characterises each model and suggest that no evidence supports of the idea that a single model of capitalism is taking shape in this sphere in EU. In paragraph 5. we give some hints about issues in search for a new interpretation capable to fasten together the process of increasing inequality, the notion of symbolic violence and the models of capitalism theory. In the last paragraph we focus on conclusions useful for carrying on our research agenda
Critical reflections on the benefits of ICT in education
In both schools and homes, information and communication technologies (ICT) are widely seen as enhancing learning, this hope fuelling their rapid diffusion and adoption throughout developed societies. But they are not yet so embedded in the social practices of everyday life as to be taken for granted, with schools proving slower to change their lesson plans than they were to fit computers in the classroom. This article examines two possible explanations - first, that convincing evidence of improved learning outcomes remains surprisingly elusive, and second, the unresolved debate over whether ICT should be conceived of as supporting delivery of a traditional or a radically different vision of pedagogy based on soft skills and new digital literacies. The difficulty in establishing traditional benefits, and the uncertainty over pursuing alternative benefits, raises fundamental questions over whether society really desires a transformed, technologically-mediated relation between teacher and learner
- …