Search CORE

51,274 research outputs found

Efficient Scalable Accurate Regression Queries in In-DBMS Analytics

Author: Anagnostopoulos Christos
Triantafillou Peter
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2017
Field of study

Recent trends aim to incorporate advanced data analytics capabilities within DBMSs. Linear regression queries are fundamental to exploratory analytics and predictive modeling. However, computing their exact answers leaves a lot to be desired in terms of efficiency and scalability. We contribute a novel predictive analytics model and associated regression query processing algorithms, which are efficient, scalable and accurate. We focus on predicting the answers to two key query types that reveal dependencies between the values of different attributes: (i) mean-value queries and (ii) multivariate linear regression queries, both within specific data subspaces defined based on the values of other attributes. Our algorithms achieve many orders of magnitude improvement in query processing efficiency and nearperfect approximations of the underlying relationships among data attributes

Crossref

Warwick Research Archives Portal Repository

Enlighten

Molecular dynamics in arbitrary geometries : parallel evaluation of pair forces

Author: Allen M.P.
Fischer K.
Gellert W.
Graham B. Macpherson
Jason M. Reese
Macpherson G.B.
Rapaport D.C.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2007
Field of study

A new algorithm for calculating intermolecular pair forces in molecular dynamics (MD) simulations on a distributed parallel computer is presented. The arbitrary interacting cells algorithm (AICA) is designed to operate on geometrical domains defined by an unstructured, arbitrary polyhedral mesh that has been spatially decomposed into irregular portions for parallelisation. It is intended for nano scale fluid mechanics simulation by MD in complex geometries, and to provide the MD component of a hybrid MD/continuum simulation. The spatial relationship of the cells of the mesh is calculated at the start of the simulation and only the molecules contained in cells that have part of their surface closer than the cut-off radius of the intermolecular pair potential are required to interact. AICA has been implemented in the open source C++ code OpenFOAM, and its accuracy has been indirectly verified against a published MD code. The same system simulated in serial and in parallel on 12 and 32 processors gives the same results. Performance tests show that there is an optimal number of cells in a mesh for maximum speed of calculating intermolecular forces, and that having a large number of empty cells in the mesh does not add a significant computational overhead

Crossref

University of Strathclyde Institutional Repository

Edinburgh Research Explorer

Recommended from our members

Accommodating user preferences in the optimization of public transport travel

Author: Hartley JK
Wu Q
Publication venue: 'UK Simulation Society'
Publication date: 01/01/2004
Field of study

Nottingham Trent Institutional Repository (IRep)

Visualization of Data by Method of Elastic Maps and Its Applications in Genomics, Economics and Sociology

Author: Gorban Prof. Alexander. N.
Zinovyev Dr. Andrei Yu.
Publication venue
Publication date: 01/08/2001
Field of study

Technology of data visualization and data modeling is suggested. The basic of the technology is original idea of elastic net and methods of its construction and application. A short review of relevant methods has been made. The methods proposed are illustrated by applying them to the real economical, sociological and biological datasets and to some model data distributions. The basic of the technology is original idea of elastic net - regular point approximation of some manifold that is put into the multidimensional space and has in a certain sense minimal energy. This manifold is an analogue of principal surface and serves as non-linear screen on what multidimensional data are projected. Remarkable feature of the technology is its ability to work with and to fill gaps in data tables. Gaps are unknown or unreliable values of some features. It gives a possibility to predict plausibly values of unknown features by values of other ones. So it provides technology of constructing different prognosis systems and non-linear regressions. The technology can be used by specialists in different fields. There are several examples of applying the method presented in the end of this paper

CogPrints Cognitive Sciences Eprint Archive

Convex Hull of Points Lying on Lines in o(n log n) Time after Preprocessing

Author: Afshani
Ali Abam
Arora
Basch
Ben-Or
Bern
Buchin
Chan
Chan
Chan
Chazelle
Chazelle
Chin
Chvátal
Clarkson
Cole
Cormen
de Berg
Devillers
Devillers
Dey
Djidjev
Esther Ezra
Everett
Guibas
Held
Hoeffding
Khuller
Kirkpatrick
Klein
Löffler
Löffler
Löffler
Matoušek
McCallum
Preparata
Ramos
Seidel
Sharir
van Kreveld
Wolfgang Mulzer
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

Motivated by the desire to cope with data imprecision, we study methods for taking advantage of preliminary information about point sets in order to speed up the computation of certain structures associated with them. In particular, we study the following problem: given a set L of n lines in the plane, we wish to preprocess L such that later, upon receiving a set P of n points, each of which lies on a distinct line of L, we can construct the convex hull of P efficiently. We show that in quadratic time and space it is possible to construct a data structure on L that enables us to compute the convex hull of any such point set P in O(n alpha(n) log* n) expected time. If we further assume that the points are "oblivious" with respect to the data structure, the running time improves to O(n alpha(n)). The analysis applies almost verbatim when L is a set of line-segments, and yields similar asymptotic bounds. We present several extensions, including a trade-off between space and query time and an output-sensitive algorithm. We also study the "dual problem" where we show how to efficiently compute the (<= k)-level of n lines in the plane, each of which lies on a distinct point (given in advance). We complement our results by Omega(n log n) lower bounds under the algebraic computation tree model for several related problems, including sorting a set of points (according to, say, their x-order), each of which lies on a given line known in advance. Therefore, the convex hull problem under our setting is easier than sorting, contrary to the "standard" convex hull and sorting problems, in which the two problems require Theta(n log n) steps in the worst case (under the algebraic computation tree model).Comment: 26 pages, 5 figures, 1 appendix; a preliminary version appeared at SoCG 201

arXiv.org e-Print Archive

CiteSeerX

Crossref