Search CORE

232,575 research outputs found

The LSST Data Mining Research Agenda

Author: A. Szalay
Coryn A.L. Bailer-Jones
I. Davidson
J. A. Tyson
J. Becla
K. Borne
Publication venue: 'AIP Publishing'
Publication date: 01/01/2008
Field of study

We describe features of the LSST science database that are amenable to scientific data mining, object classification, outlier identification, anomaly detection, image quality assurance, and survey science validation. The data mining research agenda includes: scalability (at petabytes scales) of existing machine learning and data mining algorithms; development of grid-enabled parallel data mining algorithms; designing a robust system for brokering classifications from the LSST event pipeline (which may produce 10,000 or more event alerts per night); multi-resolution methods for exploration of petascale databases; indexing of multi-attribute multi-dimensional astronomical databases (beyond spatial indexing) for rapid querying of petabyte databases; and more.Comment: 5 pages, Presented at the "Classification and Discovery in Large Astronomical Surveys" meeting, Ringberg Castle, 14-17 October, 200

arXiv.org e-Print Archive

Crossref

Constraint-based Sequential Pattern Mining with Decision Diagrams

Author: Cire Andre A.
Hosseininasab Amin
van Hoeve Willem-Jan
Publication venue
Publication date: 14/11/2018
Field of study

Constrained sequential pattern mining aims at identifying frequent patterns on a sequential database of items while observing constraints defined over the item attributes. We introduce novel techniques for constraint-based sequential pattern mining that rely on a multi-valued decision diagram representation of the database. Specifically, our representation can accommodate multiple item attributes and various constraint types, including a number of non-monotone constraints. To evaluate the applicability of our approach, we develop an MDD-based prefix-projection algorithm and compare its performance against a typical generate-and-check variant, as well as a state-of-the-art constraint-based sequential pattern mining algorithm. Results show that our approach is competitive with or superior to these other methods in terms of scalability and efficiency.Comment: AAAI201

arXiv.org e-Print Archive

University of Toronto Research Repository

Association for the Advancement of Artificial Intelligence: AAAI Publications

Grids and the Virtual Observatory

Author: Williams Roy
Publication venue: 'Royal College of Obstetricians & Gynaecologists (RCOG)'
Publication date: 01/01/2003
Field of study

We consider several projects from astronomy that benefit from the Grid paradigm and associated technology, many of which involve either massive datasets or the federation of multiple datasets. We cover image computation (mosaicking, multi-wavelength images, and synoptic surveys); database computation (representation through XML, data mining, and visualization); and semantic interoperability (publishing, ontologies, directories, and service descriptions)

Caltech Authors

An Improved Technique for Multi-Dimensional Constrained Gradient Mining

Author: Elugbadebo O. J.
Folorunso O
Sodiya A. S.
Publication venue: Federal University of Agriculture, Abeokuta (FUNAAB)
Publication date: 26/02/2013
Field of study

Multi-dimensional Constrained Gradient Mining, which is an aspect of data mining, is based on mining constrained frequent gradient pattern pairs with significant difference in their measures in transactional database. Top-k Fp-growth with Gradient Pruning and Top-k Fp-growth with No Gradient Pruning were the two algorithms used for Multi-dimensional Constrained Gradient Mining in previous studies. However, these algorithms have their shortcomings. The first requires construction of Fp-tree before searching through the database and the second algorithm requires searching of database twice in finding frequent pattern pairs. These cause the problems of using large amount of time and memory space, which retrogressively make mining of database cumbersome.  Based on this anomaly, a new algorithm that combines Top-k Fp-growth with Gradient pruning and Top-k Fp-growth with No Gradient pruning is designed to eliminate these drawbacks. The new algorithm called Top-K Fp-growth with support Gradient pruning (SUPGRAP) employs the method of scanning the database once, by searching for the node and all the descendant of the node of every task at each level. The idea is to form projected Multidimensional Database and then find the Multidimensional patterns within the projected databases. The evaluation of the new algorithm shows significant improvement in terms of time and space required over the existing algorithms.  &nbsp

Federal University of Agriculture, Abeokuta: FUNAAB Journal

Peculiarity Oriented Mining in Multiple Databases

Author: Babu prof.Dr.S. Vikas
Publication venue: Journal of Advances in Electrical Devices
Publication date: 14/08/2018
Field of study

Multi-database mining has been these days diagnosed as a significant, and finding useful and novel facts, that are noticeably supported with the resource of most of databases. As a way to find out new, unexpected, thrilling styles hidden in information, peculiarity orientated mining and multi database mining are required.Multi-database mining is the technique of analyzing the data in multi databases

MAT Journals

CloudJet4BigData: Streamlining Big Data via an Accelerated Socket Interface

Author: Dimitrakos Theo
Helian Na
Li Ling
Wang Frank Zhigang
Wu Sining
Yates Rodric
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2014
Field of study

Big data needs to feed users with fresh processing results and cloud platforms can be used to speed up big data applications. This paper describes a new data communication protocol (CloudJet) for long distance and large volume big data accessing operations to alleviate the large latencies encountered in sharing big data resources in the clouds. It encapsulates a dynamic multi-stream/multi-path engine at the socket level, which conforms to Portable Operating System Interface (POSIX) and thereby can accelerate any POSIX-compatible applications across IP based networks. It was demonstrated that CloudJet accelerates typical big data applications such as very large database (VLDB), data mining, media streaming and office applications by up to tenfold in real-world tests

Crossref

Kent Academic Repository