232,575 research outputs found
The LSST Data Mining Research Agenda
We describe features of the LSST science database that are amenable to
scientific data mining, object classification, outlier identification, anomaly
detection, image quality assurance, and survey science validation. The data
mining research agenda includes: scalability (at petabytes scales) of existing
machine learning and data mining algorithms; development of grid-enabled
parallel data mining algorithms; designing a robust system for brokering
classifications from the LSST event pipeline (which may produce 10,000 or more
event alerts per night); multi-resolution methods for exploration of petascale
databases; indexing of multi-attribute multi-dimensional astronomical databases
(beyond spatial indexing) for rapid querying of petabyte databases; and more.Comment: 5 pages, Presented at the "Classification and Discovery in Large
Astronomical Surveys" meeting, Ringberg Castle, 14-17 October, 200
Constraint-based Sequential Pattern Mining with Decision Diagrams
Constrained sequential pattern mining aims at identifying frequent patterns
on a sequential database of items while observing constraints defined over the
item attributes. We introduce novel techniques for constraint-based sequential
pattern mining that rely on a multi-valued decision diagram representation of
the database. Specifically, our representation can accommodate multiple item
attributes and various constraint types, including a number of non-monotone
constraints. To evaluate the applicability of our approach, we develop an
MDD-based prefix-projection algorithm and compare its performance against a
typical generate-and-check variant, as well as a state-of-the-art
constraint-based sequential pattern mining algorithm. Results show that our
approach is competitive with or superior to these other methods in terms of
scalability and efficiency.Comment: AAAI201
Grids and the Virtual Observatory
We consider several projects from astronomy that benefit from the Grid paradigm and
associated technology, many of which involve either massive datasets or the federation
of multiple datasets. We cover image computation (mosaicking, multi-wavelength
images, and synoptic surveys); database computation (representation through XML,
data mining, and visualization); and semantic interoperability (publishing, ontologies,
directories, and service descriptions)
An Improved Technique for Multi-Dimensional Constrained Gradient Mining
Multi-dimensional Constrained Gradient Mining, which is an aspect of data mining, is based on mining constrained frequent gradient pattern pairs with significant difference in their measures in transactional database. Top-k Fp-growth with Gradient Pruning and Top-k Fp-growth with No Gradient Pruning were the two algorithms used for Multi-dimensional Constrained Gradient Mining in previous studies. However, these algorithms have their shortcomings. The first requires construction of Fp-tree before searching through the database and the second algorithm requires searching of database twice in finding frequent pattern pairs. These cause the problems of using large amount of time and memory space, which retrogressively make mining of database cumbersome. Based on this anomaly, a new algorithm that combines Top-k Fp-growth with Gradient pruning and Top-k Fp-growth with No Gradient pruning is designed to eliminate these drawbacks. The new algorithm called Top-K Fp-growth with support Gradient pruning (SUPGRAP) employs the method of scanning the database once, by searching for the node and all the descendant of the node of every task at each level. The idea is to form projected Multidimensional Database and then find the Multidimensional patterns within the projected databases. The evaluation of the new algorithm shows significant improvement in terms of time and space required over the existing algorithms.  
Peculiarity Oriented Mining in Multiple Databases
Multi-database mining has been these days diagnosed as a significant, and finding useful and novel facts, that are noticeably supported with the resource of most of databases. As a way to find out new, unexpected, thrilling styles hidden in information, peculiarity orientated mining and multi database mining are required.Multi-database mining is the technique of analyzing the data in multi databases
CloudJet4BigData: Streamlining Big Data via an Accelerated Socket Interface
Big data needs to feed users with fresh processing results and cloud platforms can be used to speed up big data applications. This paper describes a new data communication protocol (CloudJet) for long distance and large volume big data accessing operations to alleviate the large latencies encountered in sharing big data resources in the clouds. It encapsulates a dynamic multi-stream/multi-path engine at the socket level, which conforms to Portable Operating System Interface (POSIX) and thereby can accelerate any POSIX-compatible applications across IP based networks. It was demonstrated that CloudJet accelerates typical big data applications such as very large database (VLDB), data mining, media streaming and office applications by up to tenfold in real-world tests
- …