232,575 research outputs found

    The LSST Data Mining Research Agenda

    Full text link
    We describe features of the LSST science database that are amenable to scientific data mining, object classification, outlier identification, anomaly detection, image quality assurance, and survey science validation. The data mining research agenda includes: scalability (at petabytes scales) of existing machine learning and data mining algorithms; development of grid-enabled parallel data mining algorithms; designing a robust system for brokering classifications from the LSST event pipeline (which may produce 10,000 or more event alerts per night); multi-resolution methods for exploration of petascale databases; indexing of multi-attribute multi-dimensional astronomical databases (beyond spatial indexing) for rapid querying of petabyte databases; and more.Comment: 5 pages, Presented at the "Classification and Discovery in Large Astronomical Surveys" meeting, Ringberg Castle, 14-17 October, 200

    Constraint-based Sequential Pattern Mining with Decision Diagrams

    Full text link
    Constrained sequential pattern mining aims at identifying frequent patterns on a sequential database of items while observing constraints defined over the item attributes. We introduce novel techniques for constraint-based sequential pattern mining that rely on a multi-valued decision diagram representation of the database. Specifically, our representation can accommodate multiple item attributes and various constraint types, including a number of non-monotone constraints. To evaluate the applicability of our approach, we develop an MDD-based prefix-projection algorithm and compare its performance against a typical generate-and-check variant, as well as a state-of-the-art constraint-based sequential pattern mining algorithm. Results show that our approach is competitive with or superior to these other methods in terms of scalability and efficiency.Comment: AAAI201

    Grids and the Virtual Observatory

    Get PDF
    We consider several projects from astronomy that benefit from the Grid paradigm and associated technology, many of which involve either massive datasets or the federation of multiple datasets. We cover image computation (mosaicking, multi-wavelength images, and synoptic surveys); database computation (representation through XML, data mining, and visualization); and semantic interoperability (publishing, ontologies, directories, and service descriptions)

    An Improved Technique for Multi-Dimensional Constrained Gradient Mining

    Get PDF
    Multi-dimensional Constrained Gradient Mining, which is an aspect of data mining, is based on mining constrained frequent gradient pattern pairs with significant difference in their measures in transactional database. Top-k Fp-growth with Gradient Pruning and Top-k Fp-growth with No Gradient Pruning were the two algorithms used for Multi-dimensional Constrained Gradient Mining in previous studies. However, these algorithms have their shortcomings. The first requires construction of Fp-tree before searching through the database and the second algorithm requires searching of database twice in finding frequent pattern pairs. These cause the problems of using large amount of time and memory space, which retrogressively make mining of database cumbersome.  Based on this anomaly, a new algorithm that combines Top-k Fp-growth with Gradient pruning and Top-k Fp-growth with No Gradient pruning is designed to eliminate these drawbacks. The new algorithm called Top-K Fp-growth with support Gradient pruning (SUPGRAP) employs the method of scanning the database once, by searching for the node and all the descendant of the node of every task at each level. The idea is to form projected Multidimensional Database and then find the Multidimensional patterns within the projected databases. The evaluation of the new algorithm shows significant improvement in terms of time and space required over the existing algorithms.  &nbsp

    Peculiarity Oriented Mining in Multiple Databases

    Get PDF
    Multi-database mining has been these days diagnosed as a significant, and finding useful and novel facts, that are noticeably supported with the resource of most of databases. As a way to find out new, unexpected, thrilling styles hidden in information, peculiarity orientated mining and multi database mining are required.Multi-database mining is the technique of analyzing the data in multi databases

    CloudJet4BigData: Streamlining Big Data via an Accelerated Socket Interface

    Get PDF
    Big data needs to feed users with fresh processing results and cloud platforms can be used to speed up big data applications. This paper describes a new data communication protocol (CloudJet) for long distance and large volume big data accessing operations to alleviate the large latencies encountered in sharing big data resources in the clouds. It encapsulates a dynamic multi-stream/multi-path engine at the socket level, which conforms to Portable Operating System Interface (POSIX) and thereby can accelerate any POSIX-compatible applications across IP based networks. It was demonstrated that CloudJet accelerates typical big data applications such as very large database (VLDB), data mining, media streaming and office applications by up to tenfold in real-world tests
    • …
    corecore