232,487 research outputs found

    DAMEWARE - Data Mining & Exploration Web Application Resource

    Get PDF
    Astronomy is undergoing through a methodological revolution triggered by an unprecedented wealth of complex and accurate data. DAMEWARE (DAta Mining & Exploration Web Application and REsource) is a general purpose, Web-based, Virtual Observatory compliant, distributed data mining framework specialized in massive data sets exploration with machine learning methods. We present the DAMEWARE (DAta Mining & Exploration Web Application REsource) which allows the scientific community to perform data mining and exploratory experiments on massive data sets, by using a simple web browser. DAMEWARE offers several tools which can be seen as working environments where to choose data analysis functionalities such as clustering, classification, regression, feature extraction etc., together with models and algorithms.Comment: User Manual of the DAMEWARE Web Application, 51 page

    A database with enterprise application for mining astronomical data obtained by MOA : a thesis submitted in partial fulfilment of the requirements for the degree of the Master of Information Science in Computer Science, Massey University at Albany, Auckland, New Zealand

    Get PDF
    The MOA (Microlensing Observations in Astrophysics) Project is one of a new generation of modern astronomy endeavours that generates huge volumes of data. These have enormous scientific data mining potential. However, it is common for astronomers to deal with millions and even billions of records. The challenge of how to manage these large data sets is an important case for researchers. A good database management system is vital for the research. With the modern observation equipments used, MOA suffers from the growing volume of the data and a database management solution is needed. This study analyzed the modern technology for database and enterprise application. After analysing the data mining requirements of MOA, a prototype data management system based on MVC pattern was developed. Furthermore, the application supports sharing MOA findings and scientific data on the Internet. It was tested on a 7GB subset of achieved MOA data set. After testing, it was found that the application could query data in an efficient time and support data mining

    Editorial for the First Workshop on Mining Scientific Papers: Computational Linguistics and Bibliometrics

    Full text link
    The workshop "Mining Scientific Papers: Computational Linguistics and Bibliometrics" (CLBib 2015), co-located with the 15th International Society of Scientometrics and Informetrics Conference (ISSI 2015), brought together researchers in Bibliometrics and Computational Linguistics in order to study the ways Bibliometrics can benefit from large-scale text analytics and sense mining of scientific papers, thus exploring the interdisciplinarity of Bibliometrics and Natural Language Processing (NLP). The goals of the workshop were to answer questions like: How can we enhance author network analysis and Bibliometrics using data obtained by text analytics? What insights can NLP provide on the structure of scientific writing, on citation networks, and on in-text citation analysis? This workshop is the first step to foster the reflection on the interdisciplinarity and the benefits that the two disciplines Bibliometrics and Natural Language Processing can drive from it.Comment: 4 pages, Workshop on Mining Scientific Papers: Computational Linguistics and Bibliometrics at ISSI 201

    The LSST Data Mining Research Agenda

    Full text link
    We describe features of the LSST science database that are amenable to scientific data mining, object classification, outlier identification, anomaly detection, image quality assurance, and survey science validation. The data mining research agenda includes: scalability (at petabytes scales) of existing machine learning and data mining algorithms; development of grid-enabled parallel data mining algorithms; designing a robust system for brokering classifications from the LSST event pipeline (which may produce 10,000 or more event alerts per night); multi-resolution methods for exploration of petascale databases; indexing of multi-attribute multi-dimensional astronomical databases (beyond spatial indexing) for rapid querying of petabyte databases; and more.Comment: 5 pages, Presented at the "Classification and Discovery in Large Astronomical Surveys" meeting, Ringberg Castle, 14-17 October, 200

    Data Mining and Machine Learning in Astronomy

    Full text link
    We review the current state of data mining and machine learning in astronomy. 'Data Mining' can have a somewhat mixed connotation from the point of view of a researcher in this field. If used correctly, it can be a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, promising great scientific advance. However, if misused, it can be little more than the black-box application of complex computing algorithms that may give little physical insight, and provide questionable results. Here, we give an overview of the entire data mining process, from data collection through to the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines, applications from a broad range of astronomy, emphasizing those where data mining techniques directly resulted in improved science, and important current and future directions, including probability density functions, parallel algorithms, petascale computing, and the time domain. We conclude that, so long as one carefully selects an appropriate algorithm, and is guided by the astronomical problem at hand, data mining can be very much the powerful tool, and not the questionable black box.Comment: Published in IJMPD. 61 pages, uses ws-ijmpd.cls. Several extra figures, some minor additions to the tex

    Machine Science in Biomedicine: Practicalities, Pitfalls and Potential

    Full text link
    Machine Science, or Data-driven Research, is a new and interesting scientific methodology that uses advanced computational techniques to identify, retrieve, classify and analyse data in order to generate hypotheses and develop models. In this paper we describe three recent biomedical Machine Science studies, and use these to assess the current state of the art with specific emphasis on data mining, data assessment, costs, limitations, skills and tool support
    corecore