232,487 research outputs found
DAMEWARE - Data Mining & Exploration Web Application Resource
Astronomy is undergoing through a methodological revolution triggered by an
unprecedented wealth of complex and accurate data. DAMEWARE (DAta Mining &
Exploration Web Application and REsource) is a general purpose, Web-based,
Virtual Observatory compliant, distributed data mining framework specialized in
massive data sets exploration with machine learning methods. We present the
DAMEWARE (DAta Mining & Exploration Web Application REsource) which allows the
scientific community to perform data mining and exploratory experiments on
massive data sets, by using a simple web browser. DAMEWARE offers several tools
which can be seen as working environments where to choose data analysis
functionalities such as clustering, classification, regression, feature
extraction etc., together with models and algorithms.Comment: User Manual of the DAMEWARE Web Application, 51 page
A database with enterprise application for mining astronomical data obtained by MOA : a thesis submitted in partial fulfilment of the requirements for the degree of the Master of Information Science in Computer Science, Massey University at Albany, Auckland, New Zealand
The MOA (Microlensing Observations in Astrophysics) Project is one of a new generation of modern astronomy endeavours that generates huge volumes of data. These have enormous scientific data mining potential. However, it is common for astronomers to deal with millions and even billions of records. The challenge of how to manage these large data sets is an important case for researchers. A good database management system is vital for the research. With the modern observation equipments used, MOA suffers from the growing volume of the data and a database management solution is needed. This study analyzed the modern technology for database and enterprise application. After analysing the data mining requirements of MOA, a prototype data management system based on MVC pattern was developed. Furthermore, the application supports sharing MOA findings and scientific data on the Internet. It was tested on a 7GB subset of achieved MOA data set. After testing, it was found that the application could query data in an efficient time and support data mining
Editorial for the First Workshop on Mining Scientific Papers: Computational Linguistics and Bibliometrics
The workshop "Mining Scientific Papers: Computational Linguistics and
Bibliometrics" (CLBib 2015), co-located with the 15th International Society of
Scientometrics and Informetrics Conference (ISSI 2015), brought together
researchers in Bibliometrics and Computational Linguistics in order to study
the ways Bibliometrics can benefit from large-scale text analytics and sense
mining of scientific papers, thus exploring the interdisciplinarity of
Bibliometrics and Natural Language Processing (NLP). The goals of the workshop
were to answer questions like: How can we enhance author network analysis and
Bibliometrics using data obtained by text analytics? What insights can NLP
provide on the structure of scientific writing, on citation networks, and on
in-text citation analysis? This workshop is the first step to foster the
reflection on the interdisciplinarity and the benefits that the two disciplines
Bibliometrics and Natural Language Processing can drive from it.Comment: 4 pages, Workshop on Mining Scientific Papers: Computational
Linguistics and Bibliometrics at ISSI 201
The LSST Data Mining Research Agenda
We describe features of the LSST science database that are amenable to
scientific data mining, object classification, outlier identification, anomaly
detection, image quality assurance, and survey science validation. The data
mining research agenda includes: scalability (at petabytes scales) of existing
machine learning and data mining algorithms; development of grid-enabled
parallel data mining algorithms; designing a robust system for brokering
classifications from the LSST event pipeline (which may produce 10,000 or more
event alerts per night); multi-resolution methods for exploration of petascale
databases; indexing of multi-attribute multi-dimensional astronomical databases
(beyond spatial indexing) for rapid querying of petabyte databases; and more.Comment: 5 pages, Presented at the "Classification and Discovery in Large
Astronomical Surveys" meeting, Ringberg Castle, 14-17 October, 200
Data Mining and Machine Learning in Astronomy
We review the current state of data mining and machine learning in astronomy.
'Data Mining' can have a somewhat mixed connotation from the point of view of a
researcher in this field. If used correctly, it can be a powerful approach,
holding the potential to fully exploit the exponentially increasing amount of
available data, promising great scientific advance. However, if misused, it can
be little more than the black-box application of complex computing algorithms
that may give little physical insight, and provide questionable results. Here,
we give an overview of the entire data mining process, from data collection
through to the interpretation of results. We cover common machine learning
algorithms, such as artificial neural networks and support vector machines,
applications from a broad range of astronomy, emphasizing those where data
mining techniques directly resulted in improved science, and important current
and future directions, including probability density functions, parallel
algorithms, petascale computing, and the time domain. We conclude that, so long
as one carefully selects an appropriate algorithm, and is guided by the
astronomical problem at hand, data mining can be very much the powerful tool,
and not the questionable black box.Comment: Published in IJMPD. 61 pages, uses ws-ijmpd.cls. Several extra
figures, some minor additions to the tex
Machine Science in Biomedicine: Practicalities, Pitfalls and Potential
Machine Science, or Data-driven Research, is a new and interesting scientific
methodology that uses advanced computational techniques to identify, retrieve,
classify and analyse data in order to generate hypotheses and develop models.
In this paper we describe three recent biomedical Machine Science studies, and
use these to assess the current state of the art with specific emphasis on data
mining, data assessment, costs, limitations, skills and tool support
- …