91,192 research outputs found
Data mining as a tool for environmental scientists
Over recent years a huge library of data mining algorithms has been developed to tackle a variety of problems in fields such as medical imaging and network traffic analysis. Many of these techniques are far more flexible than more classical modelling approaches and could be usefully applied to data-rich environmental problems. Certain techniques such as Artificial Neural Networks, Clustering, Case-Based Reasoning and more recently Bayesian Decision Networks have found application in environmental modelling while other methods, for example classification and association rule extraction, have not yet been taken up on any wide scale. We propose that these and other data mining techniques could be usefully applied to difficult problems in the field. This paper introduces several data mining concepts and briefly discusses their application to environmental modelling, where data may be sparse, incomplete, or heterogenous
On the role of pre and post-processing in environmental data mining
The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed
Data Driven Discovery in Astrophysics
We review some aspects of the current state of data-intensive astronomy, its
methods, and some outstanding data analysis challenges. Astronomy is at the
forefront of "big data" science, with exponentially growing data volumes and
data rates, and an ever-increasing complexity, now entering the Petascale
regime. Telescopes and observatories from both ground and space, covering a
full range of wavelengths, feed the data via processing pipelines into
dedicated archives, where they can be accessed for scientific analysis. Most of
the large archives are connected through the Virtual Observatory framework,
that provides interoperability standards and services, and effectively
constitutes a global data grid of astronomy. Making discoveries in this
overabundance of data requires applications of novel, machine learning tools.
We describe some of the recent examples of such applications.Comment: Keynote talk in the proceedings of ESA-ESRIN Conference: Big Data
from Space 2014, Frascati, Italy, November 12-14, 2014, 8 pages, 2 figure
Some Pattern Recognition Challenges in Data-Intensive Astronomy
We review some of the recent developments and challenges posed by the data
analysis in modern digital sky surveys, which are representative of the
information-rich astronomy in the context of Virtual Observatory. Illustrative
examples include the problems of an automated star-galaxy classification in
complex and heterogeneous panoramic imaging data sets, and an automated,
iterative, dynamical classification of transient events detected in synoptic
sky surveys. These problems offer good opportunities for productive
collaborations between astronomers and applied computer scientists and
statisticians, and are representative of the kind of challenges now present in
all data-intensive fields. We discuss briefly some emergent types of scalable
scientific data analysis systems with a broad applicability.Comment: 8 pages, compressed pdf file, figures downgraded in quality in order
to match the arXiv size limi
- âŠ