Search CORE

3,130 research outputs found

Mining Knowledge in Astrophysical Massive Data Sets

Author: Brescia
D’Abrusco
D’Abrusco
Fabio Pasian
Giuseppe Longo
Massimo Brescia
Smareglia
Publication venue: 'Elsevier BV'
Publication date: 01/01/2010
Field of study

Modern scientific data mainly consist of huge datasets gathered by a very large number of techniques and stored in very diversified and often incompatible data repositories. More in general, in the e-science environment, it is considered as a critical and urgent requirement to integrate services across distributed, heterogeneous, dynamic "virtual organizations" formed by different resources within a single enterprise. In the last decade, Astronomy has become an immensely data rich field due to the evolution of detectors (plates to digital to mosaics), telescopes and space instruments. The Virtual Observatory approach consists into the federation under common standards of all astronomical archives available worldwide, as well as data analysis, data mining and data exploration applications. The main drive behind such effort being that once the infrastructure will be completed, it will allow a new type of multi-wavelength, multi-epoch science which can only be barely imagined. Data Mining, or Knowledge Discovery in Databases, while being the main methodology to extract the scientific information contained in such MDS (Massive Data Sets), poses crucial problems since it has to orchestrate complex problems posed by transparent access to different computing environments, scalability of algorithms, reusability of resources, etc. In the present paper we summarize the present status of the MDS in the Virtual Observatory and what is currently done and planned to bring advanced Data Mining methodologies in the case of the DAME (DAta Mining & Exploration) project.Comment: Pages 845-849 1rs International Conference on Frontiers in Diagnostics Technologie

arXiv.org e-Print Archive

Crossref

Archivio della ricerca - Università degli studi di Napoli Federico II

DAME: A distributed data mining and exploration framework within the virtual observatory

Author: Giuseppe Longo
Massimo Brescia
Omar Laurino
Raffaele D’Abrusco
Stefano Cavuoti
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Nowadays, many scientific areas share the same broad requirements of being able to deal with massive and distributed datasets while, when possible, being integrated with services and applications. In order to solve the growing gap between the incremental generation of data and our understanding of it, it is required to know how to access, retrieve, analyze, mine and integrate data from disparate sources. One of the fundamental aspects of any new generation of data mining software tool or package which really wants to become a service for the community is the possibility to use it within complex workflows which each user can fine tune in order to match the specific demands of his scientific goal. These workflows need often to access different resources (data, providers, computing facilities and packages) and require a strict interoperability on (at least) the client side. The project DAME (DAta Mining & Exploration) arises from these requirements by providing a distributed WEB-based data mining infrastructure specialized on Massive Data Sets exploration with Soft Computing methods. Originally designed to deal with astrophysical use cases, where first scientific application examples have demonstrated its effectiveness, the DAME Suite results as a multi-disciplinary platformindependent tool perfectly compliant with modern KDD (Knowledge Discovery in Databases) requirements and Information & Communication Technology trends

Archivio della ricerca - Università degli studi di Napoli Federico II

Virtual Astronomy, Information Technology, and the New Scientific Methodology

Author: Djorgovski S. G.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

All sciences, including astronomy, are now entering the era of information abundance. The exponentially increasing volume and complexity of modern data sets promises to transform the scientific practice, but also poses a number of common technological challenges. The Virtual Observatory concept is the astronomical community's response to these challenges: it aims to harness the progress in information technology in the service of astronomy, and at the same time provide a valuable testbed for information technology and applied computer science. Challenges broadly fall into two categories: data handling (or "data farming"), including issues such as archives, intelligent storage, databases, interoperability, fast networks, etc., and data mining, data understanding, and knowledge discovery, which include issues such as automated clustering and classification, multivariate correlation searches, pattern recognition, visualization in highly hyperdimensional parameter spaces, etc., as well as various applications of machine learning in these contexts. Such techniques are forming a methodological foundation for science with massive and complex data sets in general, and are likely to have a much broather impact on the modern society, commerce, information economy, security, etc. There is a powerful emerging synergy between the computationally enabled science and the science-driven computing, which will drive the progress in science, scholarship, and many other venues in the 21st century

Crossref

Caltech Authors

Genetic Algorithm Modeling with GPU Parallel Computing Technology

Author: Brescia Massimo
Cavuoti Stefano
Garofalo Mauro
Longo Giuseppe
Pescapé Antonio
Ventre Giorgio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

We present a multi-purpose genetic algorithm, designed and implemented with GPGPU / CUDA parallel computing technology. The model was derived from a multi-core CPU serial implementation, named GAME, already scientifically successfully tested and validated on astrophysical massive data classification problems, through a web application resource (DAMEWARE), specialized in data mining based on Machine Learning paradigms. Since genetic algorithms are inherently parallel, the GPGPU computing paradigm has provided an exploit of the internal training features of the model, permitting a strong optimization in terms of processing performances and scalability.Comment: 11 pages, 2 figures, refereed proceedings; Neural Nets and Surroundings, Proceedings of 22nd Italian Workshop on Neural Nets, WIRN 2012; Smart Innovation, Systems and Technologies, Vol. 19, Springe

arXiv.org e-Print Archive

Crossref

Archivio della ricerca - Università degli studi di Napoli Federico II

Exploration of Parameter Spaces in a Virtual Observatory

Author: Brunner R.
Curkendall D.
Djorgovski S. G.
Granat R.
Jacob J.
Mahabal A.
Stolorz P.
Williams R.
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 01/01/2001
Field of study

Like every other field of intellectual endeavor, astronomy is being revolutionised by the advances in information technology. There is an ongoing exponential growth in the volume, quality, and complexity of astronomical data sets, mainly through large digital sky surveys and archives. The Virtual Observatory (VO) concept represents a scientific and technological framework needed to cope with this data flood. Systematic exploration of the observable parameter spaces, covered by large digital sky surveys spanning a range of wavelengths, will be one of the primary modes of research with a VO. This is where the truly new discoveries will be made, and new insights be gained about the already known astronomical objects and phenomena. We review some of the methodological challenges posed by the analysis of large and complex data sets expected in the VO-based research. The challenges are driven both by the size and the complexity of the data sets (billions of data vectors in parameter spaces of tens or hundreds of dimensions), by the heterogeneity of the data and measurement errors, including differences in basic survey parameters for the federated data sets (e.g., in the positional accuracy and resolution, wavelength coverage, time baseline, etc.), various selection effects, as well as the intrinsic clustering properties (functional form, topology) of the data distributions in the parameter spaces of observed attributes. Answering these challenges will require substantial collaborative efforts and partnerships between astronomers, computer scientists, and statisticians.Comment: Invited review, 10 pages, Latex file with 4 eps figures, style files included. To appear in Proc. SPIE, v. 4477 (2001

arXiv.org e-Print Archive

Data Driven Discovery in Astrophysics

Author: Brescia M.
Cavuoti S.
Djorgovski S. G.
Donalek C.
Longo G.
Publication venue
Publication date: 01/01/2014
Field of study

We review some aspects of the current state of data-intensive astronomy, its methods, and some outstanding data analysis challenges. Astronomy is at the forefront of "big data" science, with exponentially growing data volumes and data rates, and an ever-increasing complexity, now entering the Petascale regime. Telescopes and observatories from both ground and space, covering a full range of wavelengths, feed the data via processing pipelines into dedicated archives, where they can be accessed for scientific analysis. Most of the large archives are connected through the Virtual Observatory framework, that provides interoperability standards and services, and effectively constitutes a global data grid of astronomy. Making discoveries in this overabundance of data requires applications of novel, machine learning tools. We describe some of the recent examples of such applications.Comment: Keynote talk in the proceedings of ESA-ESRIN Conference: Big Data from Space 2014, Frascati, Italy, November 12-14, 2014, 8 pages, 2 figure

arXiv.org e-Print Archive

Archivio della ricerca - Università degli studi di Napoli Federico II

Some Pattern Recognition Challenges in Data-Intensive Astronomy

Author: Djorgovski S. G.
Donalek C.
Drake A.
Glikman E.
Graham M.
Mahabal A.
Williams R.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

We review some of the recent developments and challenges posed by the data analysis in modern digital sky surveys, which are representative of the information-rich astronomy in the context of Virtual Observatory. Illustrative examples include the problems of an automated star-galaxy classification in complex and heterogeneous panoramic imaging data sets, and an automated, iterative, dynamical classification of transient events detected in synoptic sky surveys. These problems offer good opportunities for productive collaborations between astronomers and applied computer scientists and statisticians, and are representative of the kind of challenges now present in all data-intensive fields. We discuss briefly some emergent types of scalable scientific data analysis systems with a broad applicability.Comment: 8 pages, compressed pdf file, figures downgraded in quality in order to match the arXiv size limi

arXiv.org e-Print Archive

Crossref

Caltech Authors

CERN Document Server