1,985 research outputs found
The LSST Data Mining Research Agenda
We describe features of the LSST science database that are amenable to
scientific data mining, object classification, outlier identification, anomaly
detection, image quality assurance, and survey science validation. The data
mining research agenda includes: scalability (at petabytes scales) of existing
machine learning and data mining algorithms; development of grid-enabled
parallel data mining algorithms; designing a robust system for brokering
classifications from the LSST event pipeline (which may produce 10,000 or more
event alerts per night); multi-resolution methods for exploration of petascale
databases; indexing of multi-attribute multi-dimensional astronomical databases
(beyond spatial indexing) for rapid querying of petabyte databases; and more.Comment: 5 pages, Presented at the "Classification and Discovery in Large
Astronomical Surveys" meeting, Ringberg Castle, 14-17 October, 200
Gaining insight from large data volumes with ease
Efficient handling of large data-volumes becomes a necessity in today's
world. It is driven by the desire to get more insight from the data and to gain
a better understanding of user trends which can be transformed into economic
incentives (profits, cost-reduction, various optimization of data workflows,
and pipelines). In this paper, we discuss how modern technologies are
transforming well established patterns in HEP communities. The new data insight
can be achieved by embracing Big Data tools for a variety of use-cases, from
analytics and monitoring to training Machine Learning models on a terabyte
scale. We provide concrete examples within context of the CMS experiment where
Big Data tools are already playing or would play a significant role in daily
operations
Data-Intensive Computing in the 21st Century
The deluge of data that future applications must process—in domains ranging from science to business informatics—creates a compelling argument for substantially increased R&D targeted at discovering scalable hardware and software solutions for data-intensive problems
- …