18 research outputs found
How Will Astronomy Archives Survive The Data Tsunami?
The field of astronomy is starting to generate more data than can be managed,
served and processed by current techniques. This paper has outlined practices
for developing next-generation tools and techniques for surviving this data
tsunami, including rigorous evaluation of new technologies, partnerships
between astronomers and computer scientists, and training of scientists in
high-end software engineering engineering skills.Comment: 8 pages, 3 figures; ACM Queue. Vol 9, Number 10, October 2011
(http://queue.acm.org/detail.cfm?id=2047483
GPU Accelerated Particle Visualization with Splotch
Splotch is a rendering algorithm for exploration and visual discovery in
particle-based datasets coming from astronomical observations or numerical
simulations. The strengths of the approach are production of high quality
imagery and support for very large-scale datasets through an effective mix of
the OpenMP and MPI parallel programming paradigms. This article reports our
experiences in re-designing Splotch for exploiting emerging HPC architectures
nowadays increasingly populated with GPUs. A performance model is introduced
for data transfers, computations and memory access, to guide our re-factoring
of Splotch. A number of parallelization issues are discussed, in particular
relating to race conditions and workload balancing, towards achieving optimal
performances. Our implementation was accomplished by using the CUDA programming
paradigm. Our strategy is founded on novel schemes achieving optimized data
organisation and classification of particles. We deploy a reference simulation
to present performance results on acceleration gains and scalability. We
finally outline our vision for future work developments including possibilities
for further optimisations and exploitation of emerging technologies.Comment: 25 pages, 9 figures. Astronomy and Computing (2014
Data Science as a New Frontier for Design
The purpose of this paper is to contribute to the challenge of transferring
know-how, theories and methods from design research to the design processes in
information science and technologies. More specifically, we shall consider a
domain, namely data-science, that is becoming rapidly a globally invested
research and development axis with strong imperatives for innovation given the
data deluge we are currently facing. We argue that, in order to rise to the
data-related challenges that the society is facing, data-science initiatives
should ensure a renewal of traditional research methodologies that are still
largely based on trial-error processes depending on the talent and insights of
a single (or a restricted group of) researchers. It is our claim that design
theories and methods can provide, at least to some extent, the much-needed
framework. We will use a worldwide data-science challenge organized to study a
technical problem in physics, namely the detection of Higgs boson, as a use
case to demonstrate some of the ways in which design theory and methods can
help in analyzing and shaping the innovation dynamics in such projects.Comment: International Conference on Engineering Design, Jul 2015, Milan,
Ital
Research Cloud Data Communities
Big Data, big science, the data deluge, these are topics we are hearing about more and more in our
research pursuits. Then, through media hype, comes cloud computing, the saviour that is going to
resolve our Big Data issues. However, it is difficult to pinpoint exactly what researchers can actually
do with data and with clouds, how they get to exactly solve their Big Data problems, and how they
get help in using these relatively new tools and infrastructure.
Since the beginning of 2012, the NeCTAR Research Cloud has been running at the University of
Melbourne, attracting over 1,650 users from around the country. This has not only provided an
unprecedented opportunity for researchers to employ clouds in their research, but it has also given us
an opportunity to clearly understand how researchers can more easily solve their Big Data problems.
The cloud is now used daily, from running web servers and blog sites, through to hosting virtual
laboratories that can automatically create hundreds of servers depending on research demand. Of
course, it has also helped us understand that infrastructure isn’t everything. There are many other
skillsets needed to help researchers from the multitude of disciplines use the cloud effectively.
How can we solve Big Data problems on cloud infrastructure? One of the key aspects are
communities based on research platforms: Research is built on collaboration, connection and
community, and researchers employ platforms daily, whether as bio-imaging platforms,
computational platforms or cloud platforms (like DropBox).
There are some important features which enabled this to work.. Firstly, the borders to collaboration
are eased, allowing communities to access infrastructure that can be instantly built to be completely
open, through to completely closed, all managed securely through (nationally) standardised
interfaces. Secondly, it is free and easy to build servers and infrastructure, but it is also cheap to fail,
allowing for experimentation not only at a code-level, but at a server or infrastructure level as well.
Thirdly, this (virtual) infrastructure can be shared with collaborators, moving the practice of
collaboration from sharing papers and code to sharing servers, pre-configured and ready to go. And
finally, the underlying infrastructure is built with Big Data in mind, co-located with major data
storage infrastructure and high-performance computers, and interconnected with high-speed networks
nationally to research instruments.
The research cloud is fundamentally new in that it easily allows communities of researchers, often
connected by common geography (research precincts), discipline or long-term established
collaborations, to build open, collaborative platforms. These open, sharable, and repeatable platforms
encourage coordinated use and development, evolving to common community-oriented methods for
Big Data access and data manipulation.
In this paper we discuss in detail critical ingredients in successfully establishing these communities,
as well as some outcomes as a result of these communities and their collaboration enabling platforms.
We consider astronomy as an exemplar of a research field that has already looked to the cloud as a
solution to the ensuing data tsunami
Long-term digital preservation: a digital humanities topic?
"We argue that the so-called Digital Humanities fail to meet conventional criteria to be an accredited field of study on a par with Literature, Chemistry, Computer Science, and Civil Engineering, or even a specialized professorial emphasis such as Ancient History or Nuclear Physics. The argument uses long-term digital preservation as an example to argue that Digital Humanities proponents' case for their research agenda does not merit financial support, emphasizing practical aspects over subjective theory." (author's abstract
Data-Intensive architecture for scientific knowledge discovery
This paper presents a data-intensive architecture that demonstrates the ability to support applications from a wide range of application domains, and support the different types of users involved in defining, designing and executing data-intensive processing tasks. The prototype architecture is introduced, and the pivotal role of DISPEL as a canonical language is explained. The architecture promotes the exploration and exploitation of distributed and heterogeneous data and spans the complete knowledge discovery process, from data preparation, to analysis, to evaluation and reiteration. The architecture evaluation included large-scale applications from astronomy, cosmology, hydrology, functional genetics, imaging processing and seismology
Towards the prediction of molecular parameters from astronomical emission lines using Neural Networks
Molecular astronomy is a field that is blooming in the era of large observatories such as the Atacama Large Millimeter/Submillimeter Array (ALMA). With modern, sensitive, and high spectral resolution radio telescopes like ALMA and the Square Kilometer Array, the size of the data cubes is rapidly escalating, generating a need for powerful automatic analysis tools. This work introduces MolPred, a pilot study to perform predictions of molecular parameters such as excitation temperature (Tex) and column density (log(N)) from input spectra by the use of neural networks. We used as test cases the spectra of CO, HCO+, SiO and CH3CN between 80 and 400 GHz. Training spectra were generated with MADCUBA, a state-of-the-art spectral analysis tool. Our algorithm was designed to allow the generation of predictions for multiple molecules in parallel. Using neural networks, we can predict the column density and excitation temperature of these molecules with a mean absolute error of 8.5% for CO, 4.1% for HCO+, 1.5% for SiO and 1.6% for CH3CN. The prediction accuracy depends on the noise level, line saturation, and number of transitions. We performed predictions upon real ALMA data. The values predicted by our neural network for this real data differ by 13% from the MADCUBA values on average. Current limitations of our tool include not considering linewidth, source size, multiple velocity components, and line blending