77,016 research outputs found
Exploration of Parameter Spaces in a Virtual Observatory
Like every other field of intellectual endeavor, astronomy is being
revolutionised by the advances in information technology. There is an ongoing
exponential growth in the volume, quality, and complexity of astronomical data
sets, mainly through large digital sky surveys and archives. The Virtual
Observatory (VO) concept represents a scientific and technological framework
needed to cope with this data flood. Systematic exploration of the observable
parameter spaces, covered by large digital sky surveys spanning a range of
wavelengths, will be one of the primary modes of research with a VO. This is
where the truly new discoveries will be made, and new insights be gained about
the already known astronomical objects and phenomena. We review some of the
methodological challenges posed by the analysis of large and complex data sets
expected in the VO-based research. The challenges are driven both by the size
and the complexity of the data sets (billions of data vectors in parameter
spaces of tens or hundreds of dimensions), by the heterogeneity of the data and
measurement errors, including differences in basic survey parameters for the
federated data sets (e.g., in the positional accuracy and resolution,
wavelength coverage, time baseline, etc.), various selection effects, as well
as the intrinsic clustering properties (functional form, topology) of the data
distributions in the parameter spaces of observed attributes. Answering these
challenges will require substantial collaborative efforts and partnerships
between astronomers, computer scientists, and statisticians.Comment: Invited review, 10 pages, Latex file with 4 eps figures, style files
included. To appear in Proc. SPIE, v. 4477 (2001
The Data Big Bang and the Expanding Digital Universe: High-Dimensional, Complex and Massive Data Sets in an Inflationary Epoch
Recent and forthcoming advances in instrumentation, and giant new surveys,
are creating astronomical data sets that are not amenable to the methods of
analysis familiar to astronomers. Traditional methods are often inadequate not
merely because of the size in bytes of the data sets, but also because of the
complexity of modern data sets. Mathematical limitations of familiar algorithms
and techniques in dealing with such data sets create a critical need for new
paradigms for the representation, analysis and scientific visualization (as
opposed to illustrative visualization) of heterogeneous, multiresolution data
across application domains. Some of the problems presented by the new data sets
have been addressed by other disciplines such as applied mathematics,
statistics and machine learning and have been utilized by other sciences such
as space-based geosciences. Unfortunately, valuable results pertaining to these
problems are mostly to be found only in publications outside of astronomy. Here
we offer brief overviews of a number of concepts, techniques and developments,
some "old" and some new. These are generally unknown to most of the
astronomical community, but are vital to the analysis and visualization of
complex datasets and images. In order for astronomers to take advantage of the
richness and complexity of the new era of data, and to be able to identify,
adopt, and apply new solutions, the astronomical community needs a certain
degree of awareness and understanding of the new concepts. One of the goals of
this paper is to help bridge the gap between applied mathematics, artificial
intelligence and computer science on the one side and astronomy on the other.Comment: 24 pages, 8 Figures, 1 Table. Accepted for publication: "Advances in
Astronomy, special issue "Robotic Astronomy
Towards cross-lingual alerting for bursty epidemic events
Background: Online news reports are increasingly becoming a source for event
based early warning systems that detect natural disasters. Harnessing the
massive volume of information available from multilingual newswire presents as
many challenges as opportunities due to the patterns of reporting complex
spatiotemporal events. Results: In this article we study the problem of
utilising correlated event reports across languages. We track the evolution of
16 disease outbreaks using 5 temporal aberration detection algorithms on
text-mined events classified according to disease and outbreak country. Using
ProMED reports as a silver standard, comparative analysis of news data for 13
languages over a 129 day trial period showed improved sensitivity, F1 and
timeliness across most models using cross-lingual events. We report a detailed
case study analysis for Cholera in Angola 2010 which highlights the challenges
faced in correlating news events with the silver standard. Conclusions: The
results show that automated health surveillance using multilingual text mining
has the potential to turn low value news into high value alerts if informed
choices are used to govern the selection of models and data sources. An
implementation of the C2 alerting algorithm using multilingual news is
available at the BioCaster portal http://born.nii.ac.jp/?page=globalroundup
What's unusual in online disease outbreak news?
Background: Accurate and timely detection of public health events of
international concern is necessary to help support risk assessment and response
and save lives. Novel event-based methods that use the World Wide Web as a
signal source offer potential to extend health surveillance into areas where
traditional indicator networks are lacking. In this paper we address the issue
of systematically evaluating online health news to support automatic alerting
using daily disease-country counts text mined from real world data using
BioCaster. For 18 data sets produced by BioCaster, we compare 5 aberration
detection algorithms (EARS C2, C3, W2, F-statistic and EWMA) for performance
against expert moderated ProMED-mail postings. Results: We report sensitivity,
specificity, positive predictive value (PPV), negative predictive value (NPV),
mean alerts/100 days and F1, at 95% confidence interval (CI) for 287
ProMED-mail postings on 18 outbreaks across 14 countries over a 366 day period.
Results indicate that W2 had the best F1 with a slight benefit for day of week
effect over C2. In drill down analysis we indicate issues arising from the
granular choice of country-level modeling, sudden drops in reporting due to day
of week effects and reporting bias. Automatic alerting has been implemented in
BioCaster available from http://born.nii.ac.jp. Conclusions: Online health news
alerts have the potential to enhance manual analytical methods by increasing
throughput, timeliness and detection rates. Systematic evaluation of health
news aberrations is necessary to push forward our understanding of the complex
relationship between news report volumes and case numbers and to select the
best performing features and algorithms
- …