9,693 research outputs found
Virtual Astronomy, Information Technology, and the New Scientific Methodology
All sciences, including astronomy, are now entering the era of information abundance. The exponentially increasing volume and complexity of modern data sets promises to transform the scientific practice, but also poses a number of common technological challenges. The Virtual Observatory concept is the astronomical community's response to these challenges: it aims to harness the progress in information technology in the service of astronomy, and at the same time provide a valuable testbed for information technology and applied computer science. Challenges broadly fall into two categories: data handling (or "data farming"), including issues such as archives, intelligent storage, databases, interoperability, fast networks, etc., and data mining, data understanding, and knowledge discovery, which include issues such as automated clustering and classification, multivariate correlation searches, pattern recognition, visualization in highly hyperdimensional parameter spaces, etc., as well as various applications of machine learning in these contexts. Such techniques are forming a methodological foundation for science with massive and complex data sets in general, and are likely to have a much broather impact on the modern society, commerce, information economy, security, etc. There is a powerful emerging synergy between the
computationally enabled science and the science-driven computing, which will drive the progress in science, scholarship, and many other venues in the 21st century
Improving speaker turn embedding by crossmodal transfer learning from face embedding
Learning speaker turn embeddings has shown considerable improvement in
situations where conventional speaker modeling approaches fail. However, this
improvement is relatively limited when compared to the gain observed in face
embedding learning, which has been proven very successful for face verification
and clustering tasks. Assuming that face and voices from the same identities
share some latent properties (like age, gender, ethnicity), we propose three
transfer learning approaches to leverage the knowledge from the face domain
(learned from thousands of images and identities) for tasks in the speaker
domain. These approaches, namely target embedding transfer, relative distance
transfer, and clustering structure transfer, utilize the structure of the
source face embedding space at different granularities to regularize the target
speaker turn embedding space as optimizing terms. Our methods are evaluated on
two public broadcast corpora and yield promising advances over competitive
baselines in verification and audio clustering tasks, especially when dealing
with short speaker utterances. The analysis of the results also gives insight
into characteristics of the embedding spaces and shows their potential
applications
Numerical Simulations of the Dark Universe: State of the Art and the Next Decade
We present a review of the current state of the art of cosmological dark
matter simulations, with particular emphasis on the implications for dark
matter detection efforts and studies of dark energy. This review is intended
both for particle physicists, who may find the cosmological simulation
literature opaque or confusing, and for astro-physicists, who may not be
familiar with the role of simulations for observational and experimental probes
of dark matter and dark energy. Our work is complementary to the contribution
by M. Baldi in this issue, which focuses on the treatment of dark energy and
cosmic acceleration in dedicated N-body simulations. Truly massive dark
matter-only simulations are being conducted on national supercomputing centers,
employing from several billion to over half a trillion particles to simulate
the formation and evolution of cosmologically representative volumes (cosmic
scale) or to zoom in on individual halos (cluster and galactic scale). These
simulations cost millions of core-hours, require tens to hundreds of terabytes
of memory, and use up to petabytes of disk storage. The field is quite
internationally diverse, with top simulations having been run in China, France,
Germany, Korea, Spain, and the USA. Predictions from such simulations touch on
almost every aspect of dark matter and dark energy studies, and we give a
comprehensive overview of this connection. We also discuss the limitations of
the cold and collisionless DM-only approach, and describe in some detail
efforts to include different particle physics as well as baryonic physics in
cosmological galaxy formation simulations, including a discussion of recent
results highlighting how the distribution of dark matter in halos may be
altered. We end with an outlook for the next decade, presenting our view of how
the field can be expected to progress. (abridged)Comment: 54 pages, 4 figures, 3 tables; invited contribution to the special
issue "The next decade in Dark Matter and Dark Energy" of the new Open Access
journal "Physics of the Dark Universe". Replaced with accepted versio
Approximate Data Mining Using Sketches for Massive Data
AbstractWith the popularity of the Web and Internet, massive data is generated.However, this enormous datasets present the challenge to apply data mining techniques in order to extract useful information. Dimensionality reduction can be used to improve both efficiency and effectiveness while extracting information from data. In this paper we have proposed an algorithm to reduce the dimensionality of the datasets such that after applying data mining techniques on reduced datasets we get almost same results as with the original datasets. Random Sketch is used to reduce the dimensions of the dataset
- …