9,693 research outputs found

    Virtual Astronomy, Information Technology, and the New Scientific Methodology

    Get PDF
    All sciences, including astronomy, are now entering the era of information abundance. The exponentially increasing volume and complexity of modern data sets promises to transform the scientific practice, but also poses a number of common technological challenges. The Virtual Observatory concept is the astronomical community's response to these challenges: it aims to harness the progress in information technology in the service of astronomy, and at the same time provide a valuable testbed for information technology and applied computer science. Challenges broadly fall into two categories: data handling (or "data farming"), including issues such as archives, intelligent storage, databases, interoperability, fast networks, etc., and data mining, data understanding, and knowledge discovery, which include issues such as automated clustering and classification, multivariate correlation searches, pattern recognition, visualization in highly hyperdimensional parameter spaces, etc., as well as various applications of machine learning in these contexts. Such techniques are forming a methodological foundation for science with massive and complex data sets in general, and are likely to have a much broather impact on the modern society, commerce, information economy, security, etc. There is a powerful emerging synergy between the computationally enabled science and the science-driven computing, which will drive the progress in science, scholarship, and many other venues in the 21st century

    Improving speaker turn embedding by crossmodal transfer learning from face embedding

    Full text link
    Learning speaker turn embeddings has shown considerable improvement in situations where conventional speaker modeling approaches fail. However, this improvement is relatively limited when compared to the gain observed in face embedding learning, which has been proven very successful for face verification and clustering tasks. Assuming that face and voices from the same identities share some latent properties (like age, gender, ethnicity), we propose three transfer learning approaches to leverage the knowledge from the face domain (learned from thousands of images and identities) for tasks in the speaker domain. These approaches, namely target embedding transfer, relative distance transfer, and clustering structure transfer, utilize the structure of the source face embedding space at different granularities to regularize the target speaker turn embedding space as optimizing terms. Our methods are evaluated on two public broadcast corpora and yield promising advances over competitive baselines in verification and audio clustering tasks, especially when dealing with short speaker utterances. The analysis of the results also gives insight into characteristics of the embedding spaces and shows their potential applications

    Numerical Simulations of the Dark Universe: State of the Art and the Next Decade

    Get PDF
    We present a review of the current state of the art of cosmological dark matter simulations, with particular emphasis on the implications for dark matter detection efforts and studies of dark energy. This review is intended both for particle physicists, who may find the cosmological simulation literature opaque or confusing, and for astro-physicists, who may not be familiar with the role of simulations for observational and experimental probes of dark matter and dark energy. Our work is complementary to the contribution by M. Baldi in this issue, which focuses on the treatment of dark energy and cosmic acceleration in dedicated N-body simulations. Truly massive dark matter-only simulations are being conducted on national supercomputing centers, employing from several billion to over half a trillion particles to simulate the formation and evolution of cosmologically representative volumes (cosmic scale) or to zoom in on individual halos (cluster and galactic scale). These simulations cost millions of core-hours, require tens to hundreds of terabytes of memory, and use up to petabytes of disk storage. The field is quite internationally diverse, with top simulations having been run in China, France, Germany, Korea, Spain, and the USA. Predictions from such simulations touch on almost every aspect of dark matter and dark energy studies, and we give a comprehensive overview of this connection. We also discuss the limitations of the cold and collisionless DM-only approach, and describe in some detail efforts to include different particle physics as well as baryonic physics in cosmological galaxy formation simulations, including a discussion of recent results highlighting how the distribution of dark matter in halos may be altered. We end with an outlook for the next decade, presenting our view of how the field can be expected to progress. (abridged)Comment: 54 pages, 4 figures, 3 tables; invited contribution to the special issue "The next decade in Dark Matter and Dark Energy" of the new Open Access journal "Physics of the Dark Universe". Replaced with accepted versio

    Approximate Data Mining Using Sketches for Massive Data

    Get PDF
    AbstractWith the popularity of the Web and Internet, massive data is generated.However, this enormous datasets present the challenge to apply data mining techniques in order to extract useful information. Dimensionality reduction can be used to improve both efficiency and effectiveness while extracting information from data. In this paper we have proposed an algorithm to reduce the dimensionality of the datasets such that after applying data mining techniques on reduced datasets we get almost same results as with the original datasets. Random Sketch is used to reduce the dimensions of the dataset
    • …
    corecore