Search CORE

9,693 research outputs found

Visualisation of cluster dynamics and change detection in ubiquitous data stream mining

Author: Gaber M.
Gillick B.
Krishnaswamy S.
Zaslavsky A.
Publication venue
Publication date: 29/06/2006
Field of study

Virtual Astronomy, Information Technology, and the New Scientific Methodology

Author: Djorgovski S. G.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

All sciences, including astronomy, are now entering the era of information abundance. The exponentially increasing volume and complexity of modern data sets promises to transform the scientific practice, but also poses a number of common technological challenges. The Virtual Observatory concept is the astronomical community's response to these challenges: it aims to harness the progress in information technology in the service of astronomy, and at the same time provide a valuable testbed for information technology and applied computer science. Challenges broadly fall into two categories: data handling (or "data farming"), including issues such as archives, intelligent storage, databases, interoperability, fast networks, etc., and data mining, data understanding, and knowledge discovery, which include issues such as automated clustering and classification, multivariate correlation searches, pattern recognition, visualization in highly hyperdimensional parameter spaces, etc., as well as various applications of machine learning in these contexts. Such techniques are forming a methodological foundation for science with massive and complex data sets in general, and are likely to have a much broather impact on the modern society, commerce, information economy, security, etc. There is a powerful emerging synergy between the computationally enabled science and the science-driven computing, which will drive the progress in science, scholarship, and many other venues in the 21st century

Caltech Authors

Improving speaker turn embedding by crossmodal transfer learning from face embedding

Author: Le Nam
Odobez Jean-Marc
Publication venue
Publication date: 10/07/2017
Field of study

Learning speaker turn embeddings has shown considerable improvement in situations where conventional speaker modeling approaches fail. However, this improvement is relatively limited when compared to the gain observed in face embedding learning, which has been proven very successful for face verification and clustering tasks. Assuming that face and voices from the same identities share some latent properties (like age, gender, ethnicity), we propose three transfer learning approaches to leverage the knowledge from the face domain (learned from thousands of images and identities) for tasks in the speaker domain. These approaches, namely target embedding transfer, relative distance transfer, and clustering structure transfer, utilize the structure of the source face embedding space at different granularities to regularize the target speaker turn embedding space as optimizing terms. Our methods are evaluated on two public broadcast corpora and yield promising advances over competitive baselines in verification and audio clustering tasks, especially when dealing with short speaker utterances. The analysis of the results also gives insight into characteristics of the embedding spaces and shows their potential applications

arXiv.org e-Print Archive

Open mobile miner: a toolkit for building situation-aware data mining applications

Author: Gaber M.
Gillick B.
Haghighi P.
Krishnaswamy S.
Sinha A.
Zaslavsky A.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2013
Field of study

Numerical Simulations of the Dark Universe: State of the Art and the Next Decade

Author: Angulo Raul
Kuhlen Michael
Vogelsberger Mark
Publication venue
Publication date: 26/10/2012
Field of study

We present a review of the current state of the art of cosmological dark matter simulations, with particular emphasis on the implications for dark matter detection efforts and studies of dark energy. This review is intended both for particle physicists, who may find the cosmological simulation literature opaque or confusing, and for astro-physicists, who may not be familiar with the role of simulations for observational and experimental probes of dark matter and dark energy. Our work is complementary to the contribution by M. Baldi in this issue, which focuses on the treatment of dark energy and cosmic acceleration in dedicated N-body simulations. Truly massive dark matter-only simulations are being conducted on national supercomputing centers, employing from several billion to over half a trillion particles to simulate the formation and evolution of cosmologically representative volumes (cosmic scale) or to zoom in on individual halos (cluster and galactic scale). These simulations cost millions of core-hours, require tens to hundreds of terabytes of memory, and use up to petabytes of disk storage. The field is quite internationally diverse, with top simulations having been run in China, France, Germany, Korea, Spain, and the USA. Predictions from such simulations touch on almost every aspect of dark matter and dark energy studies, and we give a comprehensive overview of this connection. We also discuss the limitations of the cold and collisionless DM-only approach, and describe in some detail efforts to include different particle physics as well as baryonic physics in cosmological galaxy formation simulations, including a discussion of recent results highlighting how the distribution of dark matter in halos may be altered. We end with an outlook for the next decade, presenting our view of how the field can be expected to progress. (abridged)Comment: 54 pages, 4 figures, 3 tables; invited contribution to the special issue "The next decade in Dark Matter and Dark Energy" of the new Open Access journal "Physics of the Dark Universe". Replaced with accepted versio

arXiv.org e-Print Archive

Approximate Data Mining Using Sketches for Massive Data

Author: Agnihotri Swati
Gupta Parul
Saha Suman
Publication venue: The Authors. Published by Elsevier Ltd.
Publication date: 31/12/2013
Field of study

AbstractWith the popularity of the Web and Internet, massive data is generated.However, this enormous datasets present the challenge to apply data mining techniques in order to extract useful information. Dimensionality reduction can be used to improve both efficiency and effectiveness while extracting information from data. In this paper we have proposed an algorithm to reduce the dimensionality of the datasets such that after applying data mining techniques on reduced datasets we get almost same results as with the original datasets. Random Sketch is used to reduce the dimensions of the dataset