36,776 research outputs found
Communication Theoretic Data Analytics
Widespread use of the Internet and social networks invokes the generation of
big data, which is proving to be useful in a number of applications. To deal
with explosively growing amounts of data, data analytics has emerged as a
critical technology related to computing, signal processing, and information
networking. In this paper, a formalism is considered in which data is modeled
as a generalized social network and communication theory and information theory
are thereby extended to data analytics. First, the creation of an equalizer to
optimize information transfer between two data variables is considered, and
financial data is used to demonstrate the advantages. Then, an information
coupling approach based on information geometry is applied for dimensionality
reduction, with a pattern recognition example to illustrate the effectiveness.
These initial trials suggest the potential of communication theoretic data
analytics for a wide range of applications.Comment: Published in IEEE Journal on Selected Areas in Communications, Jan.
201
Tensor Analysis and Fusion of Multimodal Brain Images
Current high-throughput data acquisition technologies probe dynamical systems
with different imaging modalities, generating massive data sets at different
spatial and temporal resolutions posing challenging problems in multimodal data
fusion. A case in point is the attempt to parse out the brain structures and
networks that underpin human cognitive processes by analysis of different
neuroimaging modalities (functional MRI, EEG, NIRS etc.). We emphasize that the
multimodal, multi-scale nature of neuroimaging data is well reflected by a
multi-way (tensor) structure where the underlying processes can be summarized
by a relatively small number of components or "atoms". We introduce
Markov-Penrose diagrams - an integration of Bayesian DAG and tensor network
notation in order to analyze these models. These diagrams not only clarify
matrix and tensor EEG and fMRI time/frequency analysis and inverse problems,
but also help understand multimodal fusion via Multiway Partial Least Squares
and Coupled Matrix-Tensor Factorization. We show here, for the first time, that
Granger causal analysis of brain networks is a tensor regression problem, thus
allowing the atomic decomposition of brain networks. Analysis of EEG and fMRI
recordings shows the potential of the methods and suggests their use in other
scientific domains.Comment: 23 pages, 15 figures, submitted to Proceedings of the IEE
Orthogonal parallel MCMC methods for sampling and optimization
Monte Carlo (MC) methods are widely used for Bayesian inference and
optimization in statistics, signal processing and machine learning. A
well-known class of MC methods are Markov Chain Monte Carlo (MCMC) algorithms.
In order to foster better exploration of the state space, specially in
high-dimensional applications, several schemes employing multiple parallel MCMC
chains have been recently introduced. In this work, we describe a novel
parallel interacting MCMC scheme, called {\it orthogonal MCMC} (O-MCMC), where
a set of "vertical" parallel MCMC chains share information using some
"horizontal" MCMC techniques working on the entire population of current
states. More specifically, the vertical chains are led by random-walk
proposals, whereas the horizontal MCMC techniques employ independent proposals,
thus allowing an efficient combination of global exploration and local
approximation. The interaction is contained in these horizontal iterations.
Within the analysis of different implementations of O-MCMC, novel schemes in
order to reduce the overall computational cost of parallel multiple try
Metropolis (MTM) chains are also presented. Furthermore, a modified version of
O-MCMC for optimization is provided by considering parallel simulated annealing
(SA) algorithms. Numerical results show the advantages of the proposed sampling
scheme in terms of efficiency in the estimation, as well as robustness in terms
of independence with respect to initial values and the choice of the
parameters
Data-driven Soft Sensors in the Process Industry
In the last two decades Soft Sensors established themselves as a valuable alternative to the traditional means for the acquisition of critical process variables, process monitoring and other tasks which are related to process control. This paper discusses characteristics of the process industry data which are critical for the development of data-driven Soft Sensors. These characteristics are common to a large number of process industry fields, like the chemical industry, bioprocess industry, steel industry, etc. The focus of this work is put on the data-driven Soft Sensors because of their growing popularity, already demonstrated usefulness and huge, though yet not completely realised, potential. A comprehensive selection of case studies covering the three most important Soft Sensor application fields, a general introduction to the most popular Soft Sensor modelling techniques as well as a discussion of some open issues in the Soft Sensor development and maintenance and their possible solutions are the main contributions of this work
From Data Fusion to Knowledge Fusion
The task of {\em data fusion} is to identify the true values of data items
(eg, the true date of birth for {\em Tom Cruise}) among multiple observed
values drawn from different sources (eg, Web sites) of varying (and unknown)
reliability. A recent survey\cite{LDL+12} has provided a detailed comparison of
various fusion methods on Deep Web data. In this paper, we study the
applicability and limitations of different fusion techniques on a more
challenging problem: {\em knowledge fusion}. Knowledge fusion identifies true
subject-predicate-object triples extracted by multiple information extractors
from multiple information sources. These extractors perform the tasks of entity
linkage and schema alignment, thus introducing an additional source of noise
that is quite different from that traditionally considered in the data fusion
literature, which only focuses on factual errors in the original sources. We
adapt state-of-the-art data fusion techniques and apply them to a knowledge
base with 1.6B unique knowledge triples extracted by 12 extractors from over 1B
Web pages, which is three orders of magnitude larger than the data sets used in
previous data fusion papers. We show great promise of the data fusion
approaches in solving the knowledge fusion problem, and suggest interesting
research directions through a detailed error analysis of the methods.Comment: VLDB'201
- …