73,647 research outputs found
Signal extraction and knowledge discovery based on statistical modeling
AbstractIn the coming post IT era, the problems of signal extraction and knowledge discovery from huge data sets will become very important. For these problems, the use of good model is crucial and thus the statistical modeling will play an important role. In this paper, we show two basic tools for statistical modeling, namely the information criteria for the evaluation of the statistical models and generic state-space model which provides us with a very flexible tool for modeling complex and time-varying systems. As examples of these methods we shall show some applications in seismology and macro economics
Communication Theoretic Data Analytics
Widespread use of the Internet and social networks invokes the generation of
big data, which is proving to be useful in a number of applications. To deal
with explosively growing amounts of data, data analytics has emerged as a
critical technology related to computing, signal processing, and information
networking. In this paper, a formalism is considered in which data is modeled
as a generalized social network and communication theory and information theory
are thereby extended to data analytics. First, the creation of an equalizer to
optimize information transfer between two data variables is considered, and
financial data is used to demonstrate the advantages. Then, an information
coupling approach based on information geometry is applied for dimensionality
reduction, with a pattern recognition example to illustrate the effectiveness.
These initial trials suggest the potential of communication theoretic data
analytics for a wide range of applications.Comment: Published in IEEE Journal on Selected Areas in Communications, Jan.
201
An Overview of the Use of Neural Networks for Data Mining Tasks
In the recent years the area of data mining has experienced a considerable demand for technologies that extract knowledge from large and complex data sources. There is a substantial commercial interest as well as research investigations in the area that aim to develop new and improved approaches for extracting information, relationships, and patterns from datasets. Artificial Neural Networks (NN) are popular biologically inspired intelligent methodologies, whose classification, prediction and pattern recognition capabilities have been utilised successfully in many areas, including science, engineering, medicine, business, banking, telecommunication, and many other fields. This paper highlights from a data mining perspective the implementation of NN, using supervised and unsupervised learning, for pattern recognition, classification, prediction and cluster analysis, and focuses the discussion on their usage in bioinformatics and financial data analysis tasks
Feature-based time-series analysis
This work presents an introduction to feature-based time-series analysis. The
time series as a data type is first described, along with an overview of the
interdisciplinary time-series analysis literature. I then summarize the range
of feature-based representations for time series that have been developed to
aid interpretable insights into time-series structure. Particular emphasis is
given to emerging research that facilitates wide comparison of feature-based
representations that allow us to understand the properties of a time-series
dataset that make it suited to a particular feature-based representation or
analysis algorithm. The future of time-series analysis is likely to embrace
approaches that exploit machine learning methods to partially automate human
learning to aid understanding of the complex dynamical patterns in the time
series we measure from the world.Comment: 28 pages, 9 figure
Automated analysis of quantitative image data using isomorphic functional mixed models, with application to proteomics data
Image data are increasingly encountered and are of growing importance in many
areas of science. Much of these data are quantitative image data, which are
characterized by intensities that represent some measurement of interest in the
scanned images. The data typically consist of multiple images on the same
domain and the goal of the research is to combine the quantitative information
across images to make inference about populations or interventions. In this
paper we present a unified analysis framework for the analysis of quantitative
image data using a Bayesian functional mixed model approach. This framework is
flexible enough to handle complex, irregular images with many local features,
and can model the simultaneous effects of multiple factors on the image
intensities and account for the correlation between images induced by the
design. We introduce a general isomorphic modeling approach to fitting the
functional mixed model, of which the wavelet-based functional mixed model is
one special case. With suitable modeling choices, this approach leads to
efficient calculations and can result in flexible modeling and adaptive
smoothing of the salient features in the data. The proposed method has the
following advantages: it can be run automatically, it produces inferential
plots indicating which regions of the image are associated with each factor, it
simultaneously considers the practical and statistical significance of
findings, and it controls the false discovery rate.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS407 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …