58,992 research outputs found
Statistical Approaches for Binary and Categorical Data Modeling
Nowadays a massive amount of data is generated as the development of technology and services has accelerated. Therefore, the demand for data clustering in order to gain knowledge has increased in many sectors such as medical sciences, risk assessment and product sales. Moreover, binary data has been widely used in various applications including market basket data and text documents analysis. While applying classic widely used k-means method is inappropriate to cluster binary data, we propose an improvement of K-medoids algorithm using binary similarity measures instead of Euclidean distance which is generally deployed in clustering algorithms. In addition to K-medoids clustering method, agglomerative hierarchical clustering methods based on Gaussian probability models have recently shown to be efficient in different applications. However, the emerging of pattern recognition applications where the features are binary or integer-valued demand extending research efforts to such data types. We propose a hierarchical clustering framework for clustering categorical data based on Multinomial and Bernoulli mixture models. We have compared two widely used density-based distances, namely; Bhattacharyya and Kullback-Leibler. The merits of our proposed clustering frameworks have been shown through extensive experiments on clustering text, binary images categorization and images categorization.
The development of generative/discriminative approaches for classifying different kinds of data has attracted scholars’ attention. Considering the strengths and weaknesses of both approaches, several hybrid learning approaches which combined the desirable properties of both have been developed. Our contribution is to combine Support Vector Machines (SVMs) and Bernoulli mixture model in order to classify binary data. We propose using Bernoulli mixture model for generating probabilistic kernels for SVM based on information divergence. These kernels make intelligent use of unlabeled binary data to achieve good data discrimination. We evaluate the proposed hybrid learning approach by classifying binary and texture images
The development of artificial neural networks for the analysis of market research and electronic nose data
This thesis details research carried out into the application of unsupervised neural
network and statistical clustering techniques to market research interview survey
analysis. The objective of the research was to develop mathematical mechanisms to
locate and quantify internal clusters within the data sets with definite commonality.
As the data sets being used were binary, this commonality was expressed in terms of
identical question answers. Unsupervised neural network paradigms are investigated,
along with statistical clustering techniques. The theory of clustering in a binary space
is also looked at.
Attempts to improve the clarity of output of Self-Organising Maps (SOM) consisted
of several stages of investigation culminating in the conception of the Interrogative
Memory Structure (lMS). IMS proved easy to use, fast in operation and consistently
produced results with the highest degree of commonality when tested against SOM,
Adaptive Resonance Theory (ART!) and FASTCLUS. ARTl performed well when
clusters were measured using general metrics. During the course of the research a
supervised technique, the Vector Memory Array (VMA), was developed. VMA was
tested against Back Propagation (BP) (using data sets provided by the Warwick
electronic nose project) and consistently produced higher classification accuracies.
The main advantage of VMA is its speed of operation - in testing it produced results
in minutes compared to hours for the BP method, giving speed increases in the
region of 100: 1
U.S. stock market interaction network as learned by the Boltzmann Machine
We study historical dynamics of joint equilibrium distribution of stock
returns in the U.S. stock market using the Boltzmann distribution model being
parametrized by external fields and pairwise couplings. Within Boltzmann
learning framework for statistical inference, we analyze historical behavior of
the parameters inferred using exact and approximate learning algorithms. Since
the model and inference methods require use of binary variables, effect of this
mapping of continuous returns to the discrete domain is studied. The presented
analysis shows that binarization preserves market correlation structure.
Properties of distributions of external fields and couplings as well as
industry sector clustering structure are studied for different historical dates
and moving window sizes. We found that a heavy positive tail in the
distribution of couplings is responsible for the sparse market clustering
structure. We also show that discrepancies between the model parameters might
be used as a precursor of financial instabilities.Comment: 15 pages, 17 figures, 1 tabl
The multiplex structure of interbank networks
The interbank market has a natural multiplex network representation. We
employ a unique database of supervisory reports of Italian banks to the Banca
d'Italia that includes all bilateral exposures broken down by maturity and by
the secured and unsecured nature of the contract. We find that layers have
different topological properties and persistence over time. The presence of a
link in a layer is not a good predictor of the presence of the same link in
other layers. Maximum entropy models reveal different unexpected substructures,
such as network motifs, in different layers. Using the total interbank network
or focusing on a specific layer as representative of the other layers provides
a poor representation of interlinkages in the interbank market and could lead
to biased estimation of systemic risk.Comment: 41 pages, 8 figures, 10 table
Epidemics of Liquidity Shortages in Interbank Markets
Financial contagion from liquidity shocks has being recently ascribed as a
prominent driver of systemic risk in interbank lending markets. Building on
standard compartment models used in epidemics, in this work we develop an EDB
(Exposed-Distressed-Bankrupted) model for the dynamics of liquidity shocks
reverberation between banks, and validate it on electronic market for interbank
deposits data. We show that the interbank network was highly susceptible to
liquidity contagion at the beginning of the 2007/2008 global financial crisis,
and that the subsequent micro-prudential and liquidity hoarding policies
adopted by banks increased the network resilience to systemic risk---yet with
the undesired side effect of drying out liquidity from the market. We finally
show that the individual riskiness of a bank is better captured by its network
centrality than by its participation to the market, along with the currently
debated concept of "too interconnected to fail"
Mesoscopic Community Structure of Financial Markets Revealed by Price and Sign Fluctuations
The mesoscopic organization of complex systems, from financial markets to the
brain, is an intermediate between the microscopic dynamics of individual units
(stocks or neurons, in the mentioned cases), and the macroscopic dynamics of
the system as a whole. The organization is determined by "communities" of units
whose dynamics, represented by time series of activity, is more strongly
correlated internally than with the rest of the system. Recent studies have
shown that the binary projections of various financial and neural time series
exhibit nontrivial dynamical features that resemble those of the original data.
This implies that a significant piece of information is encoded into the binary
projection (i.e. the sign) of such increments. Here, we explore whether the
binary signatures of multiple time series can replicate the same complex
community organization of the financial market, as the original weighted time
series. We adopt a method that has been specifically designed to detect
communities from cross-correlation matrices of time series data. Our analysis
shows that the simpler binary representation leads to a community structure
that is almost identical with that obtained using the full weighted
representation. These results confirm that binary projections of financial time
series contain significant structural information.Comment: 15 pages, 7 figure
A review of electricity load profile classification methods
With the electricity market liberalisation in Indonesia, the electricity companies will have the right to develop tariff rates independently. Thus, precise knowledge of load profile classifications of customers will become essential for designing a variety of tariff options, in which the tariff rates are in line with efficient revenue generation and will encourage optimum take up of the available electricity supplies, by various types of customers. Since the early days of the liberalisation of the Electricity Supply Industries (ESI) considerable efforts have been made to investigate methodologies to form optimal tariffs based on customer classes, derived from various clustering and classification techniques. Clustering techniques are analytical processes which are used to develop groups (classes) of customers based on their behaviour and to derive representative sets of load profiles and help build models for daily load shapes. Whereas classification techniques are processes that start by analysing load demand data (LDD) from various customers and then identify the groups that these customers' LDD fall into. In this paper we will review some of the popular clustering algorithms, explain the difference between each method
- …