58,992 research outputs found

    Statistical Approaches for Binary and Categorical Data Modeling

    Get PDF
    Nowadays a massive amount of data is generated as the development of technology and services has accelerated. Therefore, the demand for data clustering in order to gain knowledge has increased in many sectors such as medical sciences, risk assessment and product sales. Moreover, binary data has been widely used in various applications including market basket data and text documents analysis. While applying classic widely used k-means method is inappropriate to cluster binary data, we propose an improvement of K-medoids algorithm using binary similarity measures instead of Euclidean distance which is generally deployed in clustering algorithms. In addition to K-medoids clustering method, agglomerative hierarchical clustering methods based on Gaussian probability models have recently shown to be efficient in different applications. However, the emerging of pattern recognition applications where the features are binary or integer-valued demand extending research efforts to such data types. We propose a hierarchical clustering framework for clustering categorical data based on Multinomial and Bernoulli mixture models. We have compared two widely used density-based distances, namely; Bhattacharyya and Kullback-Leibler. The merits of our proposed clustering frameworks have been shown through extensive experiments on clustering text, binary images categorization and images categorization. The development of generative/discriminative approaches for classifying different kinds of data has attracted scholars’ attention. Considering the strengths and weaknesses of both approaches, several hybrid learning approaches which combined the desirable properties of both have been developed. Our contribution is to combine Support Vector Machines (SVMs) and Bernoulli mixture model in order to classify binary data. We propose using Bernoulli mixture model for generating probabilistic kernels for SVM based on information divergence. These kernels make intelligent use of unlabeled binary data to achieve good data discrimination. We evaluate the proposed hybrid learning approach by classifying binary and texture images

    The development of artificial neural networks for the analysis of market research and electronic nose data

    Get PDF
    This thesis details research carried out into the application of unsupervised neural network and statistical clustering techniques to market research interview survey analysis. The objective of the research was to develop mathematical mechanisms to locate and quantify internal clusters within the data sets with definite commonality. As the data sets being used were binary, this commonality was expressed in terms of identical question answers. Unsupervised neural network paradigms are investigated, along with statistical clustering techniques. The theory of clustering in a binary space is also looked at. Attempts to improve the clarity of output of Self-Organising Maps (SOM) consisted of several stages of investigation culminating in the conception of the Interrogative Memory Structure (lMS). IMS proved easy to use, fast in operation and consistently produced results with the highest degree of commonality when tested against SOM, Adaptive Resonance Theory (ART!) and FASTCLUS. ARTl performed well when clusters were measured using general metrics. During the course of the research a supervised technique, the Vector Memory Array (VMA), was developed. VMA was tested against Back Propagation (BP) (using data sets provided by the Warwick electronic nose project) and consistently produced higher classification accuracies. The main advantage of VMA is its speed of operation - in testing it produced results in minutes compared to hours for the BP method, giving speed increases in the region of 100: 1

    U.S. stock market interaction network as learned by the Boltzmann Machine

    Full text link
    We study historical dynamics of joint equilibrium distribution of stock returns in the U.S. stock market using the Boltzmann distribution model being parametrized by external fields and pairwise couplings. Within Boltzmann learning framework for statistical inference, we analyze historical behavior of the parameters inferred using exact and approximate learning algorithms. Since the model and inference methods require use of binary variables, effect of this mapping of continuous returns to the discrete domain is studied. The presented analysis shows that binarization preserves market correlation structure. Properties of distributions of external fields and couplings as well as industry sector clustering structure are studied for different historical dates and moving window sizes. We found that a heavy positive tail in the distribution of couplings is responsible for the sparse market clustering structure. We also show that discrepancies between the model parameters might be used as a precursor of financial instabilities.Comment: 15 pages, 17 figures, 1 tabl

    The multiplex structure of interbank networks

    Full text link
    The interbank market has a natural multiplex network representation. We employ a unique database of supervisory reports of Italian banks to the Banca d'Italia that includes all bilateral exposures broken down by maturity and by the secured and unsecured nature of the contract. We find that layers have different topological properties and persistence over time. The presence of a link in a layer is not a good predictor of the presence of the same link in other layers. Maximum entropy models reveal different unexpected substructures, such as network motifs, in different layers. Using the total interbank network or focusing on a specific layer as representative of the other layers provides a poor representation of interlinkages in the interbank market and could lead to biased estimation of systemic risk.Comment: 41 pages, 8 figures, 10 table

    Epidemics of Liquidity Shortages in Interbank Markets

    Get PDF
    Financial contagion from liquidity shocks has being recently ascribed as a prominent driver of systemic risk in interbank lending markets. Building on standard compartment models used in epidemics, in this work we develop an EDB (Exposed-Distressed-Bankrupted) model for the dynamics of liquidity shocks reverberation between banks, and validate it on electronic market for interbank deposits data. We show that the interbank network was highly susceptible to liquidity contagion at the beginning of the 2007/2008 global financial crisis, and that the subsequent micro-prudential and liquidity hoarding policies adopted by banks increased the network resilience to systemic risk---yet with the undesired side effect of drying out liquidity from the market. We finally show that the individual riskiness of a bank is better captured by its network centrality than by its participation to the market, along with the currently debated concept of "too interconnected to fail"

    Mesoscopic Community Structure of Financial Markets Revealed by Price and Sign Fluctuations

    Get PDF
    The mesoscopic organization of complex systems, from financial markets to the brain, is an intermediate between the microscopic dynamics of individual units (stocks or neurons, in the mentioned cases), and the macroscopic dynamics of the system as a whole. The organization is determined by "communities" of units whose dynamics, represented by time series of activity, is more strongly correlated internally than with the rest of the system. Recent studies have shown that the binary projections of various financial and neural time series exhibit nontrivial dynamical features that resemble those of the original data. This implies that a significant piece of information is encoded into the binary projection (i.e. the sign) of such increments. Here, we explore whether the binary signatures of multiple time series can replicate the same complex community organization of the financial market, as the original weighted time series. We adopt a method that has been specifically designed to detect communities from cross-correlation matrices of time series data. Our analysis shows that the simpler binary representation leads to a community structure that is almost identical with that obtained using the full weighted representation. These results confirm that binary projections of financial time series contain significant structural information.Comment: 15 pages, 7 figure

    A review of electricity load profile classification methods

    Get PDF
    With the electricity market liberalisation in Indonesia, the electricity companies will have the right to develop tariff rates independently. Thus, precise knowledge of load profile classifications of customers will become essential for designing a variety of tariff options, in which the tariff rates are in line with efficient revenue generation and will encourage optimum take up of the available electricity supplies, by various types of customers. Since the early days of the liberalisation of the Electricity Supply Industries (ESI) considerable efforts have been made to investigate methodologies to form optimal tariffs based on customer classes, derived from various clustering and classification techniques. Clustering techniques are analytical processes which are used to develop groups (classes) of customers based on their behaviour and to derive representative sets of load profiles and help build models for daily load shapes. Whereas classification techniques are processes that start by analysing load demand data (LDD) from various customers and then identify the groups that these customers' LDD fall into. In this paper we will review some of the popular clustering algorithms, explain the difference between each method
    • …
    corecore