303,453 research outputs found
Recommended from our members
Sphere-sphere intersection for investment portfolio diversification - A new data-driven cluster analysis.
Aiming at supporting the process of investment portfolio diversification by using a data-driven approach, the present methodological paper proposes a new cluster analysis, which compares publicly traded companies, mainly in times of high volatility (e.g. crisis times). The main goal of the proposed method is to provide a less arbitrary analysis to support financial investors to precisely measure the degree of similarity between equity stocks, unveiling equity market clustering patterns by applying analytic geometry solutions and calculating an overall clustering pattern indicator. Empirical results on synthetic data demonstrate either that the proposed method has conceptual superiority over traditional cluster analyses and its potential practical usefulness to asset allocation, portfolio strategy, asset pricing, among other related purposes. Finally, the outputs of the proposed cluster analysis are presented through an intuitive and easily understandable mathematical visualization. •It is proposed a new method to calculate risk-similarity and clustering patterns.•The method unveils clustering patterns through a data-driven process.•Portfolio diversification can benefit from sphere-sphere intersection calculations
A hierarchical Mamdani-type fuzzy modelling approach with new training data selection and multi-objective optimisation mechanisms: A special application for the prediction of mechanical properties of alloy steels
In this paper, a systematic data-driven fuzzy modelling methodology is proposed, which allows to construct Mamdani fuzzy models considering both accuracy (precision) and transparency (interpretability) of fuzzy systems. The new methodology employs a fast hierarchical clustering algorithm to generate an initial fuzzy model efficiently; a training data selection mechanism is developed to identify appropriate and efficient data as learning samples; a high-performance Particle Swarm Optimisation (PSO) based multi-objective optimisation mechanism is developed to further improve the fuzzy model in terms of both the structure and the parameters; and a new tolerance analysis method is proposed to derive the confidence bands relating to the final elicited models. This proposed modelling approach is evaluated using two benchmark problems and is shown to outperform other modelling approaches. Furthermore, the proposed approach is successfully applied to complex high-dimensional modelling problems for manufacturing of alloy steels, using ‘real’ industrial data. These problems concern the prediction of the mechanical properties of alloy steels by correlating them with the heat treatment process conditions as well as the weight percentages of the chemical compositions
An Approach to Web-Scale Named-Entity Disambiguation
We present a multi-pass clustering approach to large scale. wide-scope named-entity disambiguation (NED) oil collections of web pages. Our approach Uses name co-occurrence information to cluster and hence disambiguate entities. and is designed to handle NED on the entire web. We show that on web collections, NED becomes increasing), difficult as the corpus size increases, not only because of the challenge of scaling the NED algorithm, but also because new and surprising facets of entities become visible in the data. This effect limits the potential benefits for data-driven approaches of processing larger data-sets, and suggests that efficient clustering-based disambiguation methods for the web will require extracting more specialized information front documents
Listen to genes : dealing with microarray data in the frequency domain
Background: We present a novel and systematic approach to analyze temporal microarray data. The approach includes
normalization, clustering and network analysis of genes.
Methodology: Genes are normalized using an error model based uniform normalization method aimed at identifying and
estimating the sources of variations. The model minimizes the correlation among error terms across replicates. The
normalized gene expressions are then clustered in terms of their power spectrum density. The method of complex Granger
causality is introduced to reveal interactions between sets of genes. Complex Granger causality along with partial Granger
causality is applied in both time and frequency domains to selected as well as all the genes to reveal the interesting
networks of interactions. The approach is successfully applied to Arabidopsis leaf microarray data generated from 31,000
genes observed over 22 time points over 22 days. Three circuits: a circadian gene circuit, an ethylene circuit and a new
global circuit showing a hierarchical structure to determine the initiators of leaf senescence are analyzed in detail.
Conclusions: We use a totally data-driven approach to form biological hypothesis. Clustering using the power-spectrum
analysis helps us identify genes of potential interest. Their dynamics can be captured accurately in the time and frequency
domain using the methods of complex and partial Granger causality. With the rise in availability of temporal microarray
data, such methods can be useful tools in uncovering the hidden biological interactions. We show our method in a step by
step manner with help of toy models as well as a real biological dataset. We also analyse three distinct gene circuits of
potential interest to Arabidopsis researchers
Differential Performance Debugging with Discriminant Regression Trees
Differential performance debugging is a technique to find performance
problems. It applies in situations where the performance of a program is
(unexpectedly) different for different classes of inputs. The task is to
explain the differences in asymptotic performance among various input classes
in terms of program internals. We propose a data-driven technique based on
discriminant regression tree (DRT) learning problem where the goal is to
discriminate among different classes of inputs. We propose a new algorithm for
DRT learning that first clusters the data into functional clusters, capturing
different asymptotic performance classes, and then invokes off-the-shelf
decision tree learning algorithms to explain these clusters. We focus on linear
functional clusters and adapt classical clustering algorithms (K-means and
spectral) to produce them. For the K-means algorithm, we generalize the notion
of the cluster centroid from a point to a linear function. We adapt spectral
clustering by defining a novel kernel function to capture the notion of linear
similarity between two data points. We evaluate our approach on benchmarks
consisting of Java programs where we are interested in debugging performance.
We show that our algorithm significantly outperforms other well-known
regression tree learning algorithms in terms of running time and accuracy of
classification.Comment: To Appear in AAAI 201
Robust Optimization using a new Volume-Based Clustering approach
We propose a new data-driven technique for constructing uncertainty sets for robust optimization problems. The technique captures the underlying structure of sparse data through volume-based clustering, resulting in less conservative solutions than most commonly used robust optimization approaches. This can aid management in making informed decisions under uncertainty, allowing a better understanding of the potential outcomes and risks associated with possible decisions. The paper demonstrates how clustering can be performed using any desired geometry and provides a mathematical optimization formulation for generating clusters and constructing the uncertainty set. In order to find an efficient solution to the problem, we explore different approaches since the method may be computationally expensive. This contribution to the field provides a novel data-driven approach to uncertainty set construction for robust optimization that can be applied to real-world scenarios
- …