Search CORE

303,453 research outputs found

Recommended from our members

Sphere-sphere intersection for investment portfolio diversification - A new data-driven cluster analysis.

Author: Haddad Michel Ferreira Cardia
Publication venue: MethodsX
Publication date: 30/01/2020
Field of study

Aiming at supporting the process of investment portfolio diversification by using a data-driven approach, the present methodological paper proposes a new cluster analysis, which compares publicly traded companies, mainly in times of high volatility (e.g. crisis times). The main goal of the proposed method is to provide a less arbitrary analysis to support financial investors to precisely measure the degree of similarity between equity stocks, unveiling equity market clustering patterns by applying analytic geometry solutions and calculating an overall clustering pattern indicator. Empirical results on synthetic data demonstrate either that the proposed method has conceptual superiority over traditional cluster analyses and its potential practical usefulness to asset allocation, portfolio strategy, asset pricing, among other related purposes. Finally, the outputs of the proposed cluster analysis are presented through an intuitive and easily understandable mathematical visualization. •It is proposed a new method to calculate risk-similarity and clustering patterns.•The method unveils clustering patterns through a data-driven process.•Portfolio diversification can benefit from sphere-sphere intersection calculations

Apollo (Cambridge)

A hierarchical Mamdani-type fuzzy modelling approach with new training data selection and multi-objective optimisation mechanisms: A special application for the prediction of mechanical properties of alloy steels

Author: Alcala
Bakshi
Bezdek
Chan
Chen
Chen
Chen
Cococcioni
Cordon
De Castro
Delgado
Dieter
Dorigo
Eberhart
Gacto
Glover
Goldberg
Gomez-Skarmeta
Ishibuchi
Ishibuchi
Jain
Jang
Jin
Jin
Johansen
Kennedy
Kwong
Mahdi Mahfouf
Mamdani
Pickering
Qian Zhang
Rojas
Setnes
Setnes
Sugeno
Takagi
Wang
Wang
Wang
Wang
Yen
Yen
Yoshinari
Zadeh
Zadeh
Zadeh
Zhang
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/03/2011
Field of study

In this paper, a systematic data-driven fuzzy modelling methodology is proposed, which allows to construct Mamdani fuzzy models considering both accuracy (precision) and transparency (interpretability) of fuzzy systems. The new methodology employs a fast hierarchical clustering algorithm to generate an initial fuzzy model efficiently; a training data selection mechanism is developed to identify appropriate and efficient data as learning samples; a high-performance Particle Swarm Optimisation (PSO) based multi-objective optimisation mechanism is developed to further improve the fuzzy model in terms of both the structure and the parameters; and a new tolerance analysis method is proposed to derive the confidence bands relating to the final elicited models. This proposed modelling approach is evaluated using two benchmark problems and is shown to outperform other modelling approaches. Furthermore, the proposed approach is successfully applied to complex high-dimensional modelling problems for manufacturing of alloy steels, using ‘real’ industrial data. These problems concern the prediction of the mechanical properties of alloy steels by correlating them with the heat treatment process conditions as well as the weight percentages of the chemical compositions

Crossref

Kent Academic Repository

An Approach to Web-Scale Named-Entity Disambiguation

Author: C. Whitelaw
I. Bhattacharya
L. Sarmento
M. Halkidi
M. Meilă
P. Pantel
S. Dill
S. Guha
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

We present a multi-pass clustering approach to large scale. wide-scope named-entity disambiguation (NED) oil collections of web pages. Our approach Uses name co-occurrence information to cluster and hence disambiguate entities. and is designed to handle NED on the entire web. We show that on web collections, NED becomes increasing), difficult as the corpus size increases, not only because of the challenge of scaling the NED algorithm, but also because new and surprising facets of entities become visible in the data. This effect limits the potential benefits for data-driven approaches of processing larger data-sets, and suggests that efficient clustering-based disambiguation methods for the web will require extracting more specialized information front documents

Crossref

Repositório Aberto da Universidade do Porto

Listen to genes : dealing with microarray data in the frequency domain

Author: A Claridge-Chang
AN Stepanova
AN Stepanova
B-R Kim
Diego Di Bernardo
Dongyun Yi
H Guo
H Ueda
HG McWatters
IP Androulakis
J Fan
J Fan
J Qian
JCW Locke
JH Wu
Jianfeng Feng
MJ Yanovsky
MR Doyle
N Dojer
P DHaeseleer
PO Lim
PT Spellman
R Balasubramaniyan
R Cristi
Ritesh Krishna
S Kim
S Wichert
Shuixia Guo
SL Harmer
SX Guo
U Alon
Vicky Buchanan-Wollaston
W Pan
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 06/04/2009
Field of study

Background: We present a novel and systematic approach to analyze temporal microarray data. The approach includes normalization, clustering and network analysis of genes. Methodology: Genes are normalized using an error model based uniform normalization method aimed at identifying and estimating the sources of variations. The model minimizes the correlation among error terms across replicates. The normalized gene expressions are then clustered in terms of their power spectrum density. The method of complex Granger causality is introduced to reveal interactions between sets of genes. Complex Granger causality along with partial Granger causality is applied in both time and frequency domains to selected as well as all the genes to reveal the interesting networks of interactions. The approach is successfully applied to Arabidopsis leaf microarray data generated from 31,000 genes observed over 22 time points over 22 days. Three circuits: a circadian gene circuit, an ethylene circuit and a new global circuit showing a hierarchical structure to determine the initiators of leaf senescence are analyzed in detail. Conclusions: We use a totally data-driven approach to form biological hypothesis. Clustering using the power-spectrum analysis helps us identify genes of potential interest. Their dynamics can be captured accurately in the time and frequency domain using the methods of complex and partial Granger causality. With the rise in availability of temporal microarray data, such methods can be useful tools in uncovering the hidden biological interactions. We show our method in a step by step manner with help of toy models as well as a real biological dataset. We also analyse three distinct gene circuits of potential interest to Arabidopsis researchers

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Warwick Research Archives Portal Repository

Differential Performance Debugging with Discriminant Regression Trees

Author: Cerny Pavol
Chang Bor-Yuh Evan
Tizpaz-Niari Saeid
Trivedi Ashutosh
Publication venue
Publication date: 28/11/2017
Field of study

Differential performance debugging is a technique to find performance problems. It applies in situations where the performance of a program is (unexpectedly) different for different classes of inputs. The task is to explain the differences in asymptotic performance among various input classes in terms of program internals. We propose a data-driven technique based on discriminant regression tree (DRT) learning problem where the goal is to discriminate among different classes of inputs. We propose a new algorithm for DRT learning that first clusters the data into functional clusters, capturing different asymptotic performance classes, and then invokes off-the-shelf decision tree learning algorithms to explain these clusters. We focus on linear functional clusters and adapt classical clustering algorithms (K-means and spectral) to produce them. For the K-means algorithm, we generalize the notion of the cluster centroid from a point to a linear function. We adapt spectral clustering by defining a novel kernel function to capture the notion of linear similarity between two data points. We evaluate our approach on benchmarks consisting of Java programs where we are interested in debugging performance. We show that our algorithm significantly outperforms other well-known regression tree learning algorithms in terms of running time and accuracy of classification.Comment: To Appear in AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Robust Optimization using a new Volume-Based Clustering approach

Author: Basten Rob J.I.
Marandi Ahmadreza
Tan Lijia
Yazdani Alireza
Publication venue
Publication date: 27/04/2023
Field of study

We propose a new data-driven technique for constructing uncertainty sets for robust optimization problems. The technique captures the underlying structure of sparse data through volume-based clustering, resulting in less conservative solutions than most commonly used robust optimization approaches. This can aid management in making informed decisions under uncertainty, allowing a better understanding of the potential outcomes and risks associated with possible decisions. The paper demonstrates how clustering can be performed using any desired geometry and provides a mathematical optimization formulation for generating clusters and constructing the uncertainty set. In order to find an efficient solution to the problem, we explore different approaches since the method may be computationally expensive. This contribution to the field provides a novel data-driven approach to uncertainty set construction for robust optimization that can be applied to real-world scenarios

Pure OAI Repository