153 research outputs found

    Sector Neutral Portfolios: Long Memory Motifs Persistence in Market Structure Dynamics

    Get PDF
    We study soft persistence (existence in subsequent temporal layers of motifs from the initial layer) of motif structures in Triangulated Maximally Filtered Graphs (TMFG) generated from time-varying Kendall correlation matrices computed from stock prices log-returns over rolling windows with exponential smoothing. We observe long-memory processes in these structures in the form of power law decays in the number of persistent motifs. The decays then transition to a plateau regime with a power-law decay with smaller exponent. We demonstrate that identifying persistent motifs allows for forecasting and applications to portfolio diversification. Balanced portfolios are often constructed from the analysis of historic correlations, however not all past correlations are persistently reflected into the future. Sector neutrality has also been a central theme in portfolio diversification and systemic risk. We present an unsupervised technique to identify persistently correlated sets of stocks. These are empirically found to identify sectors driven by strong fundamentals. Applications of these findings are tested in two distinct ways on four different markets, resulting in significant reduction in portfolio volatility. A persistence-based measure for portfolio allocation is proposed and shown to outperform volatility weighting when tested out of sample

    Predictive Power Estimation Algorithm (PPEA) - A New Algorithm to Reduce Overfitting for Genomic Biomarker Discovery

    Get PDF
    Toxicogenomics promises to aid in predicting adverse effects, understanding the mechanisms of drug action or toxicity, and uncovering unexpected or secondary pharmacology. However, modeling adverse effects using high dimensional and high noise genomic data is prone to over-fitting. Models constructed from such data sets often consist of a large number of genes with no obvious functional relevance to the biological effect the model intends to predict that can make it challenging to interpret the modeling results. To address these issues, we developed a novel algorithm, Predictive Power Estimation Algorithm (PPEA), which estimates the predictive power of each individual transcript through an iterative two-way bootstrapping procedure. By repeatedly enforcing that the sample number is larger than the transcript number, in each iteration of modeling and testing, PPEA reduces the potential risk of overfitting. We show with three different cases studies that: (1) PPEA can quickly derive a reliable rank order of predictive power of individual transcripts in a relatively small number of iterations, (2) the top ranked transcripts tend to be functionally related to the phenotype they are intended to predict, (3) using only the most predictive top ranked transcripts greatly facilitates development of multiplex assay such as qRT-PCR as a biomarker, and (4) more importantly, we were able to demonstrate that a small number of genes identified from the top-ranked transcripts are highly predictive of phenotype as their expression changes distinguished adverse from nonadverse effects of compounds in completely independent tests. Thus, we believe that the PPEA model effectively addresses the over-fitting problem and can be used to facilitate genomic biomarker discovery for predictive toxicology and drug responses

    E-learning e documentazione didattica: un approccio organizzativo modulare

    No full text
    Questo lavoro presenta un modello per l’organizzazione di materiale didattico multimediale fruibile in rete e relativo ad un corso di laurea universitario. Attuando una strutturazione modulare e gerarchica, il modello classifica le risorse didattiche, costituite da componenti atomici multimediali, sia rispetto ad un insieme predefinito di categorie didattiche, sia rispetto agli argomenti fondamentali di un corso. I componenti che si riferiscono al medesimo argomento sono raggruppati in uno stesso contenitore logico, chiamato modulo didattico, la cui struttura è definita da un insieme di meta-dati contenuti in un catalogo. Tale catalogo descrive inoltre la struttura delle categorie didattiche e la combinazione dei diversi moduli in un unico corso, facilitando sia il riuso dei componenti atomici sia la scalabilità del modello

    Stability in biomarker discovery: does ensemble feature selection really help?

    No full text
    Ensemble feature selection has been recently explored as a promising paradigm to improve the stability, i.e. the robustness with respect to sample variation, of subsets of informative features extracted from high-dimensional domains including genetics and medicine. Though recent literature discusses a number of cases where ensemble approaches seem to be capable of providing more stable results, especially in the context of biomarker discovery, there is a lack of systematic studies aiming at providing insight on when, and to which extent, the use of an ensemble method is to be preferred to a simple one. Using a well-known benchmark from the genomics domain, this paper presents an empirical study which evaluates ten selection methods, representatives of different selection approaches, investigating if they get significantly more stable when used in an ensemble fashion. Results of our study provide interesting indications on benefits and limitations of the ensemble paradigm in terms of stability

    Similarity of feature selection methods: An empirical study across data intensive classification tasks

    No full text
    In the past two decades, the dimensionality of datasets involved in machine learning and data mining applications has increased explosively. Therefore, feature selection has become a necessary step to make the analysis more manageable and to extract useful knowledge about a given domain. A large variety of feature selection techniques are available in literature, and their comparative analysis is a very difficult task. So far, few studies have investigated, from a theoretical and/or experimental point of view, the degree of similarity/dissimilarity among the available techniques, namely the extent to which they tend to produce similar results within specific application contexts. This kind of similarity analysis is of crucial importance when two or more methods are combined in an ensemble fashion: indeed the ensemble paradigm is beneficial only if the involved methods are capable of giving different and complementary representations of the considered domain. This paper gives a contribution in this direction by proposing an empirical approach to evaluate the degree of consistency among the outputs of different selection algorithms in the context of high dimensional classification tasks. Leveraging on a proper similarity index, we systematically compared the feature subsets selected by eight popular selection methods, representatives of different selection approaches, and derived a similarity trend for feature subsets of increasing size. Through an extensive experimentation involving sixteen datasets from three challenging domains (Internet advertisements, text categorization and micro-array data classification), we obtained useful insight into the pattern of agreement of the considered methods. In particular, our results revealed how multivariate selection approaches systematically produce feature subsets that overlap to a small extent with those selected by the other methods

    Intelligent Bayesian Classifiers in Network Intrusion Detection

    No full text
    The aim of this paper is to explore the effectiveness of Bayesian classifiers in intrusion detection (ID). Specifically, we provide an experimental study that focuses on comparing the accuracy of different classification models showing that the Bayesian classification approach is reasonably effective and efficient in predicting attacks and in exploiting the knowledge required by a computational intelligent ID process

    A Filter-Based Evolutionary Approach for Selecting Features in High-Dimensional Micro-array Data

    No full text
    Evolutionary algorithms have received much attention in extracting knowledge on high-dimensional micro-array data, being crucial to their success a suitable definition of the search space of the potential solutions. In this paper, we present an evolutionary approach for selecting informative genes (features) to predict and diagnose cancer. We propose a procedure that combines results of filter methods, which are commonly used in the field of data mining, to reduce the search space where a genetic algorithm looks for solutions (i.e. gene subsets) with better classification performance, being the quality (fitness) of each solution evaluated by a classification method. The methodology is quite general because any classification algorithm could be incorporated as well a variety of filter methods. Extensive experiments on a public micro-array dataset are presented using four popular filter methods and SVM

    A Distributed Trust and Reputation Framework for Scientific Grids

    No full text
    Acknowledged as important factors for business environments operating as Virtual Organizations (VOs), trust and reputation are receiving attention also in Grids devoted to scientific applications where problems of finding suitable models and architectures for flexible security management of heterogeneous resources arise. Being these resources highly heterogeneous (from individual users to whole organizations or experiment tools and workflows), this paper presents a trust and reputation framework that integrates a number of information sources to produce a comprehensive evaluation of trust and reputation by clustering resources having similar capabilities of successfully executing a specific job. Here, trust and reputation are considered as Quality of Service (QoS) parameters, and are asserted on the operative context of resources, a concept expressing the resources capability of providing trusted services within collaborative scientific applications. Specifically, the framework exploits the use of distributed brokers that support interaction trust and the creation of VOs from existing scientific organizations. A broker is a distributed software module launched at some node of the Grid that makes use of resources and communicates with other brokers to perform specific reputation services. In turn, each broker contributes to maintain a dynamic and adaptive reputation assessment within the Grid in a collaborative and distributed fashion. The proposed framework is empirically implemented by adopting a SOA approach and results show its effectiveness and its possible integration in a scientific Gri

    Knowledge Discovery in Gene Expression Data via Evolutionary Algorithms

    No full text
    Methods currently used for micro-array data classification aim to select a minimum subset of features, namely a predictor, that is necessary to construct a classifier of best accuracy. Although effective, they lack in facing the primary goal of domain experts that are interested in detecting different groups of biologically relevant markers. In this paper, we present and test a framework which aims to provide different subsets of relevant genes. It considers initial gene filtering to define a set of feature spaces each of ones is further refined by taking advantage from a genetic algorithm. Experiments show that the overall process results in a certain number of predictors with high classification accuracy. Compared to state-of-art feature selection algorithms, the proposed framework consistently generates better feature subsets and keeps improving the quality of selected subsets in terms of accuracy and size
    • …
    corecore