19 research outputs found
A very simple and fast way to access and validate algorithms in reproducible research
The reproducibility of research in bioinformatics refers to the notion that new methodologies/ algorithms and scientific claims have to be published together with their data and source code, in a way that other researchers may verify the findings to further build more knowledge upon them. The replication and corroboration of research results are key to the scientific process and many journals are discussing the matter nowadays, taking concrete steps in this direction. In this journal itself, a very recent opinion note has appeared highlighting the increasing importance of this topic in bioinformatics and computational biology, inviting the community to further discuss the matter. In agreement with that article, we would like to propose here another step into that direction with a tool that allows the automatic generation of a web interface, named web-demo, directly from source code in a very simple and straightforward way. We believe this contribution can help make research not only reproducible but also more easily accessible. A web-demo associated to a published paper can accelerate an algorithm validation with real data, wide-spreading its use with just a few clicks.Fil: Stegmayer, Georgina. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de IngenierÃa y Ciencias HÃdricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Pividori, Milton Damián. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de IngenierÃa y Ciencias HÃdricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Milone, Diego Humberto. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de IngenierÃa y Ciencias HÃdricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentin
A Method to Improve the Analysis of Cluster Ensembles
Clustering is fundamental to understand the structure of data. In the past decade the cluster ensembleproblem has been introduced, which combines a set of partitions (an ensemble) of the data to obtain a singleconsensus solution that outperforms all the ensemble members. However, there is disagreement about which arethe best ensemble characteristics to obtain a good performance: some authors have suggested that highly differentpartitions within the ensemble are beneï¬ cial for the ï¬ nal performance, whereas others have stated that mediumdiversity among them is better. While there are several measures to quantify the diversity, a better method toanalyze the best ensemble characteristics is necessary. This paper introduces a new ensemble generation strategyand a method to make slight changes in its structure. Experimental results on six datasets suggest that this isan important step towards a more systematic approach to analyze the impact of the ensemble characteristics onthe overall consensus performance.Fil: Pividori, Milton Damián. Universidad Tecnologica Nacional. Facultad Regional Santa Fe. Centro de Investigacion y Desarrollo de Ingenieria en Sistemas de Informacion; Argentina. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de IngenierÃa y Ciencias HÃdricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Stegmayer, Georgina. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de IngenierÃa y Ciencias HÃdricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina. Universidad Tecnologica Nacional. Facultad Regional Santa Fe. Centro de Investigacion y Desarrollo de Ingenieria en Sistemas de Informacion; ArgentinaFil: Milone, Diego Humberto. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de IngenierÃa y Ciencias HÃdricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentin
A Novel Method to Control the Diversity in Cluster Ensembles
Clustering is fundamental to understand the structure of data. In the past decade the cluster ensemble problem has been introduced, which combines a set of partitions (an ensemble) of the data to obtain a single consensus solution that outperforms all the ensemble members. Although disagreement among ensemble partitions (diversity) has been found to be fundamental for success, the literature has arrived to confusing conclusions: some authors suggest that high diversity is beneficial for the final performance, whereas others have indicated that medium is better. While there are several options to measure the diversity, there is no method to control it. This paper introduces a new ensemble generation strategy and a method to smoothly change the ensemble diversity.
Experimental results on three datasets suggest that this is an important step towards a more systematic approach to analyze the impact of the ensemble diversity on the overall consensus performance.Sociedad Argentina de Informática e Investigación Operativ
Cluster Ensembles for Big Data Mining Problems
Mining big data involves several problems and new challenges, in addition to the huge volume of information. One the one hand, these data generally come from autonomous and decentralized sources, thus its dimensionality is heterogeneous and diverse, and generally involves privacy issues. On the other hand, algorithms for mining data such as clustering methods, have particular characteristics that make them useful for different types of data mining problems.
Due to the huge amount of information, the task of choosing a single clustering approach becomes even more difficult. For instance, k-means, a very popular algorithm, always assumes spherical clusters in data; hierarchical approaches can be used when there is interest in finding this type of structure; expectationmaximization iteratively adjusts the parameters of a statistical model to fit the observed data. Moreover, all these methods work properly only with relatively small data sets. Large-volume data often make their application unfeasible, not to mention if data come from autonomous sources that are constantly growing and evolving.
In the last years, a new clustering approach has emerged, called consensus clustering or cluster ensembles. Instead of running a single algorithm, this approach produces, at first, a set of data partitions (ensemble) by employing different clustering techniques on the same original data set. Then, this ensemble is processed by a consensus function, which produces a single consensus partition that outperforms individual solutions in the input ensemble. This approach has been successfully employed for distributed data mining, what makes it very interesting and applicable in the big data context. Although many techniques have been proposed for large data sets, most of them mainly focus on making individual components more efficient, instead of improving the whole consensus approach for the case of big data.Sociedad Argentina de Informática e Investigación Operativa (SADIO
A Novel Method to Control the Diversity in Cluster Ensembles
Clustering is fundamental to understand the structure of data. In the past decade the cluster ensemble problem has been introduced, which combines a set of partitions (an ensemble) of the data to obtain a single consensus solution that outperforms all the ensemble members. Although disagreement among ensemble partitions (diversity) has been found to be fundamental for success, the literature has arrived to confusing conclusions: some authors suggest that high diversity is beneficial for the final performance, whereas others have indicated that medium is better. While there are several options to measure the diversity, there is no method to control it. This paper introduces a new ensemble generation strategy and a method to smoothly change the ensemble diversity.
Experimental results on three datasets suggest that this is an important step towards a more systematic approach to analyze the impact of the ensemble diversity on the overall consensus performance.Sociedad Argentina de Informática e Investigación Operativ
Cluster Ensembles for Big Data Mining Problems
Mining big data involves several problems and new challenges, in addition to the huge volume of information. One the one hand, these data generally come from autonomous and decentralized sources, thus its dimensionality is heterogeneous and diverse, and generally involves privacy issues. On the other hand, algorithms for mining data such as clustering methods, have particular characteristics that make them useful for different types of data mining problems.
Due to the huge amount of information, the task of choosing a single clustering approach becomes even more difficult. For instance, k-means, a very popular algorithm, always assumes spherical clusters in data; hierarchical approaches can be used when there is interest in finding this type of structure; expectationmaximization iteratively adjusts the parameters of a statistical model to fit the observed data. Moreover, all these methods work properly only with relatively small data sets. Large-volume data often make their application unfeasible, not to mention if data come from autonomous sources that are constantly growing and evolving.
In the last years, a new clustering approach has emerged, called consensus clustering or cluster ensembles. Instead of running a single algorithm, this approach produces, at first, a set of data partitions (ensemble) by employing different clustering techniques on the same original data set. Then, this ensemble is processed by a consensus function, which produces a single consensus partition that outperforms individual solutions in the input ensemble. This approach has been successfully employed for distributed data mining, what makes it very interesting and applicable in the big data context. Although many techniques have been proposed for large data sets, most of them mainly focus on making individual components more efficient, instead of improving the whole consensus approach for the case of big data.Sociedad Argentina de Informática e Investigación Operativa (SADIO
Diversity control for improving the analysis of consensus clustering
Consensus clustering has emerged as a powerful technique for obtaining better clustering results, where a set of data partitions (ensemble) are generated, which are then combined to obtain a consolidated solution (consensus partition) that outperforms all of the members of the input set. The diversity of ensemble partitions has been found to be a key aspect for obtaining good results, but the conclusions of previous studies are contradictory. Therefore, ensemble diversity analysis is currently an important issue because there are no methods for smoothly changing the diversity of an ensemble, which makes it very difficult to study the impact of ensemble diversity on consensus results. Indeed, ensembles with similar diversity can have very different properties, thereby producing a consensus function with unpredictable behavior. In this study, we propose a novel method for increasing and decreasing the diversity of data partitions in a smooth manner by adjusting a single parameter, thereby achieving fine-grained control of ensemble diversity. The results obtained using well-known data sets indicate that the proposed method is effective for controlling the dissimilarity among ensemble members to obtain a consensus function with smooth behavior. This method is important for facilitating the analysis of the impact of ensemble diversity in consensus clustering.Fil: Pividori, Milton Damián. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de IngenierÃa y Ciencias HÃdricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina. Universidad Tecnológica Nacional. Facultad Regional Santa Fe. Centro de Investigación y Desarrollo de IngenierÃa en Sistemas de Información; ArgentinaFil: Stegmayer, Georgina. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de IngenierÃa y Ciencias HÃdricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Milone, Diego Humberto. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de IngenierÃa y Ciencias HÃdricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentin
Cluster Ensembles for Big Data Mining Problems
Mining big data involves several problems and new challenges, in addition to the huge volume of information. One the one hand, these data generally come from autonomous and decentralized sources, thus its dimensionality is heterogeneous and diverse, and generally involves privacy issues. On the other hand, algorithms for mining data such as clustering methods, have particular characteristics that make them useful for different types of data mining problems.
Due to the huge amount of information, the task of choosing a single clustering approach becomes even more difficult. For instance, k-means, a very popular algorithm, always assumes spherical clusters in data; hierarchical approaches can be used when there is interest in finding this type of structure; expectationmaximization iteratively adjusts the parameters of a statistical model to fit the observed data. Moreover, all these methods work properly only with relatively small data sets. Large-volume data often make their application unfeasible, not to mention if data come from autonomous sources that are constantly growing and evolving.
In the last years, a new clustering approach has emerged, called consensus clustering or cluster ensembles. Instead of running a single algorithm, this approach produces, at first, a set of data partitions (ensemble) by employing different clustering techniques on the same original data set. Then, this ensemble is processed by a consensus function, which produces a single consensus partition that outperforms individual solutions in the input ensemble. This approach has been successfully employed for distributed data mining, what makes it very interesting and applicable in the big data context. Although many techniques have been proposed for large data sets, most of them mainly focus on making individual components more efficient, instead of improving the whole consensus approach for the case of big data.Sociedad Argentina de Informática e Investigación Operativa (SADIO
A Novel Method to Control the Diversity in Cluster Ensembles
Clustering is fundamental to understand the structure of data. In the past decade the cluster ensemble problem has been introduced, which combines a set of partitions (an ensemble) of the data to obtain a single consensus solution that outperforms all the ensemble members. Although disagreement among ensemble partitions (diversity) has been found to be fundamental for success, the literature has arrived to confusing conclusions: some authors suggest that high diversity is beneficial for the final performance, whereas others have indicated that medium is better. While there are several options to measure the diversity, there is no method to control it. This paper introduces a new ensemble generation strategy and a method to smoothly change the ensemble diversity.
Experimental results on three datasets suggest that this is an important step towards a more systematic approach to analyze the impact of the ensemble diversity on the overall consensus performance.Sociedad Argentina de Informática e Investigación Operativ
Polygenic transcriptome risk scores (PTRS) can improve portability of polygenic risk scores across ancestries
Background: Polygenic risk scores (PRS) are valuable to translate the results of genome-wide association studies (GWAS) into clinical practice. To date, most GWAS have been based on individuals of European-ancestry leading to poor performance in populations of non-European ancestry. Results: We introduce the polygenic transcriptome risk score (PTRS), which is based on predicted transcript levels (rather than SNPs), and explore the portability of PTRS across populations using UK Biobank data. Conclusions: We show that PTRS has a significantly higher portability (Wilcoxon p=0.013) in the African-descent samples where the loss of performance is most acute with better performance than PRS when used in combination