Search CORE

14 research outputs found

Surveying alignment-free features for Ortholog detection in related yeast proteomes by using supervised big data classifiers

Author: Agüero-Chapin Guillermin
Antunes Agostinho
Fernández Alberto
Galpert Deborah
Herrera Triguero Francisco
Molina-Ruiz Reinaldo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Abstract Background: The development of new ortholog detection algorithms and the improvement of existing ones are of major importance in functional genomics. We have previously introduced a successful supervised pairwise ortholog classification approach implemented in a big data platform that considered several pairwise protein features and the low ortholog pair ratios found between two annotated proteomes (Galpert, D et al., BioMed Research International, 2015). The supervised models were built and tested using a Saccharomycete yeast benchmark dataset proposed by Salichos and Rokas (2011). Despite several pairwise protein features being combined in a supervised big data approach; they all, to some extent were alignment-based features and the proposed algorithms were evaluated on a unique test set. Here, we aim to evaluate the impact of alignment-free features on the performance of supervised models implemented in the Spark big data platform for pairwise ortholog detection in several related yeast proteomes. Results: The Spark Random Forest and Decision Trees with oversampling and undersampling techniques, and built with only alignment-based similarity measures or combined with several alignment-free pairwise protein features showed the highest classification performance for ortholog detection in three yeast proteome pairs. Although such supervised approaches outperformed traditional methods, there were no significant differences between the exclusive use of alignment-based similarity measures and their combination with alignment-free features, even within the twilight zone of the studied proteomes. Just when alignment-based and alignment-free features were combined in Spark Decision Trees with imbalance management, a higher success rate (98.71%) within the twilight zone could be achieved for a yeast proteome pair that underwent a whole genome duplication. The feature selection study showed that alignment-based features were top-ranked for the best classifiers while the runners-up were alignment-free features related to amino acid composition. Conclusions: The incorporation of alignment-free features in supervised big data models did not significantly improve ortholog detection in yeast proteomes regarding the classification qualities achieved with just alignment-based similarity measures. However, the similarity of their classification performance to that of traditional ortholog detection methods encourages the evaluation of other alignment-free protein pair descriptors in future research.This work was supported by the following financial sources: Postdoc fellowship (SFRH/BPD/92978/2013) granted to GACh by the Portuguese Fundação para a Ciência e a Tecnologia (FCT). AA was supported by the MarInfo – Integrated Platform for Marine Data Acquisition and Analysis (reference NORTE-01-0145-FEDER-000031), a project supported by the North Portugal Regional Operational Program (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERDF)

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Directory of Open Access Journals

Repositorio Institucional Universidad de Granada

Repositório Aberto da Universidade do Porto

Fragmentación primaria en la combustión en lecho fluidizado de pellets de serrín

Author: de León Benítez Juan
Freaza Nora
Galpert Cañizares Deborah
Gonzalez Suarez Erenio
Matiauda Mario
Rivero Marta
Publication venue: Blanquerna - Universitat Ramon Llull
Publication date: 01/01/2011
Field of study

Parece ser que La combustión en lecho fluidizado tiene buenas perspectivas dentro de las opciones tecnológicas para la generación de energía a partir de un combustible, dada su flexibilidad respecto a los combustibles a emplear como sus posibilidades de operación limpia y eficiente, junto a la posibilidad de cambio de escala. En este artículo se exponen los resultados alcanzados a escala de planta piloto en el funcionamiento de un reactor de lecho fluidizado en la combustión de pellets de serrín con vistas a su aplicación en el aprovechamiento de los residuos sólidos en el proceso de producción de bioetanol de lignocelulósicos

Revistes Catalanes amb Accés Obert

Big Data Supervised Pairwise Ortholog Detection in Yeasts

Author: Agüero-Chapin Guillermin
Antunes Agostinho
Cañizares Deborah Galpert
Gallardo Evys Ancede
Herrera Francisco
Río García Sara del
Publication venue: 'IntechOpen'
Publication date: 08/11/2017
Field of study

Ortholog are genes in different species, evolving from a common ancestor. Ortholog detection is essential to study phylogenies and to predict the function of unknown genes. The scalability of gene (or protein) pairwise comparisons and that of the classification process constitutes a challenge due to the ever-increasing amount of sequenced genomes. Ortholog detection algorithms, just based on sequence similarity, tend to fail in classification, specifically, in Saccharomycete yeasts with rampant paralogies and gene losses. In this book chapter, a new classification approach has been proposed based on the combination of pairwise similarity measures in a decision system that consider the extreme imbalance between ortholog and non-ortholog pairs. Some new gene pair similarity measures are defined based on protein physicochemical profiles, gene pair membership to conserved regions in related genomes, and protein lengths. The efficiency and scalability of the calculation of these measures are analyzed to propose its implementation for big data. In conclusion, evaluated supervised algorithms that manage big and imbalanced data showed high effectiveness in Saccharomycete yeast genomes

IntechOpen

An Effective Big Data Supervised Imbalanced Classification Approach for Ortholog Detection in Related Yeast Species

Author: Agostinho Antunes
Deborah Galpert
Evys Ancede-Gallardo
Francisco Herrera
Guillermin Agüero-Chapin
Sara Del Río
Publication venue
Publication date: 03/04/2020
Field of study

Orthology detection requires more effective scaling algorithms. In this paper, a set of gene pair features based on similarity measures (alignment scores, sequence length, gene membership to conserved regions, and physicochemical profiles) are combined in a supervised pairwise ortholog detection approach to improve effectiveness considering low ortholog ratios in relation to the possible pairwise comparison between two genomes. In this scenario, big data supervised classifiers managing imbalance between ortholog and nonortholog pair classes allow for an effective scaling solution built from two genomes and extended to other genome pairs. The supervised approach was compared with RBH, RSD, and OMA algorithms by using the following yeast genome pairs: Saccharomyces cerevisiae-Kluyveromyces lactis, Saccharomyces cerevisiae-Candida glabrata, and Saccharomyces cerevisiaeSchizosaccharomyces pombe as benchmark datasets. Because of the large amount of imbalanced data, the building and testing of the supervised model were only possible by using big data supervised classifiers managing imbalance. Evaluation metrics taking low ortholog ratios into account were applied. From the effectiveness perspective, MapReduce Random Oversampling combined with Spark SVM outperformed RBH, RSD, and OMA, probably because of the consideration of gene pair features beyond alignment similarities combined with the advances in big data supervised classification

CiteSeerX

Inclusión en Weka de filtros basados en conjuntos aproximados para bases desbalanceadas

Author: Addel Arnaldo Goya Jorge
Deborah Galpert Cañizares
Felipe Antonio Enríquez Rodríguez
Publication venue: Universidad Pablo de Olavide
Publication date: 01/12/2022
Field of study

Directory of Open Access Journals

Inclusión en Weka de filtros basados en conjuntos aproximados para bases desbalanceadas (Inclusion of filters in Weka based in rough sets for imbalanced bases)

Author: Addel Arnaldo Goya Jorge
Deborah Galpert Cañizares
Felipe Antonio Enríquez Rodríguez
Publication venue: 'Universidad Pablo de Olavide'
Publication date: 01/12/2015
Field of study

Spanish abstract. El problema de desbalance en la clasificación se presenta en conjuntos de datos que tienen una cantidad grande de datos de cierto tipo (clase mayoritaria), mientras que el número de datos del tipo contrario es considerablemente menor (clase minoritaria). En este artículo se hace un breve resumen de la teoría de conjuntos aproximados basados en relaciones de similitud para su utilización en la implementación en Weka de tres filtros para tratar el problema de desbalance de clases. Luego se realiza un análisis de los resultados en dos conjuntos de datos para probar su validación, obteniéndose resultados satisfactorios. English abstract The class imbalance problem is shown in datasets which have a great amount of data of a certain type (majority class), whilst in the case of the contrary data type it is considerably less (minority class). In this paper, a brief summary of the rough set theory is made based in similarity relations for its use on three filters Weka for class imbalance management. Finally, an analysis of the results in both sets of data is made in order to prove its validation, obtaining satisfying results

Directory of Open Access Journals

Inclusión en Weka de filtros basados en conjuntos aproximados para bases desbalanceadas (Inclusion of filters in Weka based in rough sets for imbalanced bases)

Author: Enríquez Rodríguez Felipe Antonio
Galpert Cañizares Deborah
Goya Jorge Addel Arnaldo
Publication venue: gecontec.org - Cátedra UNESCO en Gestión de Información en las Organizaciones (La Habana)
Publication date: 01/12/2015
Field of study

El problema de desbalance en la clasificación se presenta en conjuntos de datos que tienen una cantidad grande de datos de cierto tipo (clase mayoritaria), mientras que el número de datos del tipo contrario es considerablemente menor (clase minoritaria). En este artículo se hace un breve resumen de la teoría de conjuntos aproximados basados en relaciones de similitud para su utilización en la implementación en Weka de tres filtros para tratar el problema de desbalance de clases. Luego se realiza un análisis de los resultados en dos conjuntos de datos para probar su validación, obteniéndose resultados satisfactorios.English abstractThe class imbalance problem is shown in datasets which have a great amount of data of a certain type (majority class), whilst in the case of the contrary data type it is considerably less (minority class). In this paper, a brief summary of the rough set theory is made based in similarity relations for its use on three filters Weka for class imbalance management. Finally, an analysis of the results in both sets of data is made in order to prove its validation, obtaining satisfying results

Directory of Open Access Journals

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Revistas UPO (Universidad Pablo de Olivade)

An Effective Big Data Supervised Imbalanced Classification Approach for Ortholog Detection in Related Yeast Species

Author: Agostinho Antunes
Deborah Galpert
Evys Ancede-Gallardo
Francisco Herrera
Guillermin Agüero-Chapin
Sara del Río
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2015
Field of study

Crossref

Directory of Open Access Journals

Repositorio Institucional Universidad de Granada

PubMed Central

Algunas aplicaciones de la estructura booleana del código genético

Author: Deborah Galpert Cañizares
Eberto Morgado Morales
Gladys Casas Cardoso
Maria del Carmen Chávez
Ricardo Grau Ábalo
Robersy Sánchez Rodríguez
Publication venue: Universidad de Ciencias Informáticas
Publication date: 01/10/2011
Field of study

Las estructuras boolenas del código genético constituyen modelos matemáticos minimales y muy simplificados que nos ayudan a comprender mejor la lógica subyacente del código genético. Más específicamente, estas estructuras reflejan una fuerte conexión entre los órdenes del código genético y las propiedades físico-químicas de los aminoácidos. En este artículo presentamos dos aplicaciones de esta estructura algebraica en problemas típicos de Bioinformática. El primer es el de la clasificación de las mutaciones de una proteína dada. El siguiente es un caso particular del problema de predicción de estructura secundaria. Usamos además técnicas estadísticas y de inteligencia artificial en la solución de ellos

Directory of Open Access Journals

Emerging Computational Approaches for Antimicrobial Peptide Discovery

Author: Agostinho Antunes
Dany Domínguez-Pérez
Deborah Galpert-Cañizares
Gisselle Pérez-Machado
Guillermin Agüero-Chapin
Marta Teijeira
Yovani Marrero-Ponce
Publication venue: 'MDPI AG'
Publication date: 01/07/2022
Field of study

In the last two decades many reports have addressed the application of artificial intelligence (AI) in the search and design of antimicrobial peptides (AMPs). AI has been represented by machine learning (ML) algorithms that use sequence-based features for the discovery of new peptidic scaffolds with promising biological activity. From AI perspective, evolutionary algorithms have been also applied to the rational generation of peptide libraries aimed at the optimization/design of AMPs. However, the literature has scarcely dedicated to other emerging non-conventional in silico approaches for the search/design of such bioactive peptides. Thus, the first motivation here is to bring up some non-standard peptide features that have been used to build classical ML predictive models. Secondly, it is valuable to highlight emerging ML algorithms and alternative computational tools to predict/design AMPs as well as to explore their chemical space. Another point worthy of mention is the recent application of evolutionary algorithms that actually simulate sequence evolution to both the generation of diversity-oriented peptide libraries and the optimization of hit peptides. Last but not least, included here some new considerations in proteogenomic analyses currently incorporated into the computational workflow for unravelling AMPs in natural sources

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

PubMed Central