45 research outputs found

    A random forest approach to the detection of epistatic interactions in case-control studies

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The key roles of epistatic interactions between multiple genetic variants in the pathogenesis of complex diseases notwithstanding, the detection of such interactions remains a great challenge in genome-wide association studies. Although some existing multi-locus approaches have shown their successes in small-scale case-control data, the "combination explosion" course prohibits their applications to genome-wide analysis. It is therefore indispensable to develop new methods that are able to reduce the search space for epistatic interactions from an astronomic number of all possible combinations of genetic variants to a manageable set of candidates.</p> <p>Results</p> <p>We studied case-control data from the viewpoint of binary classification. More precisely, we treated single nucleotide polymorphism (SNP) markers as categorical features and adopted the random forest to discriminate cases against controls. On the basis of the gini importance given by the random forest, we designed a sliding window sequential forward feature selection (SWSFS) algorithm to select a small set of candidate SNPs that could minimize the classification error and then statistically tested up to three-way interactions of the candidates. We compared this approach with three existing methods on three simulated disease models and showed that our approach is comparable to, sometimes more powerful than, the other methods. We applied our approach to a genome-wide case-control dataset for Age-related Macular Degeneration (AMD) and successfully identified two SNPs that were reported to be associated with this disease.</p> <p>Conclusion</p> <p>Besides existing pure statistical approaches, we demonstrated the feasibility of incorporating machine learning methods into genome-wide case-control studies. The gini importance offers yet another measure for the associations between SNPs and complex diseases, thereby complementing existing statistical measures to facilitate the identification of epistatic interactions and the understanding of epistasis in the pathogenesis of complex diseases.</p

    Bias in random forest variable importance measures: Illustrations, sources and a solution

    Get PDF
    BACKGROUND: Variable importance measures for random forests have been receiving increased attention as a means of variable selection in many classification tasks in bioinformatics and related scientific fields, for instance to select a subset of genetic markers relevant for the prediction of a certain disease. We show that random forest variable importance measures are a sensible means for variable selection in many applications, but are not reliable in situations where potential predictor variables vary in their scale of measurement or their number of categories. This is particularly important in genomics and computational biology, where predictors often include variables of different types, for example when predictors include both sequence data and continuous variables such as folding energy, or when amino acid sequence data show different numbers of categories. RESULTS: Simulation studies are presented illustrating that, when random forest variable importance measures are used with data of varying types, the results are misleading because suboptimal predictor variables may be artificially preferred in variable selection. The two mechanisms underlying this deficiency are biased variable selection in the individual classification trees used to build the random forest on one hand, and effects induced by bootstrap sampling with replacement on the other hand. CONCLUSION: We propose to employ an alternative implementation of random forests, that provides unbiased variable selection in the individual classification trees. When this method is applied using subsampling without replacement, the resulting variable importance measures can be used reliably for variable selection even in situations where the potential predictor variables vary in their scale of measurement or their number of categories. The usage of both random forest algorithms and their variable importance measures in the R system for statistical computing is illustrated and documented thoroughly in an application re-analyzing data from a study on RNA editing. Therefore the suggested method can be applied straightforwardly by scientists in bioinformatics research

    Is EC class predictable from reaction mechanism?

    Get PDF
    We thank the Scottish Universities Life Sciences Alliance (SULSA) and the Scottish Overseas Research Student Awards Scheme of the Scottish Funding Council (SFC) for financial support.Background: We investigate the relationships between the EC (Enzyme Commission) class, the associated chemical reaction, and the reaction mechanism by building predictive models using Support Vector Machine (SVM), Random Forest (RF) and k-Nearest Neighbours (kNN). We consider two ways of encoding the reaction mechanism in descriptors, and also three approaches that encode only the overall chemical reaction. Both cross-validation and also an external test set are used. Results: The three descriptor sets encoding overall chemical transformation perform better than the two descriptions of mechanism. SVM and RF models perform comparably well; kNN is less successful. Oxidoreductases and hydrolases are relatively well predicted by all types of descriptor; isomerases are well predicted by overall reaction descriptors but not by mechanistic ones. Conclusions: Our results suggest that pairs of similar enzyme reactions tend to proceed by different mechanisms. Oxidoreductases, hydrolases, and to some extent isomerases and ligases, have clear chemical signatures, making them easier to predict than transferases and lyases. We find evidence that isomerases as a class are notably mechanistically diverse and that their one shared property, of substrate and product being isomers, can arise in various unrelated ways. The performance of the different machine learning algorithms is in line with many cheminformatics applications, with SVM and RF being roughly equally effective. kNN is less successful, given the role that non-local information plays in successful classification. We note also that, despite a lack of clarity in the literature, EC number prediction is not a single problem; the challenge of predicting protein function from available sequence data is quite different from assigning an EC classification from a cheminformatics representation of a reaction.Publisher PDFPeer reviewe

    Anyone with a Long-Face? Craniofacial Evolutionary Allometry (CREA) in a Family of Short-Faced Mammals, the Felidae

    Get PDF
    Among adults of closely related species, a trend in craniofacial evolutionary allometry (CREA) for larger taxa to be long-faced and smaller ones to have paedomorphic aspects, such as proportionally smaller snouts and larger braincases, has been demonstrated in some mammals and two bird lineages. Nevertheless, whether this may represent a ‘rule’ with few exceptions is still an open question. In this context, Felidae is a particularly interesting family to study because, although its members are short-faced, previous research did suggest relative facial elongation in larger living representatives. Using geometric morphometrics, based on two sets of anatomical landmarks, and traditional morphometrics, for comparing relative lengths of the palate and basicranium, we performed a series of standard and comparative allometric regressions in the Felidae and its two subfamilies. All analyses consistently supported the CREA pattern, with only one minor exception in the geometric morphometric analysis of Pantherinae: the genus Neofelis. With its unusually long canines, Neofelis species seem to have a relatively narrow cranium and long face, despite being smaller than other big cats. In spite of this, overall, our findings strengthen the possibility that the CREA pattern might indeed be a ‘rule’ among mammals, raising questions on the processes behind it and suggesting future directions for its study

    Anti-tumour necrosis factor discontinuation in inflammatory bowel disease patients in remission: study protocol of a prospective, multicentre, randomized clinical trial

    Get PDF
    Background: Patients with inflammatory bowel disease who achieve remission with anti-tumour necrosis factor (anti-TNF) drugs may have treatment withdrawn due to safety concerns and cost considerations, but there is a lack of prospective, controlled data investigating this strategy. The primary study aim is to compare the rates of clinical remission at 1?year in patients who discontinue anti-TNF treatment versus those who continue treatment. Methods: This is an ongoing, prospective, double-blind, multicentre, randomized, placebo-controlled study in patients with Crohn?s disease or ulcerative colitis who have achieved clinical remission for ?6?months with an anti-TNF treatment and an immunosuppressant. Patients are being randomized 1:1 to discontinue anti-TNF therapy or continue therapy. Randomization stratifies patients by the type of inflammatory bowel disease and drug (infliximab versus adalimumab) at study inclusion. The primary endpoint of the study is sustained clinical remission at 1?year. Other endpoints include endoscopic and radiological activity, patient-reported outcomes (quality of life, work productivity), safety and predictive factors for relapse. The required sample size is 194 patients. In addition to the main analysis (discontinuation versus continuation), subanalyses will include stratification by type of inflammatory bowel disease, phenotype and previous treatment. Biological samples will be obtained to identify factors predictive of relapse after treatment withdrawal. Results: Enrolment began in 2016, and the study is expected to end in 2020. Conclusions: This study will contribute prospective, controlled data on outcomes and predictors of relapse in patients with inflammatory bowel disease after withdrawal of anti-TNF agents following achievement of clinical remission. Clinical trial reference number: EudraCT 2015-001410-1

    Gene selection for cancer classification with the help of bees

    Full text link

    Taking the pulse of Earth's tropical forests using networks of highly distributed plots

    Get PDF
    Tropical forests are the most diverse and productive ecosystems on Earth. While better understanding of these forests is critical for our collective future, until quite recently efforts to measure and monitor them have been largely disconnected. Networking is essential to discover the answers to questions that transcend borders and the horizons of funding agencies. Here we show how a global community is responding to the challenges of tropical ecosystem research with diverse teams measuring forests tree-by-tree in thousands of long-term plots. We review the major scientific discoveries of this work and show how this process is changing tropical forest science. Our core approach involves linking long-term grassroots initiatives with standardized protocols and data management to generate robust scaled-up results. By connecting tropical researchers and elevating their status, our Social Research Network model recognises the key role of the data originator in scientific discovery. Conceived in 1999 with RAINFOR (South America), our permanent plot networks have been adapted to Africa (AfriTRON) and Southeast Asia (T-FORCES) and widely emulated worldwide. Now these multiple initiatives are integrated via ForestPlots.net cyber-infrastructure, linking colleagues from 54 countries across 24 plot networks. Collectively these are transforming understanding of tropical forests and their biospheric role. Together we have discovered how, where and why forest carbon and biodiversity are responding to climate change, and how they feedback on it. This long-term pan-tropical collaboration has revealed a large long-term carbon sink and its trends, as well as making clear which drivers are most important, which forest processes are affected, where they are changing, what the lags are, and the likely future responses of tropical forests as the climate continues to change. By leveraging a remarkably old technology, plot networks are sparking a very modern revolution in tropical forest science. In the future, humanity can benefit greatly by nurturing the grassroots communities now collectively capable of generating unique, long-term understanding of Earth's most precious forests.Additional co-authors: Susan Laurance, William Laurance, Francoise Yoko Ishida, Andrew Marshall, Catherine Waite, Hannsjoerg Woell, Jean-Francois Bastin, Marijn Bauters, Hans Beeckman, Pfascal Boeckx, Jan Bogaert, Charles De Canniere, Thales de Haulleville, Jean-Louis Doucet, Olivier Hardy, Wannes Hubau, Elizabeth Kearsley, Hans Verbeeck, Jason Vleminckx, Steven W. Brewer, Alfredo Alarcón, Alejandro Araujo-Murakami, Eric Arets, Luzmila Arroyo, Ezequiel Chavez, Todd Fredericksen, René Guillén Villaroel, Gloria Gutierrez Sibauty, Timothy Killeen, Juan Carlos Licona, John Lleigue, Casimiro Mendoza, Samaria Murakami, Alexander Parada Gutierrez, Guido Pardo, Marielos Peña-Claros, Lourens Poorter, Marisol Toledo, Jeanneth Villalobos Cayo, Laura Jessica Viscarra, Vincent Vos, Jorge Ahumada, Everton Almeida, Jarcilene Almeida, Edmar Almeida de Oliveira, Wesley Alves da Cruz, Atila Alves de Oliveira, Fabrício Alvim Carvalho, Flávio Amorim Obermuller, Ana Andrade, Fernanda Antunes Carvalho, Simone Aparecida Vieira, Ana Carla Aquino, Luiz Aragão, Ana Claudia Araújo, Marco Antonio Assis, Jose Ataliba Mantelli Aboin Gomes, Fabrício Baccaro, Plínio Barbosa de Camargo, Paulo Barni, Jorcely Barroso, Luis Carlos Bernacci, Kauane Bordin, Marcelo Brilhante de Medeiros, Igor Broggio, José Luís Camargo, Domingos Cardoso, Maria Antonia Carniello, Andre Luis Casarin Rochelle, Carolina Castilho, Antonio Alberto Jorge Farias Castro, Wendeson Castro, Sabina Cerruto Ribeiro, Flávia Costa, Rodrigo Costa de Oliveira, Italo Coutinho, John Cunha, Lola da Costa, Lucia da Costa Ferreira, Richarlly da Costa Silva, Marta da Graça Zacarias Simbine, Vitor de Andrade Kamimura, Haroldo Cavalcante de Lima, Lia de Oliveira Melo, Luciano de Queiroz, José Romualdo de Sousa Lima, Mário do Espírito Santo, Tomas Domingues, Nayane Cristina dos Santos Prestes, Steffan Eduardo Silva Carneiro, Fernando Elias, Gabriel Eliseu, Thaise Emilio, Camila Laís Farrapo, Letícia Fernandes, Gustavo Ferreira, Joice Ferreira, Leandro Ferreira, Socorro Ferreira, Marcelo Fragomeni Simon, Maria Aparecida Freitas, Queila S. García, Angelo Gilberto Manzatto, Paulo Graça, Frederico Guilherme, Eduardo Hase, Niro Higuchi, Mariana Iguatemy, Reinaldo Imbrozio Barbosa, Margarita Jaramillo, Carlos Joly, Joice Klipel, Iêda Leão do Amaral, Carolina Levis, Antonio S. Lima, Maurício Lima Dan, Aline Lopes, Herison Madeiros, William E. Magnusson, Rubens Manoel dos Santos, Beatriz Marimon, Ben Hur Marimon Junior, Roberta Marotti Martelletti Grillo, Luiz Martinelli, Simone Matias Reis, Salomão Medeiros, Milton Meira-Junior, Thiago Metzker, Paulo Morandi, Natanael Moreira do Nascimento, Magna Moura, Sandra Cristina Müller, Laszlo Nagy, Henrique Nascimento, Marcelo Nascimento, Adriano Nogueira Lima, Raimunda Oliveira de Araújo, Jhonathan Oliveira Silva, Marcelo Pansonato, Gabriel Pavan Sabino, Karla Maria Pedra de Abreu, Pablo José Francisco Pena Rodrigues, Maria Piedade, Domingos Rodrigues, José Roberto Rodrigues Pinto, Carlos Quesada, Eliana Ramos, Rafael Ramos, Priscyla Rodrigues, Thaiane Rodrigues de Sousa, Rafael Salomão, Flávia Santana, Marcos Scaranello, Rodrigo Scarton Bergamin, Juliana Schietti, Jochen Schöngart, Gustavo Schwartz, Natalino Silva, Marcos Silveira, Cristiana Simão Seixas, Marta Simbine, Ana Claudia Souza, Priscila Souza, Rodolfo Souza, Tereza Sposito, Edson Stefani Junior, Julio Daniel do Vale, Ima Célia Guimarães Vieira, Dora Villela, Marcos Vital, Haron Xaud, Katia Zanini, Charles Eugene Zartman, Nur Khalish Hafizhah Ideris, Faizah binti Hj Metali, Kamariah Abu Salim, Muhd Shahruney Saparudin, Rafizah Mat Serudin, Rahayu Sukmaria Sukri, Serge Begne, George Chuyong, Marie Noel Djuikouo, Christelle Gonmadje, Murielle Simo-Droissart, Bonaventure Sonké, Hermann Taedoumg, Lise Zemagho, Sean Thomas, Fidèle Baya, Gustavo Saiz, Javier Silva Espejo, Dexiang Chen, Alan Hamilton, Yide Li, Tushou Luo, Shukui Niu, Han Xu, Zhang Zhou, Esteban Álvarez-Dávila, Juan Carlos Andrés Escobar, Henry Arellano-Peña, Jaime Cabezas Duarte, Jhon Calderón, Lina Maria Corrales Bravo, Borish Cuadrado, Hermes Cuadros, Alvaro Duque, Luisa Fernanda Duque, Sandra Milena Espinosa, Rebeca Franke-Ante, Hernando García, Alejandro Gómez, Roy González-M., Álvaro Idárraga-Piedrahíta, Eliana Jimenez, Rubén Jurado, Wilmar López Oviedo, René López-Camacho, Omar Aurelio Melo Cruz, Irina Mendoza Polo, Edwin Paky, Karen Pérez, Angel Pijachi, Camila Pizano, Adriana Prieto, Laura Ramos, Zorayda Restrepo Correa, James Richardson, Elkin Rodríguez, Gina M. Rodriguez M., Agustín Rudas, Pablo Stevenson, Markéta Chudomelová, Martin Dancak, Radim Hédl, Stanislav Lhota, Martin Svatek, Jacques Mukinzi, Corneille Ewango, Terese Hart, Emmanuel Kasongo Yakusu, Janvier Lisingo, Jean-Remy Makana, Faustin Mbayu, Benjamin Toirambe, John Tshibamba Mukendi, Lars Kvist, Gustav Nebel, Selene Báez, Carlos Céron, Daniel M. Griffith, Juan Ernesto Guevara Andino, David Neill, Walter Palacios, Maria Cristina Peñuela-Mora, Gonzalo Rivas-Torres, Gorky Villa, Sheleme Demissie, Tadesse Gole, Techane Gonfa, Kalle Ruokolainen, Michel Baisie, Fabrice Bénédet, Wemo Betian, Vincent Bezard, Damien Bonal, Jerôme Chave, Vincent Droissart, Sylvie Gourlet-Fleury, Annette Hladik, Nicolas Labrière, Pétrus Naisso, Maxime Réjou-Méchain, Plinio Sist, Lilian Blanc, Benoit Burban, Géraldine Derroire, Aurélie Dourdain, Clement Stahl, Natacha Nssi Bengone, Eric Chezeaux, Fidèle Evouna Ondo, Vincent Medjibe, Vianet Mihindou, Lee White, Heike Culmsee, Cristabel Durán Rangel, Viviana Horna, Florian Wittmann, Stephen Adu-Bredu, Kofi Affum-Baffoe, Ernest Foli, Michael Balinga, Anand Roopsind, James Singh, Raquel Thomas, Roderick Zagt, Indu K. Murthy, Kuswata Kartawinata, Edi Mirmanto, Hari Priyadi, Ismayadi Samsoedin, Terry Sunderland, Ishak Yassir, Francesco Rovero, Barbara Vinceti, Bruno Hérault, Shin-Ichiro Aiba, Kanehiro Kitayama, Armandu Daniels, Darlington Tuagben, John T. Woods, Muhammad Fitriadi, Alexander Karolus, Kho Lip Khoon, Noreen Majalap, Colin Maycock, Reuben Nilus, Sylvester Tan, Almeida Sitoe, Indiana Coronado G., Lucas Ojo, Rafael de Assis, Axel Dalberg Poulsen, Douglas Sheil, Karen Arévalo Pezo, Hans Buttgenbach Verde, Victor Chama Moscoso, Jimmy Cesar Cordova Oroche, Fernando Cornejo Valverde, Massiel Corrales Medina, Nallaret Davila Cardozo, Jano de Rutte Corzo, Jhon del Aguila Pasquel, Gerardo Flores Llampazo, Luis Freitas, Darcy Galiano Cabrera, Roosevelt García Villacorta, Karina Garcia Cabrera, Diego García Soria, Leticia Gatica Saboya, Julio Miguel Grandez Rios, Gabriel Hidalgo Pizango, Eurídice Honorio Coronado, Isau Huamantupa-Chuquimaco, Walter Huaraca Huasco, Yuri Tomas Huillca Aedo, Jose Luis Marcelo Peña, Abel Monteagudo Mendoza, Vanesa Moreano Rodriguez, Percy Núñez Vargas, Sonia Cesarina Palacios Ramos, Nadir Pallqui Camacho, Antonio Peña Cruz, Freddy Ramirez Arevalo, José Reyna Huaymacari, Carlos Reynel Rodriguez, Marcos Antonio Ríos Paredes, Lily Rodriguez Bayona, Rocio del Pilar Rojas Gonzales, Maria Elena Rojas Peña, Norma Salinas Revilla, Yahn Carlos Soto Shareva, Raul Tupayachi Trujillo, Luis Valenzuela Gamarra, Rodolfo Vasquez Martinez, Jim Vega Arenas, Christian Amani, Suspense Averti Ifo, Yannick Bocko, Patrick Boundja, Romeo Ekoungoulou, Mireille Hockemba, Donatien Nzala, Alusine Fofanah, David Taylor, Guillermo Bañares-de Dios, Luis Cayuela, Íñigo Granzow-de la Cerda, Manuel Macía, Juliana Stropp, Maureen Playfair, Verginia Wortel, Toby Gardner, Robert Muscarella, Hari Priyadi, Ervan Rutishauser, Kuo-Jung Chao, Pantaleo Munishi, Olaf Bánki, Frans Bongers, Rene Boot, Gabriella Fredriksson, Jan Reitsma, Hans ter Steege, Tinde van Andel, Peter van de Meer, Peter van der Hout, Mark van Nieuwstadt, Bert van Ulft, Elmar Veenendaal, Ronald Vernimmen, Pieter Zuidema, Joeri Zwerts, Perpetra Akite, Robert Bitariho, Colin Chapman, Eilu Gerald, Miguel Leal, Patrick Mucunguzi, Miguel Alexiades, Timothy R. Baker, Karina Banda, Lindsay Banin, Jos Barlow, Amy Bennett, Erika Berenguer, Nicholas Berry, Neil M. Bird, George A. Blackburn, Francis Brearley, Roel Brienen, David Burslem, Lidiany Carvalho, Percival Cho, Fernanda Coelho, Murray Collins, David Coomes, Aida Cuni-Sanchez, Greta Dargie, Kyle Dexter, Mat Disney, Freddie Draper, Muying Duan, Adriane Esquivel-Muelbert, Robert Ewers, Belen Fadrique, Sophie Fauset, Ted R. Feldpausch, Filipe França, David Galbraith, Martin Gilpin, Emanuel Gloor, John Grace, Keith Hamer, David Harris, Tommaso Jucker, Michelle Kalamandeen, Bente Klitgaard, Aurora Levesley, Simon L. Lewis, Jeremy Lindsell, Gabriela Lopez-Gonzalez, Jon Lovett, Yadvinder Malhi, Toby Marthews, Emma McIntosh, Karina Melgaço, William Milliken, Edward Mitchard, Peter Moonlight, Sam Moore, Alexandra Morel, Julie Peacock, Kelvin Peh, Colin Pendry, R. Toby Pennington, Luciana de Oliveira Pereira, Carlos Peres, Oliver L. Phillips, Georgia Pickavance, Thomas Pugh, Lan Qie, Terhi Riutta, Katherine Roucoux, Casey Ryan, Tiina Sarkinen, Camila Silva Valeria, Dominick Spracklen, Suzanne Stas, Martin Sullivan, Michael Swaine, Joey Talbot, James Taplin, Geertje van der Heijden, Laura Vedovato, Simon Willcock, Mathew Williams, Luciana Alves, Patricia Alvarez Loayza, Gabriel Arellano, Cheryl Asa, Peter Ashton, Gregory Asner, Terry Brncic, Foster Brown, Robyn Burnham, Connie Clark, James Comiskey, Gabriel Damasco, Stuart Davies, Tony Di Fiore, Terry Erwin, William Farfan-Rios, Jefferson Hall, David Kenfack, Thomas Lovejoy, Roberta Martin, Olga Martha Montiel, John Pipoly, Nigel Pitman, John Poulsen, Richard Primack, Miles Silman, Marc Steininger, Varun Swamy, John Terborgh, Duncan Thomas, Peter Umunay, Maria Uriarte, Emilio Vilanova Torre, Ophelia Wang, Kenneth Young, Gerardo A. Aymard C., Lionel Hernández, Rafael Herrera Fernández, Hirma Ramírez-Angulo, Pedro Salcedo, Elio Sanoja, Julio Serrano, Armando Torres-Lezama, Tinh Cong Le, Trai Trong Le, Hieu Dang Tra
    corecore