640 research outputs found

    GeneSrF and varSelRF: a web-based tool and R package for gene selection and classification using random forest

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Microarray data are often used for patient classification and gene selection. An appropriate tool for end users and biomedical researchers should combine user friendliness with statistical rigor, including carefully avoiding selection biases and allowing analysis of multiple solutions, together with access to additional functional information of selected genes. Methodologically, such a tool would be of greater use if it incorporates state-of-the-art computational approaches and makes source code available.</p> <p>Results</p> <p>We have developed GeneSrF, a web-based tool, and varSelRF, an R package, that implement, in the context of patient classification, a validated method for selecting very small sets of genes while preserving classification accuracy. Computation is parallelized, allowing to take advantage of multicore CPUs and clusters of workstations. Output includes bootstrapped estimates of prediction error rate, and assessments of the stability of the solutions. Clickable tables link to additional information for each gene (GO terms, PubMed citations, KEGG pathways), and output can be sent to PaLS for examination of PubMed references, GO terms, KEGG and and Reactome pathways characteristic of sets of genes selected for class prediction. The full source code is available, allowing to extend the software. The web-based application is available from <url>http://genesrf2.bioinfo.cnio.es</url>. All source code is available from Bioinformatics.org or The Launchpad. The R package is also available from CRAN.</p> <p>Conclusion</p> <p>varSelRF and GeneSrF implement a validated method for gene selection including bootstrap estimates of classification error rate. They are valuable tools for applied biomedical researchers, specially for exploratory work with microarray data. Because of the underlying technology used (combination of parallelization with web-based application) they are also of methodological interest to bioinformaticians and biostatisticians.</p

    Plantmetabolomics.org: mass spectrometry-based Arabidopsis metabolomics—database and tools update

    Get PDF
    The PlantMetabolomics (PM) database (http://www.plantmetabolomics.org) contains comprehensive targeted and untargeted mass spectrum metabolomics data for Arabidopsis mutants across a variety of metabolomics platforms. The database allows users to generate hypotheses about the changes in metabolism for mutants with genes of unknown function. Version 2.0 of PlantMetabolomics.org currently contains data for 140 mutant lines along with the morphological data. A web-based data analysis wizard allows researchers to select preprocessing and data-mining procedures to discover differences between mutants. This community resource enables researchers to formulate models of the metabolic network of Arabidopsis and enhances the research community's ability to formulate testable hypotheses concerning gene functions. PM features new web-based tools for data-mining analysis, visualization tools and enhanced cross links to other databases. The database is publicly available. PM aims to provide a hypothesis building platform for the researchers interested in any of the mutant lines or metabolites

    SignS: a parallelized, open-source, freely available, web-based tool for gene selection and molecular signatures for survival and censored data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Censored data are increasingly common in many microarray studies that attempt to relate gene expression to patient survival. Several new methods have been proposed in the last two years. Most of these methods, however, are not available to biomedical researchers, leading to many re-implementations from scratch of ad-hoc, and suboptimal, approaches with survival data.</p> <p>Results</p> <p>We have developed SignS (Signatures for Survival data), an open-source, freely-available, web-based tool and R package for gene selection, building molecular signatures, and prediction with survival data. SignS implements four methods which, according to existing reviews, perform well and, by being of a very different nature, offer complementary approaches. We use parallel computing via MPI, leading to large decreases in user waiting time. Cross-validation is used to asses predictive performance and stability of solutions, the latter an issue of increasing concern given that there are often several solutions with similar predictive performance. Biological interpretation of results is enhanced because genes and signatures in models can be sent to other freely-available on-line tools for examination of PubMed references, GO terms, and KEGG and Reactome pathways of selected genes.</p> <p>Conclusion</p> <p>SignS is the first web-based tool for survival analysis of expression data, and one of the very few with biomedical researchers as target users. SignS is also one of the few bioinformatics web-based applications to extensively use parallelization, including fault tolerance and crash recovery. Because of its combination of methods implemented, usage of parallel computing, code availability, and links to additional data bases, SignS is a unique tool, and will be of immediate relevance to biomedical researchers, biostatisticians and bioinformaticians.</p

    Stable isotope analysis indicates resource partitioning and trophic niche overlap in larvae of four tuna species in the Gulf of Mexico

    Get PDF
    In this study we assessed the trophic ecology of bluefin tuna Thunnus thynnus larvae from the Gulf of Mexico, together with the co-occurring larvae of blackfin tuna T. atlanticus, bullet tuna Auxis rochei, and skipjack Katsuwonus pelamis, using both bulk-tissue stable isotope analysis (SIAbulk) and compound-specific analysis of amino acids (CSIAAA). Bulk nitrogen (δ15Nbulk) and carbon (δ13Cbulk) values differed significantly among species, suggesting partitioning of resources due to an adaptive process allowing these tunas to share the ecosystem’s trophic resources during this early life period. K. pelamis had the largest isotopic niche width, likely due to piscivorous feeding at an earlier age compared to the other species, with an isotopic niche overlap of 17.5% with T. thynnus, 15.8% with T. atlanticus, and 31.2% with A. rochei. This trophic overlap suggests a mix of competition and trophic differentiation among these 4 species of tuna larvae. Higher nitrogen isotopic signatures in preflexion versus postflexion larvae of T. thynnus measured using both SIAbulk and CSIAAA indicate maternal isotopic transmission, as well as ‘capital breeder’-like characteristics. In contrast, the nitrogen isotopic ratios of the other 3 species were similar between ontogenetic stages. These observations suggest different breeding strategies within the study area for T. atlanticus, K. pelamis, and A. rochei compared to T. thynnus. No significant differences were observed among the 4 species’ trophic positions (TPs) estimated by CSIAAA, whereas a higher TP was observed for T. thynnus by SIAbulk. These differences in TP estimation may be attributed to discrepancies in baseline estimates.Postprint2,48

    Dietary Fat Patterns and Outcomes in Acute Pancreatitis in Spain

    Get PDF
    Background/Objective: Evidence from basic and clinical studies suggests that unsaturated fatty acids (UFAs) might be relevant mediators of the development of complications in acute pancreatitis (AP). Objective: The aim of this study was to analyze outcomes in patients with AP from regions in Spain with different patterns of dietary fat intake. Materials and Methods: A retrospective analysis was performed with data from 1,655 patients with AP from a Spanish prospective cohort study and regional nutritional data from a Spanish cross-sectional study. Nutritional data considered in the study concern the total lipid consumption, detailing total saturated fatty acids, UFAs and monounsaturated fatty acids (MUFAs) consumption derived from regional data and not from the patient prospective cohort. Two multivariable analysis models were used: (1) a model with the Charlson comorbidity index, sex, alcoholic etiology, and recurrent AP; (2) a model that included these variables plus obesity. Results: In multivariable analysis, patients from regions with high UFA intake had a significantly increased frequency of local complications, persistent organ failure (POF), mortality, and moderate-to-severe disease in the model without obesity and a higher frequency of POF in the model with obesity. Patients from regions with high MUFA intake had significantly more local complications and moderate-to-severe disease; this significance remained for moderate-to-severe disease when obesity was added to the model. Conclusions: Differences in dietary fat patterns could be associated with different outcomes in AP, and dietary fat patterns may be a pre-morbid factor that determines the severity of AP. UFAs, and particulary MUFAs, may influence the pathogenesis of the severity of AP

    Guidelines for the definition of operational management units

    Get PDF
    The objective of fisheries management is the sustainable exploitation of the fish resources over the extent of their spatial distribution. Along with the Common Fisheries Policy (CFP) objectives, the socio-economic viability of the fisheries exploiting the resource is also to be achieved. To reach these aims, managers need to define the management units they are going to work with. For the purpose of GEPETO project, we define a management unit (MU) as the set of fishing fleets exploiting a common pool of fish resources with strong spatial overlapping and sharing of habitats, which make them being typically fished together. In other words, a MU is the set of fishing fleets exploiting a common fish community over their spatial distribution. MUs have to be defined by the fish community, by the spatial range of distribution of the fish community, and by the set of fishing fleets sharing the exploitation of the fish communityL'objectif de gestion de la pêche est l'exploitation durable des ressources halieutiques sur l'étendue de leur répartition spatiale. Avec la nouvelle Politique Commune de la pêche (PCP) l' objectif de la viabilité socio-économique des pêcheries exploitant la ressource doit également être réalisé. Pour l'atteindre, les gestionnaires doivent définir des unités de gestion. Les partenaires du projet GEPETO, définissent une unité de gestion (MU) comme l'ensemble des flottes de pêche exploitant un pool commun de ressources halieutiques disponibles dans des habitats communs, ce qui les rend très imbriquées. En d'autres termes, un MU est l'ensemble des flottes de pêche exploitant une communauté de poissons ordinaires sur leur répartition spatiale. La MU peu être définie par la communauté de poissons, par la gamme spatiale de la distribution de la communauté de poissons, et par l'ensemble des flottes de pêche qui partagent l'exploitation de la communauté de poissons

    LARVAL BLUEFIN TUNA (THUNNUS THYNNUS) TROPHODYNAMICS FROM BALEARIC SEA (WM) AND GULF OF MEXICO SPAWNING ECOSYSTEMS BY STABLE ISOTOPE

    Get PDF
    The present study uses stable isotopes of nitrogen and carbon (δ15N and δ13C) as trophic indicators for Atlantic bluefin tuna larvae (BFT) (6-10 mm SL) in the highly contrasting environmental conditions of the Gulf of Mexico (GOM) and the Balearic Sea (MED). The study analyzes ontogenetic changes in the food sources and trophic levels (TL) of BFT larvae from each spawning habitat. The results discuss differences in the ontogenic dietary shifts observed in the BFT larvae from the GOM and MED as well as trophodynamic differences in relation to the microzooplanktonic baselines used for estimating trophic enrichment. Significant trophic differences between the GOM and MED larvae were observed in relation to δ15N signatures in favour of the MED larvae, which may have important implications in their early life growth strategy.Versión de edito

    The efficacy of various machine learning models for multi-class classification of RNA-seq expression data

    Full text link
    Late diagnosis and high costs are key factors that negatively impact the care of cancer patients worldwide. Although the availability of biological markers for the diagnosis of cancer type is increasing, costs and reliability of tests currently present a barrier to the adoption of their routine use. There is a pressing need for accurate methods that enable early diagnosis and cover a broad range of cancers. The use of machine learning and RNA-seq expression analysis has shown promise in the classification of cancer type. However, research is inconclusive about which type of machine learning models are optimal. The suitability of five algorithms were assessed for the classification of 17 different cancer types. Each algorithm was fine-tuned and trained on the full array of 18,015 genes per sample, for 4,221 samples (75 % of the dataset). They were then tested with 1,408 samples (25 % of the dataset) for which cancer types were withheld to determine the accuracy of prediction. The results show that ensemble algorithms achieve 100% accuracy in the classification of 14 out of 17 types of cancer. The clustering and classification models, while faster than the ensembles, performed poorly due to the high level of noise in the dataset. When the features were reduced to a list of 20 genes, the ensemble algorithms maintained an accuracy above 95% as opposed to the clustering and classification models.Comment: 12 pages, 4 figures, 3 tables, conference paper: Computing Conference 2019, published at https://link.springer.com/chapter/10.1007/978-3-030-22871-2_6
    corecore