558 research outputs found

    An evolutionary approach to constraint-regularized learning

    Get PDF
    The success of machine learning methods for inducing models from data crucially depends on the proper incorporation of background knowledge about the model to be learned. The idea of constraint-regularized learning is to em- ploy fuzzy set-based modeling techniques in order to express such knowl- edge in a flexible way, and to formalize it in terms of fuzzy constraints. Thus, background knowledge can be used to appropriately bias the learn- ing process within the regularization framework of inductive inference. After a brief review of this idea, the paper offers an operationalization of constraint- regularized learning. The corresponding framework is based on evolutionary methods for model optimization and employs fuzzy rule bases of the Takagi- Sugeno type as flexible function approximators

    MGIS: managing banana (Musa spp.) genetic resources information and high-throughput genotyping data

    Get PDF
    Unraveling the genetic diversity held in genebanks on a large scale is underway, due to advances in Next-generation sequence (NGS) based technologies that produce high-density genetic markers for a large number of samples at low cost. Genebank users should be in a position to identify and select germplasm from the global genepool based on a combination of passport, genotypic and phenotypic data. To facilitate this, a new generation of information systems is being designed to efficiently handle data and link it with other external resources such as genome or breeding databases. The Musa Germplasm Information System (MGIS), the database for global ex situ-held banana genetic resources, has been developed to address those needs in a user-friendly way. In developing MGIS, we selected a generic database schema (Chado), the robust content management system Drupal for the user interface, and Tripal, a set of Drupal modules which links the Chado schema to Drupal. MGIS allows germplasm collection examination, accession browsing, advanced search functions, and germplasm orders. Additionally, we developed unique graphical interfaces to compare accessions and to explore them based on their taxonomic information. Accession-based data has been enriched with publications, genotyping studies and associated genotyping datasets reporting on germplasm use. Finally, an interoperability layer has been implemented to facilitate the link with complementary databases like the Banana Genome Hub and the MusaBase breeding database. Database URL:https://www.crop-diversity.org/mgis

    The public goods hypothesis for the evolution of life on Earth

    Get PDF
    It is becoming increasingly difficult to reconcile the observed extent of horizontal gene transfers with the central metaphor of a great tree uniting all evolving entities on the planet. In this manuscript we describe the Public Goods Hypothesis and show that it is appropriate in order to describe biological evolution on the planet. According to this hypothesis, nucleotide sequences (genes, promoters, exons, etc.) are simply seen as goods, passed from organism to organism through both vertical and horizontal transfer. Public goods sequences are defined by having the properties of being largely non-excludable (no organism can be effectively prevented from accessing these sequences) and non-rival (while such a sequence is being used by one organism it is also available for use by another organism). The universal nature of genetic systems ensures that such non-excludable sequences exist and non-excludability explains why we see a myriad of genes in different combinations in sequenced genomes. There are three features of the public goods hypothesis. Firstly, segments of DNA are seen as public goods, available for all organisms to integrate into their genomes. Secondly, we expect the evolution of mechanisms for DNA sharing and of defense mechanisms against DNA intrusion in genomes. Thirdly, we expect that we do not see a global tree-like pattern. Instead, we expect local tree-like patterns to emerge from the combination of a commonage of genes and vertical inheritance of genomes by cell division. Indeed, while genes are theoretically public goods, in reality, some genes are excludable, particularly, though not only, when they have variant genetic codes or behave as coalition or club goods, available for all organisms of a coalition to integrate into their genomes, and non-rival within the club. We view the Tree of Life hypothesis as a regionalized instance of the Public Goods hypothesis, just like classical mechanics and euclidean geometry are seen as regionalized instances of quantum mechanics and Riemannian geometry respectively. We argue for this change using an axiomatic approach that shows that the Public Goods hypothesis is a better accommodation of the observed data than the Tree of Life hypothesis

    Statistical advances and challenges for analyzing correlated high dimensional SNP data in genomic study for complex diseases

    Full text link
    Recent advances of information technology in biomedical sciences and other applied areas have created numerous large diverse data sets with a high dimensional feature space, which provide us a tremendous amount of information and new opportunities for improving the quality of human life. Meanwhile, great challenges are also created driven by the continuous arrival of new data that requires researchers to convert these raw data into scientific knowledge in order to benefit from it. Association studies of complex diseases using SNP data have become more and more popular in biomedical research in recent years. In this paper, we present a review of recent statistical advances and challenges for analyzing correlated high dimensional SNP data in genomic association studies for complex diseases. The review includes both general feature reduction approaches for high dimensional correlated data and more specific approaches for SNPs data, which include unsupervised haplotype mapping, tag SNP selection, and supervised SNPs selection using statistical testing/scoring, statistical modeling and machine learning methods with an emphasis on how to identify interacting loci.Comment: Published in at http://dx.doi.org/10.1214/07-SS026 the Statistics Surveys (http://www.i-journals.org/ss/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Apex and fuzzy model assessment of environmental benefits of agroforestry buffers for claypan soils

    Get PDF
    Contamination of surface waters with pollutants from agricultural land is a major threat to the environment. A field-size watershed study in Northeast Missouri showed that vegetated filter strips containing grass and grass+trees (agroforestry) buffers placed on contours reduced sediment and nutrient loadings by 11-35%. Watershed scale studies are overly expensive while computer simulated hydrologic models offer efficient and economical tools to examine environmental benefits of conservation practices. The current study used the Agricultural Policy Environmental eXtender (APEX) model and a fuzzy logic model to predict environmental benefits of buffers and grass waterways of three adjacent watersheds at the Greenley Memorial Research Center. During the second phase of the study, an automated computer technique was developed to optimize parameter sets for the APEX model for runoff, sediment, total phosphorous (TP) and total nitrogen (TN) losses. The APEX model was calibrated and validated satisfactorily for runoff from both pre- and post-buffer watersheds. The sediment, TP, and TN were calibrated only for larger events during the pre-buffer period (>50 mm). Only TP was calibrated by post-buffer models. The models simulated 13- 25% TP reduction by grass waterways, and 4-5% runoff and 13-45% TP reductions by buffers. The fuzzy model predicted runoff for the study watersheds and for watersheds 30 and 50 times larger in northern Missouri. A stepwise multi-objective, multi-variable parameter optimization technique improved calibration of sediments, TP, and TN after optimization for runoff parameters. The results of the study show that models can be used to examine environmental benefits provided long-term data are available

    Overview of the taxonomy of zooxanthellate Scleractinia

    Get PDF
    Coral taxonomy has entered a historical phase where nomenclatorial uncertainty is rapidly increasing. The fundamental cause is mandatory adherence to historical monographs that lack essential information of all sorts, and also to type specimens, if they exist at all, that are commonly unrecognizable fragments or are uncharacteristic of the species they are believed to represent. Historical problems, including incorrect subsequent type species designations, also create uncertainty for many well-established genera. The advent of in situ studies in the 1970s revealed these issues; now molecular technology is again changing the taxonomic landscape. The competing methodologies involved must be seen in context if they are to avoid becoming an additional basis for continuing nomenclatorial instability. To prevent this happening, the International Commission on Zoological Nomenclature (ICZN) will need to focus on rules that consolidate well-established nomenclature and allow for the designation of new type specimens that are unambiguous, and which include both skeletal material and soft tissue for molecular study. Taxonomic and biogeographic findings have now become linked, with molecular methodologies providing the capacity to re-visit past taxonomic decisions, and to extend both taxonomy and biogeography into the realm of evolutionary theory. It is proposed that most species will ultimately be seen as operational taxonomic units that are human rather than natural constructs, which in consequence will always have fuzzy morphological, genetic, and distribution boundaries. The pathway ahead calls for the integration of morphological and molecular taxonomies, and for website delivery of information that crosses current discipline boundaries
    corecore