178 research outputs found

    A Phylogenetic Analysis of the Genus Fragaria (Strawberry) Using Intron-Containing Sequence from the ADH-1 Gene

    Get PDF
    The genus Fragaria encompasses species at ploidy levels ranging from diploid to decaploid. The cultivated strawberry, Fragaria×ananassa, and its two immediate progenitors, F. chiloensis and F. virginiana, are octoploids. To elucidate the ancestries of these octoploid species, we performed a phylogenetic analysis using intron-containing sequences of the nuclear ADH-1 gene from 39 germplasm accessions representing nineteen Fragaria species and one outgroup species, Dasiphora fruticosa. All trees from Maximum Parsimony and Maximum Likelihood analyses showed two major clades, Clade A and Clade B. Each of the sampled octoploids contributed alleles to both major clades. All octoploid-derived alleles in Clade A clustered with alleles of diploid F. vesca, with the exception of one octoploid allele that clustered with the alleles of diploid F. mandshurica. All octoploid-derived alleles in clade B clustered with the alleles of only one diploid species, F. iinumae. When gaps encoded as binary characters were included in the Maximum Parsimony analysis, tree resolution was improved with the addition of six nodes, and the bootstrap support was generally higher, rising above the 50% threshold for an additional nine branches. These results, coupled with the congruence of the sequence data and the coded gap data, validate and encourage the employment of sequence sets containing gaps for phylogenetic analysis. Our phylogenetic conclusions, based upon sequence data from the ADH-1 gene located on F. vesca linkage group II, complement and generally agree with those obtained from analyses of protein-encoding genes GBSSI-2 and DHAR located on F. vesca linkage groups V and VII, respectively, but differ from a previous study that utilized rDNA sequences and did not detect the ancestral role of F. iinumae

    Machine learning models towards elucidating the plant intron retention code

    Get PDF
    2017 Fall.Includes bibliographical references.Alternative Splicing is a process that allows a single gene to encode multiple proteins. Intron Retention (IR) is a type of alternative splicing which is mainly prevalent in plants, but has been shown to regulate gene expression in various organisms and is often involved in rare human diseases. Despite its important role, not much research has been done to understand IR. The motivation behind this research work is to better understand IR and how it is regulated by various biological factors. We designed a combination of 137 features, forming an "intron retention code", to reveal the factors that contribute to IR. Using random forest and support vector machine classifiers, we show the usefulness of these features for the task of predicting whether an intron is subject to IR or not. An analysis of the top-ranking features for this task reveals a high level of similarity of the most predictive features across the three plant species, demonstrating the conservation of the factors that determine IR. We also found a high level of similarity to the top features contributing to IR in mammals. The task of predicting the response to drought stress proved more difficult, with lower levels of accuracy and lower levels of similarity across species, suggesting that additional features need to be considered for predicting condition-specific IR

    A knowledge engineering approach to the recognition of genomic coding regions

    Get PDF
    ได้ทุนอุดหนุนการวิจัยจากมหาวิทยาลัยเทคโนโลยีสุรนารี ปีงบประมาณ พ.ศ.2556-255

    Diversity Arrays Technology (DArT) for Pan-Genomic Evolutionary Studies of Non-Model Organisms

    Get PDF
    Background: High-throughput tools for pan-genomic study, especially the DNA microarray platform, have sparked a remarkable increase in data production and enabled a shift in the scale at which biological investigation is possible. The use of microarrays to examine evolutionary relationships and processes, however, is predominantly restricted to model or near-model organisms. Methodology/Principal Findings: This study explores the utility of Diversity Arrays Technology (DArT) in evolutionary studies of non-model organisms. DArT is a hybridization-based genotyping method that uses microarray technology to identify and type DNA polymorphism. Theoretically applicable to any organism (even one for which no prior genetic data are available), DArT has not yet been explored in exclusively wild sample sets, nor extensively examined in a phylogenetic framework. DArT recovered 1349 markers of largely low copy-number loci in two lineages of seed-free land plants: the diploid fern Asplenium viride and the haploid moss Garovaglia elegans. Direct sequencing of 148 of these DArT markers identified 30 putative loci including four routinely sequenced for evolutionary studies in plants. Phylogenetic analyses of DArT genotypes reveal phylogeographic and substrate specificity patterns in A. viride, a lack of phylogeographic pattern in Australian G. elegans, and additive variation in hybrid or mixed samples. Conclusions/Significance: These results enable methodological recommendations including procedures for detecting and analysing DArT markers tailored specifically to evolutionary investigations and practical factors informing the decision to use DArT, and raise evolutionary hypotheses concerning substrate specificity and biogeographic patterns. Thus DArT is a demonstrably valuable addition to the set of existing molecular approaches used to infer biological phenomena such as adaptive radiations, population dynamics, hybridization, introgression, ecological differentiation and phylogeography

    Combined optimization algorithms applied to pattern classification

    Get PDF
    Accurate classification by minimizing the error on test samples is the main goal in pattern classification. Combinatorial optimization is a well-known method for solving minimization problems, however, only a few examples of classifiers axe described in the literature where combinatorial optimization is used in pattern classification. Recently, there has been a growing interest in combining classifiers and improving the consensus of results for a greater accuracy. In the light of the "No Ree Lunch Theorems", we analyse the combination of simulated annealing, a powerful combinatorial optimization method that produces high quality results, with the classical perceptron algorithm. This combination is called LSA machine. Our analysis aims at finding paradigms for problem-dependent parameter settings that ensure high classifica, tion results. Our computational experiments on a large number of benchmark problems lead to results that either outperform or axe at least competitive to results published in the literature. Apart from paxameter settings, our analysis focuses on a difficult problem in computation theory, namely the network complexity problem. The depth vs size problem of neural networks is one of the hardest problems in theoretical computing, with very little progress over the past decades. In order to investigate this problem, we introduce a new recursive learning method for training hidden layers in constant depth circuits. Our findings make contributions to a) the field of Machine Learning, as the proposed method is applicable in training feedforward neural networks, and to b) the field of circuit complexity by proposing an upper bound for the number of hidden units sufficient to achieve a high classification rate. One of the major findings of our research is that the size of the network can be bounded by the input size of the problem and an approximate upper bound of 8 + √2n/n threshold gates as being sufficient for a small error rate, where n := log/SL and SL is the training set

    Gen verileri üzerinde ilginçlik ölçütleri kullanılarak birliktelik kuralları madenciliğinin uygulanması

    Get PDF
    Aim: Data mining is the discovery process of beneficial information, not revealed from large-scale data beforehand. One of the fields in which data mining is widely used is health. With data mining, the diagnosis and treatment of the disease and the risk factors affecting the disease can be determined quickly. Association rules are one of the data mining techniques. The aim of this study is to determine patient profiles by obtaining strong association rules with the apriori algorithm, which is one of the association rule algorithms. Material and Method: The data set used in the study consists of 205 acute myocardial infarction (AMI) patients. The patients have also carried the genotype of the FNDC5 (rs3480, rs726344, rs16835198) polymorphisms. Support and confidence measures are used to evaluate the rules obtained in the Apriori algorithm. The rules obtained by these measures are correct but not strong. Therefore, interest measures are used, besides two basic measures, with the aim of obtaining stronger rules. In this study For reaching stronger rules, interest measures lift, conviction, certainty factor, cosine, phi and mutual information are applied. Results: In this study, 108 rules were obtained. The proposed interest measures were implemented to reach stronger rules and as a result 29 of the rules were qualified as strong. Conclusion: As a result, stronger rules have been obtained with the use of interest measures in the clinical decision making process. Thanks to the strong rules obtained, it will facilitate the patient profile determination and clinical decision-making process of AMI patients.Amaç: Veri madenciliği, önceden büyük ölçekli verilerden ortaya çıkarılmayan faydalı bilgilerin keşfedilme sürecidir. Veri madenciliğinin yaygın olarak kullanıldığı alanlardan biri de sağlıktır. Veri madenciliği ile hastalığın tanı ve tedavisi ile hastalığı etkileyen risk faktörleri hızlı bir şekilde belirlenebilmektedir. Birliktelik kuralları, veri madenciliği tekniklerinden biridir. Bu çalışmanın amacı, birliktelik kuralı algoritmalarından biri olan apriori algoritması ile güçlü birliktelik kuralları elde ederek hasta profillerini belirlemektir. Materyal ve Metot: Çalışmada kullanılan veri seti 205 akut miyokard enfarktüsü (AMI) hastasından oluşmaktadır. Hastalar ayrıca FNDC5 polimorfizmlerinin rs3480, rs726344, rs16835198 genotipini de taşımaktadır. Apriori algoritması ile elde edilen kuralları değerlendirmek için destek ve güven ölçüleri kullanılır. Ancak bu ölçütler ile elde edilen kurallar doğrudur ancak güçlü değildir. Bu nedenle, daha güçlü kurallar elde etmek amacıyla iki temel ölçütün yanı sıra ilginçlik ölçütleri kullanılmaktadır. Bu çalışmada daha güçlü kurallara ulaşmak için ilginçlik ölçütlerinden kaldıraç, kanaat, kesinlik faktörü, cosine, korelasyon katsayısı (phi) ve karşılıklı bilgi ölçütleri uygulanmıştır. Bulgular: Çalışmada 108 kural elde edilmiştir. Bu kurallara ilginçlik ölçütlerinin de uygulanması ile elde edilen kural sayısı 29 olmuştur ve bu kurallar güçlü kural olarak nitelendirilmiştir. Sonuç: Sonuç olarak, klinik karar verme sürecinde ilginçlik ölçütlerinin kullanılmasıyla daha güçlü kurallar elde edilmiştir. Elde edilen güçlü kurallar sayesinde AMİ hastalarının hasta profili belirleme ve klinik karar verme sürecini kolaylaştıracaktır

    NOVEL APPLICATIONS OF MACHINE LEARNING IN BIOINFORMATICS

    Get PDF
    Technological advances in next-generation sequencing and biomedical imaging have led to a rapid increase in biomedical data dimension and acquisition rate, which is challenging the conventional data analysis strategies. Modern machine learning techniques promise to leverage large data sets for finding hidden patterns within them, and for making accurate predictions. This dissertation aims to design novel machine learning-based models to transform biomedical big data into valuable biological insights. The research presented in this dissertation focuses on three bioinformatics domains: splice junction classification, gene regulatory network reconstruction, and lesion detection in mammograms. A critical step in defining gene structures and mRNA transcript variants is to accurately identify splice junctions. In the first work, we built the first deep learning-based splice junction classifier, DeepSplice. It outperforms the state-of-the-art classification tools in terms of both classification accuracy and computational efficiency. To uncover transcription factors governing metabolic reprogramming in non-small-cell lung cancer patients, we developed TFmeta, a machine learning approach to reconstruct relationships between transcription factors and their target genes in the second work. Our approach achieves the best performance on benchmark data sets. In the third work, we designed deep learning-based architectures to perform lesion detection in both 2D and 3D whole mammogram images

    Bioinformatics: a promising field for case-based reasoning

    Get PDF
    Case Based Reasoning has been applied in different fields such as medicine, industry, tutoring systems and others, but in the CBR there are many areas to explore. Nowadays, some research works in Bioinformatics are attempting to use CBR like a tool for classifying DNA genes. Specially the microarrays have been applied increasingly to improve medical decision-making, and to the diagnosis of different diseases like cancer. This research work analyzes the Microarrays structure, and the initial concepts to understand how DNA structure is studied in the Bioinformatics' field. In last years the CBR has been related to Bioinformatics and Microarrays. In this report, our interest is to find out how the Microarrays technique could help in the CBR field, and specially in the Case-Based Maintenance policies.Postprint (published version
    corecore