43 research outputs found

    Fast Comparison of Microbial Genomes Using the Chaos Games Representation for Metagenomic Applications

    Get PDF
    AbstractGenome sequencing technology is generating large databases of sequence at such a rate that advances in computer hardware alone are not adequate to handle them: more efficient algorithms are needed. Here an alignment-free method of sequence comparison and visualisation based on the Chaos Games Representation (CGR) and multifractal analysis is explored as an approach to search and filter through a data set of over 1500 microbial genomes. Whereas BLAST takes 25hours to search this data set with large sequence fragments (e.g. 100 Kb), the method introduced here can reduce this data set by 95% (from 1550 target species to just 50) in about 15minutes, and it is able to predict the exact species correctly in 67% of cases. The results presented here demonstrate that CGR is worth further investigation as a fast method to perform genome sequence comparison on large data sets, and various ways to further develop the method are discussed

    Exploración y comparación de métodos de inteligencia artificial para la clasificación taxonómica en análisis metagenómicos

    Get PDF
    55 páginasLa mayor diversidad genética está presente en las comunidades de microorganismos, el conocer estas especies, sus funciones y diferencias constituye un papel importante para solucionar problemas diversas áreas, como la salud, la alimentación y el medio ambiente. El método tradicional para realizar este tipo de investigaciones consiste en aislar el microorganismo de una muestra del entorno y así estudiar su constitución genética, sin embargo menos del 1% de los microorganismos pueden ser aislados y cultivados en los laboratorios. Gracias a las técnicas de secuenciación modernas cada vez más accesibles surge la metagenómica proponiendo una alternativa para poder estudiar el otro 99%. La metagenómica se encarga de estudiar la secuenciación de una muestra del entorno para descubrir a qué organismos pertenecen los fragmentos secuenciados. Sin embargo el problema radica en que los procesos necesarios para identificar el tipo de organismos en la muestra demandan mucho tiempo y recursos computacionales. En este trabajo se utilizan diferentes algoritmos de inteligencia artificial para agrupar los fragmentos de secuencias según su similitud en conjuntos puros, es decir, conjuntos cuyos fragmentos pertenezcan a un solo organismo o a un mismo grupo taxonómico de organismos. Además se propone un nuevo algoritmo que se basa en la aplicación del k-means de manera iterativa perfeccionando los grupos según la distancia entre ello. Se compararon los resultados con métodos de agrupamientos clásicos y se comprobó que con este último método se obtienen grupos más puros. Este resultado ayuda a que los procesos de ensamblado o de comparación serán más eficientes y rápidos, debido a que se tiene como entrada inicial una muestra más condensada y uniforme, disminuyendo el tiempo y los recursos consumidos durante los proyectos metagenómicos, al mismo tiempo que pueden realizarse de una forma más enfocada.PregradoIngeniero(a) de Sistemas y Computació

    Synthetic Biology

    Get PDF
    Synthetic biology gives us a new hope because it combines various disciplines, such as genetics, chemistry, biology, molecular sciences, and other disciplines, and gives rise to a novel interdisciplinary science. We can foresee the creation of the new world of vegetation, animals, and humans with the interdisciplinary system of biological sciences. These articles are contributed by renowned experts in their fields. The field of synthetic biology is growing exponentially and opening up new avenues in multidisciplinary approaches by bringing together theoretical and applied aspects of science

    Machine and deep learning meet genome-scale metabolic modeling

    Get PDF
    Omic data analysis is steadily growing as a driver of basic and applied molecular biology research. Core to the interpretation of complex and heterogeneous biological phenotypes are computational approaches in the fields of statistics and machine learning. In parallel, constraint-based metabolic modeling has established itself as the main tool to investigate large-scale relationships between genotype, phenotype, and environment. The development and application of these methodological frameworks have occurred independently for the most part, whereas the potential of their integration for biological, biomedical, and biotechnological research is less known. Here, we describe how machine learning and constraint-based modeling can be combined, reviewing recent works at the intersection of both domains and discussing the mathematical and practical aspects involved. We overlap systematic classifications from both frameworks, making them accessible to nonexperts. Finally, we delineate potential future scenarios, propose new joint theoretical frameworks, and suggest concrete points of investigation for this joint subfield. A multiview approach merging experimental and knowledge-driven omic data through machine learning methods can incorporate key mechanistic information in an otherwise biologically-agnostic learning process

    Women in Science 2014

    Get PDF
    Women in Science 2014 summarizes research done by Smith College’s Summer Research Fellowship (SURF) Program participants. Ever since its 1967 start, SURF has been a cornerstone of Smith’s science education. In 2014, 150 students participated in SURF (141 hosted on campus and nearby eld sites), supervised by 61 faculty mentor-advisors drawn from the Clark Science Center and connected to its eighteen science, mathematics, and engineering departments and programs and associated centers and units. At summer’s end, SURF participants were asked to summarize their research experiences for this publication.https://scholarworks.smith.edu/clark_womeninscience/1003/thumbnail.jp

    Inside the sequence universe: the amazing life of data and the people who look after them

    Get PDF
    This thesis provides an ethnographic exploration of two large nucleotide sequence databases, the European Molecular Biology Laboratory Bank, UK and GenBank, US. It describes and analyses their complex bioinformatic environments as well as their material-discursive environments – the objects, narratives and practices that recursively constitute these databases. In doing so, it unravels a rich bioinformational ecology – the “sequence universe”. Here, mosquitoes have mumps, the louse is “huge” and self-styled information plumbers patch-up high-throughput data pipelines while data curators battle the indiscriminate coming-to-life caused by metagenomics. Given the intensification of data production, the biosciences have reached a point where concerns have squarely turned to fundamental questions about how to know within and between all that data. This thesis assembles a database imaginary, recovering inventive terms of scholarly engagement with bioinformational databases and data, terms that remain critical without necessarily reverting to a database logic. Science studies and related disciplines, investigating illustrious projects like the UK Biobank, have developed a sustained critique of the perceived conflation of bodies and data. This thesis argues that these accounts forego an engagement with the database sui generis, as a situated arrangement of people, things, routines and spaces. It shows that databases have histories and continue established practices of collecting and curating. At the same time, it maps entanglements of the databases with experiments and discovery thereby demonstrates the vibrancy of data. Focusing on the question of what happens at these databases, the thesis follows data curators and programmers but also database records and the entities documented by them, such as uncultured bacteria. It contextualises ethnographic findings within the literature on the sociology and philosophy of science and technology while also making references to works of art and literature in order to bring into relief the boundary-defying scope of the issues raised

    Opportunities and obstacles for deep learning in biology and medicine

    Get PDF
    Deep learning describes a class of machine learning algorithms that are capable of combining raw inputs into layers of intermediate features. These algorithms have recently shown impressive results across a variety of domains. Biology and medicine are data-rich disciplines, but the data are complex and often ill-understood. Hence, deep learning techniques may be particularly well suited to solve problems of these fields. We examine applications of deep learning to a variety of biomedical problems-patient classification, fundamental biological processes and treatment of patients-and discuss whether deep learning will be able to transform these tasks or if the biomedical sphere poses unique challenges. Following from an extensive literature review, we find that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art. Even though improvements over previous baselines have been modest in general, the recent progress indicates that deep learning methods will provide valuable means for speeding up or aiding human investigation. Though progress has been made linking a specific neural network\u27s prediction to input features, understanding how users should interpret these models to make testable hypotheses about the system under study remains an open challenge. Furthermore, the limited amount of labelled data for training presents problems in some domains, as do legal and privacy constraints on work with sensitive health records. Nonetheless, we foresee deep learning enabling changes at both bench and bedside with the potential to transform several areas of biology and medicine

    Summer Research Fellowship Project Descriptions 2022

    Get PDF
    A summary of research done by Smith College’s 2021 Summer Research Fellowship (SURF) Program participants. Ever since its 1967 start, SURF has been a cornerstone of Smith’s science education. Supervised by faculty mentor-advisors drawn from the Clark Science Center and connected to its eighteen science, mathematics, and engineering departments and programs and associated centers and units. At summer’s end, SURF participants were asked to summarize their research experiences for this publication.https://scholarworks.smith.edu/clark_womeninscience/1012/thumbnail.jp

    Putting ecological theories to the test : individual-based simulations of synthetic microbial community dynamics

    Get PDF
    Microbial communities are critical for the proper functioning of each and every ecosystem on Earth. The ability to understand the structure and functioning of these complex communities is crucial to manage and protect natural communities, as well as to rationally design engineered microbial communities for important applications ranging from medical and pharmaceutical uses to various bioindustrial processes. In recent years, synthetic microbial communities have gained increasing interest from microbiologists due to their reduced complexity and increased controllability, which favours them over more complex natural systems for examining ecological theories. In this thesis, the in silico counterpart of this approach was used to test ecological theories relating to biodiversity and functionality through the use of mathematical models. Models are abstractions of reality which allow for the testing of hypotheses in a controlled way. In this thesis, individual-based models of synthetic microbial communities were developed and used in simulation studies to answer research questions relating to community diversity, stability, productivity and functionality. The models are spatially explicit and track through time the characteristics, interactions and activities of every individual in the community. The modelling framework is flexible and thus also extendable to other avenues of research
    corecore