5,394 research outputs found
Automated sequence design of nucleic acid hybridization reactions for microRNA detection
[EN] microRNA (miRNA) can be found in a variety of biological samples and then they
represent important molecular markers for early diagnostic strategies. This work (TFG)
explores a novel approach based on nested non-enzymatic and enzymatic biochemical
processes in vitro. In particular, an automated sequence design algorithm of nucleic acid
hybridization reactions for microRNA detection is developed.[ES] Los microRNAs (miRNAs) pueden ser hallados en una gran variedad de muestras
biológicas y suponen una fuente importante de marcadores moleculares para
estrategias de diagnóstico tempranas. En este trabajo (TFG), se explora un abordaje
novedoso basado en procesos bioquÃmicos anidados enzimáticos y no enzimáticos in
vitro. Particularmente, se desarrolla un algoritmo de diseño de secuencias automatizado
para reacciones de hibridación de ácidos nucleicos para la detección de microRNA.Goiriz Beltrán, L. (2019). Automated sequence design of nucleic acid hybridization reactions for microRNA detection. http://hdl.handle.net/10251/125058TFG
Evolutionary relationships in Panicoid grasses based on plastome phylogenomics (Panicoideae; Poaceae)
Background: Panicoideae are the second largest subfamily in Poaceae (grass family), with 212 genera and approximately 3316 species. Previous studies have begun to reveal relationships within the subfamily, but largely lack resolution and/or robust support for certain tribal and subtribal groups. This study aims to resolve these relationships, as well as characterize a putative mitochondrial insert in one linage. Results: 35 newly sequenced Panicoideae plastomes were combined in a phylogenomic study with 37 other species: 15 Panicoideae and 22 from outgroups. A robust Panicoideae topology largely congruent with previous studies was obtained, but with some incongruences with previously reported subtribal relationships. A mitochondrial DNA (mtDNA) to plastid DNA (ptDNA) transfer was discovered in the Paspalum lineage. Conclusions: The phylogenomic analysis returned a topology that largely supports previous studies. Five previously recognized subtribes appear on the topology to be non-monophyletic. Additionally, evidence for mtDNA to ptDNA transfer was identified in both Paspalum fimbriatum and P. dilatatum, and suggests a single rare event that took place in a common progenitor. Finally, the framework from this study can guide larger whole plastome sampling to discern the relationships in Cyperochloeae, Steyermarkochloeae, Gynerieae, and other incertae sedis taxa that are weakly supported or unresolved.Fil: Burke, Sean V.. Northern Illinois University; Estados UnidosFil: Wysocki, William P.. Northern Illinois University; Estados UnidosFil: Zuloaga, Fernando Omar. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Instituto de Botánica Darwinion. Academia Nacional de Ciencias Exactas, FÃsicas y Naturales. Instituto de Botánica Darwinion; ArgentinaFil: Craine, Joseph M.. Jonah Ventures; Estados UnidosFil: Pires, J. Chris. University of Missouri; Estados UnidosFil: Edger, Patrick P.. Michigan State University; Estados UnidosFil: Mayfield Jones, Dustin. Donald Danforth Plant Science Center; Estados UnidosFil: Clark, Lynn G.. Iowa State University; Estados UnidosFil: Kelchner, Scot A.. University of Idaho; Estados UnidosFil: Duvall, Melvin R.. Northern Illinois University; Estados Unido
Recommended from our members
Methodology for identifying alternative solutions in a population based data generation approach applied to synthetic biology
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonDesign is an essential component of sustainable development. Computational modelling has
become a useful technique that facilitates the design of complex systems. Variables that characterises
a complex system are encoded into a computational model using mathematical concepts
and through simulation each of these variables alone or in combination are modified to observe
the changes in the outcome. This allows the researchers to make predictions on the behaviour
of the real system that is being studied in response to the changes. The ultimate goal of any
design process is to come up with the best design; as resources are limited, to minimize the cost
and resource consumption, and to maximize the performance, profits and efficiency. To optimize
means to find the best solution, the best compromise among several conflicting demands subject
to predefined requirements. Therefore, computational optimization, modelling and simulation
forms an integrated part of the modern design practice.
This thesis defines a data analytics driven methodology which enables the identification of
alternative solutions of computational design by analysing the generational history of the population
based heuristic search used to generate the templates. While optimisation is focused on
obtaining the optimal solution this methodology focuses on alternative solutions which are sub
optimal by fitness or solutions with similar fitness but different structures. When the optimal
design solution is less robust, alternative solutions can offer a sufficiently good accuracy and an
achievable resource requirement. The main advantage of the methodology is that it exploits the
exploration process of the solution space during a single run, by focusing also on suboptimal
solutions, which usually get neglected in the search for an optimal one. The history of the
heuristic search is analysed for the emergence of alternative solutions and evolving of a solution.
By examining how an initial solution converts to an optimal solution core design patterns are
identified, and these were used to improve the design process. Further, this method limits the
number of runs of the heuristic search as more solution space is covered. The methodology is
generic because it can be used to any instance where a population based heuristic search is applied
to generate optimal designs. The applicability of the methodology is demonstrated using
three case studies from mathematics (building of a mathematical function for a set target) and
biology (obtaining alternative designs for genomic metabolic models [GEM] and DNA walker
circuits). In each case a different heuristic search method was used: Gene expression programming
(mathematical expressions), genetic algorithms (GEM models) and simulated annealing
(DNA walker circuits). Descriptive analytics, visual analytics and clustering was mainly used to build the data analytics driven approach in identifying alternative solutions. This data analytics
driven methodology is useful in optimising the computational design of complex systems
Digital Ecosystems: Ecosystem-Oriented Architectures
We view Digital Ecosystems to be the digital counterparts of biological
ecosystems. Here, we are concerned with the creation of these Digital
Ecosystems, exploiting the self-organising properties of biological ecosystems
to evolve high-level software applications. Therefore, we created the Digital
Ecosystem, a novel optimisation technique inspired by biological ecosystems,
where the optimisation works at two levels: a first optimisation, migration of
agents which are distributed in a decentralised peer-to-peer network, operating
continuously in time; this process feeds a second optimisation based on
evolutionary computing that operates locally on single peers and is aimed at
finding solutions to satisfy locally relevant constraints. The Digital
Ecosystem was then measured experimentally through simulations, with measures
originating from theoretical ecology, evaluating its likeness to biological
ecosystems. This included its responsiveness to requests for applications from
the user base, as a measure of the ecological succession (ecosystem maturity).
Overall, we have advanced the understanding of Digital Ecosystems, creating
Ecosystem-Oriented Architectures where the word ecosystem is more than just a
metaphor.Comment: 39 pages, 26 figures, journa
MISSEL: a method to identify a large number of small species-specific genomic subsequences and its application to viruses classification
Continuous improvements in next generation sequencing technologies led to ever-increasing collections of genomic sequences, which have not been easily characterized by biologists, and whose analysis requires huge computational effort. The classification of species emerged as one of the main applications of DNA analysis and has been addressed with several approaches, e.g., multiple alignments-, phylogenetic trees-, statistical- and character-based methods
A new computational framework for the classification and function prediction of long non-coding RNAs
Long non-coding RNAs (lncRNAs) are known to play a significant role in several biological processes. These RNAs possess sequence length greater than 200 base pairs (bp), and so are often misclassified as protein-coding genes. Most Coding Potential Computation (CPC) tools fail to accurately identify, classify and predict the biological functions of lncRNAs in plant genomes, due to previous research being limited to mammalian genomes.
In this thesis, an investigation and extraction of various sequence and codon-bias features for identification of lncRNA sequences has been carried out, to develop a new CPC Framework. For identification of essential features, the framework implements regularisation-based selection. A novel classification algorithm is implemented, which removes the dependency on experimental datasets and provides a coordinate-based solution for sub-classification of lncRNAs. For imputing the lncRNA functions, lncRNA-protein interactions have been first determined through co-expression of genes which were re-analysed by a sequence similaritybased approach for identification of novel interactions and prediction of lncRNA functions in the genome. This integrates a D3-based application for visualisation of lncRNA sequences and their associated functions in the genome.
Standard evaluation metrics such as accuracy, sensitivity, and specificity have been used for benchmarking the performance of the framework against leading CPC tools. Case study analyses were conducted with plant RNA-seq datasets for evaluating the effectiveness of the framework using a cross-validation approach. The tests show the framework can provide significant improvements on existing CPC models for plant genomes: 20-40% greater accuracy. Function prediction analysis demonstrates results are consistent with the experimentally-published findings
Investigating modularity and transparency within bioinspired connectionist architectures using genetic and epigenetic models
Machine learning algorithms allow computers to deal with incomplete data in tasks such as speech recognition and object detection. Some machine learning algorithms take inspiration from biological systems due to useful properties such as robustness, allowing algorithms to be flexible and domain agnostic. This comes at a cost, resulting in difficulty when one attempts to understand the reasoning behind decisions. This is problematic when such models are applied in realworld situations where accountability, legality, and maintenance are of concern. Artificial gene regulatory networks (AGRNs) are a type of connectionist architecture inspired by gene regulatory mechanisms. AGRNs are of interest within this thesis due to their ability to solve tasks in chaotic dynamical systems despite their relatively small size.The overarching aim of this work was to investigate the properties of connectionist architectures to improve the transparency of their execution. Initially, the evolutionary process and internal structure of AGRNs were investigated. Following this, the creation of an external control layer used to improve the transparency of execution of an external connectionist architecture was attempted.When investigating the evolutionary process of AGRNs, pathways were found that when followed, produced more performant networks in a shorter time frame. Evidence that AGRNs are capable of performing well despite internal interference was found when investigating their modularity, where it was also discovered that they do not develop strict modularity consistently. A control layer inspired by epigenetics that selectively deactivates nodes in trained artificial neural networks (ANNs) was developed; the analysis of its behaviour provided an insight into the internal workings of the ANN
Recommended from our members
Mapping the Genomic Context of Mutagenesis
The accumulation of genomic mutations leads to the formation of cancer. For this reason, many efforts have been undertaken to characterise mutational processes in terms of their genomic imprints. A particularly successful approach is matrix-based mutational signature analysis, which identifies prototypical mutation patterns by applying non-negative matrix factorisation to catalogues of single nucleotide variants and other mutation types. However, mutagenesis is a multifaceted event that is affected by the genomic organisation of DNA and cellular processes such as transcription, replication, and DNA repair processes. Moreover, since many mutational processes also generate characteristic multi nucleotide variants, insertion and deletions, and structural variants, it appears valuable to jointly deconvolve broader mutational catalogues to better understand the complex nature of mutagenesis.
In this thesis, I present TensorSignatures, an algorithm to learn mutational signatures jointly across different variant categories as well as their genomic localisation and properties. The analysis of 2,778 primary and 3,824 metastatic cancer genomes of the PCAWG consortium and the HMF cohort shows that practically all signatures operate dynamically in response to various genomic and epigenomic states. The analysis pins differential spectra of UV mutagenesis found in active and inactive chromatin to global genome nucleotide excision repair. TensorSignatures accurately characterises transcription-associated mutagenesis, which is detected in 7 different cancer types. The algorithm also extracts distinct signatures of replication- and double strand break repair-driven mutagenesis by APOBEC3A and 3B with differential numbers and length of mutation clusters. As a fourth example, TensorSignatures reproduces a signature of somatic hypermutation generating highly clustered variants around the transcription start sites of active genes in lymphoid leukaemia, distinct from a more general and less clustered signature of Polη-driven translesion synthesis found in a broad range of cancer types. Finally, I demonstrate TensorSignatures’ utility by applying it to multiple datasets in various collaboration projects.
Taken together, TensorSignatures adds great detail and refines mutational signature analysis by jointly learning mutation patterns and their genomic determinants. This sheds light on the manifold influences that underlie mutagenesis and helps to pinpoint mutagenic influences which cannot easily be distinguished based on the mutation spectra alone. As mutational signature analysis is an essential element of the cancer genome analysis toolkit, TensorSignatures may help make the growing catalogues of mutational signatures more insightful by highlighting mutagenic mechanisms, or hypotheses thereof, to be investigated in greater depth
- …