170 research outputs found

    Thermodynamic modeling reveals widespread multivalent binding by RNA-binding proteins

    Get PDF
    Motivation: Understanding how proteins recognize their RNA targets is essential to elucidate regulatory processes in the cell. Many RNA-binding proteins (RBPs) form complexes or have multiple domains that allow them to bind to RNA in a multivalent, cooperative manner. They can thereby achieve higher specificity and affinity than proteins with a single RNA-binding domain. However, current approaches to de novo discovery of RNA binding motifs do not take multivalent binding into account. Results: We present Bipartite Motif Finder (BMF), which is based on a thermodynamic model of RBPs with two co-operatively binding RNA-binding domains. We show that bivalent binding is a common strategy among RBPs, yielding higher affinity and sequence specificity. We furthermore illustrate that the spatial geometry between the binding sites can be learned from bound RNA sequences. These discovered bipartite motifs are consistent with previously known motifs and binding behaviors. Our results demonstrate the importance of multivalent binding for RNA-binding proteins and highlight the value of bipartite motif models in representing the multivalency of protein-RNA interactions

    Functional characterisation of cohesin subunit SMC3 and separase and their roles in the segregation of large and minichromosomes in Trypanosoma brucei

    Get PDF
    The genome of the African trypanosome, Trypanosoma brucei, presents an unusual karyotype in which two main classes of chromosomes, large and small minichromosomes, need to be faithfully replicated and segregated during the cell cycle. Although the large and minichromosomes are colocalised and segregated by association with the mitotic spindle, minichromosomes exhibit segregation patterns that differ from those observed for large chromosomes. To address whether this difference is reflected at a molecular level, two different proteins that have highly conserved functions in eukaryotic chromosomes segregation were characterised in this study. The first protein, SMC3, is a component of the chromosome cohesion apparatus that holds sister chromatids together after their replication until segregation at anaphase. The second protein, separase, is a cysteine protease that resolves sister chromatid cohesion at the onset of anaphase and has, in other organisms, additional functions during mitosis. The T. brucei cohesin subunit, TbSMC3, localised to the nucleus as a chromatin-bound protein from G1 phase until metaphase and dissociated from chromatin during anaphase until the completion of cell division. On the other hand, cytoplasmic localisation of separase with nuclear exclusion was prevalent until the onset of metaphase when the protein re-localised to the nucleus, thus providing a potential control mechanism to prevent premature cohesin cleavage. Interference with the normal expression of SMC3 and separase by RNA interference resulted in defects in growth rate, cell cycle progression and chromosomes segregation. TbSMC3 depletion produced a lethal phenotype and inhibition of cell cycle progression. Similarly, lethality with severe inhibition of cell cycle progression was the main feature of separase depletion. Using fluorescence in situ hybridisation (FISH), it was shown that SMC3 depletion had no visible effect on the symmetric segregation of the minichromosome population, but interferes with the faithful mitotic segregation of large chromosomes. In contrast, separase depletion blocks the segregation of both large and minichromosomes. In separase-depleted mitotic cells, cohesins remained bound to chromatin, which is in contrast to rapid dissociation of cohesins from chromatin in wild-type mitotic cells. The severity of segregation phenotypes after separase depletion was additionally explained by defects in the mitotic spindle assembly. In both SMC3 and separase depleted cells, cytokinesis in the absence of mitosis/karyokinesis was not inhibited in procyclic cells, resulting in the generation of anucleate 'zoid' cells. The lethality imposed on trypanosome cells after depletion of both SMC3 and separase proteins indicate that they can serve as potential drug targets for anti-parasite chemotherapy

    Front Matter - Soft Computing for Data Mining Applications

    Get PDF
    Efficient tools and algorithms for knowledge discovery in large data sets have been devised during the recent years. These methods exploit the capability of computers to search huge amounts of data in a fast and effective manner. However, the data to be analyzed is imprecise and afflicted with uncertainty. In the case of heterogeneous data sources such as text, audio and video, the data might moreover be ambiguous and partly conflicting. Besides, patterns and relationships of interest are usually vague and approximate. Thus, in order to make the information mining process more robust or say, human-like methods for searching and learning it requires tolerance towards imprecision, uncertainty and exceptions. Thus, they have approximate reasoning capabilities and are capable of handling partial truth. Properties of the aforementioned kind are typical soft computing. Soft computing techniques like Genetic

    Machine Learning in clinical biology and medicine: from prediction of multidrug resistant infections in humans to pre-mRNA splicing control in Ciliates

    Get PDF
    Machine Learning methods have broadly begun to infiltrate the clinical literature in such a way that the correct use of algorithms and tools can facilitate both diagnosis and therapies. The availability of large quantities of high-quality data could lead to an improved understanding of risk factors in community and healthcare-acquired infections. In the first part of my PhD program, I refined my skills in Machine Learning by developing and evaluate with a real antibiotic stewardship dataset, a model useful to predict multi-drugs resistant urinary tract infections after patient hospitalization9 . For this purpose, I created an online platform called DSaaS specifically designed for healthcare operators to train ML models (supervised learning algorithms). These results are reported in Chapter 2. In the second part of the PhD thesis (Chapter 3) I used my new skills to study the genomic variants, in particular the phenomenon of intron splicing. One of the important modes of pre-mRNA post-transcriptional modification is alternative intron splicing, that includes intron retention (unsplicing), allowing the creation of many distinct mature mRNA transcripts from a single gene. An accurate interpretation of genomic variants is the backbone of genomic medicine. Determining for example the causative variant in patients with Mendelian disorders facilitates both management and potential downstream treatment of the patient’s condition, as well as providing peace of mind and allowing more effective counselling for the wider family. Recent years have seen a surge in bioinformatics tools designed to predict variant impact on splicing, and these offer an opportunity to circumvent many limitations of RNA-seq based approaches. An increasing number of these tools rely on machine learning computational approaches that can identify patterns in data and use this knowledge to speculate on new data. I optimized a pipeline to extract and classify introns from genomes and transcriptomes and I classified them into retained (Ris) and constitutively spliced (CSIs) introns. I used data from ciliates for the peculiar organization of their genomes (enriched of coding sequences) and because they are unicellular organisms without cells differentiated into tissues. That made easier the identification and the manipulation of introns. In collaboration with the PhD colleague dr. Leonardo Vito, I analyzed these intronic sequences in order to identify “features” to predict and to classify them by Machine Learning algorithms. We also developed a platform useful to manipulate FASTA, gtf, BED, etc. files produced by the pipeline tools. I named the platform: Biounicam (intron extraction tools) available at http://46.23.201.244:1880/ui. The major objective of this study was to develop an accurate machine-learning model that can predict whether an intron will be retained or not, to understand the key-features involved in the intron retention mechanism, and provide insight on the factors that drive IR. Once the model has been developed, the final step of my PhD work will be to expand the platform with different machine learning algorithms to better predict the retention and to test new features that drive this phenomenon. These features hopefully will contribute to find new mechanisms that controls intron splicing. The other additional papers and patents I published during my PhD program are in Appendix B and C. These works have enriched me with many useful techniques for future works and ranged from microbiology to classical statistics

    Accessing new biomedical applications by combining genetic design and chemical modification of elastin-like recombinamers

    Get PDF
    El objetivo de esta tesis es demostrar que la versatilidad de estos recombinámeros se puede aumentar mediante modificación química y genética para la configurar la degradación, el autoensamblado y la interacción con células de los ELRs. El trabajo desarrollado en esta tesis aborda todo el proceso de diseño, producción, purificación, caracterización y aplicación directa de los nuevos ELRs. Para ello, se han utilizado una amplia variedad técnicas de ingeniería genética, microbiología, física, química junto con los correspondientes cultivos celulares. A) La tecnología del ADN recombinante permite un control total sobre el diseño de ELR, y de este modo la inserción de distintas secuencias biofuncionales, como secuencias sensibles a proteasas. Mediante el control de la disposición espacial de este tipo de secuencias proteolíticas queremos demostrar la biodegradación especifica de ELRs. Además, la capacidad de biodegradación selectiva será aplicada para la biofabricación de sustratos para detección zimográfica. B) Debido a la degradabbilidad controlable de los ELRs, su uso será estudiado como sustrato selectivo para la identificación de enzimas proteolíticas. Por lo tanto, siguiendo el estudio de la aplicación de ELRs para técnicas zimográficas diseñaremos un nuevo método de detección de proteasas con potencial para sistemas de inspección avanzados de alto rendimiento. C) Se ha demostrado que los biomateriales modificados con colesterol exhiben fuertes interacciones intermoleculares. Así, aplicaremos estas interacciones en un sistema de ELRs para generar fuerzas intermoleculares que desencadenen el autoensamblado de los ELR. D) Además, gracias a la capacidad de interacción de los grupos colesterol con membranas lipídicas, estudiaremos la capacidad de ELRs ricos en colesterol para mejorar la interacción de los éstos con ciertos tipos celulares implicados en la captación de lípidos, para aumentar el recubrimiento de células vivas con proteínas ELR.Departamento de Física de la Materia Condensada, Cristalografía y MineralogíaDoctorado en Química: Química de Síntesis, Catálisis y Materiales Avanzado

    A Machine Learning Classification Framework for Early Prediction of Alzheimer’s Disease

    Get PDF
    People today, in addition to their concerns about getting old and having to go through watching themselves grow weak and wrinkly, are facing an increasing fear of dementia. There are around 47 million people affected by dementia worldwide and the cost associated with providing them health and social care support is estimated to reach 2 trillion by 2030 which is almost equivalent to the 18th largest economy in the world. The most common form of dementia with the highest costs in health and social care is Alzheimer’s disease, which gradually kills neurons and causes patients to lose loving memories, the ability to recognise family members, childhood memories, and even the ability to follow simple instructions. Alzheimer’s disease is irreversible, unstoppable and has no known cure. Besides being a calamity to affected patients, it is a great financial burden on health providers. Health care providers also face a challenge in diagnosing the disease as current methods used to diagnose Alzheimer’s disease rely on manual evaluations of a patient’s medical history and mental examinations such as the Mini-Mental State Examination. These diagnostic methods often give a false diagnosis and were designed to identify Alzheimer’s after stage two when the part of all symptoms are evident. The problem is that clinicians are unable to stop or control the progress of Alzheimer’s disease, because of a lack of knowledge on the patterns that triggered the development of the disease. In this thesis, we explored and investigated Alzheimer’s disease from a computational perspective to uncover different risk factors and present a strategic framework called Early Prediction of Alzheimer’s Disease Framework (EPADf) that would give a future prediction of early-onset Alzheimer’s disease. Following extensive background research that resulted in the formalisation of the framework concept, prediction approaches, and the concept of ranking the risk factors based on clinical instinct, knowledge and experience using mathematical reasoning, we carried out experiments to get further insight and investigate the disease further using machine learning models. In this study, we used machine learning models and conducted two classification experiments for early prediction of Alzheimer’s disease, and one ranking experiment to rank its risk factors by importance. Besides these experiments, we also presented two logical approaches to search for patterns in an Alzheimer’s dataset, and a ranking algorithm to rank Alzheimer’s disease risk factors based on clinical evaluation. For the classification experiments we used five different Machine Learning models; Random Forest (RF), Random Oracle Model (ROM), a hybrid model combined of Levenberg-Marquardt neural network and Random Forest, combined using Fischer discriminate analysis (H2), Linear Neural Networks (LNN), and Multi-layer Perceptron Model (MLP). These models were deployed on a de-identified multivariable patient’s data, provided by the ADNI (Alzheimer’s disease Neuroimaging Initiative), to illustrate the effective use of data analysis to investigate Alzheimer’s disease biological and behavioural risk factors. We found that the continues enhancement of patient’s data and the use of combined machine learning models can provide an early cost-effective prediction of Alzheimer’s disease, and help in extracting insightful information on the risk factors of the disease. Based on this work and findings we have developed the strategic framework (EPADf) which is discussed in more depth in this thesis

    Non-coding RNA networks regulating leaf vegetative desiccation tolerance in the resurrection plant Xerophyta humilis.

    Get PDF
    Common to orthodox seeds, desiccation tolerance (DT) is exceedingly rare in the vegetative tissues of modern angiosperms, being limited to a small number of "resurrection plants". While the molecular mechanisms of DT, as well as the transcription factors regulating the seed and vegetative DT programmes, have been identified, very little is known with regards to the role of regulatory noncoding RNAs (ncRNAs). To investigate the presence and roles of possible ncRNA players, RNA-Seq was performed on desiccating Xerophyta humilis leaves and a bioinformatic pipeline assembled to identify the potential decoy lncRNAs and miRNAs present. Interaction mapping was performed, identifying a number of small regulatory networks each regulating a small subset of the desiccation transcriptome. Predicted networks were screened for function related to DT and expression consistent with functional regulatory interactions. Of the predicted networks, two appear highly promising as potential regulators of key DT response genes. The results indicate that differentially expressed (DE) desiccation response ncRNAs are present in the vegetative tissues of X. humilis and likely play a key role in the regulation of DT. This suggests that ncRNAs appear to play a more important role in DT than previously thought, and may have facilitated the evolution of vegetative DT through reprogramming of seed DT programs in vegetative tissues

    Key body pose detection and movement assessment of fitness performances

    Get PDF
    Motion segmentation plays an important role in human motion analysis. Understanding the intrinsic features of human activities represents a challenge for modern science. Current solutions usually involve computationally demanding processing and achieve the best results using expensive, intrusive motion capture devices. In this thesis, research has been carried out to develop a series of methods for affordable and effective human motion assessment in the context of stand-up physical exercises. The objective of the research was to tackle the needs for an autonomous system that could be deployed in nursing homes or elderly people's houses, as well as rehabilitation of high profile sport performers. Firstly, it has to be designed so that instructions on physical exercises, especially in the case of elderly people, can be delivered in an understandable way. Secondly, it has to deal with the problem that some individuals may find it difficult to keep up with the programme due to physical impediments. They may also be discouraged because the activities are not stimulating or the instructions are hard to follow. In this thesis, a series of methods for automatic assessment production, as a combination of worded feedback and motion visualisation, is presented. The methods comprise two major steps. First, a series of key body poses are identified upon a model built by a multi-class classifier from a set of frame-wise features extracted from the motion data. Second, motion alignment (or synchronisation) with a reference performance (the tutor) is established in order to produce a second assessment model. Numerical assessment, first, and textual feedback, after, are delivered to the user along with a 3D skeletal animation to enrich the assessment experience. This animation is produced after the demonstration of the expert is transformed to the current level of performance of the user, in order to help encourage them to engage with the programme. The key body pose identification stage follows a two-step approach: first, the principal components of the input motion data are calculated in order to reduce the dimensionality of the input. Then, candidates of key body poses are inferred using multi-class, supervised machine learning techniques from a set of training samples. Finally, cluster analysis is used to refine the result. Key body pose identification is guaranteed to be invariant to the repetitiveness and symmetry of the performance. Results show the effectiveness of the proposed approach by comparing it against Dynamic Time Warping and Hierarchical Aligned Cluster Analysis. The synchronisation sub-system takes advantage of the cyclic nature of the stretches that are part of the stand-up exercises subject to study in order to remove out-of-sequence identified key body poses (i.e., false positives). Two approaches are considered for performing cycle analysis: a sequential, trivial algorithm and a proposed Genetic Algorithm, with and without prior knowledge on cyclic sequence patterns. These two approaches are compared and the Genetic Algorithm with prior knowledge shows a lower rate of false positives, but also a higher false negative rate. The GAs are also evaluated with randomly generated periodic string sequences. The automatic assessment follows a similar approach to that of key body pose identification. A multi-class, multi-target machine learning classifier is trained with features extracted from previous motion alignment. The inferred numerical assessment levels (one per identified key body pose and involved body joint) are translated into human-understandable language via a highly-customisable, context-free grammar. Finally, visual feedback is produced in the form of a synchronised skeletal animation of both the user's performance and the tutor's. If the user's performance is well below a standard then an affine offset transformation of the skeletal motion data series to an in-between performance is performed, in order to prevent dis-encouragement from the user and still provide a reference for improvement. At the end of this thesis, a study of the limitations of the methods in real circumstances is explored. Issues like the gimbal lock in the angular motion data, lack of accuracy of the motion capture system and the escalation of the training set are discussed. Finally, some conclusions are drawn and future work is discussed
    corecore