543 research outputs found

    Workshop Proceedings of the 12th edition of the KONVENS conference

    Get PDF
    The 2014 issue of KONVENS is even more a forum for exchange: its main topic is the interaction between Computational Linguistics and Information Science, and the synergies such interaction, cooperation and integrated views can produce. This topic at the crossroads of different research traditions which deal with natural language as a container of knowledge, and with methods to extract and manage knowledge that is linguistically represented is close to the heart of many researchers at the Institut für Informationswissenschaft und Sprachtechnologie of Universität Hildesheim: it has long been one of the institute’s research topics, and it has received even more attention over the last few years

    Efficient algorithms for simulation and analysis of many-body systems

    Get PDF
    This thesis introduces methods to efficiently generate and analyze time series data of many-body systems. While we have a strong focus on biomolecular processes, the presented methods can also be applied more generally. Due to limitations of microscope resolution in both space and time, biomolecular processes are especially hard to observe experimentally. Computer models offer an opportunity to work around these limitations. However, as these models are bound by computational effort, careful selection of the model as well as its efficient implementation play a fundamental role in their successful sampling and/or estimation. Especially for high levels of resolution, computer simulations can produce vast amounts of high-dimensional data and in general it is not straightforward to visualize, let alone to identify the relevant features and processes. To this end, we cover tools for projecting time series data onto important processes, finding over time geometrically stable features in observable space, and identifying governing dynamics. We introduce the novel software library deeptime with two main goals: (1) making methods which were developed in different communities (such as molecular dynamics and fluid dynamics) accessible to a broad user base by implementing them in a general-purpose way, and (2) providing an easy to install, extend, and maintain library by employing a high degree of modularity and introducing as few hard dependencies as possible. We demonstrate and compare the capabilities of the provided methods based on numerical examples. Subsequently, the particle-based reaction-diffusion simulation software package ReaDDy2 is introduced. It can simulate dynamics which are more complicated than what is usually analyzed with the methods available in deeptime. It is a significantly more efficient, feature-rich, flexible, and user-friendly version of its predecessor ReaDDy. As such, it enables---at the simulation model's resolution---the possibility to study larger systems and to cover longer timescales. In particular, ReaDDy2 is capable of modeling complex processes featuring particle crowding, space exclusion, association and dissociation events, dynamic formation and dissolution of particle geometries on a mesoscopic scale. The validity of the ReaDDy2 model is asserted by several numerical studies which are compared to analytically obtained results, simulations from other packages, or literature data. Finally, we present reactive SINDy, a method that can detect reaction networks from concentration curves of chemical species. It extends the SINDy method---contained in deeptime---by introducing coupling terms over a system of ordinary differential equations in an ansatz reaction space. As such, it transforms an ordinary linear regression problem to a linear tensor regression. The method employs a sparsity-promoting regularization which leads to especially simple and interpretable models. We show in biologically motivated example systems that the method is indeed capable of detecting the correct underlying reaction dynamics and that the sparsity regularization plays a key role in pruning otherwise spuriously detected reactions

    LIPIcs, Volume 261, ICALP 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 261, ICALP 2023, Complete Volum

    Modelling of the Electric Vehicle Charging Infrastructure as Cyber Physical Power Systems: A Review on Components, Standards, Vulnerabilities and Attacks

    Full text link
    The increasing number of electric vehicles (EVs) has led to the growing need to establish EV charging infrastructures (EVCIs) with fast charging capabilities to reduce congestion at the EV charging stations (EVCS) and also provide alternative solutions for EV owners without residential charging facilities. The EV charging stations are broadly classified based on i) where the charging equipment is located - on-board and off-board charging stations, and ii) the type of current and power levels - AC and DC charging stations. The DC charging stations are further classified into fast and extreme fast charging stations. This article focuses mainly on several components that model the EVCI as a cyberphysical system (CPS)

    24th Nordic Conference on Computational Linguistics (NoDaLiDa)

    Get PDF

    Methods for Epigenetic Analyses from Long-Read Sequencing Data

    Get PDF
    Epigenetics, particularly the study of DNA methylation, is a cornerstone field for our understanding of human development and disease. DNA methylation has been included in the "hallmarks of cancer" due to its important function as a biomarker and its contribution to carcinogenesis and cancer cell plasticity. Long-read sequencing technologies, such as the Oxford Nanopore Technologies platform, have evolved the study of structural variations, while at the same time allowing direct measurement of DNA methylation on the same reads. With this, new avenues of analysis have opened up, such as long-range allele-specific methylation analysis, methylation analysis on structural variations, or relating nearby epigenetic modalities on the same read to another. Basecalling and methylation calling of Nanopore reads is a computationally expensive task which requires complex machine learning architectures. Read-level methylation calls require different approaches to data management and analysis than ones developed for methylation frequencies measured from short-read technologies or array data. The 2-dimensional nature of read and genome associated DNA methylation calls, including methylation caller uncertainties, are much more storage costly than 1-dimensional methylation frequencies. Methods for storage, retrieval, and analysis of such data therefore require careful consideration. Downstream analysis tasks, such as methylation segmentation or differential methylation calling, have the potential of benefiting from read information and allow uncertainty propagation. These avenues had not been considered in existing tools. In my work, I explored the potential of long-read DNA methylation analysis and tackled some of the challenges of data management and downstream analysis using state of the art software architecture and machine learning methods. I defined a storage standard for reference anchored and read assigned DNA methylation calls, including methylation calling uncertainties and read annotations such as haplotype or sample information. This storage container is defined as a schema for the hierarchical data format version 5, includes an index for rapid access to genomic coordinates, and is optimized for parallel computing with even load balancing. It further includes a python API for creation, modification, and data access, including convenience functions for the extraction of important quality statistics via a command line interface. Furthermore, I developed software solutions for the segmentation and differential methylation testing of DNA methylation calls from Nanopore sequencing. This implementation takes advantage of the performance benefits provided by my high performance storage container. It includes a Bayesian methylome segmentation algorithm which allows for the consensus instance segmentation of multiple sample and/or haplotype assigned DNA methylation profiles, while considering methylation calling uncertainties. Based on this segmentation, the software can then perform differential methylation testing and provides a large number of options for statistical testing and multiple testing correction. I benchmarked all tools on both simulated and publicly available real data, and show the performance benefits compared to previously existing and concurrently developed solutions. Next, I applied the methods to a cancer study on a chromothriptic cancer sample from a patient with Sonic Hedgehog Medulloblastoma. I here report regulatory genomic regions differentially methylated before and after treatment, allele-specific methylation in the tumor, as well as methylation on chromothriptic structures. Finally, I developed specialized methylation callers for the combined DNA methylation profiling of CpG, GpC, and context-free adenine methylation. These callers can be used to measure chromatin accessibility in a NOMe-seq like setup, showing the potential of long-read sequencing for the profiling of transcription factor co-binding. In conclusion, this thesis presents and subsequently benchmarks new algorithmic and infrastructural solutions for the analysis of DNA methylation data from long-read sequencing

    Clustering and analysis of g quadruplex sequences.

    Get PDF
    G quadruplex structures are secondary structures located throughout the genome of various organisms with involvement in regulatory functions in different transcription, translation, genome stability, epigenetic regulation as well as cell division. Even with the diverse acknowledgement of G4 structure in vivo, there are no current search tools for G quadruplexes based on already identified G quadruplexes and identified families across different genomes based on sequence diversity. Construction of families of G4 sequences and identifying their polymorphisms within disease and disorders will lead to a better understanding of their functional roles and will further research into the biophysical modeling of interactions with oligonucleotide treatments of disease. The first project aims to develop a framework for clustering G quadruplex (G4) sequences into families based on sequence, structure, and thermodynamic properties. No current search tools exist to filter G4s based on their properties, and the diversity of G4 sequences across the genome is not fully understood. To address this gap, we utilized a combination of clustering and annotation methods to identify 95 families of G4 sequences within the human genome. Profiles for each family were created using hidden Markov models, and their thermodynamic properties, functional annotations, and transcription factor binding motifs were analyzed. The second project aims to investigate the effect of single nucleotide variations (SNVs) on G4 structures in disease contexts. Although the role of G4s in cancer and metabolic disorders are well-established, the effect of SNVs on G4s has not been extensively studied. Using the COSMIC and CLINVAR databases, we identified over 37,000 G4 SNVs and analyzed their effects on G4 secondary structures. We found that a significant proportion of SNVs result in G4 loss or gain, and we identified genes enriched for destabilizing SNVs in G4-forming regions. We also analyzed mutational patterns in the G4 structure and found a higher selective pressure on the coding region of the template strand. Our findings provide insights into the effects of SNVs on G4 structures and highlight potential targets for therapeutic intervention in diseases associated with G4 dysregulation
    corecore