534 research outputs found

    HIV-TRACE (Transmission Cluster Engine):A tool for large scale molecular epidemiology of HIV-1 and other rapidly evolving pathogens

    Get PDF
    In modern applications of molecular epidemiology, genetic sequence data are routinely used to identify clusters of transmission in rapidly evolving pathogens, most notably HIV-1. Traditional 'shoe-leather' epidemiology infers transmission clusters by tracing chains of partners sharing epidemiological connections (e.g., sexual contact). Here, we present a computational tool for identifying a molecular transmission analog of such clusters: HIV-TRACE (TRAnsmission Cluster Engine). HIV-TRACE implements an approach inspired by traditional epidemiology, by identifying chains of partners whose viral genetic relatedness imply direct or indirect epidemiological connections. Molecular transmission clusters are constructed using codon-aware pairwise alignment to a reference sequence followed by pairwise genetic distance estimation among all sequences. This approach is computationally tractable and is capable of identifying HIV-1 transmission clusters in large surveillance databases comprising tens or hundreds of thousands of sequences in near real time, that is, on the order of minutes to hours. HIV-TRACE is available at www.hivtrace.org and from www.github.com/veg/hivtrace, along with the accompanying result visualization module from www.github.com/veg/hivtrace-viz. Importantly, the approach underlying HIV-TRACE is not limited to the study of HIV-1 and can be applied to study outbreaks and epidemics of other rapidly evolving pathogens

    A log-ratio biplot approach for exploring genetic relatedness based on identity by state

    Get PDF
    The detection of cryptic relatedness in large population-based cohorts is of great importance in genome research. The usual approach for detecting closely related individuals is to plot allele sharing statistics, based on identity-by-state or identity-by-descent, in a two-dimensional scatterplot. This approach ignores that allele sharing data across individuals has in reality a higher dimensionality, and neither regards the compositional nature of the underlying counts of shared genotypes. In this paper we develop biplot methodology based on log-ratio principal component analysis that overcomes these restrictions. This leads to entirely new graphics that are essentially useful for exploring relatedness in genetic databases from homogeneous populations. The proposed method can be applied in an iterative manner, acting as a looking glass for more remote relationships that are harder to classify. Datasets from the 1,000 Genomes Project and the Genomes For Life-GCAT Project are used to illustrate the proposed method. The discriminatory power of the log-ratio biplot approach is compared with the classical plots in a simulation study. In a non-inbred homogeneous population the classification rate of the log-ratio principal component approach outperforms the classical graphics across the whole allele frequency spectrum, using only identity by state. In these circumstances, simulations show that with 35,000 independent bi-allelic variants, log-ratio principal component analysis, combined with discriminant analysis, can correctly classify relationships up to and including the fourth degreePostprint (published version

    Thermal Neural Networks: Lumped-Parameter Thermal Modeling With State-Space Machine Learning

    Full text link
    With electric power systems becoming more compact and increasingly powerful, the relevance of thermal stress especially during overload operation is expected to increase ceaselessly. Whenever critical temperatures cannot be measured economically on a sensor base, a thermal model lends itself to estimate those unknown quantities. Thermal models for electric power systems are usually required to be both, real-time capable and of high estimation accuracy. Moreover, ease of implementation and time to production play an increasingly important role. In this work, the thermal neural network (TNN) is introduced, which unifies both, consolidated knowledge in the form of heat-transfer-based lumped-parameter models, and data-driven nonlinear function approximation with supervised machine learning. A quasi-linear parameter-varying system is identified solely from empirical data, where relationships between scheduling variables and system matrices are inferred statistically and automatically. At the same time, a TNN has physically interpretable states through its state-space representation, is end-to-end trainable -- similar to deep learning models -- with automatic differentiation, and requires no material, geometry, nor expert knowledge for its design. Experiments on an electric motor data set show that a TNN achieves higher temperature estimation accuracies than previous white-/grey- or black-box models with a mean squared error of 3.18 K23.18~\text{K}^2 and a worst-case error of 5.84 K5.84~\text{K} at 64 model parameters.Comment: Preprint; Fix typos, streamline math. notation; 10 page

    Random forests on Hadoop for genome-wide association studies of multivariate neuroimaging phenotypes

    Get PDF
    10.1186/1471-2105-14-S16-S6BMC Bioinformatics14SUPPL16-BBMI

    Measuring the effects of reaction coordinate and electronic treatments in the QM/MM reaction dynamics of Trypanosoma cruzi trans-sialidase

    Get PDF
    The free energy of activation, as defined in transition state theory, is central to calculating reaction rates, distinguishing between mechanistic paths and elucidating the catalytic process. Computational free energies are accessible through the reaction space that is comprised of the conformational and electronic degrees of freedom orthogonal to the reaction coordinate. The overarching aim of this thesis was to address theoretical and methodological challenges facing current methods for calculating reaction free energies in glycoenzyme systems. Tractable calculations balance chemical accuracy and sampling efficiency that necessitates simplification of these complex reaction spaces through quantum mechanics/molecular mechanics partitioning and use of a semi-empirical electronic method to sample an approximated reaction coordinate. Here I directly and indirectly interrogate both the appropriate levels of sampling as well as the accuracy of the semi-empirical method required for reliable analysis of glycoenzyme reaction pathways. Free Energies from Adaptive Reaction Coordinates Forces, a method that builds the potential of mean force from multiple iterations of reactive trajectories, was used to construct reaction surfaces and volumes for the glycosylation and deglycosylation reactions comprising the T. cruzi trans-sialidase catalytic itinerary. This enzyme was chosen for the wealth of experimental data available for it built from its significance as a potential drug target against Chagas disease. Of equal importance is the identification of an elimination reaction competing with the primary transferase activity. The identification of this side reaction, that is observable only in the absence of the trans-sialidase or sialic acid acceptor, presented the opportunity to study the means by which enzymes selectivity bias in favor of a single reaction path. I therefore set out to explore the molecular details of how T. cruzi transsialidase asserts a precision and selectivity synonymous with enzyme catalysis. The chemical nature of the transition sate, formally defined as a dividing hypersurface separating the reactant and product regions of phase space, was characterized for the deglycosylation reaction. More than 40 transition state configurations were isolated from reactive trajectories, and the sialic acid substrate conformations were analyzed as well as the substrate interactions with the nucleophile and catalytic acid/base. A successful barrier crossing requires that the substrate pass through a family of E₅, ⁴H₅ and ⁶H₅ puckered conformations, all of which interact slightly differently with the enzyme. This work brings new evidence to the prevailing premise that there are several pathways from reactant to product passing through the saddle and successful product formation is not restricted to the minimum energy path. Increasing the reaction space with use of a multi-dimensional (3-D) reaction coordinate allowed simultaneous monitoring of the hitherto unexplored competition between a minor elimination reaction and the dominant displacement reaction present in both steps of the catalytic cycle. The dominant displacement reactions display lower barriers in the free energy profiles, greater sampling of favorable reactant stereoelectronic alignments and a greater number of possible transition paths leading to successful crossing reaction trajectories. The effects on the electronic degrees of freedom in reaction space were then investigated by running density functional theory reactive trajectories on the semi-empirical free energy. In order to carry out these simulations Free Energies from Adaptive Reaction Coordinates Forces was ported as a Fortran 90 library that interfaces with the NWChem molecular dynamics package. The resulting B3LYP/6-31G/CHARMM crossing trajectory provides a molecular orbital description of the glycosylation reaction. Direct investigation of the underlying potential energy functions for B3LYP/6-31G(d), B3LYP/6-31G and SCC-DFTB/MIO point to the minimal basis set as the primary limitation in using self-consistent charge density functional tight binding as the quantum mechanical model for modeling of enzymatic reactions transforming sialic acid substrates

    Development of capillary based separation techniques for the separation of proteins equilibrated using hexapeptide ligand libraries

    Get PDF
    Challenges in research areas such as chemistry, medicine, environmental toxicology and biology require the analysis of complex samples. Fast analysis of these samples using separation techniques with spectrometric or spectroscopic detection is common. Most often, chromatographic separation techniques such as gas and liquid chromatography coupled to mass spectrometry are chosen. These techniques, however, often reach their limits when highly charged analytes are investigated. Here, electromigrative separation techniques with their orthogonal separation mechanism are an attractive alternative. A very promising electrophoretic separation technique, which is primarily used during this work, is capillary electrophoresis (CE). One of the greatest challenges using this technique lies in the separation of biological samples, since analytes such as polyamines, peptides and proteins interact with and adsorb on the surface of “bare fused silica”-capillaries which impairs reproducibility. Without efficient suppression of these interactions, separation efficiency and run-to-run reproducibility suffer. A good way to suppress these detrimental interactions of analytes with the capillary surface is to modify the surface using of dynamic, statically adsorbed or covalently bound capillary coatings. In this work, I present approaches for the reproducible separation of polyamines, peptides and proteins: 1) In Chapter 2, the use of poly ethylene oxide as dynamic coating in SDS-CE enables the size-based separation of proteins up to a weight of 100 kDa. Advantages of this technique over classic gelbased SDS-PAGE are separation times of about 20 min and direct quantification via on-line UVdetection without the need of preliminary labeling or subsequent dyeing. Separation times were reduced to 5 min by short-end-injection and modification of the aperture for UV-detection. The presented separation system offers outstanding matrix tolerance: even complex samples such as serum were successfully separated without additional processing. Increased separation performance and efficiency were aspired by the addition of alkanols to the BGE, variation of temperature and the use of enrichment plugs in the capillary. Therein, especially the use of 2- propanol in the BGE proves fruitful regarding separation efficiency in the mass range up to 40 kDa. In Chapter 3 I will interpret my results with an extensive literature search to show, that the observed increase in separation efficiency is linked to a change of the separation mechanism from Reptation- to Ogston-sieving. 2) In Chapter 4, I proudly present, that I achieved CE-separation with MS-hyphenation not only for small polyamines and peptides, but also of large, non-digested proteins. This was possible using a single capillary coating only based on N-acryloylamido ethoxyethanol (AAEE). This highly polar and covalently bound capillary coating offers enjoyably high reproducibly and stability, the latter enabling operating times of 100 h, even when complex samples such as human serum and polyamines in fish eggs were analyzed. In Chapter 5 a novel and parallelized approach for the synthesis of this capillary coating is presented. SEM-measurements of the capillary surface between reaction steps forced me to postulate a novel reaction-mechanism for the formation of the coated surface. Additionally, I present, that pre-conditioning of capillaries with hypercritical water can result in higher reproducibility of capillary-to-capillary performance and reduced synthesis time. In Chapter 6 I will show that the presented separation techniques are excellent for the separation and detection of proteins equilibrated using hexapeptide ligand libraries (HLL). A novel approach for the consecutive equilibration of small sample volumes, which enables a deep insight into the proteome, is critically discussed. Challenges intrinsic to the solid phase extraction of proteins using HLLs are traced back to irreversible binding sites on HLLs. To tackle this issue, different elution and pre-equilibration protocols are designed and investigated. To re-establish binding conditions in consecutive equilibration, which is consecutive equilibration, different protocols for the processing of eluates from HLLs using 10 kDa cut-off filters are presented. Aspects that critically impair yields and recovery rates come to the fore and improved protocols are presented. A further project focused on CE-MS-based pI-value determination of a hardly soluble, cyclic and antibiotic peptide (Chapter 7). Detection of this peptide was not possible using AAEE-coated capillaries. This problem was overcome by using non-coated capillaries and BGEs containing small amounts of citric acid, which functions not only as buffer but also as a dynamic capillary coating. To confirm the determined pI-vaues, a novel and time-saving approach for the sequential injection of amino acid reference substances was developed.Fragestellungen in der Chemie aber auch in vielen anderen Disziplinen wie der Medizin, Umwelttoxikologie und Biologie, erfordern die Untersuchung von teilweise sehr komplexen Stoffgemischen. Eine schnelle Untersuchung von komplexen Proben ist mittels Trenntechniken, gefolgt von spektroskopischer oder spektrometrischer Detektion möglich. Zu den am weitesten verbreiteten Techniken gehören hierbei chromatographische Methoden wie die Gas- und Flüssigchromatographie, häufig gekoppelt mit Massenspektrometrie. Geraten diese chromatographischen Techniken an ihre Grenzen, beispielsweise bei hoch geladenen Stoffen, so bietet sich der Einsatz von elektromigrativen Trenntechniken an, da diese einen orthogonalen Trennmechanismus besitzen. Die Kapillarelektrophorese (engl.: capillary electrophoresis, CE), mit welcher sich diese Arbeit befasst, zählt zu den elektromigrativen Trenntechniken. Eine große Herausforderung dieser Technik besteht u.a. in der geringen Reproduzierbarkeit bei der Analytik von Biomolekülen, welche sich auf die Verwendung von „bare fused silica“-Kapillaren und deren Wechselwirkung mit den Analyten zurückführen lässt. Wird diese Wechselwirkung nicht effizient unterdrückt, so sinkt die Trenneffizienz innerhalb und die Wiederholbarkeit zwischen Trennungen. Durch die Verwendung von dynamischen, statisch adsorbierten oder kovalent gebundenen Kapillarbeschichtungen kann diese Wechselwirkung effizient unterdrückt werden. In dieser Arbeit werden mehrere Möglichkeiten zur reproduzierbaren Trennung von Polyaminen, Peptiden und Proteinen vorgestellt: 1) In Kapitel 2 wird durch die Verwendung von Polyethylenoxid als dynamische Beschichtung und Siebmatrix in der SDS-CE die größenbasierte Trennung von Proteinen bis zu einer Masse von 100 kDa erreicht. Ein Vorteil gegenüber klassischen gelbasierten Verfahren wie SDS-PAGE ist hierbei, dass Proben innerhalb von 20 min getrennt und durch on-column UV-Detektion quantifiziert werden können, ohne vor- oder nachträgliches Anfärben oder Derivatisieren. Durch eine selbst entworfene Modifikation der Trennapparatur und Injektion vom kurzen Ende der Kapillare konnte diese Trennzeit für ein schnelles Screening auf 5 Minuten reduziert werden. Das vorgestellte Trennsystem zeichnet sich durch eine hohe Matrixtoleranz aus; selbst Proben wie menschliches Serum können ohne Aufarbeitung injiziert werden. Eine Verbesserung der Auflösung und Trenneffizienz wurde durch die Verwendung von verschiedenen Alkoholen im Hintergrundelektrolyten, die Variation der Trenntemperatur und die Etablierung von Anreicherungszonen in der Kapillare angestrebt. Hierbei zeigte vor allem die Verwendung von 2- Propanol im Hintergrundelektrolyten eine erhöhte Trenneffizienz im Massenbereich bis ca. 40 kDa. In Kapitel 3 wird anhand der eigenen Ergebnisse und durch ausführliche Literaturarbeit gezeigt, dass diese Verbesserung der Auflösung durch eine Verschiebung des Trennmechanismus von Reptation- zu Ogstonsieben ermöglicht wird. 2) Durch die Verwendung einer sehr polaren kovalenten Kapillarbeschichtung, dem NAcryloylamido- ethoxyethanol (AAEE), lässt sich die kapillarelektrophoretische Trennung von kleinen Polyaminen und Peptiden, aber auch großen und unverdauten Proteinen, bei gleichzeitiger massenspektrometrischer Detektion erreichen. Dies wird in Kapitel 4 vorgestellt. Erfreulich hohe Einsatzzeiten von ca. 100 h und gute Wiederholbarkeiten werden sogar bei der Untersuchung von menschlichem Serum oder Polyaminen in Fischeiern beobachtet. In Kapitel 5 werden ein neuer und ABSTRACT (DEUTSCH) 4 parallelisierter Ansatz zur Kapillarsynthese sowie ein koexistenter und durch SEM-Aufnahmen postulierter Reaktionsmechanismus vorgestellt. Zusätzlich wird ein Ansatz zur Vorbehandlung der Kapillaren durch die Verwendung von überkritischem Wasser vorgeschlagen, welcher ersten Versuchen nach zu einer höheren Reproduzierbarkeit der Beschichtung und einer beschleunigten Synthese führen kann. In Kapitel 6 werden die vorgestellten Analysetechniken werden für die Bestimmung von Proteinen in mittels Hexapeptidligandenbibliotheken angereicherten Proben verwendet. Ein Ansatz zur mehrfachen Anreicherung, der einen tiefen Blick ins Proteom erlauben soll, wird kritisch bewertet. Herausforderungen in der Festphasenanreicherung werden auf irreversible Bindungsstellen zurückgeführt. In diesem Zusammenhang werden verschiedene Elutions- und PreÄquilibrierungsprotokolle untersucht. Für die erfolgreiche Wiederbeladung, welche einen tieferen Blick ins Proteom erlaubt, werden verschiedene Protokolle zur Aufarbeitung des Eluats mittels 10 kDa-Cut-off-Filtern vorgestellt. Kritische Aspekte, welche die Ausbeuten beeinträchtigen, werden beleuchtet und Lösungswege aufgezeigt. In Kapitel 7 wird die MS-basierte pI-Wertbestimmung eines schwerlöslichen zyklischen Peptidantibiotikums vorgestellt. Dieses konnte mit AAEE-beschichteten Kapillaren nicht erfasst werden. Diese Herausforderung konnte durch die Verwendung unbeschichteter Kapillaren und geringen Mengen Zitronensäure im Hintergrundelektrolyt, welche als dynamische Beschichtung fungiert, umgangen werden. Zur Absicherung der bestimmten pI-Werte mit Aminosäurestandards wurde ein neuer und zeitsparender Ansatz der sequentiellen Injektion entwickelt

    Big Data - Supply Chain Management Framework for Forecasting: Data Preprocessing and Machine Learning Techniques

    Full text link
    This article intends to systematically identify and comparatively analyze state-of-the-art supply chain (SC) forecasting strategies and technologies. A novel framework has been proposed incorporating Big Data Analytics in SC Management (problem identification, data sources, exploratory data analysis, machine-learning model training, hyperparameter tuning, performance evaluation, and optimization), forecasting effects on human-workforce, inventory, and overall SC. Initially, the need to collect data according to SC strategy and how to collect them has been discussed. The article discusses the need for different types of forecasting according to the period or SC objective. The SC KPIs and the error-measurement systems have been recommended to optimize the top-performing model. The adverse effects of phantom inventory on forecasting and the dependence of managerial decisions on the SC KPIs for determining model performance parameters and improving operations management, transparency, and planning efficiency have been illustrated. The cyclic connection within the framework introduces preprocessing optimization based on the post-process KPIs, optimizing the overall control process (inventory management, workforce determination, cost, production and capacity planning). The contribution of this research lies in the standard SC process framework proposal, recommended forecasting data analysis, forecasting effects on SC performance, machine learning algorithms optimization followed, and in shedding light on future research

    A Log-Ratio Biplot Approach for Exploring Genetic Relatedness Based on Identity by State

    Get PDF
    The detection of cryptic relatedness in large population-based cohorts is of great importance in genome research. The usual approach for detecting closely related individuals is to plot allele sharing statistics, based on identity-by-state or identity-by-descent, in a two-dimensional scatterplot. This approach ignores that allele sharing data across individuals has in reality a higher dimensionality, and neither regards the compositional nature of the underlying counts of shared genotypes. In this paper we develop biplot methodology based on log-ratio principal component analysis that overcomes these restrictions. This leads to entirely new graphics that are essentially useful for exploring relatedness in genetic databases from homogeneous populations. The proposed method can be applied in an iterative manner, acting as a looking glass for more remote relationships that are harder to classify. Datasets from the 1,000 Genomes Project and the Genomes For Life-GCAT Project are used to illustrate the proposed method. The discriminatory power of the log-ratio biplot approach is compared with the classical plots in a simulation study. In a non-inbred homogeneous population the classification rate of the log-ratio principal component approach outperforms the classical graphics across the whole allele frequency spectrum, using only identity by state. In these circumstances, simulations show that with 35,000 independent bi-allelic variants, log-ratio principal component analysis, combined with discriminant analysis, can correctly classify relationships up to and including the fourth degree
    corecore