32 research outputs found

    Using the Expectation Maximization Algorithm with Heterogeneous Mixture Components for the Analysis of Spectrometry Data

    Full text link
    Coupling a multi-capillary column (MCC) with an ion mobility (IM) spectrometer (IMS) opened a multitude of new application areas for gas analysis, especially in a medical context, as volatile organic compounds (VOCs) in exhaled breath can hint at a person's state of health. To obtain a potential diagnosis from a raw MCC/IMS measurement, several computational steps are necessary, which so far have required manual interaction, e.g., human evaluation of discovered peaks. We have recently proposed an automated pipeline for this task that does not require human intervention during the analysis. Nevertheless, there is a need for improved methods for each computational step. In comparison to gas chromatography / mass spectrometry (GC/MS) data, MCC/IMS data is easier and less expensive to obtain, but peaks are more diffuse and there is a higher noise level. MCC/IMS measurements can be described as samples of mixture models (i.e., of convex combinations) of two-dimensional probability distributions. So we use the expectation-maximization (EM) algorithm to deconvolute mixtures in order to develop methods that improve data processing in three computational steps: denoising, baseline correction and peak clustering. A common theme of these methods is that mixture components within one model are not homogeneous (e.g., all Gaussian), but of different types. Evaluation shows that the novel methods outperform the existing ones. We provide Python software implementing all three methods and make our evaluation data available at http://www.rahmannlab.de/research/ims

    Resource-constrained analysis of ion mobility spectrometry data

    Get PDF
    During the past decades numerous spectrometry devices, e.g. mass spectrometry or liquid or gas chromatography, have been engineered to measure the different properties of molecules, compounds or even complex structures. One device exploits the different mobilities of ionized analytes, the so-called ion mobility spectrometer. The advantages of this device are the low costs of production and maintenance (e.g. a high vacuum as in mass spectrometry is not required), the fast capture (a few milliseconds suffice) and the provision of a high resolution of up to parts per billion (ppb) by volume. An ion mobility spectrometer coupled with a multi-capillary column for pre-separation achieves a resolution higher by several magnitudes. Substantial research was done to investigate its feasibility for clinical or biotechnology applications, especially clinical diagnosis or live monitoring. Ongoing miniaturization provides devices of even mobile phone-size, allowing mobile applications. In critical places like main stations or sports stadiums, mobile devices are conceivable for the detection of drugs or explosives. Another application scenario can be a mobile device monitoring the breath of patients which can be used at home. For such scenarios it is inevitable that the data is analyzed directly at the device right after the capturing. The amount of data, the complexity of the two-dimensional spectra as well as time and device restrictions require analysis software specifically designed for this application. The basis of MCC/IMS analysis is a representation of all high-intensity regions (peaks) in the measurement by using a few descriptive parameters per peak instead of the full measurement data, a process that we refer to as peak extraction. The position of peaks infers the corresponding analyte and its signal intensity delivers information about the concentration of the analyte. These peaks can hint at several features, e.g. diseases in clinical diagnostics. Previous work mainly concentrated on the extraction of the position of the peaks’ highest signal intensity (mode). Using statistical distributions, we introduce a function which requires only seven descriptive parameters to approximate the complete shape of a peak. The straightforward nature of this function as well as the intuitive descriptors simplify and accelerate the methods estimating the descriptor set for every detected peak. Additional post-processing steps like comparison with a reference, or aligning or clustering a set of measurements further simplify and add precision to the provided peak model. Having a measurement and the proposed peak model, the peaks have to be detected and the model descriptors have to be estimated automatically. Here, we introduce two methods executing this task. The offline peak model estimation reduces one measurement automatically into a set of peak models but without any restrictions, i.e. the data is completely available during the whole analysis process and can be accessed as often as required. Furthermore, space and time restrictions were not taken into account. The idea of this method is to take its approaches as a basis and redesign them for an online analysis. Our second introduced method is referred to as online peak model estimation. The method is restricted to store only one or a small quantity of consecutive ion mobility spectra and discard the raw data directly after the analysis. Additionally, this analysis has a strict time restriction provided by the device itself (every 100 ms a new ion mobility spectrum is captured) and should even run in time on current embedded systems as the Raspberry Pi. Of course, this method should also provide a list of peak descriptors. For that purpose, we redesigned particular methods to satisfy these restrictions. This method is suitable for the application on mobile detection devices. To find commonalities and differences among a set of measurements for further classification or timeline analysis, it is an inherent necessity to find and connect peaks provided by the same analyte. We refer to these clusters covering several peaks from different measurements as consensus peaks. Several clustering methods are already introduced in literature, but many have the disadvantage of requesting the number of clusters a priori. We introduce an enhanced method of the classic EM algorithm which dynamically determines the number of clusters. Additionally, we present the main ideas of efficient implementation to make the clustering method feasible on embedded systems as well. As the EM algorithm works with statistical models, the obtained information of the peak extraction step can be efficiently applied, providing a more precise clustering. As an addition, a method is introduced to align either a peak list containing peak descriptors or consensus peak descriptors against a reference list with potentially previously discovered analytes and their parameters. This method also employs statistical models and statistical optimization methods. Since all utilized statistical methods and models are rather expensive in terms of computation time and contain almost always the costly exponential function, we introduce an approximate exponential function as substitute. This function has the ability to compute an exponential value up to 4-6 times faster than exact functions of provided standard libraries with only a minimal loss of precision. The exploitation of the binary representation of floating point values within a processor makes this acceleration possible. These features are desirable for the application on embedded systems. All methods and implementations will be evaluated in detail in terms of computation time, accuracy and reasonability

    A modular computational framework for automated peak extraction from ion mobility spectra

    Get PDF
     There was considered the possibility of the air field lights simulation in the imitators of flight simulator. The attempt of forming the database of the air field lights with realization in the lighting program DIALux is presented in this paper. Рассмотрена возможность моделирования светосигнальных огней аэродрома в имитаторах авиационных тренажеров. Приведены попытку формирования базы данных аэродромных огней с реализацией в свитлотех-ночной программе DIALux. Розглянуто можливість моделювання світлосигнальних вогнів аеродрому в імітаторах авіаційних тренажерів. Наведено спробу формування бази даних аеродромних вогнів з реалізацією у світлотехнічній програмі DIALux

    Proceedings of the EuBIC Winter School 2019

    Get PDF
    The 2019 European Bioinformatics Community (EuBIC) Winter School was held from January 15th to January 18th 2019 in Zakopane, Poland. This year’s meeting was the third of its kind and gathered international researchers in the field of (computational) proteomics to discuss (mainly) challenges in proteomics quantification and data independent acquisition (DIA). Here, we present an overview of the scientific program of the 2019 EuBIC Winter School. Furthermore, we can already give a small outlook to the upcoming EuBIC 2020 Developer’s Meeting

    Quantification of bulk lipid species in human platelets and their thrombin-induced release

    Get PDF
    Lipids play a central role in platelet physiology. Changes in the lipidome have already been described for basal and activated platelets. However, quantitative lipidomic data of platelet activation, including the released complex lipids, are unavailable. Here we describe an easy-to-use protocol based on flow-injection mass spectrometry for the quantitative analysis of bulk lipid species in basal and activated human platelets and their lipid release after thrombin activation. We provide lipid species concentrations of 12 healthy human donors, including cholesteryl ester (CE), ceramide (Cer), free cholesterol (FC), hexosylceramide (HexCer), lysophosphatidylcholine (LPC), lysophosphatidylethanolamine (LPE), phosphatidylcholine (PC), phosphatidylethanolamine (PE), phosphatidylinositol (PI), phosphatidylserine (PS), sphingomyelin (SM) and triglycerides (TG). The assay exhibited good technical repeatability (CVs < 5% for major lipid species in platelets). Except for CE and TG, the inter-donor variability of the majority of lipid species concentrations in platelets was < 30% CV. Balancing of concentrations revealed the generation of LPC and loss of TG. Changes in lipid species concentrations indicate phospholipase-mediated release of arachidonic acid mainly from PC, PI, and PE but not from PS. Thrombin induced lipid release was mainly composed of FC, PS, PC, LPC, CE, and TG. The similarity of the released lipidome with that of plasma implicates that lipid release may originate from the open-canalicular system (OCS). The repository of lipid species concentrations determined with this standardized platelet release assay contribute to elucidating the physiological role of platelet lipids and provide a basis for investigating the platelet lipidome in patients with hemorrhagic or thrombotic disorders

    Identification of herbal teas and their compounds eliciting antiviral activity against SARS-CoV-2 in vitro

    Get PDF
    Background: The SARS-CoV-2/COVID-19 pandemic has inflicted medical and socioeconomic havoc, and despite the current availability of vaccines and broad implementation of vaccination programs, more easily accessible and cost-effective acute treatment options preventing morbidity and mortality are urgently needed. Herbal teas have historically and recurrently been applied as self-medication for prophylaxis, therapy, and symptom alleviation in diverse diseases, including those caused by respiratory viruses, and have provided sources of natural products as basis for the development of therapeutic agents. To identify affordable, ubiquitously available, and effective treatments, we tested herbs consumed worldwide as herbal teas regarding their antiviral activity against SARS-CoV-2. Results: Aqueous infusions prepared by boiling leaves of the Lamiaceae perilla and sage elicit potent and sustained antiviral activity against SARS-CoV-2 when applied after infection as well as prior to infection of cells. The herbal infusions exerted in vitro antiviral effects comparable to interferon-β and remdesivir but outperformed convalescent sera and interferon-α2 upon short-term treatment early after infection. Based on protein fractionation analyses, we identified caffeic acid, perilla aldehyde, and perillyl alcohol as antiviral compounds. Global mass spectrometry (MS) analyses performed comparatively in two different cell culture infection models revealed changes of the proteome upon treatment with herbal infusions and provided insights into the mode of action. As inferred by the MS data, induction of heme oxygenase 1 (HMOX-1) was confirmed as effector mechanism by the antiviral activity of the HMOX-1-inducing compounds sulforaphane and fraxetin. Conclusions: In conclusion, herbal teas based on perilla and sage exhibit antiviral activity against SARS-CoV-2 including variants of concern such as Alpha, Beta, Delta, and Omicron, and we identified HMOX-1 as potential therapeutic target. Given that perilla and sage have been suggested as treatment options for various diseases, our dataset may constitute a valuable resource also for future research beyond virology

    Challenges and perspectives for naming lipids in the context of lipidomics

    Get PDF
    Introduction: Lipids are key compounds in the study of metabolism and are increasingly studied in biology projects. It is a very broad family that encompasses many compounds, and the name of the same compound may vary depending on the community where they are studied. Objectives: In addition, their structures are varied and complex, which complicates their analysis. Indeed, the structural resolution does not always allow a complete level of annotation so the actual compound analysed will vary from study to study and should be clearly stated. For all these reasons the identification and naming of lipids is complicated and very variable from one study to another, it needs to be harmonized. Methods & Results: In this position paper we will present and discuss the different way to name lipids (with chemoinformatic and semantic identifiers) and their importance to share lipidomic results. Conclusion: Homogenising this identification and adopting the same rules is essential to be able to share data within the community and to map data on functional networks

    From raw ion mobility measurements to disease classification : a comparison of analysis processes

    Get PDF
    Ion mobility spectrometry (IMS) is a technology for the detection of volatile compounds in the air of exhaled breath that is increasingly used in medical applications. One major goal is to classify patients into disease groups, for example diseased versus healthy, from simple breath samples. Raw IMS measurements are data matrices in which peak regions representing the compounds have to be identified and quantified. A typical analysis process consists of pre-processing and peak detection in single experiments, peak clustering to obtain consensus peaks across several experiments, and classification of samples based on the resulting multivariate peak intensities. Recently several automated algorithms for peak detection and peak clustering have been introduced, in order to overcome the current need for human-based analysis that is slow, subjective and sometimes not reproducible. We present an unbiased comparison of a multitude of combinations of peak processing and multivariate classification algorithms on a disease dataset. The specific combination of the algorithms for the different analysis steps determines the classification accuracy, with the encouraging result that certain fully-automated combinations perform even better than current manual approaches

    On the Computational Complexity of Gossip Protocols

    Get PDF
    Gossip protocols deal with a group of communicating agents, each holding a private information, and aim at arriving at a situation in which all the agents know each other secrets. Distributed epistemic gossip protocols are particularly simple distributed programs that use formulas from an epistemic logic. Recently, the implementability of these distributed protocols was established (which means that the evaluation of these formulas is decidable), and the problems of their partial correctness and termination were shown to be decidable, but their exact computational complexity was left open. We show that for any monotonic type of calls the implementability of a distributed epistemic gossip protocol is a P^{NP}_{||}-complete problem, while the problems of its partial correctness and termination are in coNP^{NP}.</jats:p

    A detailed comparison of analysis processes for MCC-IMS data in disease classification - automated methods can replace manual peak annotations

    Get PDF
    The best fully automated analysis process achieves even better classification results than the established manual process. The best algorithms for the three analysis steps are (i) SGLTR (Savitzky-Golay Laplace operator filter thresholding regions) and LM (Local Maxima) for automated peak identification, (ii) EM clustering (Expectation Maximization) and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) for the clustering step and (iii) RF (Random Forest) for multivariate classification. Thus, automated methods can replace the manual steps in the analysis process to enable an unbiased high throughput use of the technology
    corecore