38 research outputs found
Parallel Sampling-Pipeline for Indefinite Stream of Heterogeneous Graphs using OpenCL for FPGAs
In the field of data science, a huge amount of data, generally represented as graphs, needs to be processed and analyzed. It is of utmost importance that this data be processed swiftly and efficiently to save time and energy. The volume and velocity of data, along with irregular access patterns in graph data structures, pose challenges in terms of analysis and processing. Further, a big chunk of time and energy is spent on analyzing these graphs on large compute clusters and/or data-centers. Filtering and refining of data using graph sampling techniques are one of the most effective ways to speed up the analysis. Efficient accelerators, such as FPGAs, have proven to significantly lower the energy cost of running an algorithm. To this end, we present the design and implementation of a parallel graph sampling technique, for a large number of input graphs streaming into a FPGA. A parallel approach using OpenCL for FPGAs was adopted to come up with a solution that is both time- and energyefficient. We introduce a novel graph data structure, suitable for streaming graphs on FPGAs, that allows time- and memory-efficient representation of graphs. Our experiments show that our proposed technique is 3x faster and 2x more energy efficient as compared to serial CPU version of the algorithm
ContraNovo: A Contrastive Learning Approach to Enhance De Novo Peptide Sequencing
De novo peptide sequencing from mass spectrometry (MS) data is a critical
task in proteomics research. Traditional de novo algorithms have encountered a
bottleneck in accuracy due to the inherent complexity of proteomics data. While
deep learning-based methods have shown progress, they reduce the problem to a
translation task, potentially overlooking critical nuances between spectra and
peptides. In our research, we present ContraNovo, a pioneering algorithm that
leverages contrastive learning to extract the relationship between spectra and
peptides and incorporates the mass information into peptide decoding, aiming to
address these intricacies more efficiently. Through rigorous evaluations on two
benchmark datasets, ContraNovo consistently outshines contemporary
state-of-the-art solutions, underscoring its promising potential in enhancing
de novo peptide sequencing. The source code is available at
https://github.com/BEAM-Labs/ContraNovo.Comment: This paper has been accepted by AAAI 202
Computational and Systems Biology Advances to Enable Bioagent-Agnostic Signatures
Enumerated threat agent lists have long driven biodefense priorities. The
global SARS-CoV-2 pandemic demonstrated the limitations of searching for known
threat agents as compared to a more agnostic approach. Recent technological
advances are enabling agent-agnostic biodefense, especially through the
integration of multi-modal observations of host-pathogen interactions directed
by a human immunological model. Although well-developed technical assays exist
for many aspects of human-pathogen interaction, the analytic methods and
pipelines to combine and holistically interpret the results of such assays are
immature and require further investments to exploit new technologies. In this
manuscript, we discuss potential immunologically based bioagent-agnostic
approaches and the computational tool gaps the community should prioritize
filling
Parallel Factor Analysis Enables Quantification and Identification of Highly Convolved Data-Independent-Acquired Protein Spectra
The latest high-throughput mass spectrometry-based technologies can record virtually all molecules from complex biological samples, providing a holistic picture of proteomes in cells and tissues and enabling an evaluation of the overall status of a person\u27s health. However, current best practices are still only scratching the surface of the wealth of available information obtained from the massive proteome datasets, and efficient novel data-driven strategies are needed. Powered by advances in GPU hardware and open-source machine-learning frameworks, we developed a data-driven approach, CANDIA, which disassembles highly complex proteomics data into the elementary molecular signatures of the proteins in biological samples. Our work provides a performant and adaptable solution that complements existing mass spectrometry techniques. As the central mathematical methods are generic, other scientific fields that are dealing with highly convolved datasets will benefit from this work
Ad hoc learning of peptide fragmentation from mass spectra enables an interpretable detection of phosphorylated and cross-linked peptides
Mass spectrometry-based proteomics provides a holistic snapshot of the entire protein set of living cells on a molecular level. Currently, only a few deep learning approaches exist that involve peptide fragmentation spectra, which represent partial sequence information of proteins. Commonly, these approaches lack the ability to characterize less studied or even unknown patterns in spectra because of their use of explicit domain knowledge. Here, to elevate unrestricted learning from spectra, we introduce ‘ad hoc learning of fragmentation’ (AHLF), a deep learning model that is end-to-end trained on 19.2 million spectra from several phosphoproteomic datasets. AHLF is interpretable, and we show that peak-level feature importance values and pairwise interactions between peaks are in line with corresponding peptide fragments. We demonstrate our approach by detecting post-translational modifications, specifically protein phosphorylation based on only the fragmentation spectrum without a database search. AHLF increases the area under the receiver operating characteristic curve (AUC) by an average of 9.4% on recent phosphoproteomic data compared with the current state of the art on this task. Furthermore, use of AHLF in rescoring search results increases the number of phosphopeptide identifications by a margin of up to 15.1% at a constant false discovery rate. To show the broad applicability of AHLF, we use transfer learning to also detect cross-linked peptides, as used in protein structure analysis, with an AUC of up to 94%