10 research outputs found

    Quantum multi-programming for Grover's search

    Full text link
    Quantum multi-programming is a method utilizing contemporary noisy intermediate-scale quantum computers by executing multiple quantum circuits concurrently. Despite early research on it, the research remains on quantum gates or small-size quantum algorithms without correlation. In this paper, we propose a quantum multi-programming (QMP) algorithm for Grover's search. Our algorithm decomposes Grover's algorithm by the partial diffusion operator and executes the decomposed circuits in parallel by QMP. We proved that this new algorithm increases the rotation angle of the Grover operator which, as a result, increases the success probability. The new algorithm is implemented on IBM quantum computers and compared with the canonical Grover's algorithm and other variations of Grover's algorithms. The empirical tests validate that our new algorithm outperforms other variations of Grover's algorithms as well as the canonical Grover's algorithm

    Extracting Protein-Protein Interactions (PPIs) from Biomedical Literature using Attention-based Relational Context Information

    Full text link
    Because protein-protein interactions (PPIs) are crucial to understand living systems, harvesting these data is essential to probe disease development and discern gene/protein functions and biological processes. Some curated datasets contain PPI data derived from the literature and other sources (e.g., IntAct, BioGrid, DIP, and HPRD). However, they are far from exhaustive, and their maintenance is a labor-intensive process. On the other hand, machine learning methods to automate PPI knowledge extraction from the scientific literature have been limited by a shortage of appropriate annotated data. This work presents a unified, multi-source PPI corpora with vetted interaction definitions augmented by binary interaction type labels and a Transformer-based deep learning method that exploits entities' relational context information for relation representation to improve relation classification performance. The model's performance is evaluated on four widely studied biomedical relation extraction datasets, as well as this work's target PPI datasets, to observe the effectiveness of the representation to relation extraction tasks in various data. Results show the model outperforms prior state-of-the-art models. The code and data are available at: https://github.com/BNLNLP/PPI-Relation-ExtractionComment: 10 pages, 3 figures, 7 tables, 2022 IEEE International Conference on Big Data (Big Data

    Comparative Performance Evaluation of Large Language Models for Extracting Molecular Interactions and Pathway Knowledge

    Full text link
    Understanding protein interactions and pathway knowledge is crucial for unraveling the complexities of living systems and investigating the underlying mechanisms of biological functions and complex diseases. While existing databases provide curated biological data from literature and other sources, they are often incomplete and their maintenance is labor-intensive, necessitating alternative approaches. In this study, we propose to harness the capabilities of large language models to address these issues by automatically extracting such knowledge from the relevant scientific literature. Toward this goal, in this work, we investigate the effectiveness of different large language models in tasks that involve recognizing protein interactions, pathways, and gene regulatory relations. We thoroughly evaluate the performance of various models, highlight the significant findings, and discuss both the future opportunities and the remaining challenges associated with this approach. The code and data are available at: https://github.com/boxorange/BioIE-LLMComment: 10 pages, 3 figure

    Pathway-based analyses of gene expression profiles at low doses of ionizing radiation

    Get PDF
    Radiation exposure poses a significant threat to human health. Emerging research indicates that even low-dose radiation once believed to be safe, may have harmful effects. This perception has spurred a growing interest in investigating the potential risks associated with low-dose radiation exposure across various scenarios. To comprehensively explore the health consequences of low-dose radiation, our study employs a robust statistical framework that examines whether specific groups of genes, belonging to known pathways, exhibit coordinated expression patterns that align with the radiation levels. Notably, our findings reveal the existence of intricate yet consistent signatures that reflect the molecular response to radiation exposure, distinguishing between low-dose and high-dose radiation. Moreover, we leverage a pathway-constrained variational autoencoder to capture the nonlinear interactions within gene expression data. By comparing these two analytical approaches, our study aims to gain valuable insights into the impact of low-dose radiation on gene expression patterns, identify pathways that are differentially affected, and harness the potential of machine learning to uncover hidden activity within biological networks. This comparative analysis contributes to a deeper understanding of the molecular consequences of low-dose radiation exposure

    Text-Based Phishing Detection Using A Simulation Model

    Get PDF
    Phishing is one of the most potentially disruptive actions that can be performed on the Internet. Intellectual property and other pertinent business information could potentially be at risk if a user falls for a phishing attack. The most common way of carrying out a phishing attack is through email. The adversary sends an email with a link to a fraudulent site to lure consumers into divulging their confidential information. While such attacks may be easily identifiable for those well-versed in technology, it may be difficult for the typical Internet user to spot a fraudulent email. The emphasis of this research is to detect phishing attempts within emails. To date, various phishing detection algorithms, mostly based on the blacklists, have been reported to produce promising results. Yet, the phishing crime rates are not likely to decline as the cyber-criminals devise new tricks to avoid those phishing filters. Since the early non-text based approaches do not address the text content of the email that actually deludes users, this paper proposes a text-based phishing detection algorithm. In particular, this research focuses on improving upon the previously published text-based approach. The algorithm in the previous work analyzes the body text in an email to detect whether the email message asks the user to do some action such as clicking on the link that directs the user to a fraudulent website. This work expanded the text analysis portion of that algorithm, which performed poorly in catching phishing emails. The modified algorithm generated considerably higher results in filtering out malicious emails than the original algorithm did; but the rate of text incorrectly identified as phishing, which is the FPR, was slightly worse. To address the FP problem, a statistical approach was adopted and the method ameliorated the FPR while minimizing the decrease in the phishing detection accuracy. The studies in this research make use of a simulation model technique to illustrate the algorithms. The simulation model visualizes the overall process of the analysis and yields graphical and statistical results that are used to conduct the experiments. In addition, since the simulation model operates in the environment controlled by a user, using the simulation model allows the user to easily apply modified concepts for experiments. This simulation feature was utilized to find and eliminate the unnecessary factors in the algorithm, and therefore the optimal performance time was measured

    Towards Ontology-Based Phishing Detection

    Get PDF
    Detection of phishing emails is a topic that has received a lot of attention both from academia and industry due to the devastating effects that phishing enabled data breaches have had on private individual and companies. Notwithstanding enormous efforts to detect phishing attacks, phishing still remains a major threat in information security, and the damages from it are not forecasted to disappear in the near future. One of the reasons is the diversity of attacks, especially within spear phishing and whaling. Another reason is that the natural language part of the detectors is usually devoid of semantics. Many of the existing phishing detection techniques make use of keyword matching. However, phishers exploit genuine messages and users’ background information to forge counterfeit or fake baits so as to increase the success rate of deception. Since phishers craft legitimate-looking emails, many common words between legitimate emails and phishing emails appear in the email body. In addition, phishers often obtain keyword lists used in the matching systems, and they can easily detour defensing mechanisms that analyze the words of an email. The purpose of this dissertation is to investigate the effectiveness of conceptualization for lexical features, which is hypothesized to reduce vulnerability to variance in superficial characteristics. The proposed approach adds semantics to highly accurate bag-of-words and part-of-speech approaches. This study shows that while the current approach is not as effective as a starting point, it retains its performance as a testing corpus deviates from training, while the performance of the original approach decreases with the amount of deviations

    Massively parallel hybrid quantum-classical machine learning for kernelized time-series classification

    Full text link
    Supervised time-series classification garners widespread interest because of its applicability throughout a broad application domain including finance, astronomy, biosensors, and many others. In this work, we tackle this problem with hybrid quantum-classical machine learning, deducing pairwise temporal relationships between time-series instances using a time-series Hamiltonian kernel (TSHK). A TSHK is constructed with a sum of inner products generated by quantum states evolved using a parameterized time evolution operator. This sum is then optimally weighted using techniques derived from multiple kernel learning. Because we treat the kernel weighting step as a differentiable convex optimization problem, our method can be regarded as an end-to-end learnable hybrid quantum-classical-convex neural network, or QCC-net, whose output is a data set-generalized kernel function suitable for use in any kernelized machine learning technique such as the support vector machine (SVM). Using our TSHK as input to a SVM, we classify univariate and multivariate time-series using quantum circuit simulators and demonstrate the efficient parallel deployment of the algorithm to 127-qubit superconducting quantum processors using quantum multi-programming.Comment: 23 pages, 10 figures, 1 table and 1 code snippe

    Supercomputer-Based Ensemble Docking Drug Discovery Pipeline with Application to Covid-19

    No full text
    We present a supercomputer-driven pipeline for in-silico drug discovery using enhanced sampling molecular dynamics (MD) and ensemble docking. We also describe preliminary results obtained for 23 systems involving eight protein targets of the proteome of SARS CoV-2. THe MD performed is temperature replica-exchange enhanced sampling, making use of the massively parallel supercomputing on the SUMMIT supercomputer at Oak Ridge National Laboratory, with which more than 1ms of enhanced sampling MD can be generated per day. We have ensemble docked repurposing databases to ten configurations of each of the 23 SARS CoV-2 systems using AutoDock Vina. We also demonstrate that using Autodock-GPU on SUMMIT, it is possible to perform exhaustive docking of one billion compounds in under 24 hours. Finally, we discuss preliminary results and planned improvements to the pipeline, including the use of quantum mechanical (QM), machine learning, and AI methods to cluster MD trajectories and rescore docking poses
    corecore