Search CORE

8 research outputs found

Quantum multi-programming for Grover's search

Author: Korepin Vladimir
Park Gilchan
Yu Kwangmin
Zhang Kun
Publication venue
Publication date: 18/12/2022
Field of study

Quantum multi-programming is a method utilizing contemporary noisy intermediate-scale quantum computers by executing multiple quantum circuits concurrently. Despite early research on it, the research remains on quantum gates or small-size quantum algorithms without correlation. In this paper, we propose a quantum multi-programming (QMP) algorithm for Grover's search. Our algorithm decomposes Grover's algorithm by the partial diffusion operator and executes the decomposed circuits in parallel by QMP. We proved that this new algorithm increases the rotation angle of the Grover operator which, as a result, increases the success probability. The new algorithm is implemented on IBM quantum computers and compared with the canonical Grover's algorithm and other variations of Grover's algorithms. The empirical tests validate that our new algorithm outperforms other variations of Grover's algorithms as well as the canonical Grover's algorithm

arXiv.org e-Print Archive

Comparative Performance Evaluation of Large Language Models for Extracting Molecular Interactions and Pathway Knowledge

Author: Alexander Francis J.
Johnstone Patrick
Luo Xihaier
López-Marrero Vanessa
Park Gilchan
Yoo Shinjae
Yoon Byung-Jun
Publication venue
Publication date: 17/07/2023
Field of study

Understanding protein interactions and pathway knowledge is crucial for unraveling the complexities of living systems and investigating the underlying mechanisms of biological functions and complex diseases. While existing databases provide curated biological data from literature and other sources, they are often incomplete and their maintenance is labor-intensive, necessitating alternative approaches. In this study, we propose to harness the capabilities of large language models to address these issues by automatically extracting such knowledge from the relevant scientific literature. Toward this goal, in this work, we investigate the effectiveness of different large language models in tasks that involve recognizing protein interactions, pathways, and gene regulatory relations. We thoroughly evaluate the performance of various models, highlight the significant findings, and discuss both the future opportunities and the remaining challenges associated with this approach. The code and data are available at: https://github.com/boxorange/BioIE-LLMComment: 10 pages, 3 figure

arXiv.org e-Print Archive

Text-Based Phishing Detection Using A Simulation Model

Author: Park Gilchan
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2013
Field of study

Phishing is one of the most potentially disruptive actions that can be performed on the Internet. Intellectual property and other pertinent business information could potentially be at risk if a user falls for a phishing attack. The most common way of carrying out a phishing attack is through email. The adversary sends an email with a link to a fraudulent site to lure consumers into divulging their confidential information. While such attacks may be easily identifiable for those well-versed in technology, it may be difficult for the typical Internet user to spot a fraudulent email. The emphasis of this research is to detect phishing attempts within emails. To date, various phishing detection algorithms, mostly based on the blacklists, have been reported to produce promising results. Yet, the phishing crime rates are not likely to decline as the cyber-criminals devise new tricks to avoid those phishing filters. Since the early non-text based approaches do not address the text content of the email that actually deludes users, this paper proposes a text-based phishing detection algorithm. In particular, this research focuses on improving upon the previously published text-based approach. The algorithm in the previous work analyzes the body text in an email to detect whether the email message asks the user to do some action such as clicking on the link that directs the user to a fraudulent website. This work expanded the text analysis portion of that algorithm, which performed poorly in catching phishing emails. The modified algorithm generated considerably higher results in filtering out malicious emails than the original algorithm did; but the rate of text incorrectly identified as phishing, which is the FPR, was slightly worse. To address the FP problem, a statistical approach was adopted and the method ameliorated the FPR while minimizing the decrease in the phishing detection accuracy. The studies in this research make use of a simulation model technique to illustrate the algorithms. The simulation model visualizes the overall process of the analysis and yields graphical and statistical results that are used to conduct the experiments. In addition, since the simulation model operates in the environment controlled by a user, using the simulation model allows the user to easily apply modified concepts for experiments. This simulation feature was utilized to find and eliminate the unnecessary factors in the algorithm, and therefore the optimal performance time was measured

Purdue E-Pubs

Towards Ontology-Based Phishing Detection

Author: Park Gilchan
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2018
Field of study

Detection of phishing emails is a topic that has received a lot of attention both from academia and industry due to the devastating effects that phishing enabled data breaches have had on private individual and companies. Notwithstanding enormous efforts to detect phishing attacks, phishing still remains a major threat in information security, and the damages from it are not forecasted to disappear in the near future. One of the reasons is the diversity of attacks, especially within spear phishing and whaling. Another reason is that the natural language part of the detectors is usually devoid of semantics. Many of the existing phishing detection techniques make use of keyword matching. However, phishers exploit genuine messages and users’ background information to forge counterfeit or fake baits so as to increase the success rate of deception. Since phishers craft legitimate-looking emails, many common words between legitimate emails and phishing emails appear in the email body. In addition, phishers often obtain keyword lists used in the matching systems, and they can easily detour defensing mechanisms that analyze the words of an email. The purpose of this dissertation is to investigate the effectiveness of conceptualization for lexical features, which is hypothesized to reduce vulnerability to variance in superficial characteristics. The proposed approach adds semantics to highly accurate bag-of-words and part-of-speech approaches. This study shows that while the current approach is not as effective as a starting point, it retains its performance as a testing corpus deviates from training, while the performance of the original approach decreases with the amount of deviations

Purdue E-Pubs

Recommended from our members

Extracting Protein-Protein Interactions (PPIs) from Biomedical Literature using Attention-based Relational Context Information

Author: Blaby Ian
McCorkle Sean
Park Gilchan
Soto Carlos
Yoo Shinjae
Publication venue: eScholarship, University of California
Publication date: 20/12/2022
Field of study

Because protein-protein interactions (PPIs) are crucial to understand living systems, harvesting these data is essential to probe disease development and discern gene/protein functions and biological processes. Some curated datasets contain PPI data derived from the literature and other sources (e.g., IntAct, BioGrid, DIP, and HPRD). However, they are far from exhaustive, and their maintenance is a labor-intensive process. On the other hand, machine learning methods to automate PPI knowledge extraction from the scientific literature have been limited by a shortage of appropriate annotated data. This work presents a unified, multi-source PPI corpora with vetted interaction definitions augmented by binary interaction type labels and a Transformer-based deep learning method that exploits entities' relational context information for relation representation to improve relation classification performance. The model's performance is evaluated on four widely studied biomedical relation extraction datasets, as well as this work's target PPI datasets, to observe the effectiveness of the representation to relation extraction tasks in various data. Results show the model outperforms prior state-of-the-art models. The code and data are available at: https://github.com/BNLNLP/PPI-Relation-Extractio

eScholarship - University of California

Massively parallel hybrid quantum-classical machine learning for kernelized time-series classification

Author: Baker Jack S.
Ghukasyan Ara
Goktas Oktay
Park Gilchan
Radha Santosh Kumar
Yu Kwangmin
Publication venue
Publication date: 10/05/2023
Field of study

Supervised time-series classification garners widespread interest because of its applicability throughout a broad application domain including finance, astronomy, biosensors, and many others. In this work, we tackle this problem with hybrid quantum-classical machine learning, deducing pairwise temporal relationships between time-series instances using a time-series Hamiltonian kernel (TSHK). A TSHK is constructed with a sum of inner products generated by quantum states evolved using a parameterized time evolution operator. This sum is then optimally weighted using techniques derived from multiple kernel learning. Because we treat the kernel weighting step as a differentiable convex optimization problem, our method can be regarded as an end-to-end learnable hybrid quantum-classical-convex neural network, or QCC-net, whose output is a data set-generalized kernel function suitable for use in any kernelized machine learning technique such as the support vector machine (SVM). Using our TSHK as input to a SVM, we classify univariate and multivariate time-series using quantum circuit simulators and demonstrate the efficient parallel deployment of the algorithm to 127-qubit superconducting quantum processors using quantum multi-programming.Comment: 23 pages, 10 figures, 1 table and 1 code snippe

arXiv.org e-Print Archive

Supercomputer-Based Ensemble Docking Drug Discovery Pipeline with Application to Covid-19

We present a supercomputer-driven pipeline for in-silico drug discovery using enhanced sampling molecular dynamics (MD) and ensemble docking. We also describe preliminary results obtained for 23 systems involving eight protein targets of the proteome of SARS CoV-2. THe MD performed is temperature replica-exchange enhanced sampling, making use of the massively parallel supercomputing on the SUMMIT supercomputer at Oak Ridge National Laboratory, with which more than 1ms of enhanced sampling MD can be generated per day. We have ensemble docked repurposing databases to ten configurations of each of the 23 SARS CoV-2 systems using AutoDock Vina. We also demonstrate that using Autodock-GPU on SUMMIT, it is possible to perform exhaustive docking of one billion compounds in under 24 hours. Finally, we discuss preliminary results and planned improvements to the pipeline, including the use of quantum mechanical (QM), machine learning, and AI methods to cluster MD trajectories and rescore docking poses

ChemRxiv