Search CORE

113 research outputs found

Learning Domain-Specific Word Embeddings from Sparse Cybersecurity Texts

Author: Pan SHimei
Park Youngja
Roy Arpita
Publication venue
Publication date: 05/07/2017
Field of study

Word embedding is a Natural Language Processing (NLP) technique that automatically maps words from a vocabulary to vectors of real numbers in an embedding space. It has been widely used in recent years to boost the performance of a vari-ety of NLP tasks such as Named Entity Recognition, Syntac-tic Parsing and Sentiment Analysis. Classic word embedding methods such as Word2Vec and GloVe work well when they are given a large text corpus. When the input texts are sparse as in many specialized domains (e.g., cybersecurity), these methods often fail to produce high-quality vectors. In this pa-per, we describe a novel method to train domain-specificword embeddings from sparse texts. In addition to domain texts, our method also leverages diverse types of domain knowledge such as domain vocabulary and semantic relations. Specifi-cally, we first propose a general framework to encode diverse types of domain knowledge as text annotations. Then we de-velop a novel Word Annotation Embedding (WAE) algorithm to incorporate diverse types of text annotations in word em-bedding. We have evaluated our method on two cybersecurity text corpora: a malware description corpus and a Common Vulnerability and Exposure (CVE) corpus. Our evaluation re-sults have demonstrated the effectiveness of our method in learning domain-specific word embeddings

arXiv.org e-Print Archive

Directory of Open Access Journals

FigShare

Split-NER: Named Entity Recognition via Two Question-Answering-based Classifications

Author: Arora Jatin
Park Youngja
Publication venue
Publication date: 30/10/2023
Field of study

In this work, we address the NER problem by splitting it into two logical sub-tasks: (1) Span Detection which simply extracts entity mention spans irrespective of entity type; (2) Span Classification which classifies the spans into their entity types. Further, we formulate both sub-tasks as question-answering (QA) problems and produce two leaner models which can be optimized separately for each sub-task. Experiments with four cross-domain datasets demonstrate that this two-step approach is both effective and time efficient. Our system, SplitNER outperforms baselines on OntoNotes5.0, WNUT17 and a cybersecurity dataset and gives on-par performance on BioNLP13CG. In all cases, it achieves a significant reduction in training time compared to its QA baseline counterpart. The effectiveness of our system stems from fine-tuning the BERT model twice, separately for span detection and classification. The source code can be found at https://github.com/c3sr/split-ner

arXiv.org e-Print Archive

Backdoor Smoothing: Demystifying Backdoor Attacks on Deep Neural Networks

Author: Backes Michael
Biggio Battista
Grosse Kathrin
Lee Taesung
Molloy Ian
Park Youngja
Publication venue
Publication date: 02/11/2021
Field of study

Backdoor attacks mislead machine-learning models to output an attacker-specified class when presented a specific trigger at test time. These attacks require poisoning the training data to compromise the learning algorithm, e.g., by injecting poisoning samples containing the trigger into the training set, along with the desired class label. Despite the increasing number of studies on backdoor attacks and defenses, the underlying factors affecting the success of backdoor attacks, along with their impact on the learning algorithm, are not yet well understood. In this work, we aim to shed light on this issue by unveiling that backdoor attacks induce a smoother decision function around the triggered samples -- a phenomenon which we refer to as \textit{backdoor smoothing}. To quantify backdoor smoothing, we define a measure that evaluates the uncertainty associated to the predictions of a classifier around the input samples. Our experiments show that smoothness increases when the trigger is added to the input samples, and that this phenomenon is more pronounced for more successful attacks. We also provide preliminary evidence that backdoor triggers are not the only smoothing-inducing patterns, but that also other artificial patterns can be detected by our approach, paving the way towards understanding the limitations of current defenses and designing novel ones.Comment: 9 pages, 7 figures, under submissio

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Cagliari

Scalable nonparametric multiway data analysis

Author: Shandian Zhe
Xinqi Chu
Youngja Park
Yuan Qi
Zenglin Xu
Publication venue
Publication date: 01/01/2015
Field of study

Abstract Multiway data analysis deals with multiway arrays, i.e., tensors, and the goal is twofold: predicting missing entries by modeling the interactions between array elements and discovering hidden patterns, such as clusters or communities in each mode. Despite the success of existing tensor factorization approaches, they are either unable to capture nonlinear interactions, or computationally expensive to handle massive data. In addition, most of the existing methods lack a principled way to discover latent clusters, which is important for better understanding of the data. To address these issues, we propose a scalable nonparametric tensor decomposition model. It employs Dirichlet process mixture (DPM) prior to model the latent clusters; it uses local Gaussian processes (GPs) to capture nonlinear relationships and to improve scalability. An efficient online variational Bayes Expectation-Maximization algorithm is proposed to learn the model. Experiments on both synthetic and real-world data show that the proposed model is able to discover latent clusters with higher prediction accuracy than competitive methods. Furthermore, the proposed model obtains significantly better predictive performance than the state-of-the-art large scale tensor decomposition algorithm, GigaTensor, on two large datasets with billions of entries

CiteSeerX

Self-similarity in NMR spectra: an application in assessing the level of cysteine

Author: Jones Dean P.
Jung Yoon Young
Park Youngja
Vidakovic Brani
Ziegler Thomas R.
Publication venue: Georgia Institute of Technology
Publication date: 15/01/2007
Field of study

High resolution of NMR spectroscopic data of biosamples are a rich source of information on the metabolic response to physiological variation or pathological events. There are many advantages of NMR techniques such as the sample preparation is fast, simple and non-invasive. Statistical analysis of NMR spectra usually focuses on differential expression of large resonance intensity corresponding to abundant metabolites and involves several data preprocessing steps. In this paper we estimate functional components of spectra and test their significance using multiscale techniques. We also explore scaling in NMR spectra and use the systematic variability of scaling descriptors to predict the level of cysteine, an important precursor of glutathione, a control antioxidant in human body. This is motivated by high cost (in time and resources) of traditional methods for assessing cysteine level by high performance liquid chromatograph (HPLC)

Scholarly Materials And Research @ Georgia Tech

Hepatic Oxidative Stress in Fructose-Induced Fatty Liver Is Not Caused by Sulfur Amino Acid Insufficiency

Author: Go Young-Mi
Jones Dean P.
Kunde Sachin S.
Orr Michael L.
Park Youngja
Roede James R.
Vos Miriam B.
Ziegler Thomas R.
Publication venue: MDPI
Publication date: 01/01/2011
Field of study

Fructose-sweetened liquid consumption is associated with fatty liver and oxidative stress. In rodent models of fructose-mediated fatty liver, protein consumption is decreased. Additionally, decreased sulfur amino acid intake is known to cause oxidative stress. Studies were designed to test whether oxidative stress in fructose-sweetened liquid-induced fatty liver is caused by decreased ad libitum solid food intake with associated inadequate sulfur amino acid intake. C57BL6 mice were grouped as: control (ad libitum water), fructose (ad libitum 30% fructose-sweetened liquid), glucose (ad libitum 30% glucose-sweetened water) and pair-fed (ad libitum water and sulfur amino acid intake same as the fructose group). Hepatic and plasma thiol-disulfide antioxidant status were analyzed after five weeks. Fructose- and glucose-fed mice developed fatty liver. The mitochondrial antioxidant protein, thioredoxin-2, displayed decreased abundance in the liver of fructose and glucose-fed mice compared to controls. Glutathione/glutathione disulfide redox potential (EhGSSG) and abundance of the cytoplasmic antioxidant protein, peroxiredoxin-2, were similar among groups. We conclude that both fructose and glucose-sweetened liquid consumption results in fatty liver and upregulated thioredoxin-2 expression, consistent with mitochondrial oxidative stress; however, inadequate sulfur amino acid intake was not the cause of this oxidative stress

Multidisciplinary Digital Publishing Institute

CiteSeerX

Directory of Open Access Journals

PubMed Central

Detailed Mitochondrial Phenotyping by High Resolution Metabolomics

Author: AD Baxevanis
AG Marshall
CP Wild
D Nagrath
DA Drechsel
Dean P. Jones
DR Richardson
E Mervaala
Frederick H. Strobel
GU Balcke
H Mitsubuchi
J Vina
JA Maceluch
James R. Roede
JD Storey
JM Johnson
JQ Chen
JR Mercer
JR Roede
KB Wallace
KF Aoki
LS Lamont
LS Lamont
M Brown
M He
M Mayr
MK Savage
P Spegel
PM Joyner
Q Xu
QA Soltow
R Guevara
S Ahola-Erkkila
SE Calvo
Shuzhao Li
SK Manna
T Brody
T Yu
Tobias Eckle
WL Miller
Y Benjamini
Y Chen
Y Liu
Y Park
Youngja Park
Publication venue: Public Library of Science
Publication date: 06/03/2012
Field of study

Mitochondrial phenotype is complex and difficult to define at the level of individual cell types. Newer metabolic profiling methods provide information on dozens of metabolic pathways from a relatively small sample. This pilot study used “top-down” metabolic profiling to determine the spectrum of metabolites present in liver mitochondria. High resolution mass spectral analyses and multivariate statistical tests provided global metabolic information about mitochondria and showed that liver mitochondria possess a significant phenotype based on gender and genotype. The data also show that mitochondria contain a large number of unidentified chemicals

Public Library of Science (PLOS)

Crossref

PubMed Central

FigShare

High-resolution metabolomics to discover potential parasite-specific biomarkers in a Plasmodium falciparum erythrocytic stage culture system

Author: A Sengupta
Bill Liang
CA Smith
Carl Angelo D Medriano
D McHugh
DC Neujahr
Dean P Jones
DP Jones
DP Tschudy
ES Furfine
ES Istvan
Eucaris Torres
HB Bradshaw
HB Bradshaw
I Harris
J Liu
JF Mosha
JM Johnson
K Dettmer
Karan Uppal
KL Olszewski
L Tritten
Laurence Slutsker
M Kanehisa
M Kanehisa
M LeRoux
P Srivastava
P Xu
PS Ebert
R Teng
S Saen-Oon
SE Babbitt
SK Cribbs
T Yu
TR Sana
V Lakshmanan
W Lu
W Ruangyuttikam
W Trager
WHO
Y Benjamini
Ya Ping Shi
YH Park
Young Ho Jeon
Youngja H Park
Z Yan
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref