5 research outputs found
Benchmarking datasets for Anomaly-based Network Intrusion Detection: KDD CUP 99 alternatives
Machine Learning has been steadily gaining traction for its use in
Anomaly-based Network Intrusion Detection Systems (A-NIDS). Research into this
domain is frequently performed using the KDD~CUP~99 dataset as a benchmark.
Several studies question its usability while constructing a contemporary NIDS,
due to the skewed response distribution, non-stationarity, and failure to
incorporate modern attacks. In this paper, we compare the performance for
KDD-99 alternatives when trained using classification models commonly found in
literature: Neural Network, Support Vector Machine, Decision Tree, Random
Forest, Naive Bayes and K-Means. Applying the SMOTE oversampling technique and
random undersampling, we create a balanced version of NSL-KDD and prove that
skewed target classes in KDD-99 and NSL-KDD hamper the efficacy of classifiers
on minority classes (U2R and R2L), leading to possible security risks. We
explore UNSW-NB15, a modern substitute to KDD-99 with greater uniformity of
pattern distribution. We benchmark this dataset before and after SMOTE
oversampling to observe the effect on minority performance. Our results
indicate that classifiers trained on UNSW-NB15 match or better the Weighted
F1-Score of those trained on NSL-KDD and KDD-99 in the binary case, thus
advocating UNSW-NB15 as a modern substitute to these datasets.Comment: Paper accepted into Proceedings of IEEE International Conference on
Computing, Communication and Security 2018 (ICCCS-2018) Statistics: 8 pages,
7 tables, 3 figures, 34 reference
Identifying Trades Using Technical Analysis and ML/DL Models
The importance of predicting stock market prices cannot be overstated. It is
a pivotal task for investors and financial institutions as it enables them to
make informed investment decisions, manage risks, and ensure the stability of
the financial system. Accurate stock market predictions can help investors
maximize their returns and minimize their losses, while financial institutions
can use this information to develop effective risk management policies.
However, stock market prediction is a challenging task due to the complex
nature of the stock market and the multitude of factors that can affect stock
prices. As a result, advanced technologies such as deep learning are being
increasingly utilized to analyze vast amounts of data and provide valuable
insights into the behavior of the stock market. While deep learning has shown
promise in accurately predicting stock prices, there is still much research to
be done in this area.Comment: 14 pages, 9 figures, 5 table
NEmo – news that triggers emotions, an affectively-annotated dataset of gun violence news
Given our society’s increased exposure to multimedia formats on social media platforms, efforts to understand how digital content impacts people’s emotions are burgeoning. As such, we introduce a U.S. gun violence news dataset that contains news headline and image pairings from 840 news articles with 15K high-quality, crowdsourced annotations on emotional responses to the news pairings. We created three experimental conditions for the annotation process: two with a single modality (headline or image only), and one multimodal (headline and image together). In contrast to prior works on affectively-annotated data, our dataset includes annotations on the dominant emotion experienced with the content, the intensity of the selected emotion and an open-ended, written component. By collecting annotations on different modalities of the same news content pairings, we explore the relationship between image and text influence on human emotional response. We offer initial analysis on our dataset, showing the nuanced affective differences that appear due to modality and individual factors such as political leaning and media consumption habits. Our dataset is made publicly available to facilitate future research in affective computing.http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.267.pdfPublished versio
BU-NEmo: news and emotions dataset
BU-NEmo is a multimodal affective dataset of gun violence news content. BU-NEmo extends the Gun Violence Framing Corpus (GVFC) proposed by Liu et. al (2019) and Tourni et. al (2021), which contains pairs of news headlines and lead images and their "frames" (view points) from gun violence-related articles. The extension concerns the results of an annotation experiment that evaluates the effect of the news content on the emotions of news consumers.
The data in BU-NEmo are annotated with three types of affective annotations:
(1) The emotion the annotator feels from looking at the content, out of the following 8 classes: Amusement, Awe, Contentment, Excitement, Fear, Sadness, Anger, Disgust.
(2) The intensity of the annotator's emotional response, on a scale from 1-5 (5 being the most intense).
(3) A free-text written response explaining their emotional response, structured as "I feel because."
These annotations were collected in three experimental conditions: only the headline text was presented, only the image, and text and image together. By comparing the annotations across these three conditions, the relationship between news modality, frames, and emotional response can be studied.IIS-1838193 - National Science Foundationhttps://github.com/Tdrinker/NEmo-datase
Subgroups of patients with young-onset type 2 diabetes in India reveal insulin deficiency as a major driver
Correction: Article Numbere3001442 DOI10.1007/s00125-021-05620-2 Early AccessNOV 2021Aim/hypothesis Five subgroups were described in European diabetes patients using a data driven machine learning approach on commonly measured variables. We aimed to test the applicability of this phenotyping in Indian individuals with young-onset type 2 diabetes. Methods We applied the European-derived centroids to Indian individuals with type 2 diabetes diagnosed before 45 years of age from the WellGen cohort (n = 1612). We also applied de novo k-means clustering to the WellGen cohort to validate the subgroups. We then compared clinical and metabolic-endocrine characteristics and the complication rates between the subgroups. We also compared characteristics of the WellGen subgroups with those of two young European cohorts, ANDIS (n = 962) and DIREVA (n = 420). Subgroups were also assessed in two other Indian cohorts, Ahmedabad (n = 187) and PHENOEINDY-2 (n = 205). Results Both Indian and European young-onset type 2 diabetes patients were predominantly classified into severe insulin-deficient (SIDD) and mild obesity-related (MOD) subgroups, while the severe insulin-resistant (SIRD) and mild age-related (MARD) subgroups were rare. In WellGen, SIDD (53%) was more common than MOD (38%), contrary to findings in Europeans (Swedish 26% vs 68%, Finnish 24% vs 71%, respectively). A higher proportion of SIDD compared with MOD was also seen in Ahmedabad (57% vs 33%) and in PHENOEINDY-2 (67% vs 23%). Both in Indians and Europeans, the SIDD subgroup was characterised by insulin deficiency and hyperglycaemia, MOD by obesity, SIRD by severe insulin resistance and MARD by mild metabolic-endocrine disturbances. In WellGen, nephropathy and retinopathy were more prevalent in SIDD compared with MOD while the latter had higher prevalence of neuropathy. Conclusions /interpretation Our data identified insulin deficiency as the major driver of type 2 diabetes in young Indians, unlike in young European individuals in whom obesity and insulin resistance predominate. Our results provide useful clues to pathophysiological mechanisms and susceptibility to complications in type 2 diabetes in the young Indian population and suggest a need to review management strategies.Peer reviewe