Search CORE

5 research outputs found

Benchmarking datasets for Anomaly-based Network Intrusion Detection: KDD CUP 99 alternatives

Author: Divekar Abhishek
Mishra Rudra
Parekh Meet
Savla Vaibhav
Shirole Mahesh
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/11/2018
Field of study

Machine Learning has been steadily gaining traction for its use in Anomaly-based Network Intrusion Detection Systems (A-NIDS). Research into this domain is frequently performed using the KDD~CUP~99 dataset as a benchmark. Several studies question its usability while constructing a contemporary NIDS, due to the skewed response distribution, non-stationarity, and failure to incorporate modern attacks. In this paper, we compare the performance for KDD-99 alternatives when trained using classification models commonly found in literature: Neural Network, Support Vector Machine, Decision Tree, Random Forest, Naive Bayes and K-Means. Applying the SMOTE oversampling technique and random undersampling, we create a balanced version of NSL-KDD and prove that skewed target classes in KDD-99 and NSL-KDD hamper the efficacy of classifiers on minority classes (U2R and R2L), leading to possible security risks. We explore UNSW-NB15, a modern substitute to KDD-99 with greater uniformity of pattern distribution. We benchmark this dataset before and after SMOTE oversampling to observe the effect on minority performance. Our results indicate that classifiers trained on UNSW-NB15 match or better the Weighted F1-Score of those trained on NSL-KDD and KDD-99 in the binary case, thus advocating UNSW-NB15 as a modern substitute to these datasets.Comment: Paper accepted into Proceedings of IEEE International Conference on Computing, Communication and Security 2018 (ICCCS-2018) Statistics: 8 pages, 7 tables, 3 figures, 34 reference

arXiv.org e-Print Archive

Crossref

Identifying Trades Using Technical Analysis and ML/DL Models

Author: Chawan Prof. Pramila M.
Deliwala Nirmit
Doshi Mann
Parekh Meet
Shah Aayush
Publication venue
Publication date: 12/04/2023
Field of study

The importance of predicting stock market prices cannot be overstated. It is a pivotal task for investors and financial institutions as it enables them to make informed investment decisions, manage risks, and ensure the stability of the financial system. Accurate stock market predictions can help investors maximize their returns and minimize their losses, while financial institutions can use this information to develop effective risk management policies. However, stock market prediction is a challenging task due to the complex nature of the stock market and the multitude of factors that can affect stock prices. As a result, advanced technologies such as deep learning are being increasingly utilized to analyze vast amounts of data and provide valuable insights into the behavior of the stock market. While deep learning has shown promise in accurately predicting stock prices, there is still much research to be done in this area.Comment: 14 pages, 9 figures, 5 table

arXiv.org e-Print Archive

NEmo – news that triggers emotions, an affectively-annotated dataset of gun violence news

Author: Betke Margrit
Gao Ge
Guo Lei
Paik Sejin
Parekh Meet
Reardon Carley
Wijaya Derry
Zhao Yanling
Publication venue
Publication date: 20/06/2022
Field of study

Given our society’s increased exposure to multimedia formats on social media platforms, efforts to understand how digital content impacts people’s emotions are burgeoning. As such, we introduce a U.S. gun violence news dataset that contains news headline and image pairings from 840 news articles with 15K high-quality, crowdsourced annotations on emotional responses to the news pairings. We created three experimental conditions for the annotation process: two with a single modality (headline or image only), and one multimodal (headline and image together). In contrast to prior works on affectively-annotated data, our dataset includes annotations on the dominant emotion experienced with the content, the intensity of the selected emotion and an open-ended, written component. By collecting annotations on different modalities of the same news content pairings, we explore the relationship between image and text influence on human emotional response. We offer initial analysis on our dataset, showing the nuanced affective differences that appear due to modality and individual factors such as political leaning and media consumption habits. Our dataset is made publicly available to facilitate future research in affective computing.http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.267.pdfPublished versio

Boston University Institutional Repository (OpenBU)

BU-NEmo: news and emotions dataset

Author: Betke Margrit
Gao Ge
Guo Lei
Paik Sejin
Parekh Meet
Reardon Carley
Wijaya D.
Zhao Yanling
Publication venue
Publication date: 19/01/2023
Field of study

BU-NEmo is a multimodal affective dataset of gun violence news content. BU-NEmo extends the Gun Violence Framing Corpus (GVFC) proposed by Liu et. al (2019) and Tourni et. al (2021), which contains pairs of news headlines and lead images and their "frames" (view points) from gun violence-related articles. The extension concerns the results of an annotation experiment that evaluates the effect of the news content on the emotions of news consumers. The data in BU-NEmo are annotated with three types of affective annotations: (1) The emotion the annotator feels from looking at the content, out of the following 8 classes: Amusement, Awe, Contentment, Excitement, Fear, Sadness, Anger, Disgust. (2) The intensity of the annotator's emotional response, on a scale from 1-5 (5 being the most intense). (3) A free-text written response explaining their emotional response, structured as "I feel because." These annotations were collected in three experimental conditions: only the headline text was presented, only the image, and text and image together. By comparing the annotations across these three conditions, the relationship between news modality, frames, and emotional response can be studied.IIS-1838193 - National Science Foundationhttps://github.com/Tdrinker/NEmo-datase

Boston University Institutional Repository (OpenBU)

Subgroups of patients with young-onset type 2 diabetes in India reveal insulin deficiency as a major driver

Author: Ahlqvist Emma
Asplund Olof
Bhat Dattatrey
Datta Anupam
Groop Leif
Kakati Sanjeeb
Karajamaki Annemari
Kunte Pooja
Parekh Malay
Phatak Sanat
Prasad Rashmi B.
Saboo Banshi
Shah Meet
Shukla Sharvari R.
Tuomi Tiinamaija
Wagh Rucha
Yajnik Chittaranjan S.
Publication venue
Publication date: 01/01/2022
Field of study

Correction: Article Numbere3001442 DOI10.1007/s00125-021-05620-2 Early AccessNOV 2021Aim/hypothesis Five subgroups were described in European diabetes patients using a data driven machine learning approach on commonly measured variables. We aimed to test the applicability of this phenotyping in Indian individuals with young-onset type 2 diabetes. Methods We applied the European-derived centroids to Indian individuals with type 2 diabetes diagnosed before 45 years of age from the WellGen cohort (n = 1612). We also applied de novo k-means clustering to the WellGen cohort to validate the subgroups. We then compared clinical and metabolic-endocrine characteristics and the complication rates between the subgroups. We also compared characteristics of the WellGen subgroups with those of two young European cohorts, ANDIS (n = 962) and DIREVA (n = 420). Subgroups were also assessed in two other Indian cohorts, Ahmedabad (n = 187) and PHENOEINDY-2 (n = 205). Results Both Indian and European young-onset type 2 diabetes patients were predominantly classified into severe insulin-deficient (SIDD) and mild obesity-related (MOD) subgroups, while the severe insulin-resistant (SIRD) and mild age-related (MARD) subgroups were rare. In WellGen, SIDD (53%) was more common than MOD (38%), contrary to findings in Europeans (Swedish 26% vs 68%, Finnish 24% vs 71%, respectively). A higher proportion of SIDD compared with MOD was also seen in Ahmedabad (57% vs 33%) and in PHENOEINDY-2 (67% vs 23%). Both in Indians and Europeans, the SIDD subgroup was characterised by insulin deficiency and hyperglycaemia, MOD by obesity, SIRD by severe insulin resistance and MARD by mild metabolic-endocrine disturbances. In WellGen, nephropathy and retinopathy were more prevalent in SIDD compared with MOD while the latter had higher prevalence of neuropathy. Conclusions /interpretation Our data identified insulin deficiency as the major driver of type 2 diabetes in young Indians, unlike in young European individuals in whom obesity and insulin resistance predominate. Our results provide useful clues to pathophysiological mechanisms and susceptibility to complications in type 2 diabetes in the young Indian population and suggest a need to review management strategies.Peer reviewe

Lund University Publications

PubMed Central

Helsingin yliopiston digitaalinen arkisto