Search CORE

41 research outputs found

100,000 prize jackpot. Call now!: Identifying the pertinent features of SMS spam

Author: Henry Tan
Micah Sherr
Nazli Goharian
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/01/2012
Field of study

ABSTRACT Mobile SMS spam is on the rise and is a prevalent problem. While recent work has shown that simple machine learning techniques can distinguish between ham and spam with high accuracy, this paper explores the individual contributions of various textual features in the classification process. Our results reveal the surprising finding that simple is better: using the largest spam corpus of which we are aware, we find that using simple textual features is sufficient to provide accuracy that is nearly identical to that achieved by the best known techniques, while achieving a twofold speedup

CiteSeerX

A Survey of Email Spam Filtering Methods

Author: Sharma Madhvi
Sharma Sumit
Publication venue: Control Theory and Informatics
Publication date: 30/08/2018
Field of study

E-mail is one of the most secure medium for online communication and transferring data or messages through the web. An overgrowing increase in popularity, the number of unsolicited data has also increased rapidly. To filtering data, different approaches exist which automatically detect and remove these untenable messages. There are several numbers of email spam filtering technique such as Knowledge-based technique, Clustering techniques, Learning based technique, Heuristic processes and so on. This paper illustrates a survey of different existing email spam filtering system regarding Machine Learning Technique (MLT) such as Naive Bayes, SVM, K-Nearest Neighbor, Bayes Additive Regression, KNN Tree, and rules. However, here we present the classification, evaluation and comparison of different email spam filtering system Keywords: e-mail spam, spam filtering methods, machine learning technique, classification, SVM, AN

International Institute for Science, Technology and Education (IISTE): E-Journals

Penanganan Fitur Kontinyu dengan Feature Discretization Berbasis Expectation Maximization Clustering untuk Klasifikasi Spam Email Menggunakan Algoritma ID3

Author: Safuan S. (Safuan)
Supriyanto C. (Catur)
Wahono R. S. (Romi)
Publication venue: IlmuKomputer.com
Publication date: 01/01/2015
Field of study

Pemanfaatan jaringan internet saat ini berkembang begitu pesatnya, salah satunya adalah pengiriman surat elektronik atau email. Akhir-akhir ini ramai diperbincangkan adanya spam email. Spam email adalah email yang tidak diminta dan tidak diinginkan dari orang asing yang dikirim dalam jumlah besar ke mailing list, biasanya beberapa dengan sifat komersial. Adanya spam ini mengurangi produktivitas karyawan karena harus meluangkan waktu untuk menghapus pesan spam. Untuk mengatasi permasalahan tersebut dibutuhkan sebuah filter email yang akan mendeteksi keberadaan spam sehingga tidak dimunculkan pada inbox mail. Banyak peneliti yang mencoba untuk membuat filter email dengan berbagai macam metode, tetapi belum ada yang menghasilkan akurasi maksimal. Pada penelitian ini akan dilakukan klasifikasi dengan menggunakan algoritma Decision Tree Iterative Dicotomizer 3 (ID3) karena ID3 merupakan algoritma yang paling banyak digunakan di pohon keputusan, terkenal dengan kecepatan tinggi dalam klasifikasi, kemampuan belajar yang kuat dan konstruksi mudah. Tetapi ID3 tidak dapat menangani fitur kontinyu sehingga proses klasifikasi tidak bisa dilakukan. Pada penelitian ini, feature discretization berbasis Expectation Maximization (EM) Clustering digunakan untuk merubah fitur kontinyu menjadi fitur diskrit, sehingga proses klasifikasi spam email bisa dilakukan. Hasil eksperimen menunjukkan ID3 dapat melakukan klasifikasi spam email dengan akurasi 91,96% jika menggunakan data training 90%. Terjadi peningkatan sebesar 28,05% dibandingkan dengan klasifikasi ID3 menggunakan binning

Neliti

Active Multi-Field Learning for Spam Filtering

Author: Liu Wuying
Wang Lin
Xie Nan
Yi Mianzhu
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 11/02/2015
Field of study

Ubiquitous spam messages cause a serious waste of time and resources. This paper addresses the practical spam filtering problem, and proposes a universal approach to fight with various spam messages. The proposed active multi-field learning approach is based on: 1) It is cost-sensitive to obtain a label for a real-world spam filter, which suggests an active learning idea; and 2) Different messages often have a similar multi-field text structure, which suggests a multi-field learning idea. The multi-field learning framework combines multiple results predicted from field classifiers by a novel compound weight, and each field classifier calculates the arithmetical average of multiple conditional probabilities predicted from feature strings according to a data structure of string-frequency index. Comparing the current variance of field classifying results with the historical variance, the active learner evaluates the classifying confidence and regards the more uncertain message as the more informative sample for which to request a label. The experimental results show that the proposed approach can achieve the state-of-the-art performance at greatly reduced label requirements both in email spam filtering and short text spam filtering. Our active multi-field learning performance, the standard (1-ROCA) % measurement, even exceeds the full feedback performance of some advanced individual classifying algorithm

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

대표이사 변경 이후 주가 변동성 예측을 위한 뉴스의 대표이사 변경 사유 분류 자동화

Author: 함영석
Publication venue: 서울대학교 대학원
Publication date: 01/02/2023
Field of study

학위논문(석사) -- 서울대학교대학원 : 공과대학 산업공학과, 2023. 2. 조성준.A CEO turnover event is an event significantly influencing the company. The role of CEO at a firm is to manage overall operations, and thus a change in CEO could af fect not only the firms strategic direction but also consumer perception, investment decision and eventually the share price. Thus, shareholders and investors keep an eye on the change of CEO, especially on the reason why the CEO has changed. CEO turnover causes can be inferred from the detailed information about the firm such as the firm performance and stock price prior to the event. However, in financial news related to CEO turnover specifically describe the motivation of the turnover. In this paper, a machine learning thecniques such as the TF-IDF method and the fine-tuned DistilBERT language model were utilized to classify the turnover causes from financial news related to CEO turnover. The main contribution of this paper is to automate the manual labeling process to aid shareholders and investors to cap ture the investment opportunity in a timely manner. A contextualized embedding of news articles obtained from the language model is then further utilized as an additional feature for predicting the post-event stock volatility of a firm.대표이사 변경은 기업에서 발생하는 이벤트 중의 하나이며 해당 기업에 큰 영향을 준다. 대표이사의 역할은 기업의 전반적인 경영 전략 등을 담당하며, 때문에 대표이사의 변 경은 기업의 경영 전략뿐만 아니라 소비자 인식, 투자 전략 등에 영향을 주며 이는 해당 기업의 주가에도 반영된다. 그렇기 때문에 투자자들에게도 대표이사 변경은 눈여겨볼 이벤트이며, 특히 변경 사유는 투자자들이 주의하는 부분이다. 대표이사 변경 사유는 이벤트 발생 이전의 주가의 변동, 기업 실적 등을 통해서도 대략적으로 유추할 수 있 다. 하지만 대표이사 변경에 관련된 뉴스에는 보다 직접적으로 사유에 대해서 찾아볼 수 있다. 특별한 이유없이 나이가 들거나 그로인해 생긴 질병으로 인해 본인의 의지 로 대표이사직에서 물러나거나 특별한 이유로 인해 강제적으로 물러나는 등의 사유가 있을 수도 있고, 또 다른 경우에는 다음 후임자에 대한 정보도 파악할수 있다. 본 논 문에서는 뉴스로 부터 자연어처리를 통하여 대표이사 변경의 사유를 분류하는 모델을 제안한다. 기존의 수기로 레이블링 하는 방식을 자동화하는 것에 의의를 둔다. 단어의 빈도와 역 문서 빈도를 활용한 TF-IDF 모델을 변경 사유 분류 모델의 벤치마크 모델로 활용하고, 트랜스포머 구조의 사전학습된 언어모델을 사용하여 대표이사 변경 사유를 분류하는 테스크를 통하여 파인튜닝하는 과정에서 뉴스의 임베딩을 추출한다. 대표이사 변경 사유 분류를 통하여 사유에 따라 변경 이후 주가 변동성이 증가할 것이란 신호를 투자자들에게 제공함으로써 빠르게 투자 전략을 조정할 수 있도록 기여한다. 또한, 언 어모델에서 얻은 맥락을 포함한 벡터 임베딩을 활용하여 이벤트 발생 이후 해당 기업의 주가 변동성을 예측하는 모델을 구축하여 사유 분류 모델의 활용도를 실험하였다.1. Introduction 1 1.1 Background 1 1.2 Problem Description 2 1.3 Research Motivation and Contribution 4 1.4 Organization of the Thesis 6 2. Literature Review 7 2.1 CEO Turnover and Volatility 7 2.2 Machine Learning for Text Classification 8 2.3 Pretrained Language Model for Text Classification 9 3. Proposed Method 12 3.1 Overall Architecture 12 3.2 Machine Learning Text Classification 13 3.3 Fine-Tuning DistilBERT for Text Classification 18 3.4 Regression Model for Stock Volatility Prediction 20 4. Experiments and Results 23 4.1 Data 23 4.1.1 Label Engineering & Imbalance Dataset 29 4.2 Evaluation 34 4.3 Results 36 5. Conclusion 43 Bibliography 46 국문초록 52 감사의 글 54석

SNU Open Repository and Archive

That ain’t you: Blocking spearphishing through behavioral modelling

Author: Stringhini G
Thonnard O
Publication venue: Conference on Detection of Intrusions and Malware and Vulnerability Assessment (DIMVA)
Publication date: 01/01/2015
Field of study

One of the ways in which attackers steal sensitive information from corporations is by sending spearphishing emails. A typical spearphishing email appears to be sent by one of the victim’s coworkers or business partners, but has instead been crafted by the attacker. A particularly insidious type of spearphishing emails are the ones that do not only claim to be written by a certain person, but are also sent by that person’s email account, which has been compromised. Spearphishing emails are very dangerous for companies, because they can be the starting point to a more sophisticated attack or cause intellectual property theft, and lead to high financial losses. Currently, there are no effective systems to protect users against such threats. Existing systems leverage adaptations of anti-spam techniques. However, these techniques are often inadequate to detect spearphishing attacks. The reason is that spearphishing has very different characteristics from spam and even traditional phishing. To fight the spearphishing threat, we propose a change of focus in the techniques that we use for detecting malicious emails: instead of looking for features that are indicative of attack emails, we look for emails that claim to have been written by a certain person within a company, but were actually authored by an attacker. We do this by modelling the email-sending behavior of users over time, and comparing any subsequent email sent by their accounts against this model. Our approach can block advanced email attacks that traditional protection systems are unable to detect, and is an important step towards detecting advanced spearphishing attacks

UCL Discovery