Search CORE

48 research outputs found

Recommended from our members

Toward practical and private online services

Author: Gupta Trinabh
Publication venue
Publication date: 31/01/2018
Field of study

Today's common online services (social networks, media streaming, messaging, email, etc.) bring convenience. However, these services are susceptible to privacy leaks. Certainly, email snooping by rogue employees, email server hacks, and accidental disclosures of user ratings for movies are some sources of private information leakage. This dissertation investigates the following question: Can we build systems that (a) provide strong privacy guarantees to the users, (b) are consistent with existing commercial and policy regimes, and (c) are affordable? Satisfying all three requirements simultaneously is challenging, as providing strong privacy guarantees usually necessitates either sacrificing functionality, incurring high resource costs, or both. Indeed, there are powerful cryptographic protocols---private information retrieval (PIR), and secure two-party computation (2PC)---that provide strong guarantees but are orders of magnitude more expensive than their non-private counterparts. This dissertation takes these protocols as a starting point and then substantially reduces their costs by tailoring them using application-specific properties. It presents two systems, Popcorn and Pretzel, built on this design ethos. Popcorn is a Netflix-like media delivery system, that provably hides, even from the content distributor (for example, Netflix), which movie a user is watching. Popcorn tailors PIR protocols to the media domain. It amortizes the server-side overhead of PIR by batching requests from the large number of concurrent users retrieving content at any given time; and, it forms large batches without introducing playback delays by leveraging the properties of media streaming. Popcorn is consistent with the prevailing commercial regime (copyrights, etc.), and its per-request dollar cost is 3.87 times that of a non-private system. The other system described in this dissertation, Pretzel, is an email system that encrypts emails end-to-end between senders and intended recipients, but allows the email service provider to perform content-based spam filtering and targeted advertising. Pretzel refines a 2PC protocol. It reduces the resource consumption of the protocol by replacing the underlying encryption scheme with a more efficient one, applying a packing technique to conserve invocations of the encryption algorithm, and pruning the inputs to the protocol. Pretzel's costs, versus a legacy non-private implementation, are estimated to be up to 5.4 times for the email provider, with additional but modest client-side requirements. Popcorn and Pretzel have fundamental connections. For instance, the cryptographic protocols in both systems securely compute vector-matrix products. However, we observe that differences in the vector and matrix dimensions lead to different system designs. Ultimately, both systems represent a potentially appealing compromise: sacrifice some functionality to build in strong privacy properties at affordable costs.Computer Science

Texas ScholarWorks

SMS Spam Detection in a Real-World Platform using Machine Learning

Author: Rodriguez Villanueva Cesar Adolfo
Publication venue: Helsingfors universitet
Publication date: 01/01/2019
Field of study

Spam detection techniques have made our lives easier by unclogging our inboxes and keeping unsafe messages from being opened. With the automation of text messaging solutions and the increase in telecommunication companies and message providers, the volume of text messages has been on the rise. With this growth came along malicious traffic which users had little control over. In this thesis, we present an implementation of a spam detection system in a real-world text messaging platform. Using well-established machine learning algorithms, we make an in-depth analysis on the performance of the models using two different datasets: one publicly available (N=5,574) and the other gathered from actual traffic of the platform (N=1,477). Making use of the empirical results, we outline the models and hyperparameters which can be used in the platform and in which scenarios they produce optimal performance. The results indicate that our dataset poses a great challenge at accurate classification, most likely due to the small sample size and unbalanced dataset, along with nuances in the dataset. Nevertheless, there were models that were found to have a good all-around performance and they can be trained and used in the platform

Helsingin yliopiston digitaalinen arkisto

Artificial Intelligence Approaches for Filtering of Spams

Author: Matula Tomáš
Publication venue: Vysoké učení technické v Brně. Fakulta informačních technologií
Publication date: 01/01/2014
Field of study

Diplomová práce se zaměřuje na klasifikaci elektronické pošty. Popisuje základní způsoby filtrování nevyžádané pošty. Následně se zabývá bayesovskými klasifikátory spamu a umělými imunitními systémy. Popisuje existující aplikace a metriky vyhodnocování výsledků. Cílem práce je navrhnout a implementovat algoritmus na filtrování spamu. Nakonec porovnává získané výsledky s vybranými známými metodami.This thesis focuses on the e-mail classification and describes the basic ways of spam filtering. The Bayesian spam classifiers and artificial immune systems are analyzed and applied in this thesis. Furthermore, existing applications and evaluation metrics are described. The aim of this thesis is to design and implement an algorithm for spam filtering. Ultimately, the results are compared with selected known methods.

Digital library of Brno University of Technology

National Repository of Grey Literature

Sentiment analysis in context: Investigating the use of BERT and other techniques for ChatBot improvement

Author: INNOCENTE SIMONE
Publication venue
Publication date: 25/07/2023
Field of study

openIn an increasingly digitized world, where large amounts of data are generated daily, its efficient analysis has become more and more stringent. Natural Language Processing (NLP) offers a solution by exploiting the power of artificial intelligence to process texts, to understand their content and to perform specific tasks. The thesis is based on an internship at Pat Srl, a company devoted to create solutions to support digital innovation, process automation, and service quality with the ultimate goal of improving leadership and customer satisfaction. The primary objective of this thesis is to develop a sentiment analysis model in order to improve the customer experience for clients using the ChatBot system created by the company itself. This task has gained significant attention in recent years as it can be applied to different fields, including social media monitoring, market research, brand monitoring or customer experience and feedback analysis. Following a careful analysis of the available data, a comprehensive evaluation of various models was conducted. Notably, BERT, a large language model that has provided promising results in several NLP tasks, emerged among all. Different approaches utilizing the BERT models were explored, such as the fine-tuning modality or the architectural structure. Moreover, some preprocessing steps of the data were emphasized and studied, due to the particular nature of the sentiment analysis task. During the course of the internship, the dataset underwent revisions aimed to mitigate the problem of inaccurate predictions. Additionally, techniques for data balancing were tested and evaluated, enhancing the overall quality of the analysis. Another important aspect of this project involved the deployment of the model. In a business environment, it is essential to carefully consider and balance resources before transitioning to production. The model distribution was carried out using specific tools, such as Docker and Kubernetes. These specialized technologies played a pivotal role in ensuring efficient and seamless deployment.In an increasingly digitized world, where large amounts of data are generated daily, its efficient analysis has become more and more stringent. Natural Language Processing (NLP) offers a solution by exploiting the power of artificial intelligence to process texts, to understand their content and to perform specific tasks. The thesis is based on an internship at Pat Srl, a company devoted to create solutions to support digital innovation, process automation, and service quality with the ultimate goal of improving leadership and customer satisfaction. The primary objective of this thesis is to develop a sentiment analysis model in order to improve the customer experience for clients using the ChatBot system created by the company itself. This task has gained significant attention in recent years as it can be applied to different fields, including social media monitoring, market research, brand monitoring or customer experience and feedback analysis. Following a careful analysis of the available data, a comprehensive evaluation of various models was conducted. Notably, BERT, a large language model that has provided promising results in several NLP tasks, emerged among all. Different approaches utilizing the BERT models were explored, such as the fine-tuning modality or the architectural structure. Moreover, some preprocessing steps of the data were emphasized and studied, due to the particular nature of the sentiment analysis task. During the course of the internship, the dataset underwent revisions aimed to mitigate the problem of inaccurate predictions. Additionally, techniques for data balancing were tested and evaluated, enhancing the overall quality of the analysis. Another important aspect of this project involved the deployment of the model. In a business environment, it is essential to carefully consider and balance resources before transitioning to production. The model distribution was carried out using specific tools, such as Docker and Kubernetes. These specialized technologies played a pivotal role in ensuring efficient and seamless deployment

Padua Thesis and Dissertation Archive

Applied Machine Learning for Cybersecurity in Spam Filtering and Malware Detection

Author: Sokolov Mark
Publication venue: 'East Carolina University'
Publication date: 18/12/2020
Field of study

Machine learning is one of the fastest-growing fields and its application to cybersecurity is increasing. In order to protect people from malicious attacks, several machine learning algorithms have been used to predict the malicious attacks. This research emphasizes two vulnerable areas of cybersecurity that could be easily exploited. First, we show that spam filtering is a well known problem that has been addressed by many authors, yet it still has vulnerabilities. Second, with the increase of malware threats in our world, a lot of companies use AutoAI to help protect their systems. Nonetheless, AutoAI is not perfect, and data scientists can still design better models. In this thesis I show that although there are efficient mechanisms to prevent malicious attacks, there are still vulnerabilities that could be easily exploited. In the visual spoofing experiment, we show that using a classifier trained on data using Latin alphabet, to classify a message with a combination of Latin and Cyrillic letters leads to much lower classification accuracy. In Malware prediction experiment, our model has been able to predict malware attacks on Microsoft computers and got higher accuracy than any well known Auto AI

ScholarShip

A Context-Dependent Supervised Learning Approach to Sentiment Detection in Large Textual Databases

Author: Gindl Stefan
Scharl Arno
Weichselbraun Albert
Publication venue: Brazilian Computer Society Special Interest Group on Databases
Publication date: 01/01/2010
Field of study

Sentiment detection automatically identifies emotions in textual data. The increasing amount of emotive documents available in corporate databases and on the World Wide Web calls for automated methods to process this important source of knowledge. Sentiment detection draws attention from researchers and practitioners alike - to enrich business intelligence applications, for example, or to asure the impact of customer reviews on purchasing decisions. Most sentiment detection approaches do not consider language ambiguity, despite the fact that one and the same sentiment term might differ in polarity depending on the context, in which a statement is made. To address this shortcoming, this paper introduces a novel method that uses Naïve Bayes to identify ambiguous terms. A contextualized sentiment lexicon stores the polarity of these terms, together with a set of co-occurring context terms. A formal evaluation of the assigned polarities confirms that considering the usage context of ambiguous terms improves the accuracy of high-throughput sentiment detection methods. Such methods are a prerequisite for using sentiment as a metadata element in storage and distributed file-level intelligence applications, as well as in enterprise portals that provide a semantic repository of an organization's information assets

CiteSeerX

webLyzard technology gmbh

Performance of Gaussian Naïve Bayes for classification with dependencies from Archemedian copula

Author: Winston Hugh E.
Publication venue
Publication date: 01/05/2022
Field of study

Master's Project (M.S.) University of Alaska Fairbanks, 2022Naive Bayes is an application of Bayes theorem in which the likelihood function is factored into marginals by making the assumption that the variables are independent. Naive Bayes is typically used for classification problems in which the goal is to find the class with the largest probability given the data on hand. When the data on hand are continuous real numbers we can further assume they are class conditionally normally distributed, which is a particular version of Naive Bayes called Gaussian Naive Bayes. This paper explores when Gaussian Naive Bayes classification problems work well vs when they do not. Typically when assumptions are not valid, valid conclusions cannot be drawn. However, Naive Bayes is known to be robust even when the independence assumption is not met. We show using simulations that binary classification accuracy of Naive Bayes is much more sensitive to differences in the class conditional marginal distributions than the correlation between predictors. Additionally we show that Naive Bayes completely fails when predictors are generated using a Gumbel copula and compare results with a general Bayes classifier and the K-Nearest Neighbors classifier

ScholarWorks@UA

Machine Learning Algorithms for Smart Data Analysis in Internet of Things Environment: Taxonomies and Research Trends

Author: Anabi Hilary Kelechi
Khalid Yahya
MOHAMMED H. ALSHARIF
Shehzad Ashraf Chaudhry
Publication venue
Publication date: 01/01/2020
Field of study

Machine learning techniques will contribution towards making Internet of Things (IoT) symmetric applications among the most significant sources of new data in the future. In this context, network systems are endowed with the capacity to access varieties of experimental symmetric data across a plethora of network devices, study the data information, obtain knowledge, and make informed decisions based on the dataset at its disposal. This study is limited to supervised and unsupervised machine learning (ML) techniques, regarded as the bedrock of the IoT smart data analysis. This study includes reviews and discussions of substantial issues related to supervised and unsupervised machine learning techniques, highlighting the advantages and limitations of each algorithm, and discusses the research trends and recommendations for further study

Covenant University Repository

Hate Speech Detection for Banjarese Languages on Instagram Using Machine Learning Methods

Author: Abdi Muhammad Nur
Abidin Ahmad Zainul
Alkaff Muhammad
Amalia Raisa
Fachrurrazi Muhammad
Miqdad Muhammad Afrizal
Publication venue: LPPM Universitas Bumigora
Publication date: 07/07/2023
Field of study

Hate speech refers to verbal expression or communication that aims to provoke or discriminate against individuals. The Ministry of Communication and Information of Indonesia has encountered and dealt with 3,640 cases of hate speech transmitted through digital channels between 2018 and 2021. Particularly in South Kalimantan, hate speech in the local language, Banjarese has become increasingly prevalent in recent years. Surprisingly, there is a lack of research on using machine learning to detect hate speech in the Banjarese language, specifically on Instagram. Therefore, this study aimed to address this gap by constructing a dataset of Banjarese language hate speech and comparing various feature extraction and machine learning models to detect Banjarese language hate speech effectively. Thisresearch used several feature extraction techniques and machine learning methods to detect Banjareselanguage hate speech. The feature extraction methods used were Word N-Gram, Term Frequency- Inverse Document Frequency (TF-IDF), a combination of Word N-Gram and TF-IDF, Word2Vec, and Glove, while the machine learning methods used were Support Vector Machine (SVM), Na¨ıve Bayes, and Decision Tree. The results of this study revealed that the combination of TF-IDF for feature extraction and SVM as the model achieves exceptional performance. The average Recall, Precision, Accuracy, and F1-Score score exceeded 90%, demonstrating the model’s ability to identify Banjarese hate speech accurately

Open Journal System (OJS) Universitas Bumigora