31 research outputs found

    Click Fraud Detection in Online and In-app Advertisements: A Learning Based Approach

    Get PDF
    Click Fraud is the fraudulent act of clicking on pay-per-click advertisements to increase a site’s revenue, to drain revenue from the advertiser, or to inflate the popularity of content on social media platforms. In-app advertisements on mobile platforms are among the most common targets for click fraud, which makes companies hesitant to advertise their products. Fraudulent clicks are supposed to be caught by ad providers as part of their service to advertisers, which is commonly done using machine learning methods. However: (1) there is a lack of research in current literature addressing and evaluating the different techniques of click fraud detection and prevention, (2) threat models composed of active learning systems (smart attackers) can mislead the training process of the fraud detection model by polluting the training data, (3) current deep learning models have significant computational overhead, (4) training data is often in an imbalanced state, and balancing it still results in noisy data that can train the classifier incorrectly, and (5) datasets with high dimensionality cause increased computational overhead and decreased classifier correctness -- while existing feature selection techniques address this issue, they have their own performance limitations. By extending the state-of-the-art techniques in the field of machine learning, this dissertation provides the following solutions: (i) To address (1) and (2), we propose a hybrid deep-learning-based model which consists of an artificial neural network, auto-encoder and semi-supervised generative adversarial network. (ii) As a solution for (3), we present Cascaded Forest and Extreme Gradient Boosting with less hyperparameter tuning. (iii) To overcome (4), we propose a row-wise data reduction method, KSMOTE, which filters out noisy data samples both in the raw data and the synthetically generated samples. (iv) For (5), we propose different column-reduction methods such as multi-time-scale Time Series analysis for fraud forecasting, using binary labeled imbalanced datasets and hybrid filter-wrapper feature selection approaches

    Semi-Automatic Classification of Cementitious Materials using Scanning Electron Microscope Images

    No full text
    International audienceSegmentation and classification are prolific research topics in the image processing community, which have been more and more used in the context of analysis of cementitious materials, on images acquired with Scanning Electron Microscopes (SEM). Indeed, there is a need to be able to detect and to quantify the materials present in a cement paste in order to follow the chemical reactions occurring in the material even days after the solidification. In this paper, we propose a new approach for segmentation and classification of cementitious materials based on the denoising of the data with the Block Matching 3D (BM3D) algorithm, Binary Partition Tree (BPT) segmentation, Support Vector Machines (SVM) classification, and the interactivity with the user. The BPT provides a hierarchical representation of the spatial regions of the data, allowing a segmentation to be selected among the admissible partitions of the image. SVMs are used to obtain a classification map of the image. This approach combines state-of-the-art image processing tools with the interactivity with the user to allow a better segmentation to be performed, or to help the classifier discriminate the classes better. We show that the proposed approach outperforms a previous method on synthetic data and several real datasets coming from cement samples, both qualitatively with visual examination and quantitatively with the comparison of experimental results with theoretical ones

    Cyber Security and Critical Infrastructures 2nd Volume

    Get PDF
    The second volume of the book contains the manuscripts that were accepted for publication in the MDPI Special Topic "Cyber Security and Critical Infrastructure" after a rigorous peer-review process. Authors from academia, government and industry contributed their innovative solutions, consistent with the interdisciplinary nature of cybersecurity. The book contains 16 articles, including an editorial that explains the current challenges, innovative solutions and real-world experiences that include critical infrastructure and 15 original papers that present state-of-the-art innovative solutions to attacks on critical systems

    Novel pattern recognition methods for classification and detection in remote sensing and power generation applications

    Get PDF
    Novel pattern recognition methods for classification and detection in remote sensing and power generation application

    Finding the online cry for help : automatic text classification for suicide prevention

    Get PDF
    Successful prevention of suicide, a serious public health concern worldwide, hinges on the adequate detection of suicide risk. While online platforms are increasingly used for expressing suicidal thoughts, manually monitoring for such signals of distress is practically infeasible, given the information overload suicide prevention workers are confronted with. In this thesis, the automatic detection of suicide-related messages is studied. It presents the first classification-based approach to online suicidality detection, and focuses on Dutch user-generated content. In order to evaluate the viability of such a machine learning approach, we developed a gold standard corpus, consisting of message board and blog posts. These were manually labeled according to a newly developed annotation scheme, grounded in suicide prevention practice. The scheme provides for the annotation of a post's relevance to suicide, and the subject and severity of a suicide threat, if any. This allowed us to derive two tasks: the detection of suicide-related posts, and of severe, high-risk content. In a series of experiments, we sought to determine how well these tasks can be carried out automatically, and which information sources and techniques contribute to classification performance. The experimental results show that both types of messages can be detected with high precision. Therefore, the amount of noise generated by the system is minimal, even on very large datasets, making it usable in a real-world prevention setting. Recall is high for the relevance task, but at around 60%, it is considerably lower for severity. This is mainly attributable to implicit references to suicide, which often go undetected. We found a variety of information sources to be informative for both tasks, including token and character ngram bags-of-words, features based on LSA topic models, polarity lexicons and named entity recognition, and suicide-related terms extracted from a background corpus. To improve classification performance, the models were optimized using feature selection, hyperparameter, or a combination of both. A distributed genetic algorithm approach proved successful in finding good solutions for this complex search problem, and resulted in more robust models. Experiments with cascaded classification of the severity task did not reveal performance benefits over direct classification (in terms of F1-score), but its structure allows the use of slower, memory-based learning algorithms that considerably improved recall. At the end of this thesis, we address a problem typical of user-generated content: noise in the form of misspellings, phonetic transcriptions and other deviations from the linguistic norm. We developed an automatic text normalization system, using a cascaded statistical machine translation approach, and applied it to normalize the data for the suicidality detection tasks. Subsequent experiments revealed that, compared to the original data, normalized data resulted in fewer and more informative features, and improved classification performance. This extrinsic evaluation demonstrates the utility of automatic normalization for suicidality detection, and more generally, text classification on user-generated content

    Automated CTC Classification, Enumeration and Pheno Typing:Where Math meets Biology

    Get PDF

    Clinical decision support system for early detection and diagnosis of dementia

    Get PDF
    Dementia is a syndrome caused by a chronic or progressive disease of the brain, which affects memory, orientation, thinking, calculation, learning ability and language. Until recently, early diagnosis of dementia was not a high priority, since the related diseases were considered untreatable and irreversible. However, more effective treatments are becoming available, which can slow the progress of dementia if they are used in the early stages of the disease. Therefore, early diagnosis is becoming more important. The Clock Drawing Test (CDT) and Mini Mental State Examination (MMSE) are well-known cognitive assessment tests. A known obstacle to the wider usage of the CDT assessments is the scoring and interpretation of the results. This thesis introduces a novel diagnostic Clinical Decision Support System (CDSS) based on CDT which can help in the diagnosis of three stages of dementia. It also introduces the advanced methods developed for the interpretation and analysis of CDTs. The data used in this research consist of 604 clock drawings produced by dementia patients and healthy individuals. A comprehensive catalogue of 47 visual features within CDT drawings is proposed to enhance the sensitivity of the CDT in diagnosing the early stages of dementia. These features are selected following a comprehensive analysis of the available data and the most common CDT scoring systems reported in the medical literature. These features are used to build a new digitised dataset necessary for training and validating the proposed CDSS. In this thesis, a novel feature selection method is proposed for the study of CDT feature significance and to define the most important features in diagnosing dementia. iii A new framework is also introduced to analyse the temporal changes in the CDT features corresponding to the progress of dementia over time, and to define the first onset symptoms. The proposed CDSS is designed to differentiate between four cognitive function statuses: (i) normal; (ii) mild cognitive impairment or mild dementia; (iii) moderate or severe dementia; and (vi) functional. This represents a new application of the CDT, as it was previously used only to detect the positive dementia cases. Diagnosing mild cognitive impairment or early stage dementia using CDT as a standalone tool is a very challenging task. To address this, a novel cascade classifier is proposed, which benefits from combining CDT and MMSE to enhance the overall performance of the system. The proposed CDSS diagnoses the CDT drawings and places them into one of three cognitive statuses (normal or functional, mild cognitive impairment or mild dementia, and moderate or severe dementia) with an accuracy of 78.34 %. Moreover, the proposed CDSS can distinguish between the normal and the abnormal cases with accuracy of 89.54 %. The achieved results are good and outperform most of CDT scoring systems in discriminating between normal and abnormal cases as reported in existing literature. Moreover, the system shows a good performance in diagnosing the CDT drawings into one of the three cognitive statuses, even comparing well with the performance of dementia specialists. The research has been granted ethical approval from the South East Wales Research Ethics Committee to employ anonymised copies of clock drawings and copies of Mini Mental State Examination made by patients during their examination by the memory team in Llandough hospital, Cardif
    corecore