55 research outputs found

    Artificial and Natural Topic Detection in Online Social Networks

    Get PDF
    Online Social Networks (OSNs), such as Twitter, offer attractive means of social interactions and communications, but also raise privacy and security issues. The OSNs provide valuable information to marketing and competitiveness based on users posts and opinions stored inside a huge volume of data from several themes, topics, and subjects. In order to mining the topics discussed on an OSN we present a novel application of Louvain method for TopicModeling based on communities detection in graphs by modularity. The proposed approach succeeded in finding topics in five different datasets composed of textual content from Twitter and Youtube. Another important contribution achieved was about the presence of texts posted by spammers. In this case, a particular behavior observed by graph community architecture (density and degree) allows the indication of a topic strength and the classification of it as natural or artificial. The later created by the spammers on OSNs

    Recognition on Online Social Network by user's writing style

    Get PDF
    Compromising legitimate accounts is the most popular way of disseminating fraudulent content in Online Social Networks (OSN). To address this issue, we propose an approach for recognition of compromised Twitter accounts based on Authorship Verification. Our solution can detect accounts that became compromised by analysing their user writing styles. This way, when an account content does not match its user writing style, we affirm that the account has been compromised, similar to Authorship Verification. Our approach follows the profile-based paradigm and uses N-grams as its kernel. Then, a threshold is found to represent the boundary of an account writing style. Experiments were performed using two subsampled datasets from Twitter. Experimental results showed the developed model is very suitable for compromised recognition of Online Social Networks accounts due to the capacity of recognizing user styles over 95% accuracy for both datasets

    Tailoring Machine Learning for Process Mining

    Full text link
    Machine learning models are routinely integrated into process mining pipelines to carry out tasks like data transformation, noise reduction, anomaly detection, classification, and prediction. Often, the design of such models is based on some ad-hoc assumptions about the corresponding data distributions, which are not necessarily in accordance with the non-parametric distributions typically observed with process data. Moreover, the learning procedure they follow ignores the constraints concurrency imposes to process data. Data encoding is a key element to smooth the mismatch between these assumptions but its potential is poorly exploited. In this paper, we argue that a deeper insight into the issues raised by training machine learning models with process data is crucial to ground a sound integration of process mining and machine learning. Our analysis of such issues is aimed at laying the foundation for a methodology aimed at correctly aligning machine learning with process mining requirements and stimulating the research to elaborate in this direction.Comment: 16 page

    Comparing Concept Drift Detection with Process Mining Software

    Get PDF
    Organisations have seen a rise in the volume of data corresponding to business processes being recorded. Handling process data is a meaningful way to extract relevant information from business processes with impact on the company's values. Nonetheless, business processes are subject to changes during their executions, adding complexity to their analysis. This paper aims at evaluating currently available process mining tools and software that handle concept drifts, i.e. changes over time of the statistical properties of the events occurring in a process. We provide an in-depth analysis of these tools, comparing their differences, advantages, and disadvantages by testing against a log taken from a Process Control System. Thus, by highlighting the trade-off between the software, the paper gives the stakeholders the best options regarding their case use

    Detecting and mitigating adversarial examples in regression tasks: A photovoltaic power generation forecasting case study

    Get PDF
    With data collected by Internet of Things sensors, deep learning (DL) models can forecast the generation capacity of photovoltaic (PV) power plants. This functionality is especially relevant for PV power operators and users as PV plants exhibit irregular behavior related to environmental conditions. However, DL models are vulnerable to adversarial examples, which may lead to increased predictive error and wrong operational decisions. This work proposes a new scheme to detect adversarial examples and mitigate their impact on DL forecasting models. This approach is based on one-class classifiers and features extracted from the data inputted to the forecasting models. Tests were performed using data collected from a real-world PV power plant along with adversarial samples generated by the Fast Gradient Sign Method under multiple attack patterns and magnitudes. One-class Support Vector Machine and Local Outlier Factor were evaluated as detectors of attacks to Long-Short Term Memory and Temporal Convolutional Network forecasting models. According to the results, the proposed scheme showed a high capability of detecting adversarial samples with an average F1-score close to 90%. Moreover, the detection and mitigation approach strongly reduced the prediction error increase caused by adversarial samples

    Discovering Attackers Past Behavior to Generate Online Hyper-Alerts

    Get PDF
    To support information security, organizations deploy Intrusion Detection Systems (IDS) that monitor information systems and networks, generating alerts for every suspicious behavior. However, the huge amount of alerts that an IDS triggers and their low-level representation make the alerts analysis a challenging task. In this paper, we propose a new approach based on hierarchical clustering that supports intrusion alert analysis in two main steps. First, it correlates historical alerts to identify the most common strategies attackers have used. Then, it associates upcoming alerts in real time according to the strategies discovered in the first step. The experiments were performed using a real dataset from the University of Maryland. The results showed that the proposed approach could properly identify the attack strategy patterns from historical alerts, and organize the upcoming alerts into a smaller amount of meaningful hyper-alerts

    Robust computer vision system for marbling meat segmentation

    Get PDF
    In this study, we developed a robust automatic computer vision system for marbling meat segmentation. Our approach can segment muscle fat in various marbled meat samples using images acquired with different quality devices in an uncontrolled environment, where there was external ambient light and artificial light; thus, professionals can apply this method without specialized knowledge in terms of sample treatments or equipment, as well as without disruption to normal procedures, thereby obtaining a robust solution. The proposed approach for marbling segmentation is based on data clustering and dynamic thresholding. Experiments were performed using two datasets that comprised 82 images of 41 longissimus dorsi muscles acquired by different sampling devices. The experimental results showed that the computer vision system performed well with over 98% accuracy and a low number of false positives, regardless of the acquisition device employed

    Recognition of Compromised Accounts on Twitter

    Get PDF
    In this work, we propose an approach for recognition of compromised Twitter accounts based on Authorship Verification. Our solution can detect accounts that became compromised by analysing their user writing styles. This way, when an account content does not match its user writing style, we affirm that the account has been compromised, similar to Authorship Verification. Our approach follows the profile-based paradigm and uses N-grams as its kernel. Then, a threshold is found to represent the boundary of an account writing style. Experiments were performed using a subsampled dataset from Twitter. Experimental results showed that the developed model is very suitable for compromised recognition of Online Social Networks accounts due to the capability of recognize user styles over 95% accuracy
    corecore