6,029 research outputs found
Exposing and explaining fake news on-the-fly
Social media platforms enable the rapid dissemination and consumption of information. However, users instantly consume such content regardless of the reliability of the shared data. Consequently, the latter crowdsourcing model is exposed to manipulation. This work contributes with an explainable and online classification method to recognize fake news in real-time. The proposed method combines both unsupervised and supervised Machine Learning approaches with online created lexica. The profiling is built using creator-, content- and context-based features using Natural Language Processing techniques. The explainable classification mechanism displays in a dashboard the features selected for classification and the prediction confidence. The performance of the proposed solution has been validated with real data sets from Twitter and the results attain 80% accuracy and macro F-measure. This proposal is the first to jointly provide data stream processing, profiling, classification and explainability. Ultimately, the proposed early detection, isolation and explanation of fake news contribute to increase the quality and trustworthiness of social media contentsXunta de Galicia | Ref. ED481B-2021-118Xunta de Galicia | Ref. ED481B-2022-093Fundação para a Ciência e a Tecnologia | Ref. UIDB/50014/2020Universidade de Vigo/CISU
Recommended from our members
Towards real-time feature tracking technique using adaptive micro-clusters
Data streams are unbounded, sequential data instances that are generated with high velocity. Classifying sequential data instances is a very challenging problem in machine learning with applications in network intrusion detection, financial markets and sensor networks. Data stream classification is concerned with the automatic labelling of unseen instances from the stream in real-time. For this the classifier needs to adapt to concept drifts and can only have a single pass through the data if the stream is fast. This research paper presents our work on a real-time pre-processing technique, in particular a feature tracking technique that takes concept drift into consideration. The feature tracking technique is designed to improve Data Stream Mining (DSM) classification algorithms by enabling real-time feature selection. The technique is based on adaptive summaries of the data and class distributions, known as Micro-Clusters. Currently the technique is able to detect concept drift and identifies which features have been involved
Process-Oriented Stream Classification Pipeline:A Literature Review
Featured Application: Nowadays, many applications and disciplines work on the basis of stream data. Common examples are the IoT sector (e.g., sensor data analysis), or video, image, and text analysis applications (e.g., in social media analytics or astronomy). With our work, we gather different approaches and terminology, and give a broad overview over the topic. Our main target groups are practitioners and newcomers to the field of data stream classification. Due to the rise of continuous data-generating applications, analyzing data streams has gained increasing attention over the past decades. A core research area in stream data is stream classification, which categorizes or detects data points within an evolving stream of observations. Areas of stream classification are diverse—ranging, e.g., from monitoring sensor data to analyzing a wide range of (social) media applications. Research in stream classification is related to developing methods that adapt to the changing and potentially volatile data stream. It focuses on individual aspects of the stream classification pipeline, e.g., designing suitable algorithm architectures, an efficient train and test procedure, or detecting so-called concept drifts. As a result of the many different research questions and strands, the field is challenging to grasp, especially for beginners. This survey explores, summarizes, and categorizes work within the domain of stream classification and identifies core research threads over the past few years. It is structured based on the stream classification process to facilitate coordination within this complex topic, including common application scenarios and benchmarking data sets. Thus, both newcomers to the field and experts who want to widen their scope can gain (additional) insight into this research area and find starting points and pointers to more in-depth literature on specific issues and research directions in the field.</p
On-device modeling of user's social context and familiar places from smartphone-embedded sensor data
Context modeling and recognition are crucial for adaptive mobile and
ubiquitous computing. Context-awareness in mobile environments relies on prompt
reactions to context changes. However, current solutions focus on limited
context information processed on centralized architectures, risking privacy
leakage and lacking personalization. On-device context modeling and recognition
are emerging research trends, addressing these concerns. Social interactions
and visited locations play significant roles in characterizing daily life
scenarios. This paper proposes an unsupervised and lightweight approach to
model the user's social context and locations directly on the mobile device.
Leveraging the ego-network model, the system extracts high-level, semantic-rich
context features from smartphone-embedded sensor data. For the social context,
the approach utilizes data on physical and cyber social interactions among
users and their devices. Regarding location, it prioritizes modeling the
familiarity degree of specific locations over raw location data, such as GPS
coordinates and proximity devices. The effectiveness of the proposed approach
is demonstrated through three sets of experiments, employing five real-world
datasets. These experiments evaluate the structure of social and location ego
networks, provide a semantic evaluation of the proposed models, and assess
mobile computing performance. Finally, the relevance of the extracted features
is showcased by the improved performance of three machine learning models in
recognizing daily-life situations. Compared to using only features related to
physical context, the proposed approach achieves a 3% improvement in AUROC, 9%
in Precision, and 5% in Recall
Machine Learning for Financial Prediction Under Regime Change Using Technical Analysis: A Systematic Review
Recent crises, recessions and bubbles have stressed the non-stationary nature and the presence of drastic structural changes in the financial domain. The most recent literature suggests the use of conventional machine learning and statistical approaches in this context. Unfortunately, several of these techniques are unable or slow to adapt to changes in the price-generation process. This study aims to survey the relevant literature on Machine Learning for financial prediction under regime change employing a systematic approach.
It reviews key papers with a special emphasis on technical analysis. The study discusses the growing number of contributions that are bridging the gap between two separate communities, one focused on data stream learning and the other on economic research. However, it also makes apparent that we are still in an early stage. The range of machine learning algorithms that have been tested in this domain is very wide, but the results of the study do not suggest that currently there is a specific technique that is clearly dominant
Denial of Service in Web-Domains: Building Defenses Against Next-Generation Attack Behavior
The existing state-of-the-art in the field of application layer Distributed Denial of Service (DDoS) protection is generally designed, and thus effective, only for static web domains. To the best of our knowledge, our work is the first that studies the problem of application layer DDoS defense in web domains of dynamic content and organization, and for next-generation bot behaviour. In the first part of this thesis, we focus on the following research tasks: 1) we identify the main weaknesses of the existing application-layer anti-DDoS solutions as proposed in research literature and in the industry, 2) we obtain a comprehensive picture of the current-day as well as the next-generation application-layer attack behaviour and 3) we propose novel techniques, based on a multidisciplinary approach that combines offline machine learning algorithms and statistical analysis, for detection of suspicious web visitors in static web domains. Then, in the second part of the thesis, we propose and evaluate a novel anti-DDoS system that detects a broad range of application-layer DDoS attacks, both in static and dynamic web domains, through the use of advanced techniques of data mining. The key advantage of our system relative to other systems that resort to the use of challenge-response tests (such as CAPTCHAs) in combating malicious bots is that our system minimizes the number of these tests that are presented to valid human visitors while succeeding in preventing most malicious attackers from accessing the web site. The results of the experimental evaluation of the proposed system demonstrate effective detection of current and future variants of application layer DDoS attacks
Recommended from our members
Data stream mining of event and complex event streams: a survey of existing and future technologies and applications in big data
A survey on machine learning for recurring concept drifting data streams
The problem of concept drift has gained a lot of attention in recent years. This aspect is key in many domains exhibiting non-stationary as well as cyclic patterns and structural breaks affecting their generative processes. In this survey, we review the relevant literature to deal with regime changes in the behaviour of continuous data streams. The study starts with a general introduction to the field of data stream learning, describing recent works on passive or active mechanisms to adapt or detect concept drifts, frequent challenges in this area, and related performance metrics. Then, different supervised and non-supervised approaches such as online ensembles, meta-learning and model-based clustering that can be used to deal with seasonalities in a data stream are covered. The aim is to point out new research trends and give future research directions on the usage of machine learning techniques for data streams which can help in the event of shifts and recurrences in continuous learning scenarios in near real-time
Exploiting Emergence of New Topics via Anamoly Detection: A Survey
Detecting and generating new concepts has attracted much attention in data mining era, nowadays. The emergence of new topics in news data is a big challenge. The problem can be extended as “finding breaking news”. Years ago the emergence of new stories were detected and followed up by domain experts. But manually reading stories and concluding the misbehaviors is a critical and time consuming task. Further mapping these misbehaviors to various stories needs excellent knowledge about the news and old concepts. So automatically modeling breaking news has much interest in data mining. The anomalies in news published in newspapers are the basic clues for concluding the emergence of a new story(s). The anomalies are the keywords or phrases which doesn’t match the whole concept of the news. These anomalies then processed and mapped to the stories where these keywords and phrases doesn’t behave as anomalies. After mapping these anomalies one can conclude that these mapped topic by anomaly linking can generate a new concept which eventually can be modeled as emerging story. We survey some techniques which can be used to efficiently model the new concept. News Classification, Anomaly Detection, Concept Detection and Generation are some of those techniques which collectively can be the basics of modeling breaking news. We further discussed some data sources which can process and used as input stories or news for modeling emergence of new stories
- …