21,092 research outputs found
Graph ensemble boosting for imbalanced noisy graph stream classification
© 2014 IEEE. Many applications involve stream data with structural dependency, graph representations, and continuously increasing volumes. For these applications, it is very common that their class distributions are imbalanced with minority (or positive) samples being only a small portion of the population, which imposes significant challenges for learning models to accurately identify minority samples. This problem is further complicated with the presence of noise, because they are similar to minority samples and any treatment for the class imbalance may falsely focus on the noise and result in deterioration of accuracy. In this paper, we propose a classification model to tackle imbalanced graph streams with noise. Our method, graph ensemble boosting, employs an ensemble-based framework to partition graph stream into chunks each containing a number of noisy graphs with imbalanced class distributions. For each individual chunk, we propose a boosting algorithm to combine discriminative subgraph pattern selection and model learning as a unified framework for graph classification. To tackle concept drifting in graph streams, an instance level weighting mechanism is used to dynamically adjust the instance weight, through which the boosting framework can emphasize on difficult graph samples. The classifiers built from different graph chunks form an ensemble for graph stream classification. Experiments on real-life imbalanced graph streams demonstrate clear benefits of our boosting design for handling imbalanced noisy graph stream
Extracting News Events from Microblogs
Twitter stream has become a large source of information for many people, but
the magnitude of tweets and the noisy nature of its content have made
harvesting the knowledge from Twitter a challenging task for researchers for a
long time. Aiming at overcoming some of the main challenges of extracting the
hidden information from tweet streams, this work proposes a new approach for
real-time detection of news events from the Twitter stream. We divide our
approach into three steps. The first step is to use a neural network or deep
learning to detect news-relevant tweets from the stream. The second step is to
apply a novel streaming data clustering algorithm to the detected news tweets
to form news events. The third and final step is to rank the detected events
based on the size of the event clusters and growth speed of the tweet
frequencies. We evaluate the proposed system on a large, publicly available
corpus of annotated news events from Twitter. As part of the evaluation, we
compare our approach with a related state-of-the-art solution. Overall, our
experiments and user-based evaluation show that our approach on detecting
current (real) news events delivers a state-of-the-art performance
- …