376,327 research outputs found
A GRAPH-BASED APPROACH TO MODEL MANAGEMENT
A graph-based framework for model management system design is proposed in this paper. The framework applies graph theory to the development of a knowledge-based model management system, which has the capability of integrating existing models in the model base to support ad hoc decision making. In other words, models in the model base are not only stand-alone models but also building blocks for creating integrated models. This guarantees effective utilization of developed models and promises future development of an automated modeling system. In the framework, nodes and edges are used to represent sets of data attributes and sets of functions for converting a set of data from one format to another respectively. A basic model is defined as a combination of two nodes, one input node and one output node, and an edge connecting the two nodes. A model graph, which is composed of basic models, is a graph representing all possible alternatives for producing the requested information. Each path in a model graph is a model for producing the information. If the path includes more than one basic model, it represents an integrated model. Based on the graphical representation. an inference mechanism for model integration and strategies for model selection are presented
Adaptive image retrieval using a graph model for semantic feature integration
The variety of features available to represent multimedia data constitutes a rich pool of information. However, the plethora of data poses a challenge in terms of feature selection and integration for effective retrieval. Moreover, to further improve effectiveness, the
retrieval model should ideally incorporate context-dependent feature representations to allow for retrieval on a higher semantic level. In this paper we present a retrieval model and learning framework for the purpose of interactive information retrieval. We describe
how semantic relations between multimedia objects based on user interaction can be learnt and then integrated with visual and textual features into a unified framework. The framework models both feature similarities and semantic relations in a single graph. Querying in this model is implemented using the theory of random walks. In addition, we present ideas to implement short-term learning from relevance feedback. Systematic experimental results validate the effectiveness of the proposed approach for image retrieval. However, the model is not restricted to the image domain and could easily be employed for retrieving multimedia data (and even a combination of different domains, eg images, audio and text documents)
When Social Influence Meets Item Inference
Research issues and data mining techniques for product recommendation and
viral marketing have been widely studied. Existing works on seed selection in
social networks do not take into account the effect of product recommendations
in e-commerce stores. In this paper, we investigate the seed selection problem
for viral marketing that considers both effects of social influence and item
inference (for product recommendation). We develop a new model, Social Item
Graph (SIG), that captures both effects in form of hyperedges. Accordingly, we
formulate a seed selection problem, called Social Item Maximization Problem
(SIMP), and prove the hardness of SIMP. We design an efficient algorithm with
performance guarantee, called Hyperedge-Aware Greedy (HAG), for SIMP and
develop a new index structure, called SIG-index, to accelerate the computation
of diffusion process in HAG. Moreover, to construct realistic SIG models for
SIMP, we develop a statistical inference based framework to learn the weights
of hyperedges from data. Finally, we perform a comprehensive evaluation on our
proposals with various baselines. Experimental result validates our ideas and
demonstrates the effectiveness and efficiency of the proposed model and
algorithms over baselines.Comment: 12 page
Towards a framework for designing full model selection and optimization systems
People from a variety of industrial domains are beginning to realise that appropriate use of machine learning techniques for their data mining projects could bring great benefits. End-users now have to face the new problem of how to choose a combination of data processing tools and algorithms for a given dataset. This problem is usually termed the Full Model Selection (FMS) problem. Extended from our previous work [10], in this paper, we introduce a framework for designing FMS algorithms. Under this framework, we propose a novel algorithm combining both genetic algorithms (GA) and particle swarm optimization (PSO) named GPS (which stands for GA-PSO-FMS), in which a GA is used for searching the optimal structure for a data mining solution, and PSO is used for searching optimal parameters for a particular structure instance. Given a classification dataset, GPS outputs a FMS solution as a directed acyclic graph consisting of diverse data mining operators that are available to the problem. Experimental results demonstrate the benefit of the algorithm. We also present, with detailed analysis, two model-tree-based variants for speeding up the GPS algorithm
Graph ensemble boosting for imbalanced noisy graph stream classification
© 2014 IEEE. Many applications involve stream data with structural dependency, graph representations, and continuously increasing volumes. For these applications, it is very common that their class distributions are imbalanced with minority (or positive) samples being only a small portion of the population, which imposes significant challenges for learning models to accurately identify minority samples. This problem is further complicated with the presence of noise, because they are similar to minority samples and any treatment for the class imbalance may falsely focus on the noise and result in deterioration of accuracy. In this paper, we propose a classification model to tackle imbalanced graph streams with noise. Our method, graph ensemble boosting, employs an ensemble-based framework to partition graph stream into chunks each containing a number of noisy graphs with imbalanced class distributions. For each individual chunk, we propose a boosting algorithm to combine discriminative subgraph pattern selection and model learning as a unified framework for graph classification. To tackle concept drifting in graph streams, an instance level weighting mechanism is used to dynamically adjust the instance weight, through which the boosting framework can emphasize on difficult graph samples. The classifiers built from different graph chunks form an ensemble for graph stream classification. Experiments on real-life imbalanced graph streams demonstrate clear benefits of our boosting design for handling imbalanced noisy graph stream
STREAM-EVOLVING BOT DETECTION FRAMEWORK USING GRAPH-BASED AND FEATURE-BASED APPROACHES FOR IDENTIFYING SOCIAL BOTS ON TWITTER
This dissertation focuses on the problem of evolving social bots in online social networks, particularly Twitter. Such accounts spread misinformation and inflate social network content to mislead the masses. The main objective of this dissertation is to propose a stream-based evolving bot detection framework (SEBD), which was constructed using both graph- and feature-based models. It was built using Python, a real-time streaming engine (Apache Kafka version 3.2), and our pretrained model (bot multi-view graph attention network (Bot-MGAT)). The feature-based model was used to identify predictive features for bot detection and evaluate the SEBD predictions. The graph-based model was used to facilitate multiview graph attention networks (GATs) with fellowship links to build our framework for predicting account labels from streams. A probably approximately correct learning framework was applied to confirm the accuracy and confidence levels of SEBD.The results showed that the SEBD can effectively identify bots from streams and profile features are sufficient for detecting social bots. The pretrained Bot-MGAT model uses fellowship links to reveal hidden information that can aid in identifying bot accounts. The significant contributions of this study are the development of a stream based bot detection framework for detecting social bots based on a given hashtag and the proposal of a hybrid approach for feature selection to identify predictive features for identifying bot accounts. Our findings indicate that Twitter has a higher percentage of active bots than humans in hashtags. The results indicated that stream-based detection is more effective than offline detection by achieving accuracy score 96.9%. Finally, semi supervised learning (SSL) can solve the issue of labeled data in bot detection tasks
On the Feature Discovery for App Usage Prediction in Smartphones
With the increasing number of mobile Apps developed, they are now closely
integrated into daily life. In this paper, we develop a framework to predict
mobile Apps that are most likely to be used regarding the current device status
of a smartphone. Such an Apps usage prediction framework is a crucial
prerequisite for fast App launching, intelligent user experience, and power
management of smartphones. By analyzing real App usage log data, we discover
two kinds of features: The Explicit Feature (EF) from sensing readings of
built-in sensors, and the Implicit Feature (IF) from App usage relations. The
IF feature is derived by constructing the proposed App Usage Graph (abbreviated
as AUG) that models App usage transitions. In light of AUG, we are able to
discover usage relations among Apps. Since users may have different usage
behaviors on their smartphones, we further propose one personalized feature
selection algorithm. We explore minimum description length (MDL) from the
training data and select those features which need less length to describe the
training data. The personalized feature selection can successfully reduce the
log size and the prediction time. Finally, we adopt the kNN classification
model to predict Apps usage. Note that through the features selected by the
proposed personalized feature selection algorithm, we only need to keep these
features, which in turn reduces the prediction time and avoids the curse of
dimensionality when using the kNN classifier. We conduct a comprehensive
experimental study based on a real mobile App usage dataset. The results
demonstrate the effectiveness of the proposed framework and show the predictive
capability for App usage prediction.Comment: 10 pages, 17 figures, ICDM 2013 short pape
On the Optimal Recovery of Graph Signals
Learning a smooth graph signal from partially observed data is a well-studied
task in graph-based machine learning. We consider this task from the
perspective of optimal recovery, a mathematical framework for learning a
function from observational data that adopts a worst-case perspective tied to
model assumptions on the function to be learned. Earlier work in the optimal
recovery literature has shown that minimizing a regularized objective produces
optimal solutions for a general class of problems, but did not fully identify
the regularization parameter. Our main contribution provides a way to compute
regularization parameters that are optimal or near-optimal (depending on the
setting), specifically for graph signal processing problems. Our results offer
a new interpretation for classical optimization techniques in graph-based
learning and also come with new insights for hyperparameter selection. We
illustrate the potential of our methods in numerical experiments on several
semi-synthetic graph signal processing datasets.Comment: This paper has been accepted by 14th International conference on
Sampling Theory and Applications (SampTA 2023
- …