2,648 research outputs found
On Classification with Bags, Groups and Sets
Many classification problems can be difficult to formulate directly in terms
of the traditional supervised setting, where both training and test samples are
individual feature vectors. There are cases in which samples are better
described by sets of feature vectors, that labels are only available for sets
rather than individual samples, or, if individual labels are available, that
these are not independent. To better deal with such problems, several
extensions of supervised learning have been proposed, where either training
and/or test objects are sets of feature vectors. However, having been proposed
rather independently of each other, their mutual similarities and differences
have hitherto not been mapped out. In this work, we provide an overview of such
learning scenarios, propose a taxonomy to illustrate the relationships between
them, and discuss directions for further research in these areas
MGTAB: A Multi-Relational Graph-Based Twitter Account Detection Benchmark
The development of social media user stance detection and bot detection
methods rely heavily on large-scale and high-quality benchmarks. However, in
addition to low annotation quality, existing benchmarks generally have
incomplete user relationships, suppressing graph-based account detection
research. To address these issues, we propose a Multi-Relational Graph-Based
Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based
benchmark for account detection. To our knowledge, MGTAB was built based on the
largest original data in the field, with over 1.55 million users and 130
million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of
relationships, ensuring high-quality annotation and diversified relations. In
MGTAB, we extracted the 20 user property features with the greatest information
gain and user tweet features as the user features. In addition, we performed a
thorough evaluation of MGTAB and other public datasets. Our experiments found
that graph-based approaches are generally more effective than feature-based
approaches and perform better when introducing multiple relations. By analyzing
experiment results, we identify effective approaches for account detection and
provide potential future research directions in this field. Our benchmark and
standardized evaluation procedures are freely available at:
https://github.com/GraphDetec/MGTAB.Comment: 14 pages, 7 figure
Multiple Instance Learning: A Survey of Problem Characteristics and Applications
Multiple instance learning (MIL) is a form of weakly supervised learning
where training instances are arranged in sets, called bags, and a label is
provided for the entire bag. This formulation is gaining interest because it
naturally fits various problems and allows to leverage weakly labeled data.
Consequently, it has been used in diverse application fields such as computer
vision and document classification. However, learning from bags raises
important challenges that are unique to MIL. This paper provides a
comprehensive survey of the characteristics which define and differentiate the
types of MIL problems. Until now, these problem characteristics have not been
formally identified and described. As a result, the variations in performance
of MIL algorithms from one data set to another are difficult to explain. In
this paper, MIL problem characteristics are grouped into four broad categories:
the composition of the bags, the types of data distribution, the ambiguity of
instance labels, and the task to be performed. Methods specialized to address
each category are reviewed. Then, the extent to which these characteristics
manifest themselves in key MIL application areas are described. Finally,
experiments are conducted to compare the performance of 16 state-of-the-art MIL
methods on selected problem characteristics. This paper provides insight on how
the problem characteristics affect MIL algorithms, recommendations for future
benchmarking and promising avenues for research
- …