10,780 research outputs found
Applying Machine Learning to Catalogue Matching in Astrophysics
We present the results of applying automated machine learning techniques to
the problem of matching different object catalogues in astrophysics. In this
study we take two partially matched catalogues where one of the two catalogues
has a large positional uncertainty. The two catalogues we used here were taken
from the HI Parkes All Sky Survey (HIPASS), and SuperCOSMOS optical survey.
Previous work had matched 44% (1887 objects) of HIPASS to the SuperCOSMOS
catalogue.
A supervised learning algorithm was then applied to construct a model of the
matched portion of our catalogue. Validation of the model shows that we
achieved a good classification performance (99.12% correct).
Applying this model, to the unmatched portion of the catalogue found 1209 new
matches. This increases the catalogue size from 1887 matched objects to 3096.
The combination of these procedures yields a catalogue that is 72% matched.Comment: 8 Pages, 5 Figure
Multiple Instance Learning: A Survey of Problem Characteristics and Applications
Multiple instance learning (MIL) is a form of weakly supervised learning
where training instances are arranged in sets, called bags, and a label is
provided for the entire bag. This formulation is gaining interest because it
naturally fits various problems and allows to leverage weakly labeled data.
Consequently, it has been used in diverse application fields such as computer
vision and document classification. However, learning from bags raises
important challenges that are unique to MIL. This paper provides a
comprehensive survey of the characteristics which define and differentiate the
types of MIL problems. Until now, these problem characteristics have not been
formally identified and described. As a result, the variations in performance
of MIL algorithms from one data set to another are difficult to explain. In
this paper, MIL problem characteristics are grouped into four broad categories:
the composition of the bags, the types of data distribution, the ambiguity of
instance labels, and the task to be performed. Methods specialized to address
each category are reviewed. Then, the extent to which these characteristics
manifest themselves in key MIL application areas are described. Finally,
experiments are conducted to compare the performance of 16 state-of-the-art MIL
methods on selected problem characteristics. This paper provides insight on how
the problem characteristics affect MIL algorithms, recommendations for future
benchmarking and promising avenues for research
- …