Search CORE

10,780 research outputs found

Applying Machine Learning to Catalogue Matching in Astrophysics

Author: Andreon
Bazell
Bertin
Bishop
Budavári
Cristianini
D. J. Rohde
Drinkwater
Fellegi
Guyon
Hambly
Hambly
Joachims
M. J. Drinkwater
M. R. Gallagher
M. T. Doyle
Meyer
Richards
Rohde
Schlkopf
Sutherland
T. Downs
Tagliaferri
Vapnik
Voisin
Wakamatsu
Zwaan
Publication venue: 'Wiley'
Publication date: 01/04/2005
Field of study

We present the results of applying automated machine learning techniques to the problem of matching different object catalogues in astrophysics. In this study we take two partially matched catalogues where one of the two catalogues has a large positional uncertainty. The two catalogues we used here were taken from the HI Parkes All Sky Survey (HIPASS), and SuperCOSMOS optical survey. Previous work had matched 44% (1887 objects) of HIPASS to the SuperCOSMOS catalogue. A supervised learning algorithm was then applied to construct a model of the matched portion of our catalogue. Validation of the model shows that we achieved a good classification performance (99.12% correct). Applying this model, to the unmatched portion of the catalogue found 1209 new matches. This increases the catalogue size from 1887 matched objects to 3096. The combination of these procedures yields a catalogue that is 72% matched.Comment: 8 Pages, 5 Figure

arXiv.org e-Print Archive

Crossref

CERN Document Server

Multiple Instance Learning: A Survey of Problem Characteristics and Applications

Author: Carbonneau Marc-André
Cheplygina Veronika
Gagnon Ghyslain
Granger Eric
Publication venue: 'Elsevier BV'
Publication date: 10/12/2016
Field of study

Multiple instance learning (MIL) is a form of weakly supervised learning where training instances are arranged in sets, called bags, and a label is provided for the entire bag. This formulation is gaining interest because it naturally fits various problems and allows to leverage weakly labeled data. Consequently, it has been used in diverse application fields such as computer vision and document classification. However, learning from bags raises important challenges that are unique to MIL. This paper provides a comprehensive survey of the characteristics which define and differentiate the types of MIL problems. Until now, these problem characteristics have not been formally identified and described. As a result, the variations in performance of MIL algorithms from one data set to another are difficult to explain. In this paper, MIL problem characteristics are grouped into four broad categories: the composition of the bags, the types of data distribution, the ambiguity of instance labels, and the task to be performed. Methods specialized to address each category are reviewed. Then, the extent to which these characteristics manifest themselves in key MIL application areas are described. Finally, experiments are conducted to compare the performance of 16 state-of-the-art MIL methods on selected problem characteristics. This paper provides insight on how the problem characteristics affect MIL algorithms, recommendations for future benchmarking and promising avenues for research

arXiv.org e-Print Archive