Phylogenetic approaches are finding more and more applications outside the
field of biology. Astrophysics is no exception since an overwhelming amount of
multivariate data has appeared in the last twenty years or so. In particular,
the diversification of galaxies throughout the evolution of the Universe quite
naturally invokes phylogenetic approaches. We have demonstrated that Maximum
Parsimony brings useful astrophysical results, and we now proceed toward the
analyses of large datasets for galaxies. In this talk I present how we solve
the major difficulties for this goal: the choice of the parameters, their
discretization, and the analysis of a high number of objects with an
unsupervised NP-hard classification technique like cladistics. 1. Introduction
How do the galaxy form, and when? How did the galaxy evolve and transform
themselves to create the diversity we observe? What are the progenitors to
present-day galaxies? To answer these big questions, observations throughout
the Universe and the physical modelisation are obvious tools. But between
these, there is a key process, without which it would be impossible to extract
some digestible information from the complexity of these systems. This is
classification. One century ago, galaxies were discovered by Hubble. From
images obtained in the visible range of wavelengths, he synthetised his
observations through the usual process: classification. With only one parameter
(the shape) that is qualitative and determined with the eye, he found four
categories: ellipticals, spirals, barred spirals and irregulars. This is the
famous Hubble classification. He later hypothetized relationships between these
classes, building the Hubble Tuning Fork. The Hubble classification has been
refined, notably by de Vaucouleurs, and is still used as the only global
classification of galaxies. Even though the physical relationships proposed by
Hubble are not retained any more, the Hubble Tuning Fork is nearly always used
to represent the classification of the galaxy diversity under its new name the
Hubble sequence (e.g. Delgado-Serrano, 2012). Its success is impressive and can
be understood by its simplicity, even its beauty, and by the many correlations
found between the morphology of galaxies and their other properties. And one
must admit that there is no alternative up to now, even though both the Hubble
classification and diagram have been recognised to be unsatisfactory. Among the
most obvious flaws of this classification, one must mention its monovariate,
qualitative, subjective and old-fashioned nature, as well as the difficulty to
characterise the morphology of distant galaxies. The first two most significant
multivariate studies were by Watanabe et al. (1985) and Whitmore (1984). Since
the year 2005, the number of studies attempting to go beyond the Hubble
classification has increased largely. Why, despite of this, the Hubble
classification and its sequence are still alive and no alternative have yet
emerged (Sandage, 2005)? My feeling is that the results of the multivariate
analyses are not easily integrated into a one-century old practice of modeling
the observations. In addition, extragalactic objects like galaxies, stellar
clusters or stars do evolve. Astronomy now provides data on very distant
objects, raising the question of the relationships between those and our
present day nearby galaxies. Clearly, this is a phylogenetic problem.
Astrocladistics 1 aims at exploring the use of phylogenetic tools in
astrophysics (Fraix-Burnet et al., 2006a,b). We have proved that Maximum
Parsimony (or cladistics) can be applied in astrophysics and provides a new
exploration tool of the data (Fraix-Burnet et al., 2009, 2012, Cardone \&
Fraix-Burnet, 2013). As far as the classification of galaxies is concerned, a
larger number of objects must now be analysed. In this paper, IComment: Proceedings of the 60th World Statistics Congress of the
International Statistical Institute, ISI2015, Jul 2015, Rio de Janeiro,
Brazi