    Action recognition using Randomised Ferns

    This paper presents a generic method for recognising and localising human actions in video based solely on the distribution of interest points. The use of local interest points has shown promising results in both object and action recognition. While previous methods classify actions based on the appearance and/or motion of these points, we hypothesise that the distribution of interest points alone contains the majority of the discriminatory information. Motivated by its recent success in rapidly detecting 2D interest points, the semi-naive Bayesian classification method of Randomised Ferns is employed. Given a set of interest points within the boundaries of an action, the generic classifier learns the spatial and temporal distributions of those interest points. This is done efficiently by comparing sums of responses of interest points detected within randomly positioned spatio-temporal blocks within the action boundaries. We present results on the largest and most popular human action dataset using a number of interest point detectors, and demostrate that the distribution of interest points alone can perform as well as approaches that rely upon the appearance of the interest points

    rFerns: An Implementation of the Random Ferns Method for General-Purpose Machine Learning

    In this paper I present an extended implementation of the Random ferns algorithm contained in the R package rFerns. It differs from the original by the ability of consuming categorical and numerical attributes instead of only binary ones. Also, instead of using simple attribute subspace ensemble it employs bagging and thus produce error approximation and variable importance measure modelled after Random forest algorithm. I also present benchmarks' results which show that although Random ferns' accuracy is mostly smaller than achieved by Random forest, its speed and good quality of importance measure it provides make rFerns a reasonable choice for a specific applications

    An谩lisis de la relaci贸n entre creatividad, atenci贸n y rendimiento escolar en ni帽os y ni帽as de m谩s de 9 a帽os en Colombia (Analysis of the relationship between creativity, attention and school performance in children and girls over 9 years in Colombia)

    Resumen:聽 Objetivo: El presente art铆culo de resultados de investigaci贸n se llev贸 a cabo en el marco de la neuropsicolog铆a aplicada a la educaci贸n. Objetivo: Analizar la relaci贸n entre la creatividad, atenci贸n visual y auditiva y el rendimiento escolar. M茅todo: Es un estudio de tipo cuantitativo, no experimental y Correlacional; la muestra estuvo conformada por 85 ni帽os y ni帽as escolarizados, con edad igual o mayor a 9 a帽os y que se encontraran cursando entre cuarto y sexto grado acad茅mico. La atenci贸n fue evaluada con las subpruebas del dominio de atenci贸n, de la bater铆a ENI, correspondientes a atenci贸n auditiva y visual, la creatividad a trav茅s del test CREA, y el rendimiento acad茅mico fue suministrado por la instituci贸n educativa por medio del bolet铆n de calificaciones. Resultados: No puntuaron una relaci贸n estad铆sticamente significativa entre la atenci贸n y el rendimiento acad茅mico, asimismo con la creatividad, difiriendo los hallazgos de Le贸n (2008), quien propone los procesos atencionales como predictores del rendimiento acad茅mico y de igual manera, los de Corbal谩n, Mart铆nez, Donolo, Alonso, Tejerina y Limi帽ana (2003), quienes manifiestan que la inteligencia creativa influye en el procesamiento de informaci贸n y aprendizaje y, como consecuencia, en el rendimiento; sin embargo, estos resultados pueden obedecer a diferentes variables asociadas al desempe帽o escolar. Conclusiones: El presente estudio puede ser una v铆a importante para el reconocimiento del valor de la atenci贸n en los procesos creativos. Adem谩s, sugiere la realizaci贸n de estudios m谩s amplios y centrados en la relaci贸n entre la creatividad y el rendimiento para esclarecer m谩s el tipo de relaci贸n que tienen. Abstract:聽 Objective:This research paper was conducted based on neuropsychology criteria applied to education.聽 This research aims to analyze the relationship between creativity, attention (visual and auditory attention) and school performance. Method: It is a quantitative, experimental and correlational study. For data collection, 85 school children, aged 9 years or more, fourth and sixth grade students were sampled. In order to assess auditory and visual attention, subtests related to attention domain, through ENI battery was used; also, creativity was assessed through CREA test, and the academic performance was based on scores reports supplied by the educational institution. Results: statistically, there were no a significant relationship between attention and academic performance, as well as creativity, differing in this way from Leon findings (2008), who proposes attention processes as academic performance predictors and also, from those ones such as: Corbal谩n, Martinez, Donolo, Alonso, Tejerina and Limi帽ana (2003), who consider that information and learning process are influenced by creative intelligence; however, these results may obey different variables associated with school performance. Conclusion: This study may be an important tool in order to recognize the value of the attention and creative processes. Also, it suggests broader studies focus on the relationship between creativity and academic performance

    Robust and efficient approach to feature selection with machine learning

    Most statistical analyses or modelling studies must deal with the discrepancy between the measured aspects of analysed phenomenona and their true nature. Hence, they are often preceded by a step of altering the data representation into somehow optimal for the following methods.This thesis deals with feature selection, a narrow yet important subset of representation altering methodologies.Feature selection is applied to an information system, i.e., data existing in a tabular form, as a group of objects characterised by values of some set of attributes (also called features or variables), and is defined as a process of finding a strict subset of them which fulfills some criterion.There are two essential classes of feature selection methods: minimal optimal, which aim to find the smallest subset of features that optimise accuracy of certain modelling methods, and all relevant, which aim to find the entire set of features potentially usable for modelling. The first class is mostly used in practice, as it adheres to a well known optimisation problem and has a direct connection to the final model performance. However, I argue that there exists a wide and significant class of applications in which only all relevant approaches may yield usable results, while minimal optimal methods are not only ineffective but even can lead to wrong conclusions.Moreover, all relevant class substantially overlaps with the set of actual research problems in which feature selection is an important result on its own, sometimes even more important than the finally resulting black-box model. In particular this applies to the p>>n problems, i.e., those for which the number of attributes is large and substantially exceeds the number of objects; for instance, such data is produced by high-throughput biological experiments which currently serve as the most powerful tool of molecular biology and a fundament of the arising individualised medicine.In the main part of the thesis I present Boruta, a heuristic, all relevant feature selection method. It is based on the concept of shadows, by-design random attributes incorporated into the information system as a reference for the relevance of original features in the context of whole structure of the analysed data. The variable importance on its own is assessed using the Random Forest method, a popular ensemble classifier.As the performance of the Boruta method turns out insatisfactory for some important applications, the following chapters of the thesis are devoted to Random Ferns, an ensemble classifier with the structure similar to Random Forest, but of a substantially higher computational efficiency. In the thesis, I propose a substantial generalisation of this method, capable of training on generic data and calculating feature importance scores.Finally, I assess both the Boruta method and its Random Ferns-based derivative on a series of p>>n problems of a biological origin. In particular, I focus on the stability of feature selection; I propose a novel methodology based on bootstrap and self-consistency. The results I obtain empirically confirm the validity of aforementioned effects characteristic to minimal optimal selection, as well as the efficiency of proposed heuristics for all relevant selection.The thesis is completed with a study of the applicability of Random Ferns in musical information retrieval, showing the usefulness of this method in other contexts and proposing its generalisation for multi-label classification problems.W wi臋kszo艣ci zagadnie艅 statystycznego modelowania istnieje problem niedostosowania zebranych danych do natury badanego zjawiska; co za tym idzie, analiza danych jest zazwyczaj poprzedzona zmian膮 ich surowej formy w optymaln膮 dla dalej stosowanych metod.W rozprawie zajmuj臋 si臋 selekcj膮 cech, jedn膮 z klas zabieg贸w zmiany formy danych. Dotyczy ona system贸w informacyjnych, czyli danych daj膮cych si臋 przedstawi膰 w formie tabelarycznej jako zbi贸r obiekt贸w opisanych przez warto艣ci zbioru atrybut贸w (nazywanych te偶 cechami), oraz jest zdefiniowana jako proces wydzielenia w jakim艣 sensie optymalnego podzbioru atrybut贸w.Wyr贸偶nia si臋 dwie zasadnicze grupy metod selekcji cech: poszukuj膮cych mo偶liwie ma艂ego podzbioru cech zapewniaj膮cego mo偶liwie dobr膮 dok艂adno艣膰 jakiej艣 metody modelowania (minimal optimal) oraz poszukuj膮cych podzbioru wszystkich cech, kt贸re nios膮 istotn膮 informacj臋 i przez to s膮 potencjalnie u偶yteczne dla jakiej艣 metody modelowania (all relevant). Tradycyjnie stosuje si臋 prawie wy艂膮cznie metody minimal optimal, sprowadzaj膮 si臋 one bowiem w prosty spos贸b do znanego problemu optymalizacji i maj膮 bezpo艣redni zwi膮zek z efektywno艣ci膮 finalnego modelu. W rozprawie argumentuj臋 jednak, 偶e istnieje szeroka i istotna klasa problem贸w, w kt贸rych tylko metody all relevant pozwalaj膮 uzyska膰 u偶yteczne wyniki, a metody minimal optimal s膮 nie tylko nieefektywne ale cz臋sto prowadz膮 do mylnych wniosk贸w. Co wi臋cej, wspomniana klasa pokrywa si臋 te偶 w du偶ej mierze ze zbiorem faktycznych problem贸w w kt贸rych selekcja cech jest sama w sobie u偶ytecznym wynikiem, nierzadko wa偶niejszym nawet od uzyskanego modelu. W szczeg贸lno艣ci chodzi tu o zbiory klasy p>>n, to jest takie w kt贸rych liczba atrybut贸w w~systemie informacyjnym jest du偶a i znacz膮co przekracza liczb臋 obiekt贸w; dane takie powszechnie wyst臋puj膮 chocia偶by w wysokoprzepustowych badaniach biologicznych, b臋d膮cych obecnie najpot臋偶niejszym narz臋dziem analitycznym biologii molekularnej jak i fundamentem rodz膮cej si臋 zindywidualizowanej medycyny.W zasadniczej cz臋艣ci rozprawy prezentuj臋 metod臋 Boruta, heurystyczn膮 metod臋 selekcji zmiennych. Jest ona oparta o koncepcj臋 rozszerzania systemu informacyjnego o cienie, z definicji nieistotne atrybuty wytworzone z oryginalnych cech przez losow膮 permutacj臋 warto艣ci, kt贸re s膮 wykorzystywane jako odniesienie dla oceny istotno艣ci oryginalnych atrybut贸w w kontek艣cie pe艂nej struktury analizowanych danych. Do oceny wa偶no艣ci cech metoda wykorzystuje algorytm lasu losowego (Random Forest), popularny klasyfikator zespo艂owy.Poniewa偶 wydajno艣膰 obliczeniowa metody Boruta mo偶e by膰 niewystarczaj膮ca dla pewnych istotnych zastosowa艅, w dalszej cz臋艣ci rozprawy zajmuj臋 si臋 algorytmem paproci losowych, klasyfikatorem zespo艂owym zbli偶onym struktur膮 do algorytmu lasu losowego, lecz oferuj膮cym znacz膮co lepsz膮 wydajno艣膰 obliczeniow膮. Proponuj臋 uog贸lnienie tej metody, zdolne do treningu na generycznych systemach informacyjnych oraz do obliczania miary wa偶no艣ci atrybut贸w.Zar贸wno metod臋 Boruta jak i jej modyfikacj臋 wykorzystuj膮c膮 paprocie losowe poddaj臋 w rozprawie wyczerpuj膮cej analizie na szeregu zbior贸w klasy p>>n pochodzenia biologicznego. W szczeg贸lno艣ci rozwa偶am tu stabilno艣膰 selekcji; w tym celu formu艂uj臋 now膮 metod臋 oceny opart膮 o podej艣cie resamplingowe i samozgodno艣膰 wynik贸w. Wyniki przeprowadzonych eksperyment贸w potwierdzaj膮 empirycznie zasadno艣膰 wspomnianych wcze艣niej problem贸w zwi膮zanych z selekcj膮 minimal optimal, jak r贸wnie偶 zasadno艣膰 przyj臋tych heurystyk dla selekcji all relevant.Rozpraw臋 dope艂nia studium stosowalno艣ci algorytmu paproci losowych w problemie rozpoznawania instrument贸w muzycznych w nagraniach, ilustruj膮ce przydatno艣膰 tej metody w innych kontekstach i proponuj膮ce jej uog贸lnienie na klasyfikacj臋 wieloetykietow膮