    Correcting the Hub Occurrence Prediction Bias in Many Dimensions

    Data reduction is a common pre-processing step for k-nearest neighbor classification (kNN). The existing prototype selection methods implement different criteria for selecting relevant points to use in classification, which constitutes a selection bias. This study examines the nature of the instance selection bias in intrinsically high-dimensional data. In high-dimensional feature spaces, hubs are known to emerge as centers of influence in kNN classification. These points dominate most kNN sets and are often detrimental to classification performance. Our experiments reveal that different instance selection strategies bias the predictions of the behavior of hub-points in high-dimensional data in different ways. We propose to introduce an intermediate un-biasing step when training the neighbor occurrence models and we demonstrate promising improvements in various hubness-aware classification methods, on a wide selection of high-dimensional synthetic and real-world datasets

    Machine learning techniques for music information retrieval

    Tese de doutoramento, Informática (Engenharia Informática), Universidade de Lisboa, Faculdade de Ciências, 2015The advent of digital music has changed the rules of music consumption, distribution and sales. With it has emerged the need to effectively search and manage vast music collections. Music information retrieval is an interdisciplinary field of research that focuses on the development of new techniques with that aim in mind. This dissertation addresses a specific aspect of this field: methods that automatically extract musical information exclusively based on the audio signal. We propose a method for automatic music-based classification, label inference, and music similarity estimation. Our method consist in representing the audio with a finite set of symbols and then modeling the symbols time evolution. The symbols are obtained via vector quantization in which a single codebook is used to quantize the audio descriptors. The symbols time evolution is modeled via a first order Markov process. Based on systematic evaluations we carried out on publicly available sets, we show that our method achieves performances on par with most techniques found in literature. We also present and discuss the problems that appear when computers try to classify or annotate songs using the audio as the only source of information. In our method, the separation of quantization process from the creation and training of classification models helped us in that analysis. It enabled us to examine how instantaneous sound attributes (henceforth features) are distributed in term of musical genre, and how designing codebooks specially tailored for these distributions affects the performance of ours and other classification systems commonly used for this task. On this issue, we show that there is no apparent benefit in seeking a thorough representation of the feature space. This is a bit unexpected since it goes against the assumption that features carry equally relevant information loads and somehow capture the specificities of musical facets, implicit in many genre recognition methods. Label inference is the task of automatically annotating songs with semantic words - this tasks is also known as autotagging. In this context, we illustrate the importance of a number of issues, that in our perspective, are often overlooked. We show that current techniques are fragile in the sense that small alterations in the set of labels may lead to dramatically different results. Furthermore, through a series of experiments, we show that autotagging systems fail to learn tag models capable to generalize to datasets of different origins. We also show that the performance achieved with these techniques is not sufficient to be able to take advantage of the correlations between tags.Fundação para a Ciência e a Tecnologia (FCT

    Aproksimativni algoritmi za generisanje k-NN grafa

    Nearest neighbor graphs are modeling proximity relationships between objects. They are widely used in many areas, primarily in machine learning, but also in information retrieval, biology, computer graphics,geographic information systems, etc. The focus of this thesis are knearest neighbor graphs (k-NNG), a special class of nearest neighbor graphs. Each node of k-NNG is connected with directed edges to its k nearest neighbors.A brute-force method for constructing k-NNG entails O(n 2 ) distance calculations. This thesis addresses the problem of more efficient k-NNG construction, achieved by approximation algorithms. The main challenge of an approximation algorithm for k-NNG construction is to decrease the number of distance calculations, while maximizing the approximation’s accuracy.NN-Descent is one such approximation algorithm for k-NNG construction, which reports excellent results in many cases. However, it does not perform well on high-dimensional data. The first part of this thesis summarizes the problem, and gives explanations for such a behavior. The second part introduces five new NN-Descent variants that aim to improve NN-Descent on high-dimensional data. The performance of the  proposed algorithms is evaluated with an experimental analysis.Finally, the third part of this thesis is dedicated to k-NNG update algorithms. Namely, in the real world scenarios data often change over time. If data change after k-NNG construction, the graph needs to be updated accordingly. Therefore, in this part of the thesis, two approximation algorithms for k-NNG updates are proposed. They are validated with extensive experiments on time series data.Graf najbližih suseda modeluje veze između objekata koji su međusobno bliski. Ovi grafovi se koriste u mnogim disciplinama, pre svega u mašinskom učenju, a potom i u pretraživanju informacija, biologiji, računarskoj grafici, geografskim informacionim sistemima, itd. Fokus ove teze je graf k najbližih suseda (k-NN graf), koji predstavlja posebnu klasu grafova najbližih suseda. Svaki čvor k-NN grafa je povezan usmerenim granama sa njegovih k najbližih suseda.Metod grube sile za generisanje k-NN grafova podrazumeva O(n 2 ) računanja razdaljina između dve tačke. Ova teza se bavi  problemom efikasnijeg generisanja k-NN grafova, korišćenjem aproksimativnih  algoritama.Glavni cilj aprokismativnih algoritama za generisanje k-NN grafova jeste smanjivanje ukupnog broja računanja razdaljina između dve tačke, uz održavanje visoke tačnosti krajnje aproksimacije.NN-Descent je jedan takav aproksimativni algoritam za generisanje k-NN grafova. Iako se pokazao kao veoma dobar u većini slučajeva, ovaj algoritam ne daje dobre rezultate nad visokodimenzionalnim podacima. Unutar prvog dela teze, detaljno je opisana suština problema i objašnjeni su razlozi za njegovo nastajaneU drugom delu predstavljeno je pet različitih modifikacija NN-Descent algoritma, koje za cilj imaju njegovo poboljšavanje pri radu nad visokodimenzionalnim podacima. Evaluacija ovih algoritama je data kroz eksperimentalnu analizu.Treći deo teze se bavi algoritmima za ažuriranje k-NN grafova. Naime,podaci se vrlo često menjaju  vremenom. Ukoliko se izmene podaci nad kojima je prethodno generisan k-NN graf, potrebno je graf ažurirati u skladu sa izmenama. U okviru ovog dela teze predložena su dva aproksimativna algoritma za ažuriranje k-NN grafova. Ovi algoritmi su evaluirani opširnim eksperimentima nad vremenskim serijama

    Webly Supervised Semantic Embeddings for Large Scale Zero-Shot Learning

    Zero-shot learning (ZSL) makes object recognition in images possible in absence of visual training data for a part of the classes from a dataset. When the number of classes is large, classes are usually represented by semantic class prototypes learned automatically from unannotated text collections. This typically leads to much lower performances than with manually designed semantic prototypes such as attributes. While most ZSL works focus on the visual aspect and reuse standard semantic prototypes learned from generic text collections, we focus on the problem of semantic class prototype design for large scale ZSL. More specifically, we investigate the use of noisy textual metadata associated to photos as text collections, as we hypothesize they are likely to provide more plausible semantic embeddings for visual classes if exploited appropriately. We thus make use of a source-based voting strategy to improve the robustness of semantic prototypes. Evaluation on the large scale ImageNet dataset shows a significant improvement in ZSL performances over two strong baselines, and over usual semantic embeddings used in previous works. We show that this improvement is obtained for several embedding methods, leading to state of the art results when one uses automatically created visual and text features

    Improving k-nn search and subspace clustering based on local intrinsic dimensionality

    In several novel applications such as multimedia and recommender systems, data is often represented as object feature vectors in high-dimensional spaces. The high-dimensional data is always a challenge for state-of-the-art algorithms, because of the so-called curse of dimensionality . As the dimensionality increases, the discriminative ability of similarity measures diminishes to the point where many data analysis algorithms, such as similarity search and clustering, that depend on them lose their effectiveness. One way to handle this challenge is by selecting the most important features, which is essential for providing compact object representations as well as improving the overall search and clustering performance. Having compact feature vectors can further reduce the storage space and the computational complexity of search and learning tasks. Support-Weighted Intrinsic Dimensionality (support-weighted ID) is a new promising feature selection criterion that estimates the contribution of each feature to the overall intrinsic dimensionality. Support-weighted ID identifies relevant features locally for each object, and penalizes those features that have locally lower discriminative power as well as higher density. In fact, support-weighted ID measures the ability of each feature to locally discriminate between objects in the dataset. Based on support-weighted ID, this dissertation introduces three main research contributions: First, this dissertation proposes NNWID-Descent, a similarity graph construction method that utilizes the support-weighted ID criterion to identify and retain relevant features locally for each object and enhance the overall graph quality. Second, with the aim to improve the accuracy and performance of cluster analysis, this dissertation introduces k-LIDoids, a subspace clustering algorithm that extends the utility of support-weighted ID within a clustering framework in order to gradually select the subset of informative and important features per cluster. k-LIDoids is able to construct clusters together with finding a low dimensional subspace for each cluster. Finally, using the compact object and cluster representations from NNWID-Descent and k-LIDoids, this dissertation defines LID-Fingerprint, a new binary fingerprinting and multi-level indexing framework for the high-dimensional data. LID-Fingerprint can be used for hiding the information as a way of preventing passive adversaries as well as providing an efficient and secure similarity search and retrieval for the data stored on the cloud. When compared to other state-of-the-art algorithms, the good practical performance provides an evidence for the effectiveness of the proposed algorithms for the data in high-dimensional spaces

    Multimedia Forensics

    This book is open access. Media forensics has never been more relevant to societal life. Not only media content represents an ever-increasing share of the data traveling on the net and the preferred communications means for most users, it has also become integral part of most innovative applications in the digital information ecosystem that serves various sectors of society, from the entertainment, to journalism, to politics. Undoubtedly, the advances in deep learning and computational imaging contributed significantly to this outcome. The underlying technologies that drive this trend, however, also pose a profound challenge in establishing trust in what we see, hear, and read, and make media content the preferred target of malicious attacks. In this new threat landscape powered by innovative imaging technologies and sophisticated tools, based on autoencoders and generative adversarial networks, this book fills an important gap. It presents a comprehensive review of state-of-the-art forensics capabilities that relate to media attribution, integrity and authenticity verification, and counter forensics. Its content is developed to provide practitioners, researchers, photo and video enthusiasts, and students a holistic view of the field

    Statistical distribution of common audio features : encounters in a heavy-tailed universe

    In the last few years some Music Information Retrieval (MIR) researchers have spotted important drawbacks in applying standard successful-in-monophonic algorithms to polyphonic music classification and similarity assessment. Noticeably, these so called “Bag-of-Frames” (BoF) algorithms share a common set of assumptions. These assumptions are substantiated in the belief that the numerical descriptions extracted from short-time audio excerpts (or frames) are enough to capture relevant information for the task at hand, that these frame-based audio descriptors are time independent, and that descriptor frames are well described by Gaussian statistics. Thus, if we want to improve current BoF algorithms we could: i) improve current audio descriptors, ii) include temporal information within algorithms working with polyphonic music, and iii) study and characterize the real statistical properties of these frame-based audio descriptors. From a literature review, we have detected that many works focus on the first two improvements, but surprisingly, there is a lack of research in the third one. Therefore, in this thesis we analyze and characterize the statistical distribution of common audio descriptors of timbre, tonal and loudness information. Contrary to what is usually assumed, our work shows that the studied descriptors are heavy-tailed distributed and thus, they do not belong to a Gaussian universe. This new knowledge led us to propose new algorithms that show improvements over the BoF approach in current MIR tasks such as genre classification, instrument detection, and automatic tagging of music. Furthermore, we also address new MIR tasks such as measuring the temporal evolution of Western popular music. Finally, we highlight some promising paths for future audio-content MIR research that will inhabit a heavy-tailed universe.En el campo de la extracción de información musical o Music Information Retrieval (MIR), los algoritmos llamados Bag-of-Frames (BoF) han sido aplicados con éxito en la clasificación y evaluación de similitud de señales de audio monofónicas. Por otra parte, investigaciones recientes han señalado problemas importantes a la hora de aplicar dichos algoritmos a señales de música polifónica. Estos algoritmos suponen que las descripciones numéricas extraídas de los fragmentos de audio de corta duración (o frames ) son capaces de capturar la información necesaria para la realización de las tareas planteadas, que el orden temporal de estos fragmentos de audio es irrelevante y que las descripciones extraídas de los segmentos de audio pueden ser correctamente descritas usando estadísticas Gaussianas. Por lo tanto, si se pretende mejorar los algoritmos BoF actuales se podría intentar: i) mejorar los descriptores de audio, ii) incluir información temporal en los algoritmos que trabajan con música polifónica y iii) estudiar y caracterizar las propiedades estadísticas reales de los descriptores de audio. La bibliografía actual sobre el tema refleja la existencia de un número considerable de trabajos centrados en las dos primeras opciones de mejora, pero sorprendentemente, hay una carencia de trabajos de investigación focalizados en la tercera opción. Por lo tanto, esta tesis se centra en el análisis y caracterización de la distribución estadística de descriptores de audio comúnmente utilizados para representar información tímbrica, tonal y de volumen. Al contrario de lo que se asume habitualmente, nuestro trabajo muestra que los descriptores de audio estudiados se distribuyen de acuerdo a una distribución de “cola pesada” y por lo tanto no pertenecen a un universo Gaussiano. Este descubrimiento nos permite proponer nuevos algoritmos que evidencian mejoras importantes sobre los algoritmos BoF actualmente utilizados en diversas tareas de MIR tales como clasificación de género, detección de instrumentos musicales y etiquetado automático de música. También nos permite proponer nuevas tareas tales como la medición de la evolución temporal de la música popular occidental. Finalmente, presentamos algunas prometedoras líneas de investigación para tareas de MIR ubicadas, a partir de ahora, en un universo de “cola pesada”.En l’àmbit de la extracció de la informació musical o Music Information Retrieval (MIR), els algorismes anomenats Bag-of-Frames (BoF) han estat aplicats amb èxit en la classificació i avaluació de similitud entre senyals monofòniques. D’altra banda, investigacions recents han assenyalat importants inconvenients a l’hora d’aplicar aquests mateixos algorismes en senyals de música polifònica. Aquests algorismes BoF suposen que les descripcions numèriques extretes dels fragments d’àudio de curta durada (frames) son suficients per capturar la informació rellevant per als algorismes, que els descriptors basats en els fragments son independents del temps i que l’estadística Gaussiana descriu correctament aquests descriptors. Per a millorar els algorismes BoF actuals doncs, es poden i) millorar els descriptors, ii) incorporar informació temporal dins els algorismes que treballen amb música polifònica i iii) estudiar i caracteritzar les propietats estadístiques reals d’aquests descriptors basats en fragments d’àudio. Sorprenentment, de la revisió bibliogràfica es desprèn que la majoria d’investigacions s’han centrat en els dos primers punts de millora mentre que hi ha una mancança quant a la recerca en l’àmbit del tercer punt. És per això que en aquesta tesi, s’analitza i caracteritza la distribució estadística dels descriptors més comuns de timbre, to i volum. El nostre treball mostra que contràriament al què s’assumeix, els descriptors no pertanyen a l’univers Gaussià sinó que es distribueixen segons una distribució de “cua pesada”. Aquest descobriment ens permet proposar nous algorismes que evidencien millores importants sobre els algorismes BoF utilitzats actualment en diferents tasques com la classificació del gènere, la detecció d’instruments musicals i l’etiquetatge automàtic de música. Ens permet també proposar noves tasques com la mesura de l’evolució temporal de la música popular occidental. Finalment, presentem algunes prometedores línies d’investigació per a tasques de MIR ubicades a partir d’ara en un univers de “cua pesada”