31,004 research outputs found

    Aproksimativni algoritmi za generisanje k-NN grafa

    Get PDF
    Nearest neighbor graphs are modeling proximity relationships between objects. They are widely used in many areas, primarily in machine learning, but also in information retrieval, biology, computer graphics,geographic information systems, etc. The focus of this thesis are knearest neighbor graphs (k-NNG), a special class of nearest neighbor graphs. Each node of k-NNG is connected with directed edges to its k nearest neighbors.A brute-force method for constructing k-NNG entails O(n 2 ) distance calculations. This thesis addresses the problem of more efficient k-NNG construction, achieved by approximation algorithms. The main challenge of an approximation algorithm for k-NNG construction is to decrease the number of distance calculations, while maximizing the approximation’s accuracy.NN-Descent is one such approximation algorithm for k-NNG construction, which reports excellent results in many cases. However, it does not perform well on high-dimensional data. The first part of this thesis summarizes the problem, and gives explanations for such a behavior. The second part introduces five new NN-Descent variants that aim to improve NN-Descent on high-dimensional data. The performance of the  proposed algorithms is evaluated with an experimental analysis.Finally, the third part of this thesis is dedicated to k-NNG update algorithms. Namely, in the real world scenarios data often change over time. If data change after k-NNG construction, the graph needs to be updated accordingly. Therefore, in this part of the thesis, two approximation algorithms for k-NNG updates are proposed. They are validated with extensive experiments on time series data.Graf najbližih suseda modeluje veze između objekata koji su međusobno bliski. Ovi grafovi se koriste u mnogim disciplinama, pre svega u mašinskom učenju, a potom i u pretraživanju informacija, biologiji, računarskoj grafici, geografskim informacionim sistemima, itd. Fokus ove teze je graf k najbližih suseda (k-NN graf), koji predstavlja posebnu klasu grafova najbližih suseda. Svaki čvor k-NN grafa je povezan usmerenim granama sa njegovih k najbližih suseda.Metod grube sile za generisanje k-NN grafova podrazumeva O(n 2 ) računanja razdaljina između dve tačke. Ova teza se bavi  problemom efikasnijeg generisanja k-NN grafova, korišćenjem aproksimativnih  algoritama.Glavni cilj aprokismativnih algoritama za generisanje k-NN grafova jeste smanjivanje ukupnog broja računanja razdaljina između dve tačke, uz održavanje visoke tačnosti krajnje aproksimacije.NN-Descent je jedan takav aproksimativni algoritam za generisanje k-NN grafova. Iako se pokazao kao veoma dobar u većini slučajeva, ovaj algoritam ne daje dobre rezultate nad visokodimenzionalnim podacima. Unutar prvog dela teze, detaljno je opisana suština problema i objašnjeni su razlozi za njegovo nastajaneU drugom delu predstavljeno je pet različitih modifikacija NN-Descent algoritma, koje za cilj imaju njegovo poboljšavanje pri radu nad visokodimenzionalnim podacima. Evaluacija ovih algoritama je data kroz eksperimentalnu analizu.Treći deo teze se bavi algoritmima za ažuriranje k-NN grafova. Naime,podaci se vrlo često menjaju  vremenom. Ukoliko se izmene podaci nad kojima je prethodno generisan k-NN graf, potrebno je graf ažurirati u skladu sa izmenama. U okviru ovog dela teze predložena su dva aproksimativna algoritma za ažuriranje k-NN grafova. Ovi algoritmi su evaluirani opširnim eksperimentima nad vremenskim serijama

    Aproksimativni algoritmi za generisanje k-NN grafa

    Get PDF
    Nearest neighbor graphs are modeling proximity relationships between objects. They are widely used in many areas, primarily in machine learning, but also in information retrieval, biology, computer graphics,geographic information systems, etc. The focus of this thesis are knearest neighbor graphs (k-NNG), a special class of nearest neighbor graphs. Each node of k-NNG is connected with directed edges to its k nearest neighbors.A brute-force method for constructing k-NNG entails O(n 2 ) distance calculations. This thesis addresses the problem of more efficient k-NNG construction, achieved by approximation algorithms. The main challenge of an approximation algorithm for k-NNG construction is to decrease the number of distance calculations, while maximizing the approximation’s accuracy.NN-Descent is one such approximation algorithm for k-NNG construction, which reports excellent results in many cases. However, it does not perform well on high-dimensional data. The first part of this thesis summarizes the problem, and gives explanations for such a behavior. The second part introduces five new NN-Descent variants that aim to improve NN-Descent on high-dimensional data. The performance of the  proposed algorithms is evaluated with an experimental analysis.Finally, the third part of this thesis is dedicated to k-NNG update algorithms. Namely, in the real world scenarios data often change over time. If data change after k-NNG construction, the graph needs to be updated accordingly. Therefore, in this part of the thesis, two approximation algorithms for k-NNG updates are proposed. They are validated with extensive experiments on time series data.Graf najbližih suseda modeluje veze između objekata koji su međusobno bliski. Ovi grafovi se koriste u mnogim disciplinama, pre svega u mašinskom učenju, a potom i u pretraživanju informacija, biologiji, računarskoj grafici, geografskim informacionim sistemima, itd. Fokus ove teze je graf k najbližih suseda (k-NN graf), koji predstavlja posebnu klasu grafova najbližih suseda. Svaki čvor k-NN grafa je povezan usmerenim granama sa njegovih k najbližih suseda.Metod grube sile za generisanje k-NN grafova podrazumeva O(n 2 ) računanja razdaljina između dve tačke. Ova teza se bavi  problemom efikasnijeg generisanja k-NN grafova, korišćenjem aproksimativnih  algoritama.Glavni cilj aprokismativnih algoritama za generisanje k-NN grafova jeste smanjivanje ukupnog broja računanja razdaljina između dve tačke, uz održavanje visoke tačnosti krajnje aproksimacije.NN-Descent je jedan takav aproksimativni algoritam za generisanje k-NN grafova. Iako se pokazao kao veoma dobar u većini slučajeva, ovaj algoritam ne daje dobre rezultate nad visokodimenzionalnim podacima. Unutar prvog dela teze, detaljno je opisana suština problema i objašnjeni su razlozi za njegovo nastajaneU drugom delu predstavljeno je pet različitih modifikacija NN-Descent algoritma, koje za cilj imaju njegovo poboljšavanje pri radu nad visokodimenzionalnim podacima. Evaluacija ovih algoritama je data kroz eksperimentalnu analizu.Treći deo teze se bavi algoritmima za ažuriranje k-NN grafova. Naime,podaci se vrlo često menjaju  vremenom. Ukoliko se izmene podaci nad kojima je prethodno generisan k-NN graf, potrebno je graf ažurirati u skladu sa izmenama. U okviru ovog dela teze predložena su dva aproksimativna algoritma za ažuriranje k-NN grafova. Ovi algoritmi su evaluirani opširnim eksperimentima nad vremenskim serijama

    Solution Path Clustering with Adaptive Concave Penalty

    Full text link
    Fast accumulation of large amounts of complex data has created a need for more sophisticated statistical methodologies to discover interesting patterns and better extract information from these data. The large scale of the data often results in challenging high-dimensional estimation problems where only a minority of the data shows specific grouping patterns. To address these emerging challenges, we develop a new clustering methodology that introduces the idea of a regularization path into unsupervised learning. A regularization path for a clustering problem is created by varying the degree of sparsity constraint that is imposed on the differences between objects via the minimax concave penalty with adaptive tuning parameters. Instead of providing a single solution represented by a cluster assignment for each object, the method produces a short sequence of solutions that determines not only the cluster assignment but also a corresponding number of clusters for each solution. The optimization of the penalized loss function is carried out through an MM algorithm with block coordinate descent. The advantages of this clustering algorithm compared to other existing methods are as follows: it does not require the input of the number of clusters; it is capable of simultaneously separating irrelevant or noisy observations that show no grouping pattern, which can greatly improve data interpretation; it is a general methodology that can be applied to many clustering problems. We test this method on various simulated datasets and on gene expression data, where it shows better or competitive performance compared against several clustering methods.Comment: 36 page
    corecore