132,673 research outputs found

    Zero-shot classification of salmon lice images by siamese neural networks

    Get PDF
    Revised version: some errors corrected.Deep learning models, such as neural networks and its variations, have proven exceptionally useful in the current state of society. However, facilitating competitive performances requires large amounts of data for the models to train on, which is especially true in the problem of classification. Addressing this issue for the scarce image datasets containing salmon lice images used in this thesis, can be done by recasting the problem of "which class does this image belong to?", to rather be a question of image similarity, i.e. "is image i similar to j?". In regards to this thesis, siamese neural networks are employed to distinguish images, rather than to explicitly classify them, which has the effect of producing more data points for training. Exactly how many data points for training is readily developed in this thesis (specifically triplet cardinality). Furthermore, the thesis extensively compares the performance measures of F1-score and TAR@FAR(p) in regards to siamese neural networks, and finds that they differ in terms of prediction strictness and what elements of the confusion matrix they focus on. Specifically, TAR@FAR is designed to be more strict because a bound can be set on the allowance of percentage p of false accepts, whereas F1-score also considers false rejects. Moving on, the thesis is the first work to cover the procedure of cylindrical convolution in siamese neural networks, and shows that they in fact contribute in addressing the problem of rotated images. Additionally, cylindrical convolution seemingly solves the problem of inconsistent distribution of data. Conclusively, the best model at predicting image similarity on the synthetic dataset was Siamese_LeNet5_var with cylindrical convolutions. On this dataset augmented 100 times, it performed a testing F1-score of 72.5 ± 2.6% and a testing TAR of 72.8 ± 3.0% (mean ± std). In terms of the real dataset, testing performances could not be calculated due to dataset scarcity. Regardless, the model that performed the best on the validation dataset was also Siamese_LeNet5_var with cylindrical convolutions. On this dataset augmented 100 times, it performed a median validation F1-score of 60.9% and a median TAR@FAR(0.01) of 46.7\%.Masteroppgave i anvendt og beregningsorientert matematikkMAB399MAMN-MA

    How Many Dissimilarity/Kernel Self Organizing Map Variants Do We Need?

    Full text link
    In numerous applicative contexts, data are too rich and too complex to be represented by numerical vectors. A general approach to extend machine learning and data mining techniques to such data is to really on a dissimilarity or on a kernel that measures how different or similar two objects are. This approach has been used to define several variants of the Self Organizing Map (SOM). This paper reviews those variants in using a common set of notations in order to outline differences and similarities between them. It discusses the advantages and drawbacks of the variants, as well as the actual relevance of the dissimilarity/kernel SOM for practical applications

    Identifying Real Estate Opportunities using Machine Learning

    Full text link
    The real estate market is exposed to many fluctuations in prices because of existing correlations with many variables, some of which cannot be controlled or might even be unknown. Housing prices can increase rapidly (or in some cases, also drop very fast), yet the numerous listings available online where houses are sold or rented are not likely to be updated that often. In some cases, individuals interested in selling a house (or apartment) might include it in some online listing, and forget about updating the price. In other cases, some individuals might be interested in deliberately setting a price below the market price in order to sell the home faster, for various reasons. In this paper, we aim at developing a machine learning application that identifies opportunities in the real estate market in real time, i.e., houses that are listed with a price substantially below the market price. This program can be useful for investors interested in the housing market. We have focused in a use case considering real estate assets located in the Salamanca district in Madrid (Spain) and listed in the most relevant Spanish online site for home sales and rentals. The application is formally implemented as a regression problem that tries to estimate the market price of a house given features retrieved from public online listings. For building this application, we have performed a feature engineering stage in order to discover relevant features that allows for attaining a high predictive performance. Several machine learning algorithms have been tested, including regression trees, k-nearest neighbors, support vector machines and neural networks, identifying advantages and handicaps of each of them.Comment: 24 pages, 13 figures, 5 table
    • …
    corecore