1,046 research outputs found

    Discriminatively Trained Latent Ordinal Model for Video Classification

    Full text link
    We study the problem of video classification for facial analysis and human action recognition. We propose a novel weakly supervised learning method that models the video as a sequence of automatically mined, discriminative sub-events (eg. onset and offset phase for "smile", running and jumping for "highjump"). The proposed model is inspired by the recent works on Multiple Instance Learning and latent SVM/HCRF -- it extends such frameworks to model the ordinal aspect in the videos, approximately. We obtain consistent improvements over relevant competitive baselines on four challenging and publicly available video based facial analysis datasets for prediction of expression, clinical pain and intent in dyadic conversations and on three challenging human action datasets. We also validate the method with qualitative results and show that they largely support the intuitions behind the method.Comment: Paper accepted in IEEE TPAMI. arXiv admin note: substantial text overlap with arXiv:1604.0150

    Semi-WTC: A Practical Semi-supervised Framework for Attack Categorization through Weight-Task Consistency

    Full text link
    Supervised learning has been widely used for attack categorization, requiring high-quality data and labels. However, the data is often imbalanced and it is difficult to obtain sufficient annotations. Moreover, supervised models are subject to real-world deployment issues, such as defending against unseen artificial attacks. To tackle the challenges, we propose a semi-supervised fine-grained attack categorization framework consisting of an encoder and a two-branch structure and this framework can be generalized to different supervised models. The multilayer perceptron with residual connection is used as the encoder to extract features and reduce the complexity. The Recurrent Prototype Module (RPM) is proposed to train the encoder effectively in a semi-supervised manner. To alleviate the data imbalance problem, we introduce the Weight-Task Consistency (WTC) into the iterative process of RPM by assigning larger weights to classes with fewer samples in the loss function. In addition, to cope with new attacks in real-world deployment, we propose an Active Adaption Resampling (AAR) method, which can better discover the distribution of unseen sample data and adapt the parameters of encoder. Experimental results show that our model outperforms the state-of-the-art semi-supervised attack detection methods with a 3% improvement in classification accuracy and a 90% reduction in training time.Comment: Tech repor

    The iNaturalist Species Classification and Detection Dataset

    Get PDF
    Existing image classification datasets used in computer vision tend to have a uniform distribution of images across object categories. In contrast, the natural world is heavily imbalanced, as some species are more abundant and easier to photograph than others. To encourage further progress in challenging real world conditions we present the iNaturalist species classification and detection dataset, consisting of 859,000 images from over 5,000 different species of plants and animals. It features visually similar species, captured in a wide variety of situations, from all over the world. Images were collected with different camera types, have varying image quality, feature a large class imbalance, and have been verified by multiple citizen scientists. We discuss the collection of the dataset and present extensive baseline experiments using state-of-the-art computer vision classification and detection models. Results show that current non-ensemble based methods achieve only 67% top one classification accuracy, illustrating the difficulty of the dataset. Specifically, we observe poor results for classes with small numbers of training examples suggesting more attention is needed in low-shot learning.Comment: CVPR 201

    Multiclass insect counting through deep learning-based density maps estimation

    Get PDF
    The use of digital technologies and artificial intelligence techniques for the automation of some visual assessment processes in agriculture is currently a reality. Image-based, and recently deep learning-based systems are being used in several applications. Main challenge of these applications is to achieve a correct performance in real field conditions over images that are usually acquired with mobile devices and thus offer limited quality. Plagues control is a problem to be tackled in the field. Pest management strategies relies on the identification of the level of infestation. This degree of infestation is established through a counting task manually done by the field researcher so far. Current models were not able to appropriately count due to the small size of the insects and on the last year we presented a density map based algorithm that superseded state of the art methods for a single insect type. In this paper, we extend previous work into a multiclass and multi-stadia approach. Concretely, the proposed algorithm has been tested in two use cases: on the one hand, it counts five different types of adult individuals over multiple crop leaves; and on the other hand, it identifies four different stages for immatures over 2-cm leaf disks. In these leaf disks, some of the species are in different stadia being some of them micron size and difficult to be identified even for the non-expert user. The proposed method achieves good results in both cases. The model for counting adult insects in a leaf achieves a RMSE ranging from 0.89 to 4.47, MAE ranging from 0.40 to 2.15, and R2 ranging from 0.86 to 0.91 for 4 different species in its adult phase (BEMITA, FRANOC, MYZUPE and APHIGO) that may appear together in the same leaf. Besides, for FRANOC, two stadia nymphs and adults are considered. The model developed for counting BEMITA immatures in 2-cm disks obtains R2 values up to 0.98 for big nymphs. This solution was embedded in a docker and can be accessed through an app via REST service in mobile devices. It has been tested in the wild under real conditions in different locations worldwide and over 14 different crops.The authors would like to thank all field researchers that generated the dataset, carried out the annotation process, performed the validation in the wild, and in general, supported the work in Tecnalia and BASF specially to Javier Romero, Carlos Javier Jim ́enez, Amaia Ortiz, Aitor Alvarez and Jone Echazarra

    Deep learning methods for knowledge base population

    Get PDF
    Knowledge bases store structured information about entities or concepts of the world and can be used in various applications, such as information retrieval or question answering. A major drawback of existing knowledge bases is their incompleteness. In this thesis, we explore deep learning methods for automatically populating them from text, addressing the following tasks: slot filling, uncertainty detection and type-aware relation extraction. Slot filling aims at extracting information about entities from a large text corpus. The Text Analysis Conference yearly provides new evaluation data in the context of an international shared task. We develop a modular system to address this challenge. It was one of the top-ranked systems in the shared task evaluations in 2015. For its slot filler classification module, we propose contextCNN, a convolutional neural network based on context splitting. It improves the performance of the slot filling system by 5.0% micro and 2.9% macro F1. To train our binary and multiclass classification models, we create a dataset using distant supervision and reduce the number of noisy labels with a self-training strategy. For model optimization and evaluation, we automatically extract a labeled benchmark for slot filler classification from the manual shared task assessments from 2012-2014. We show that results on this benchmark are correlated with slot filling pipeline results with a Pearson's correlation coefficient of 0.89 (0.82) on data from 2013 (2014). The combination of patterns, support vector machines and contextCNN achieves the best results on the benchmark with a micro (macro) F1 of 51% (53%) on test. Finally, we analyze the results of the slot filling pipeline and the impact of its components. For knowledge base population, it is essential to assess the factuality of the statements extracted from text. From the sentence "Obama was rumored to be born in Kenya", a system should not conclude that Kenya is the place of birth of Obama. Therefore, we address uncertainty detection in the second part of this thesis. We investigate attention-based models and make a first attempt to systematize the attention design space. Moreover, we propose novel attention variants: External attention, which incorporates an external knowledge source, k-max average attention, which only considers the vectors with the k maximum attention weights, and sequence-preserving attention, which allows to maintain order information. Our convolutional neural network with external k-max average attention sets the new state of the art on a Wikipedia benchmark dataset with an F1 score of 68%. To the best of our knowledge, we are the first to integrate an uncertainty detection component into a slot filling pipeline. It improves precision by 1.4% and micro F1 by 0.4%. In the last part of the thesis, we investigate type-aware relation extraction with neural networks. We compare different models for joint entity and relation classification: pipeline models, jointly trained models and globally normalized models based on structured prediction. First, we show that using entity class prediction scores instead of binary decisions helps relation classification. Second, joint training clearly outperforms pipeline models on a large-scale distantly supervised dataset with fine-grained entity classes. It improves the area under the precision-recall curve from 0.53 to 0.66. Third, we propose a model with a structured prediction output layer, which globally normalizes the score of a triple consisting of the classes of two entities and the relation between them. It improves relation extraction results by 4.4% F1 on a manually labeled benchmark dataset. Our analysis shows that the model learns correct correlations between entity and relation classes. Finally, we are the first to use neural networks for joint entity and relation classification in a slot filling pipeline. The jointly trained model achieves the best micro F1 score with a score of 22% while the neural structured prediction model performs best in terms of macro F1 with a score of 25%

    Collaborative-demographic hybrid for financial: product recommendation

    Get PDF
    Internship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsDue to the increased availability of mature data mining and analysis technologies supporting CRM processes, several financial institutions are striving to leverage customer data and integrate insights regarding customer behaviour, needs, and preferences into their marketing approach. As decision support systems assisting marketing and commercial efforts, Recommender Systems applied to the financial domain have been gaining increased attention. This thesis studies a Collaborative- Demographic Hybrid Recommendation System, applied to the financial services sector, based on real data provided by a Portuguese private commercial bank. This work establishes a framework to support account managers’ advice on which financial product is most suitable for each of the bank’s corporate clients. The recommendation problem is further developed by conducting a performance comparison for both multi-output regression and multiclass classification prediction approaches. Experimental results indicate that multiclass architectures are better suited for the prediction task, outperforming alternative multi-output regression models on the evaluation metrics considered. Withal, multiclass Feed-Forward Neural Networks, combined with Recursive Feature Elimination, is identified as the topperforming algorithm, yielding a 10-fold cross-validated F1 Measure of 83.16%, and achieving corresponding values of Precision and Recall of 84.34%, and 85.29%, respectively. Overall, this study provides important contributions for positioning the bank’s commercial efforts around customers’ future requirements. By allowing for a better understanding of customers’ needs and preferences, the proposed Recommender allows for more personalized and targeted marketing contacts, leading to higher conversion rates, corporate profitability, and customer satisfaction and loyalty