1,169 research outputs found

    Multiple Instance Learning: A Survey of Problem Characteristics and Applications

    Full text link
    Multiple instance learning (MIL) is a form of weakly supervised learning where training instances are arranged in sets, called bags, and a label is provided for the entire bag. This formulation is gaining interest because it naturally fits various problems and allows to leverage weakly labeled data. Consequently, it has been used in diverse application fields such as computer vision and document classification. However, learning from bags raises important challenges that are unique to MIL. This paper provides a comprehensive survey of the characteristics which define and differentiate the types of MIL problems. Until now, these problem characteristics have not been formally identified and described. As a result, the variations in performance of MIL algorithms from one data set to another are difficult to explain. In this paper, MIL problem characteristics are grouped into four broad categories: the composition of the bags, the types of data distribution, the ambiguity of instance labels, and the task to be performed. Methods specialized to address each category are reviewed. Then, the extent to which these characteristics manifest themselves in key MIL application areas are described. Finally, experiments are conducted to compare the performance of 16 state-of-the-art MIL methods on selected problem characteristics. This paper provides insight on how the problem characteristics affect MIL algorithms, recommendations for future benchmarking and promising avenues for research

    A framework for emotion and sentiment predicting supported in ensembles

    Get PDF
    Humans are prepared to comprehend each other’s emotions through subtle body movements or facial expressions; using those expressions, individuals change how they deliver messages when communicating between them. Machines, user interfaces, or robots need to empower this ability, in a way to change the interaction from the traditional “human-computer interaction” to a “human-machine cooperation”, where the machine provides the “right” information and functionality, at the “right” time, and in the “right” way. This dissertation presents a framework for emotion classification based on facial, speech, and text emotion prediction sources, supported by an ensemble of open-source code retrieved from off-the-shelf available methods. The main contribution is integrating outputs from different sources and methods in a single prediction, consistent with the emotions presented by the system’s user. For each different source, an initial aggregation of primary classifiers was implemented: for facial emotion classification, the aggregation achieved an accuracy above 73% in both FER2013 and RAF-DB datasets; For the speech emotion classification, four datasets were used, namely: RAVDESS, TESS, CREMA-D, and SAVEE. The aggregation of primary classifiers, achieved for a combination of three of the mentioned datasets results above 86 % of accuracy; The text emotion aggregation of primary classifiers was tested with one dataset called EMOTIONLINES, the classification of emotions achieved an accuracy above 53 %. Finally, the integration of all the methods in a single framework allows us to develop an emotion multi-source aggregator (EMsA), which aggregates the results extracted from the primary emotion classifications from different sources, such as facial, speech, text etc. We describe the EMsA and results using the RAVDESS dataset, which achieved 81.99% accuracy, in the case of the EMsA using a combination of faces and speech. Finally, we present an initial approach for sentiment classification.Os humanos estão preparados para compreender as emoções uns dos outros por meio de movimentos subtis do corpo ou expressões faciais; i.e., a forma como esses movimentos e expressões são enviados mudam a forma de como são entregues as mensagens quando os humanos comunicam entre eles. Máquinas, interfaces de utilizador ou robôs precisam de potencializar essa capacidade, de forma a mudar a interação do tradicional “interação humano-computador” para uma “cooperação homem-máquina”, onde a máquina fornece as informações e funcionalidades “certas”, na hora “certa” e da maneira “certa”. Nesta dissertação é apresentada uma estrutura (um ensemble de modelos) para classificação de emoções baseada em múltiplas fontes, nomeadamente na previsão de emoções faciais, de fala e de texto. Os classificadores base são suportados em código-fonte aberto associados a métodos disponíveis na literatura (classificadores primários). A principal contribuição é integrar diferentes fontes e diferentes métodos (os classificadores primários) numa única previsão consistente com as emoções apresentadas pelo utilizador do sistema. Neste contexto, salienta-se que da análise ao estado da arte efetuada sobre as diferentes formas de classificar emoções em humanos, existe o reconhecimento de emoção corporal (não considerando a face). No entanto, não foi encontrado código-fonte aberto e publicado para os classificadores primários que possam ser utilizados no âmbito desta dissertação. No reconhecimento de emoções da fala e texto foram também encontradas algumas dificuldades em encontrar classificadores primários com os requisitos necessários, principalmente no texto, pois existem bastantes modelos, mas com inúmeras emoções diferentes das 6 emoções básicas consideradas (tristeza, medo, surpresa, repulsa, raiva e alegria). Para o texto ainda possível verificar que existem mais modelos com a previsão de sentimento do que de emoções. De forma isolada para cada uma das fontes, i.e., para cada componente analisada (face, fala e texto), foi desenvolvido uma framework em Python que implementa um agregador primário com n classificadores primários (nesta dissertação considerou-se n igual 3). Para executar os testes e obter os resultados de cada agregador primário é usado um dataset específico e é enviado a informação do dataset para o agregador. I.e., no caso do agregador facial é enviado uma imagem, no caso do agregador da fala é enviado um áudio e no caso do texto é enviado a frase para a correspondente framework. Cada dataset usado foi dividido em ficheiros treino, validação e teste. Quando a framework acaba de processar a informação recebida são gerados os respetivos resultados, nomeadamente: nome do ficheiro/identificação do input, resultados do primeiro classificador primário, resultados do segundo classificador primário, resultados do terceiro classificador primário e ground-truth do dataset. Os resultados dos classificadores primários são depois enviados para o classificador final desse agregador primário, onde foram testados quatro classificadores: (a) voting, que, no caso de n igual 3, consiste na comparação dos resultados da emoção de cada classificador primário, i.e., se 2 classificadores primários tiverem a mesma emoção o resultado do voting será esse, se todos os classificadores tiverem resultados diferentes nenhum resultado é escolhido. Além deste “classificador” foram ainda usados (b) Random Forest, (c) Adaboost e (d) MLP (multiplayer perceptron). Quando a framework de cada agregador primário foi concluída, foi desenvolvido um super-agregador que tem o mesmo princípio dos agregadores primários, mas, agora, em vez de ter os resultados/agregação de apenas 3 classificadores primários, vão existir n × 3 resultados de classificadores primários (n da face, n da fala e n do texto). Relativamente aos resultados dos agregadores usados para cada uma das fontes, face, fala e texto, obteve-se para a classificação de emoção facial uma precisão de classificação acima de 73% nos datasets FER2013 e RAF-DB. Na classificação da emoção da fala foram utilizados quatro datasets, nomeadamente RAVDESS, TESS, CREMA-D e SAVEE, tendo que o melhor resultado de precisão obtido foi acima dos 86% quando usado a combinação de 3 dos 4 datasets. Para a classificação da emoção do texto, testou-se com o um dataset EMOTIONLINES, sendo o melhor resultado obtido foi de 53% (precisão). A integração de todas os classificadores primários agora num único framework permitiu desenvolver o agregador multi-fonte (emotion multi-source aggregator - EMsA), onde a classificação final da emoção é extraída, como já referido da agregação dos classificadores de emoções primárias de diferentes fontes. Para EMsA são apresentados resultados usando o dataset RAVDESS, onde foi alcançado uma precisão de 81.99 %, no caso do EMsA usar uma combinação de faces e fala. Não foi possível testar EMsA usando um dataset reconhecido na literatura que tenha ao mesmo tempo informação do texto, face e fala. Por último, foi apresentada uma abordagem inicial para classificação de sentimentos

    Large- Scale Content Based Face Image Retrieval using Attribute Enhanced Sparse Codewords.

    Get PDF
    Content based image retrieval (CBIR) have turn into majority dynamic exploration regions within previous couple of existence. Numerous index strategies be in light of worldwide component circulations. Be that as it may, these worldwide circulations have restricted segregating force since they are not able to catch nearby picture data. Photographs with individuals are the foremost attention of users. Consequently with exponentially increasing pictures, huge size contented base features representation recovery is a facilitating knowledge in favor of various developing applications. The main objective is to apply automatically spotted human characteristics that comprise semantic cue of facade pictures toward increase gratified base facade recovery through creating semantic codeword pro effectual huge size countenance recovery. With leveraging person characteristics into scalable as well as methodical structure, suggest and offer two orthogonal systems named attribute improved meager code and attribute entrenched upturned index toward develop facade recovery. We compare proposed method with other three methods namely LBP, ATTR and SC methods. The results illustrate that the proposed methods can attain qualified enhancement in Mean Average Precision (MAP) associated to the existing methods. DOI: 10.17762/ijritcc2321-8169.15084

    LIMEtree: Interactively Customisable Explanations Based on Local Surrogate Multi-output Regression Trees

    Get PDF
    Systems based on artificial intelligence and machine learning models should be transparent, in the sense of being capable of explaining their decisions to gain humans' approval and trust. While there are a number of explainability techniques that can be used to this end, many of them are only capable of outputting a single one-size-fits-all explanation that simply cannot address all of the explainees' diverse needs. In this work we introduce a model-agnostic and post-hoc local explainability technique for black-box predictions called LIMEtree, which employs surrogate multi-output regression trees. We validate our algorithm on a deep neural network trained for object detection in images and compare it against Local Interpretable Model-agnostic Explanations (LIME). Our method comes with local fidelity guarantees and can produce a range of diverse explanation types, including contrastive and counterfactual explanations praised in the literature. Some of these explanations can be interactively personalised to create bespoke, meaningful and actionable insights into the model's behaviour. While other methods may give an illusion of customisability by wrapping, otherwise static, explanations in an interactive interface, our explanations are truly interactive, in the sense of allowing the user to "interrogate" a black-box model. LIMEtree can therefore produce consistent explanations on which an interactive exploratory process can be built

    Experiments in expression recognition

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.Includes bibliographical references (p. 39-41).Despite the significant effort devoted to methods for expression recognition, suitable training and test databases designed explicitly for expression research have been largely neglected. Additionally, possible techniques for expression recognition within an Man-Machine-Interface (MMI) domain are numerous, but it remains unclear what methods are most effective for expression recognition. In response, this thesis describes the means by which an appropriate expression database has been generated and then enumerates the results of five different recognition methods as applied to that database. An analysis of the results of these experiments is given, and conclusions for future research based upon these results is put forth.by James P. Skelley.M.Eng

    Cursor control by point-of-regard estimation for a computer with integrated webcam

    Get PDF
    This work forms part of the project Eye-Communicate funded by the Malta Council for Science and Technology through the National Research & Innovation Programme (2012) under Research Grant No. R&I-2012-057.The problem of eye-gaze tracking by videooculography has been receiving extensive interest throughout the years owing to the wide range of applications associated with this technology. Nonetheless, the emergence of a new paradigm referred to as pervasive eye-gaze tracking, introduces new challenges that go beyond the typical conditions for which classical video-based eye- gaze tracking methods have been developed. In this paper, we propose to deal with the problem of point-of-regard estimation from low-quality images acquired by an integrated camera inside a notebook computer. The proposed method detects the iris region from low-resolution eye region images by its intensity values rather than the shape, ensuring that this region can also be detected at different angles of rotation and under partial occlusion by the eyelids. Following the calculation of the point- of-regard from the estimated iris center coordinates, a number of Kalman filters improve upon the noisy point-of-regard estimates to smoothen the trajectory of the mouse cursor on the monitor screen. Quantitative results obtained from a validation procedure reveal a low mean error that is within the footprint of the average on-screen icon.peer-reviewe

    Image-based Social Sensing: Combining AI and the Crowd to Mine Policy-Adherence Indicators from Twitter

    Get PDF
    Social Media provides a trove of information that, if aggregated and analysed appropriately can provide important statistical indicators to policy makers. In some situations these indicators are not available through other mechanisms. For example, given the ongoing COVID-19 outbreak, it is essential for governments to have access to reliable data on policy-adherence with regards to mask wearing, social distancing, and other hard-to-measure quantities. In this paper we investigate whether it is possible to obtain such data by aggregating information from images posted to social media. The paper presents VisualCit, a pipeline for image-based social sensing combining recent advances in image recognition technology with geocoding and crowdsourcing techniques. Our aim is to discover in which countries, and to what extent, people are following COVID-19 related policy directives. We compared the results with the indicators produced within the CovidDataHub behavior tracker initiative. Preliminary results shows that social media images can produce reliable indicators for policy makers.Comment: 10 pages, 9 figures, to be published in Proceedings of ICSE Software Engineering in Society, May 202

    Revisiting Data Complexity Metrics Based on Morphology for Overlap and Imbalance: Snapshot, New Overlap Number of Balls Metrics and Singular Problems Prospect

    Full text link
    Data Science and Machine Learning have become fundamental assets for companies and research institutions alike. As one of its fields, supervised classification allows for class prediction of new samples, learning from given training data. However, some properties can cause datasets to be problematic to classify. In order to evaluate a dataset a priori, data complexity metrics have been used extensively. They provide information regarding different intrinsic characteristics of the data, which serve to evaluate classifier compatibility and a course of action that improves performance. However, most complexity metrics focus on just one characteristic of the data, which can be insufficient to properly evaluate the dataset towards the classifiers' performance. In fact, class overlap, a very detrimental feature for the classification process (especially when imbalance among class labels is also present) is hard to assess. This research work focuses on revisiting complexity metrics based on data morphology. In accordance to their nature, the premise is that they provide both good estimates for class overlap, and great correlations with the classification performance. For that purpose, a novel family of metrics have been developed. Being based on ball coverage by classes, they are named after Overlap Number of Balls. Finally, some prospects for the adaptation of the former family of metrics to singular (more complex) problems are discussed.Comment: 23 pages, 9 figures, preprin
    corecore