10 research outputs found

    Multi-label Class-imbalanced Action Recognition in Hockey Videos via 3D Convolutional Neural Networks

    Get PDF
    Automatic analysis of the video is one of most complex problems in the fields of computer vision and machine learning. A significant part of this research deals with (human) activity recognition (HAR) since humans, and the activities that they perform, generate most of the video semantics. Video-based HAR has applications in various domains, but one of the most important and challenging is HAR in sports videos. Some of the major issues include high inter- and intra-class variations, large class imbalance, the presence of both group actions and single player actions, and recognizing simultaneous actions, i.e., the multi-label learning problem. Keeping in mind these challenges and the recent success of CNNs in solving various computer vision problems, in this work, we implement a 3D CNN based multi-label deep HAR system for multi-label class-imbalanced action recognition in hockey videos. We test our system for two different scenarios: an ensemble of kk binary networks vs. a single kk-output network, on a publicly available dataset. We also compare our results with the system that was originally designed for the chosen dataset. Experimental results show that the proposed approach performs better than the existing solution.Comment: Accepted to IEEE/ACIS SNPD 2018, 6 pages, 3 figure

    HANDLING MISSING ATTRIBUTE VALUES IN DECISION TABLES USING VALUED TOLERANCE APPROACH

    Get PDF
    Rule induction is one of the key areas in data mining as it is applied to a large number of real life data. However, in such real life data, the information is incompletely specified most of the time. To induce rules from these incomplete data, more powerful algorithms are necessary. This research work mainly focuses on a probabilistic approach based on the valued tolerance relation. This thesis is divided into two parts. The first part describes the implementation of the valued tolerance relation. The induced rules are then evaluated based on the error rate due to incorrectly classified and unclassified examples. The second part of this research work shows a comparison of the rules induced by the MLEM2 algorithm that has been implemented before, with the rules induced by the valued tolerance based approach which was implemented as part of this research. Hence, through this thesis, the error rate for the MLEM2 algorithm and the valued tolerance based approach are compared and the results are documented

    A comparison of sixteen classification strategies of rule induction from incomplete data using the MLEM2 algorithm

    Get PDF
    In data mining, rule induction is a process of extracting formal rules from decision tables, where the later are the tabulated observations, which typically consist of few attributes, i.e., independent variables and a decision, i.e., a dependent variable. Each tuple in the table is considered as a case, and there could be n number of cases for a table specifying each observation. The efficiency of the rule induction depends on how many cases are successfully characterized by the generated set of rules, i.e., ruleset. There are different rule induction algorithms, such as LEM1, LEM2, MLEM2. In the real world, datasets will be imperfect, inconsistent, and incomplete. MLEM2 is an efficient algorithm to deal with such sorts of data, but the quality of rule induction largely depends on the chosen classification strategy. We tried to compare the 16 classification strategies of rule induction using MLEM2 on incomplete data. For this, we implemented MLEM2 for inducing rulesets based on the selection of the type of approximation, i.e., singleton, subset or concept, and the value of alpha for calculating probabilistic approximations. A program called rule checker is used to calculate the error rate based on the classification strategy specified. To reduce the anomalies, we used ten-fold cross-validation to measure the error rate for each classification. Error rates for the above strategies are being calculated for different datasets, compared, and presented

    The N-Grams Based Text Similarity Detection Approach Using Self-Organizing Maps and Similarity Measures

    Get PDF
    In the paper the word-level n-grams based approach is proposed to find similarity between texts. The approach is a combination of two separate and independent techniques: self-organizing map (SOM) and text similarity measures. SOM’s uniqueness is that the obtained results of data clustering, as well as dimensionality reduction, are presented in a visual form. The four measures have been evaluated: cosine, dice, extended Jaccard’s, and overlap. First of all, texts have to be converted to numerical expression. For that purpose, the text has been split into the word-level n-grams and after that, the bag of n-grams has been created. The n-grams’ frequencies are calculated and the frequency matrix of dataset is formed. Various filters are used to create a bag of n-grams: stemming algorithms, number and punctuation removers, stop words, etc. All experimental investigation has been made using a corpus of plagiarized short answers dataset.This article belongs to the Special Issue Advances in Deep Learnin

    EdiFlow: data-intensive interactive workflows for visual analytics

    Get PDF
    International audienceVisual analytics aims at combining interactive data visualization with data analysis tasks. Given the explosion in volume and complexity of scientific data, e.g., associated to biological or physical processes or social networks, visual analytics is called to play an important role in scientific data management. Most visual analytics platforms, however, are memory-based, and are therefore limited in the volume of data handled. Moreover, the integration of each new algorithm (e.g. for clustering) requires integrating it by hand into the platform. Finally, they lack the capability to define and deploy well-structured processes where users with different roles interact in a coordinated way sharing the same data and possibly the same visualizations. We have designed and implemented EdiFlow, a workflow platform for visual analytics applications. EdiFlow uses a simple structured process model, and is backed by a persistent database, storing both process information and process instance data. EdiFlow processes provide the usual process features (roles, structured control) and may integrate visual analytics tasks as activities. We present its architecture, deployment on a sample application, and main technical challenges involved

    Виявлення шахрайської платіжної діяльності на основі моделей машинного навчання

    Get PDF
    Дипломна робота: 95 с., 25 рис., 7 табл., 2 дод., 14 джерел. Обʼєкт дослідження: методи і моделі машинного навчання. Предмет дослідження: методи і моделі класифікації для прогнозування шахрайських платіжних транзакцій. Мета дослідження: розробка ефективної моделі машинного навчання для автоматичного прогнозування ймовірності того, чи є платіжна транзакція конкретного користувача шахрайською. Використані моделі: у програмній реалізації було використано логістичну регресію, дерева рішень, випадковий ліс, XGBoost та SVM. Актуальність роботи зумовлена тим, що у сучасному цифровому світі шахрайська платіжна діяльність стає все більшою загрозою для економіки та фінансової безпеки державних та приватних установ. Зловмисники постійно шукають нові способи обману та незаконного збагачення за рахунок платіжних систем. Отриманні результати: побудована модель виявлення шахрайських платіжних транзакцій, що може прогнозувати чи є конкретна транзакція шахрайською з прийнятною точністю. В рамках подальшого дослідження пропонується підвищувати точність отриманої моделей, покращити та збагачувати дані для моделі, застосувати нові методи і підходи, такі як, наприклад, глибокі нейронні мережі та методи виявлення аномалій.Thesis: 95 p., 25 fig., 7 tabl., 2 app., 14 references. Object of research: methods and models of machine learning. Subject of research: classification methods and models for predicting fraudulent payment transactions. The purpose of the work: developing an effective machine learning model to predict the probability of whether a particular user's payment transaction is fraudulent. Used models: logistic regression, decision trees, random forest, XGBoost and SVM were used in the software implementation. The relevance of the work is due to the fact that in today's digital world, fraudulent payment activity is becoming an increasing threat to the economy and financial security of public and private institutions. Criminals are constantly looking for new ways to cheat and illegally enrich themselves at the expense of payment systems. The results obtained: a model for detecting fraudulent payment transactions is built, which can predict whether a specific transaction is fraudulent with acceptable accuracy. As part of further research, it is suggested to increase the accuracy of the obtained model, to improve and enrich the data for the model, to apply new methods and approaches, such as, for example, deep neural networks and anomaly detection method

    Top-K Nodes Identification in Big Networks Based on Topology and Activity Analysis

    Full text link
    Graphs and Networks have been the most researched topics with applications ranging from theoretical to practical fields, such as social media, genetics, and education. In many competitive environments, the most productive activities may be interacting with high-profile people, reading a much-cited article, or researching a wide range of fields such as the study on highly connected proteins. This thesis proposes two methods to deal with top-K nodes identification: centrality-based and activity-based methods for identifying top-K nodes. The first method is based on the topological structure of the network and uses the centrality measure called Katz Centrality; a path based ranking measure that calculates the local influence of a node as well as its global influence. It starts by filtering out the top-K nodes from a pool of network data using Katz Centrality. By providing a means to filter out unnecessary nodes based on their centrality values, one can focus more on the most important nodes. The proposed method was applied to various network data and the results showed how different parameter values lead to different numbers of top-K nodes. The second method incorporates the theory of heat diffusion. Each node in the network can act as the source of heat. The amount of heat diffused or received by the node depends on the number of activities it performs. There are two types of activities: Interactive and Non-Interactive. Interactive activities could be likes, comments, and shares whereas posting a status, tweets or pictures could be the examples of non-interactive activities. We applied these proposed methods on Instagram network data and compared the results with the other similar algorithms. The experiment results showed that our activity-based approach is much faster and accurate than the existing methods. Images referenced in this thesis are included in the supplementary files

    Optimización multiobjetivo para clasificación de datos desbalanceados

    Get PDF
    El objetivo de este proyecto es mejorar la clasificación de patrones realizada con una red neuronal artificial, especialmente para problemas desbalanceados, desde un enfoque de optimización multiobjetivo mediante la utilización de algoritmos evolutivos. Para facilitar la lectura de la memoria, se incluye un breve resumen de cada capítulo: · Capítulo 1. Introducción. Se trata de la sección actual, en la que se hace una breve introducción al proyecto, se describe el propósito del mismo y se detalla el contenido de esta memoria. · Capítulo 2. Contexto de Trabajo. En esta sección se describe el ámbito en el que se ha desarrollado el proyecto, detallando las técnicas y herramientas utilizadas. Se divide en dos apartados, en los que se explica el Perceptrón Multicapa y los Algoritmos Evolutivos para la Optimización Multiobjetivo. · Capítulo 3. Descripción del Sistema. En este apartado se detalla la implementación del sistema y la funcionalidad que ofrece. En primer lugar se describe de manera formal el propósito del sistema. Seguidamente se definen los elementos que intervienen en el algoritmo evolutivo para la aplicación diseñada. Por último, se especifican dos variantes implementadas en base al algoritmo inicial. · Capítulo 4. Experimentación. En este capítulo se muestran todos los experimentos realizados con el sistema desarrollado para cuatro dominios. Se desarrolla un sub-apartado por cada dominio, mostrando los resultados obtenidos para cada uno de los tres métodos (versiones) implementados, y realizando un análisis general de los resultados para dicho dominio. · Capítulo 5. Conclusiones y Trabajos Futuros. En esta sección se muestran las conclusiones obtenidas del trabajo realizado. También se recogen las posibles líneas futuras de ampliación del presente proyecto. · Capítulo 6. Bibliografía. Se muestran todos los documentos que se han empleado para la elaboración de este proyecto, así como aquellos que son de especial interés.Ingeniería en Informátic

    Vehicle-to-vehicle communication: design, performance, and disruption mitigation in real-world environment

    Get PDF
    This thesis investigates the performance of 802.11p-based V2V communication in real-life scenarios, and explores potential practical applications such as GNSS correction data broadcasting to improve the positioning accuracy of nearby vehicles, and enhancing communication robustness by preemptively predicting potential disruptions with the assistance of Machine Learning (ML) models. A custom V2V On-board Unit (OBU) hardware platform was developed, and real- world multi-vehicle outdoor experiments were planned and carried out. The collected data was examined and used to train a number of ML models, and their performance was compared. The experiments revealed that the custom OBU was fully functional, and signal quality and communication range were observed to be affected by real-world imperfections. The GNSS correction data broadcasting was shown to notably increase the positioning accuracy of nearby vehicles, and the ML models trained from Key Performance Indicators (KPIs) demonstrated excellent prediction accuracy, allowing pre-emptive actions to be taken to reduce the downtime from communication disruption

    Vehicle-to-vehicle communication: design, performance, and disruption mitigation in real-world environment

    Get PDF
    This thesis investigates the performance of 802.11p-based V2V communication in real-life scenarios, and explores potential practical applications such as GNSS correction data broadcasting to improve the positioning accuracy of nearby vehicles, and enhancing communication robustness by preemptively predicting potential disruptions with the assistance of Machine Learning (ML) models. A custom V2V On-board Unit (OBU) hardware platform was developed, and real- world multi-vehicle outdoor experiments were planned and carried out. The collected data was examined and used to train a number of ML models, and their performance was compared. The experiments revealed that the custom OBU was fully functional, and signal quality and communication range were observed to be affected by real-world imperfections. The GNSS correction data broadcasting was shown to notably increase the positioning accuracy of nearby vehicles, and the ML models trained from Key Performance Indicators (KPIs) demonstrated excellent prediction accuracy, allowing pre-emptive actions to be taken to reduce the downtime from communication disruption
    corecore