17 research outputs found

    Support vector machines to detect physiological patterns for EEG and EMG-based human-computer interaction:a review

    Get PDF
    Support vector machines (SVMs) are widely used classifiers for detecting physiological patterns in human-computer interaction (HCI). Their success is due to their versatility, robustness and large availability of free dedicated toolboxes. Frequently in the literature, insufficient details about the SVM implementation and/or parameters selection are reported, making it impossible to reproduce study analysis and results. In order to perform an optimized classification and report a proper description of the results, it is necessary to have a comprehensive critical overview of the applications of SVM. The aim of this paper is to provide a review of the usage of SVM in the determination of brain and muscle patterns for HCI, by focusing on electroencephalography (EEG) and electromyography (EMG) techniques. In particular, an overview of the basic principles of SVM theory is outlined, together with a description of several relevant literature implementations. Furthermore, details concerning reviewed papers are listed in tables and statistics of SVM use in the literature are presented. Suitability of SVM for HCI is discussed and critical comparisons with other classifiers are reported

    Development of a document classification method by using geodesic distance to calculate similarity of documents

    Get PDF
    Currently, the Internet has given people the opportunity to access to human knowledge quickly and conveniently through various channels such as Web pages, social networks, digital libraries, portals... However, with the process of exchanging and updating information quickly, the volume of information stored (in the form of digital documents) is increasing rapidly. Therefore, we are facing challenges in representing, storing, sorting and classifying documents.In this paper, we present a new approach to text classification. This approach is based on semi-supervised machine learning and Support Vector Machine (SVM). The new point of the study is that instead of calculating the distance between the vectors by Euclidean distance, we use geodesic distance. To do this, the text must first be expressed as an n-dimensional vector. In the n-dimensional vector space, each vector is represented by one point; use geodesic distance to calculate the distance from a point to nearby points and connect into a graph. The classification is based on calculating the shortest path between vertices on the graph through a kernel function. We conducted experiments on articles taken from Reuters on 5 different topics. To evaluate the proposed method, we tested the SVM method with the traditional calculation based on Euclidean distance and the method we proposed based on geodesic distance. The experiment was performed on the same data set of 5 topics: Business, Markets, World, Politics, and Technology. The results showed that the correct classification rate is better than the traditional SVM method based on Euclidean distance (average of 3.2 %

    Development of a document classification method by using geodesic distance to calculate similarity of documents

    Get PDF
    Currently, the Internet has given people the opportunity to access to human knowledge quickly and conveniently through various channels such as Web pages, social networks, digital libraries, portals... However, with the process of exchanging and updating information quickly, the volume of information stored (in the form of digital documents) is increasing rapidly. Therefore, we are facing challenges in representing, storing, sorting and classifying documents.In this paper, we present a new approach to text classification. This approach is based on semi-supervised machine learning and Support Vector Machine (SVM). The new point of the study is that instead of calculating the distance between the vectors by Euclidean distance, we use geodesic distance. To do this, the text must first be expressed as an n-dimensional vector. In the n-dimensional vector space, each vector is represented by one point; use geodesic distance to calculate the distance from a point to nearby points and connect into a graph. The classification is based on calculating the shortest path between vertices on the graph through a kernel function. We conducted experiments on articles taken from Reuters on 5 different topics. To evaluate the proposed method, we tested the SVM method with the traditional calculation based on Euclidean distance and the method we proposed based on geodesic distance. The experiment was performed on the same data set of 5 topics: Business, Markets, World, Politics, and Technology. The results showed that the correct classification rate is better than the traditional SVM method based on Euclidean distance (average of 3.2 %

    BERT self-learning approach with limited labels for document classification of a Brazilian Army’s administrative documentary set

    Get PDF
    Dissertação (Mestrado Profissional em Computação Aplicada) — Universidade de Brasília, Instituto de Ciências Exatas, Departamento de Ciência da Computação, Brasília, 2022.O considerável aumento na velocidade de produção documental e, consequentemente, no volume de dados não estruturados armazenados nas instalações do Exército Brasileiro, especificamente na forma de documentos administrativos, acrescido da necessidade de consciência situacional por parte dos Comandos, além da observação da legislação arquivística vigente, impõe a execução de processos capazes de classificar documentos. Neste diapasão, o Processamento de Linguagem Natural (NLP) surge como um importante recurso na persecução dos objetivos relativos à classificação documental, mostrandose meio adequado para o desenvolvimento de pesquisa que vise à classificação de documentos considerando a realidade da produção documental atual, onde sobeja considerável número de amostras documentais não rotuladas. Observado o fato de que os mais poderosos modelos NLP desenvolvidos baseiam-se em técnicas de aprendizado supervisionado, as quais exigem considerável número de amostras rotuladas, resta o desafio de encontrar modelo capaz de classificar conjunto de dados de uma Organização Militar (OM), parcialmente rotulado, de acordo com o Modelo de Requisitos para Sistemas Informatizados de Gestão Arquivística de Documentos (e-ARQ Brasil), alcançando performance equivalente ao nível humano. Objetivou-se desenvolver, durante a condução da presente pesquisa, a expansão do modelo BERT, com a substituição do estágio supervisionado de ajuste fino por um método de autoaprendizagem, realizando-se a mensuração da performance resultante para porcentagens específicas do conjunto de dados, inicialmente compreendidas entre 3% e 30% do total de amostras rotuladas. Os resultados obtidos permitiram vislumbrar a aplicabilidade do método proposto nas bases de dados de documentos do Exército Brasileiro. Concomitantemente, no estudo de caso em tela, foi possível verificar performance compatível com as necessidades existentes, sendo o método proposto capaz de classificar de forma equivalente à capacidade humana, apresentando melhores resultados que os experimento de referência, com ganhos maiores à medida em que o número de amostras rotuladas disponíveis decresce.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES).The remarkable acceleration in the production speed of documents and, consequently, in the volume of unstructured data stored at the Brazilian Army facilities, specifically in the form of administrative documents, plus the need of situational awareness by the Commanders, in addition to the observation of the archival legislation, requires processes that enable the capacity of classifying documents. In this sense, Natural Language Processing (NLP) stands as an important asset in the pursuit of objectives related to document classification, proving to be an adequate means for developing research that aims to classify documents considering the reality of current document production, where there is a considerable number of unlabeled document samples. Given the fact that the most powerful NLP models are based on supervised learning techniques, which require a considerable number of labeled samples, the challenge remains to find a model capable of classifying a partially labeled set of data from a Military Organization (OM), according to the Requirements Model for Computerized Document Management Systems (e-ARQ Brazil), reaching a human-level performance. It was intended to develop, during the course of this research, the expansion of the BERT model, with the substitution of the supervised fine-tuning stage by a self-learning method, analyzing the resulting performance for specific percentages of the dataset, initially ranging from 3% to 30% of the total labeled samples. The achieved results allowed us to perceive that the proposed method is applicable to the Brazilian Army’s document databases. Concomitantly, in the case study in question, it was possible to verify that the performance of the proposed method is compatible with the existing needs, being able to perform classifications equivalent to the human capacity, presenting better results than the experiments of reference, with greater gains as the number of available labeled samples decreases

    A Survey of Using Machine Learning in IoT Security and the Challenges Faced by Researchers

    Get PDF
    The Internet of Things (IoT) has become more popular in the last 15 years as it has significantly improved and gained control in multiple fields. We are nowadays surrounded by billions of IoT devices that directly integrate with our lives, some of them are at the center of our homes, and others control sensitive data such as military fields, healthcare, and datacenters, among others. This popularity makes factories and companies compete to produce and develop many types of those devices without caring about how secure they are. On the other hand, IoT is considered a good insecure environment for cyber thefts. Machine Learning (ML) and Deep Learning (DL) also gained more importance in the last 15 years; they achieved success in the networking security field too. IoT has some similar security requirements such as traditional networks, but with some differences according to its characteristics, some specific security features, and environmental limitations, some differences are made such as low energy resources, limited computational capability, and small memory. These limitations inspire some researchers to search for the perfect and lightweight security ways which strike a balance between performance and security. This survey provides a comprehensive discussion about using machine learning and deep learning in IoT devices within the last five years. It also lists the challenges faced by each model and algorithm. In addition, this survey shows some of the current solutions and other future directions and suggestions. It also focuses on the research that took the IoT environment limitations into consideration

    Adaptive Online Learning

    Get PDF
    The research that constitutes this thesis was driven by the two related goals in mind. The first one was to develop new efficient online learning algorithms and to study their properties and theoretical guarantees. The second one was to study real-world data and find algorithms appropriate for the particular real-world problems. This thesis studies online prediction with few assumptions about the nature of the data. This is important for real-world applications of machine learning as complex assumptions about the data are rarely justified. We consider two frameworks: conformal prediction, which is based on the randomness assumption, and prediction with expert advice, where no assumptions about the data are made at all. Conformal predictors are set predictors, that is a set of possible labels is issued by Learner at each trial. After the prediction is made the real label is revealed and Learner's prediction is evaluated. 10 case of classification the label space is finite so Learner makes an error if the true label is not in the set produced by Learner. Conformal prediction was originally developed for the supervised learning task and was proved to be valid in the sense of making errors with a prespecified probability. We will study possible ways of extending this approach to the semi-supervised case and build a valid algorithm for this t ask. Also, we will apply conformal prediction technique to the problem of diagnosing tuberculosis in cattle. Whereas conformal prediction relies on just the randomness assumption, prediction with expert advice drops this one as well. One may wonder whether it is possible to make good predictions under these circumstances. However Learner is provided with predictions of a certain class of experts (or prediction strategies) and may base his prediction on them. The goal then is to perform not much worse than the best strategy in the class. This is achieved by carefully mixing (aggregating) predictions of the base experts. However, often the nature of data changes over time, such that there is a region where one expert is good, followed by a region where another is good and so on. This leads to the algorithms which we call adaptive: they take into account this structure of the data. We explore the possibilities offered by the framework of specialist experts to build adaptive algorithms. This line of thought allows us then to provide an intuitive explanation for the mysterious Mixing Past Posteriors algorithm and build a new algorithm with sharp bounds for Online Multitask Learning.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    corecore