2,924 research outputs found

    Learning from imperfect data : incremental learning and Few-shot Learning

    Get PDF
    In recent years, artificial intelligence (AI) has achieved great success in many fields, e.g., computer vision, speech recognition, recommendation engines, and neural language processing. Although impressive advances have been made, AI algorithms still suffer from an important limitation: they rely on large-scale datasets. In contrast, human beings naturally possess the ability to learn novel knowledge from real-world and imperfect data such as a small number of samples or a non-static continual data stream. Attaining such an ability is particularly appealing. Specifically, an ideal AI system with human-level intelligence should work with the following imperfect data scenarios. 1)~The training data distribution changes while learning. In many real scenarios, data are streaming, might disappear after a given period of time, or even can not be stored at all due to storage constraints or privacy issues. As a consequence, the old knowledge is over-written, a phenomenon called catastrophic forgetting. 2)~The annotations of the training data are sparse. There are also many scenarios where we do not have access to the specific large-scale data of interest due to privacy and security reasons. As a consequence, the deep models overfit the training data distribution and are very likely to make wrong decisions when they encounter rare cases. Therefore, the goal of this thesis is to tackle the challenges and develop AI algorithms that can be trained with imperfect data. To achieve the above goal, we study three topics in this thesis. 1)~Learning with continual data without forgetting (i.e., incremental learning). 2)~Learning with limited data without overfitting (i.e., few-shot learning). 3)~Learning with imperfect data in real-world applications (e.g., incremental object detection). Our key idea is learning to learn/optimize. Specifically, we use advanced learning and optimization techniques to design data-driven methods to dynamically adapt the key elements in AI algorithms, e.g., selection of data, memory allocation, network architecture, essential hyperparameters, and control of knowledge transfer. We believe that the adaptive and dynamic design of system elements will significantly improve the capability of deep learning systems under limited data or continual streams, compared to the systems with fixed and non-optimized elements. More specifically, we first study how to overcome the catastrophic forgetting problem by learning to optimize exemplar data, allocate memory, aggregate neural networks, and optimize key hyperparameters. Then, we study how to improve the generalization ability of the model and tackle the overfitting problem by learning to transfer knowledge and ensemble deep models. Finally, we study how to apply incremental learning techniques to the recent top-performance transformer-based architecture for a more challenging and realistic vision, incremental object detection.Künstliche Intelligenz (KI) hat in den letzten Jahren in vielen Bereichen große Erfolge erzielt, z. B. Computer Vision, Spracherkennung, Empfehlungsmaschinen und neuronale Sprachverarbeitung. Obwohl beeindruckende Fortschritte erzielt wurden, leiden KI-Algorithmen immer noch an einer wichtigen Einschränkung: Sie sind auf umfangreiche Datensätze angewiesen. Im Gegensatz dazu besitzen Menschen von Natur aus die Fähigkeit, neuartiges Wissen aus realen und unvollkommenen Daten wie einer kleinen Anzahl von Proben oder einem nicht statischen kontinuierlichen Datenstrom zu lernen. Das Erlangen einer solchen Fähigkeit ist besonders reizvoll. Insbesondere sollte ein ideales KI-System mit Intelligenz auf menschlicher Ebene mit den folgenden unvollkommenen Datenszenarien arbeiten. 1)~Die Verteilung der Trainingsdaten ändert sich während des Lernens. In vielen realen Szenarien werden Daten gestreamt, können nach einer bestimmten Zeit verschwinden oder können aufgrund von Speicherbeschränkungen oder Datenschutzproblemen überhaupt nicht gespeichert werden. Infolgedessen wird das alte Wissen überschrieben, ein Phänomen, das als katastrophales Vergessen bezeichnet wird. 2)~Die Anmerkungen der Trainingsdaten sind spärlich. Es gibt auch viele Szenarien, in denen wir aus Datenschutz- und Sicherheitsgründen keinen Zugriff auf die spezifischen großen Daten haben, die von Interesse sind. Infolgedessen passen die tiefen Modelle zu stark an die Verteilung der Trainingsdaten an und treffen sehr wahrscheinlich falsche Entscheidungen, wenn sie auf seltene Fälle stoßen. Daher ist das Ziel dieser Arbeit, die Herausforderungen anzugehen und KI-Algorithmen zu entwickeln, die mit unvollkommenen Daten trainiert werden können. Um das obige Ziel zu erreichen, untersuchen wir in dieser Arbeit drei Themen. 1)~Lernen mit kontinuierlichen Daten ohne Vergessen (d. h. inkrementelles Lernen). 2) ~ Lernen mit begrenzten Daten ohne Überanpassung (d. h. Lernen mit wenigen Schüssen). 3) ~ Lernen mit unvollkommenen Daten in realen Anwendungen (z. B. inkrementelle Objekterkennung). Unser Leitgedanke ist Lernen lernen/optimieren. Insbesondere verwenden wir fortschrittliche Lern- und Optimierungstechniken, um datengesteuerte Methoden zu entwerfen, um die Schlüsselelemente in KI-Algorithmen dynamisch anzupassen, z. B. Auswahl von Daten, Speicherzuweisung, Netzwerkarchitektur, wesentliche Hyperparameter und Steuerung des Wissenstransfers. Wir glauben, dass das adaptive und dynamische Design von Systemelementen die Leistungsfähigkeit von Deep-Learning-Systemen bei begrenzten Daten oder kontinuierlichen Streams im Vergleich zu Systemen mit festen und nicht optimierten Elementen erheblich verbessern wird. Genauer gesagt untersuchen wir zunächst, wie das katastrophale Vergessensproblem überwunden werden kann, indem wir lernen, Beispieldaten zu optimieren, Speicher zuzuweisen, neuronale Netze zu aggregieren und wichtige Hyperparameter zu optimieren. Dann untersuchen wir, wie die Verallgemeinerungsfähigkeit des Modells verbessert und das Overfitting-Problem angegangen werden kann, indem wir lernen, Wissen zu übertragen und tiefe Modelle in Ensembles zusammenzufassen. Schließlich untersuchen wir, wie man inkrementelle Lerntechniken auf die jüngste transformatorbasierte Hochleistungsarchitektur für eine anspruchsvollere und realistischere Vision, inkrementelle Objekterkennung, anwendet

    Exploiting Cross Domain Relationships for Target Recognition

    Get PDF
    Cross domain recognition extracts knowledge from one domain to recognize samples from another domain of interest. The key to solving problems under this umbrella is to find out the latent connections between different domains. In this dissertation, three different cross domain recognition problems are studied by exploiting the relationships between different domains explicitly according to the specific real problems. First, the problem of cross view action recognition is studied. The same action might seem quite different when observed from different viewpoints. Thus, how to use the training samples from a given camera view and perform recognition in another new view is the key point. In this work, reconstructable paths between different views are built to mirror labeled actions from one source view into one another target view for learning an adaptable classifier. The path learning takes advantage of the joint dictionary learning techniques with exploiting hidden information in the seemingly useless samples, making the recognition performance robust and effective. Second, the problem of person re-identification is studied, which tries to match pedestrian images in non-overlapping camera views based on appearance features. In this work, we propose to learn a random kernel forest to discriminatively assign a specific distance metric to each pair of local patches from the two images in matching. The forest is composed by multiple decision trees, which are designed to partition the overall space of local patch-pairs into substantial subspaces, where a simple but effective local metric kernel can be defined to minimize the distance of true matches. Third, the problem of multi-event detection and recognition in smart grid is studied. The signal of multi-event might not be a straightforward combination of some single-event signals because of the correlation among devices. In this work, a concept of ``root-pattern\u27\u27 is proposed that can be extracted from a collection of single-event signals, but also transferable to analyse the constituent components of multi-cascading-event signals based on an over-complete dictionary, which is designed according to the ``root-patterns\u27\u27 with temporal information subtly embedded. The correctness and effectiveness of the proposed approaches have been evaluated by extensive experiments

    A Systematic Mapping Study on Approaches for AI-Supported Security Risk Assessment

    Get PDF
    Effective assessment of cyber risks in the increasingly dynamic threat landscape must be supported by artificial intelligence techniques due to their ability to dynamically scale and adapt. This article provides the state of the art of AI-supported security risk assessment approaches in terms of a systematic mapping study. The overall goal is to obtain an overview of security risk assessment approaches that use AI techniques to identify, estimate, and/or evaluate cyber risks. We carried out the systematic mapping study following standard processes and identified in total 33 relevant primary studies that we included in our mapping study. The results of our study show that on average, the number of papers about AI-supported security risk assessment has been increasing since 2010 with the growth rate of 133% between 2010 and 2020. The risk assessment approaches reported have mainly been used to assess cyber risks related to intrusion detection, malware detection, and industrial systems. The approaches focus mostly on identifying and/or estimating security risks, and primarily make use of Bayesian networks and neural networks as supporting AI methods/techniques.acceptedVersio

    Semi-supervised learning and fairness-aware learning under class imbalance

    Get PDF
    With the advent of Web 2.0 and the rapid technological advances, there is a plethora of data in every field; however, more data does not necessarily imply more information, rather the quality of data (veracity aspect) plays a key role. Data quality is a major issue, since machine learning algorithms are solely based on historical data to derive novel hypotheses. Data may contain noise, outliers, missing values and/or class labels, and skewed data distributions. The latter case, the so-called class-imbalance problem, is quite old and still affects dramatically machine learning algorithms. Class-imbalance causes classification models to learn effectively one particular class (majority) while ignoring other classes (minority). In extend to this issue, machine learning models that are applied in domains of high societal impact have become biased towards groups of people or individuals who are not well represented within the data. Direct and indirect discriminatory behavior is prohibited by international laws; thus, there is an urgency of mitigating discriminatory outcomes from machine learning algorithms. In this thesis, we address the aforementioned issues and propose methods that tackle class imbalance, and mitigate discriminatory outcomes in machine learning algorithms. As part of this thesis, we make the following contributions: • Tackling class-imbalance in semi-supervised learning – The class-imbalance problem is very often encountered in classification. There is a variety of methods that tackle this problem; however, there is a lack of methods that deal with class-imbalance in the semi-supervised learning. We address this problem by employing data augmentation in semi-supervised learning process in order to equalize class distributions. We show that semi-supervised learning coupled with data augmentation methods can overcome class-imbalance propagation and significantly outperform the standard semi-supervised annotation process. • Mitigating unfairness in supervised models – Fairness in supervised learning has received a lot of attention over the last years. A growing body of pre-, in- and postprocessing approaches has been proposed to mitigate algorithmic bias; however, these methods consider error rate as the performance measure of the machine learning algorithm, which causes high error rates on the under-represented class. To deal with this problem, we propose approaches that operate in pre-, in- and post-processing layers while accounting for all classes. Our proposed methods outperform state-of-the-art methods in terms of performance while being able to mitigate unfair outcomes

    AI for IT Operations (AIOps) on Cloud Platforms: Reviews, Opportunities and Challenges

    Full text link
    Artificial Intelligence for IT operations (AIOps) aims to combine the power of AI with the big data generated by IT Operations processes, particularly in cloud infrastructures, to provide actionable insights with the primary goal of maximizing availability. There are a wide variety of problems to address, and multiple use-cases, where AI capabilities can be leveraged to enhance operational efficiency. Here we provide a review of the AIOps vision, trends challenges and opportunities, specifically focusing on the underlying AI techniques. We discuss in depth the key types of data emitted by IT Operations activities, the scale and challenges in analyzing them, and where they can be helpful. We categorize the key AIOps tasks as - incident detection, failure prediction, root cause analysis and automated actions. We discuss the problem formulation for each task, and then present a taxonomy of techniques to solve these problems. We also identify relatively under explored topics, especially those that could significantly benefit from advances in AI literature. We also provide insights into the trends in this field, and what are the key investment opportunities

    Deep Reinforcement Learning for Dialogue Systems with Dynamic User Goals

    Get PDF
    Dialogue systems have recently become a widely used system across the world. Some of the functionality offered includes application user interfacing, social conversation, data interaction, and task completion. Most recently, dialogue systems have been developed to autonomously and intelligently interact with users to complete complex tasks in diverse operational spaces. This kind of dialogue system can interact with users to complete tasks such as making a phone call, ordering items online, searching the internet for a question, and more. These systems are typically created by training a machine learning model with example conversational data. One of the existing problems with training these systems is that they require large amounts of realistic user data, which can be challenging to collect and label in large quantities. Our research focuses on modifications to user simulators that change their mind mid-episode with the goal of training more robust dialogue agents. We do this by taking an existing dialogue system, modifying its user simulator, and observing quantitative and qualitative effects against a set of goals. With these results we demonstrate benefits, drawbacks, and tangential effects of using various rules and algorithms while recreating goal changing behavior

    Modern Data Mining for Software Engineer, A Machine Learning PaaS Review

    Get PDF
    Using data mining methods to produce information from the data has been proven to be valuable for individuals and society. Evolution of technology has made it possible to use complicated data mining methods in different applications and systems to achieve these valuable results. However, there are challenges in data-driven projects which can affect people either directly or indirectly. The vast amount of data is collected and processed frequently to enable the functionality of many modern applications. Cloud-based platforms have been developed to aid in the development and maintenance of data-driven projects. The field of Information Technology (IT) and data-driven projects have become complex, and they require additional attention compared to standard software development. On this thesis, a literature review is conducted to study the existing industry methods and practices, to define the used terms, and describe the relevant data mining process models. We analyze the industry to find out the factors impacting the evolution of tools and platforms, and the roles of project members. Furthermore, a hands-on review is done on typical machine learning Platforms-as-a-Service (PaaS) with an example case, and heuristics are created to aid in choosing a machine learning platform. The results of this thesis provide knowledge and understanding for the software developers and project managers who are part of these data-driven projects without the in-depth knowledge of data science. In this study, we found out that it is necessary to have a valid process model or methodology, precise roles, and versatile tools or platforms when developing data-driven applications. Each of these elements affects other elements in some way. We noticed that traditional data mining process models are insufficient in the modern agile software development. Nevertheless, they can provide valuable insights and understanding about how to handle the data in the correct way. The cloud-based platforms aid in these data-driven projects to enable the development of complicated machine learning projects without the expertise of either a data scientist or a software developer. The platforms are versatile and easy to use. However, developing functionalities and predictive models which the developer does not understand can be seen as bad practice, and cause harm in the future

    Identificação de aplicações de vídeo em canais protegidos com aprendizagem automática

    Get PDF
    As encrypted traffic is becoming a standard and traffic obfuscation techniques become more accessible and common, companies are struggling to enforce their network usage policies and ensure optimal operational network performance. Users are more technologically knowledgeable, being able to circumvent web content filtering tools with the usage of protected tunnels such as VPNs. Consequently, techniques such as DPI, which already were considered outdated due to their impracticality, become even more ineffective. Furthermore, the continuous regulations being established by governments and international unions regarding citizen privacy rights makes network monitoring increasingly challenging. This work presents a scalable and easily deployable network-based framework for application identification in a corporate environment, focusing on video applications. This framework should be effective regardless of the environment and network setup, with the objective of being a useful tool in the network monitoring process. The proposed framework offers a compromise between allowing network supervision and assuring workers’ privacy. The results evaluation indicates that we can identify web services that are running over a protected channel with an accuracy of 95%, using low-level packet information that does not jeopardize sensitive worker data.Com a adoção de tráfego cifrado a tornar-se a norma e a crescente utilização de técnicas de obfuscação de tráfego, as empresas têm cada vez mais dificuldades em aplicar políticas de uso nas suas redes, bem como garantir o seu bom funcionamento. Os utilizadores têm mais conhecimentos tecnológicos, sendo facilmente capazes de contornar ferramentas de filtros de conteúdo online com a utilização de túneis protegidos como VPNs. Consequentemente, técnicas como DPI, que já estão ultrapassadas devido à sua impraticabilidade, tornam-se cada vez mais ineficazes. Além disso, todos os regulamentos que têm vindo a ser estabelecidos por governos e organizações internacionais sobre a privacidade dos cidadãos tornam a tarefa de monitorização de uma rede cada vez mais difícil. Este documento apresenta uma plataforma escalável e facilmente instalável para identificação de aplicações numa rede empresarial, focando-se em aplicações de vídeo. Esta abordagem deve ser eficaz independentemente do contexto e organização da rede, com o objectivo de ser uma ferramenta útil no processo de supervisão de redes. O modelo proposto oferece um compromisso entre a capacidade de supervisionar uma rede e assegurar a privacidade dos trabalhadores. A avaliação de resultados indica que é possível identificar serviços web em ligações estabelecidas sobre canais protegidos com uma precisão geral de 95%, usando informações de baixo-nível dos pacotes que não comprometem informação sensível dos trabalhadores.Mestrado em Engenharia de Computadores e Telemátic

    Drugnet Ireland.

    Get PDF

    From Drugnet Europe. No 92, October–December 2015.

    Get PDF
    corecore