7,682 research outputs found

    Image classification over unknown and anomalous domains

    Get PDF
    A longstanding goal in computer vision research is to develop methods that are simultaneously applicable to a broad range of prediction problems. In contrast to this, models often perform best when they are specialized to some task or data type. This thesis investigates the challenges of learning models that generalize well over multiple unknown or anomalous modes and domains in data, and presents new solutions for learning robustly in this setting. Initial investigations focus on normalization for distributions that contain multiple sources (e.g. images in different styles like cartoons or photos). Experiments demonstrate the extent to which existing modules, batch normalization in particular, struggle with such heterogeneous data, and a new solution is proposed that can better handle data from multiple visual modes, using differing sample statistics for each. While ideas to counter the overspecialization of models have been formulated in sub-disciplines of transfer learning, e.g. multi-domain and multi-task learning, these usually rely on the existence of meta information, such as task or domain labels. Relaxing this assumption gives rise to a new transfer learning setting, called latent domain learning in this thesis, in which training and inference are carried out over data from multiple visual domains, without domain-level annotations. Customized solutions are required for this, as the performance of standard models degrades: a new data augmentation technique that interpolates between latent domains in an unsupervised way is presented, alongside a dedicated module that sparsely accounts for hidden domains in data, without requiring domain labels to do so. In addition, the thesis studies the problem of classifying previously unseen or anomalous modes in data, a fundamental problem in one-class learning, and anomaly detection in particular. While recent ideas have been focused on developing self-supervised solutions for the one-class setting, in this thesis new methods based on transfer learning are formulated. Extensive experimental evidence demonstrates that a transfer-based perspective benefits new problems that have recently been proposed in anomaly detection literature, in particular challenging semantic detection tasks

    The making of the NEAM Tsunami Hazard Model 2018 (NEAMTHM18)

    Get PDF
    The NEAM Tsunami Hazard Model 2018 (NEAMTHM18) is a probabilistic hazard model for tsunamis generated by earthquakes. It covers the coastlines of the North-eastern Atlantic, the Mediterranean, and connected seas (NEAM). NEAMTHM18 was designed as a three-phase project. The first two phases were dedicated to the model development and hazard calculations, following a formalized decision-making process based on a multiple-expert protocol. The third phase was dedicated to documentation and dissemination. The hazard assessment workflow was structured in Steps and Levels. There are four Steps: Step-1) probabilistic earthquake model; Step-2) tsunami generation and modeling in deep water; Step-3) shoaling and inundation; Step-4) hazard aggregation and uncertainty quantification. Each Step includes a different number of Levels. Level-0 always describes the input data; the other Levels describe the intermediate results needed to proceed from one Step to another. Alternative datasets and models were considered in the implementation. The epistemic hazard uncertainty was quantified through an ensemble modeling technique accounting for alternative models' weights and yielding a distribution of hazard curves represented by the mean and various percentiles. Hazard curves were calculated at 2,343 Points of Interest (POI) distributed at an average spacing of ∼20 km. Precalculated probability maps for five maximum inundation heights (MIH) and hazard intensity maps for five average return periods (ARP) were produced from hazard curves. In the entire NEAM Region, MIHs of several meters are rare but not impossible. Considering a 2% probability of exceedance in 50 years (ARP≈2,475 years), the POIs with MIH >5 m are fewer than 1% and are all in the Mediterranean on Libya, Egypt, Cyprus, and Greece coasts. In the North-East Atlantic, POIs with MIH >3 m are on the coasts of Mauritania and Gulf of Cadiz. Overall, 30% of the POIs have MIH >1 m. NEAMTHM18 results and documentation are available through the TSUMAPS-NEAM project website (http://www.tsumaps-neam.eu/), featuring an interactive web mapper. Although the NEAMTHM18 cannot substitute in-depth analyses at local scales, it represents the first action to start local and more detailed hazard and risk assessments and contributes to designing evacuation maps for tsunami early warning

    Robustness against adversarial attacks on deep neural networks

    Get PDF
    While deep neural networks have been successfully applied in several different domains, they exhibit vulnerabilities to artificially-crafted perturbations in data. Moreover, these perturbations have been shown to be transferable across different networks where the same perturbations can be transferred between different models. In response to this problem, many robust learning approaches have emerged. Adversarial training is regarded as a mainstream approach to enhance the robustness of deep neural networks with respect to norm-constrained perturbations. However, adversarial training requires a large number of perturbed examples (e.g., over 100,000 examples are required for MNIST dataset) trained on the deep neural networks before robustness can be considerably enhanced. This is problematic due to the large computational cost of obtaining attacks. Developing computationally effective approaches while retaining robustness against norm-constrained perturbations remains a challenge in the literature. In this research we present two novel robust training algorithms based on Monte-Carlo Tree Search (MCTS) [1] to enhance robustness under norm-constrained perturbations [2, 3]. The first algorithm searches potential candidates with Scale Invariant Feature Transform method and makes decisions with Monte-Carlo Tree Search method [2]. The second algorithm adopts Decision Tree Search method (DTS) to accelerate the search process while maintaining efficiency [3]. Our overarching objective is to provide computationally effective approaches that can be deployed to train deep neural networks robust against perturbations in data. We illustrate the robustness with these algorithms by studying the resistances to adversarial examples obtained in the context of the MNIST and CIFAR10 datasets. For MNIST, the results showed an average training efforts saving of 21.1\% when compared to Projected Gradient Descent (PGD) and 28.3\% when compared to Fast Gradient Sign Methods (FGSM). For CIFAR10, we obtained an average improvement of efficiency of 9.8\% compared to PGD and 13.8\% compared to FGSM. The results suggest that these two methods here introduced are not only robust to norm-constrained perturbations but also efficient during training. In regards to transferability of defences, our experiments [4] reveal that across different network architectures, across a variety of attack methods from white-box to black-box and across various datasets including MNIST and CIFAR10, our algorithms outperform other state-of-the-art methods, e.g., PGD and FGSM. Furthermore, the derived attacks and robust models obtained on our framework are reusable in the sense that the same norm-constrained perturbations can facilitate robust training across different networks. Lastly, we investigate the robustness of intra-technique and cross-technique transferability and the relations with different impact factors from adversarial strength to network capacity. The results suggest that known attacks on the resulting models are less transferable than those models trained by other state-of-the-art attack algorithms. Our results suggest that exploiting these tree search frameworks can result in significant improvements in the robustness of deep neural networks while saving computational cost on robust training. This paves the way for several future directions, both algorithmic and theoretical, as well as numerous applications to establish the robustness of deep neural networks with increasing trust and safety.Open Acces

    Machine learning for managing structured and semi-structured data

    Get PDF
    As the digitalization of private, commercial, and public sectors advances rapidly, an increasing amount of data is becoming available. In order to gain insights or knowledge from these enormous amounts of raw data, a deep analysis is essential. The immense volume requires highly automated processes with minimal manual interaction. In recent years, machine learning methods have taken on a central role in this task. In addition to the individual data points, their interrelationships often play a decisive role, e.g. whether two patients are related to each other or whether they are treated by the same physician. Hence, relational learning is an important branch of research, which studies how to harness this explicitly available structural information between different data points. Recently, graph neural networks have gained importance. These can be considered an extension of convolutional neural networks from regular grids to general (irregular) graphs. Knowledge graphs play an essential role in representing facts about entities in a machine-readable way. While great efforts are made to store as many facts as possible in these graphs, they often remain incomplete, i.e., true facts are missing. Manual verification and expansion of the graphs is becoming increasingly difficult due to the large volume of data and must therefore be assisted or substituted by automated procedures which predict missing facts. The field of knowledge graph completion can be roughly divided into two categories: Link Prediction and Entity Alignment. In Link Prediction, machine learning models are trained to predict unknown facts between entities based on the known facts. Entity Alignment aims at identifying shared entities between graphs in order to link several such knowledge graphs based on some provided seed alignment pairs. In this thesis, we present important advances in the field of knowledge graph completion. For Entity Alignment, we show how to reduce the number of required seed alignments while maintaining performance by novel active learning techniques. We also discuss the power of textual features and show that graph-neural-network-based methods have difficulties with noisy alignment data. For Link Prediction, we demonstrate how to improve the prediction for unknown entities at training time by exploiting additional metadata on individual statements, often available in modern graphs. Supported with results from a large-scale experimental study, we present an analysis of the effect of individual components of machine learning models, e.g., the interaction function or loss criterion, on the task of link prediction. We also introduce a software library that simplifies the implementation and study of such components and makes them accessible to a wide research community, ranging from relational learning researchers to applied fields, such as life sciences. Finally, we propose a novel metric for evaluating ranking results, as used for both completion tasks. It allows for easier interpretation and comparison, especially in cases with different numbers of ranking candidates, as encountered in the de-facto standard evaluation protocols for both tasks.Mit der rasant fortschreitenden Digitalisierung des privaten, kommerziellen und öffentlichen Sektors werden immer größere Datenmengen verfügbar. Um aus diesen enormen Mengen an Rohdaten Erkenntnisse oder Wissen zu gewinnen, ist eine tiefgehende Analyse unerlässlich. Das immense Volumen erfordert hochautomatisierte Prozesse mit minimaler manueller Interaktion. In den letzten Jahren haben Methoden des maschinellen Lernens eine zentrale Rolle bei dieser Aufgabe eingenommen. Neben den einzelnen Datenpunkten spielen oft auch deren Zusammenhänge eine entscheidende Rolle, z.B. ob zwei Patienten miteinander verwandt sind oder ob sie vom selben Arzt behandelt werden. Daher ist das relationale Lernen ein wichtiger Forschungszweig, der untersucht, wie diese explizit verfügbaren strukturellen Informationen zwischen verschiedenen Datenpunkten nutzbar gemacht werden können. In letzter Zeit haben Graph Neural Networks an Bedeutung gewonnen. Diese können als eine Erweiterung von CNNs von regelmäßigen Gittern auf allgemeine (unregelmäßige) Graphen betrachtet werden. Wissensgraphen spielen eine wesentliche Rolle bei der Darstellung von Fakten über Entitäten in maschinenlesbaren Form. Obwohl große Anstrengungen unternommen werden, so viele Fakten wie möglich in diesen Graphen zu speichern, bleiben sie oft unvollständig, d. h. es fehlen Fakten. Die manuelle Überprüfung und Erweiterung der Graphen wird aufgrund der großen Datenmengen immer schwieriger und muss daher durch automatisierte Verfahren unterstützt oder ersetzt werden, die fehlende Fakten vorhersagen. Das Gebiet der Wissensgraphenvervollständigung lässt sich grob in zwei Kategorien einteilen: Link Prediction und Entity Alignment. Bei der Link Prediction werden maschinelle Lernmodelle trainiert, um unbekannte Fakten zwischen Entitäten auf der Grundlage der bekannten Fakten vorherzusagen. Entity Alignment zielt darauf ab, gemeinsame Entitäten zwischen Graphen zu identifizieren, um mehrere solcher Wissensgraphen auf der Grundlage einiger vorgegebener Paare zu verknüpfen. In dieser Arbeit stellen wir wichtige Fortschritte auf dem Gebiet der Vervollständigung von Wissensgraphen vor. Für das Entity Alignment zeigen wir, wie die Anzahl der benötigten Paare reduziert werden kann, während die Leistung durch neuartige aktive Lerntechniken erhalten bleibt. Wir erörtern auch die Leistungsfähigkeit von Textmerkmalen und zeigen, dass auf Graph-Neural-Networks basierende Methoden Schwierigkeiten mit verrauschten Paar-Daten haben. Für die Link Prediction demonstrieren wir, wie die Vorhersage für unbekannte Entitäten zur Trainingszeit verbessert werden kann, indem zusätzliche Metadaten zu einzelnen Aussagen genutzt werden, die oft in modernen Graphen verfügbar sind. Gestützt auf Ergebnisse einer groß angelegten experimentellen Studie präsentieren wir eine Analyse der Auswirkungen einzelner Komponenten von Modellen des maschinellen Lernens, z. B. der Interaktionsfunktion oder des Verlustkriteriums, auf die Aufgabe der Link Prediction. Außerdem stellen wir eine Softwarebibliothek vor, die die Implementierung und Untersuchung solcher Komponenten vereinfacht und sie einer breiten Forschungsgemeinschaft zugänglich macht, die von Forschern im Bereich des relationalen Lernens bis hin zu angewandten Bereichen wie den Biowissenschaften reicht. Schließlich schlagen wir eine neuartige Metrik für die Bewertung von Ranking-Ergebnissen vor, wie sie für beide Aufgaben verwendet wird. Sie ermöglicht eine einfachere Interpretation und einen leichteren Vergleich, insbesondere in Fällen mit einer unterschiedlichen Anzahl von Kandidaten, wie sie in den de-facto Standardbewertungsprotokollen für beide Aufgaben vorkommen

    Freelance subtitlers in a subtitle production network in the OTT industry in Thailand: a longitudinal study

    Get PDF
    The present study sets out to investigate a subtitle production network in the over-the-top (OTT) industry in Thailand through the perspective of freelance subtitlers. A qualitative longitudinal research design was adopted to gain insights into (1) the way the work practices of freelance subtitlers are influenced by both human and non-human actors in the network, (2) the evolution of the network, and (3) how the freelance subtitlers’ perception of quality is influenced by changes occurring in the network. Eleven subtitlers were interviewed every six months over a period of two years, contributing to over 60 hours of interview data. The data analysis was informed by selected concepts from Actor-Network Theory (ANT) (Law 1992, 2009; Latour 1996, 2005; Mol 2010), and complemented by the three-dimensional quality model proposed by Abdallah (2016, 2017). Reflexive thematic analysis (Braun and Clarke 2019a, 2020b) was used to generate themes and sub-themes which address the research questions and tell compelling stories about the actor-network. It was found that from July 2017 to September 2019, the subtitle production network, which was sustained by complex interrelationships between actors, underwent a number of changes. The changes affected the work practices of freelance subtitlers in a more negative than positive way, demonstrating their precarious position in an industry that has widely adopted the vendor model (Moorkens 2017). Moreover, as perceived by the research participants, under increasingly undesirable working conditions, it became more challenging to maintain a quality process and to produce quality subtitles. Finally, translation technology and tools, including machine translation, were found to be key non-human actors that catalyse the changes in the network under study

    AIUCD 2022 - Proceedings

    Get PDF
    L’undicesima edizione del Convegno Nazionale dell’AIUCD-Associazione di Informatica Umanistica ha per titolo Culture digitali. Intersezioni: filosofia, arti, media. Nel titolo è presente, in maniera esplicita, la richiesta di una riflessione, metodologica e teorica, sull’interrelazione tra tecnologie digitali, scienze dell’informazione, discipline filosofiche, mondo delle arti e cultural studies

    Graphical scaffolding for the learning of data wrangling APIs

    Get PDF
    In order for students across the sciences to avail themselves of modern data streams, they must first know how to wrangle data: how to reshape ill-organised, tabular data into another format, and how to do this programmatically, in languages such as Python and R. Despite the cross-departmental demand and the ubiquity of data wrangling in analytical workflows, the research on how to optimise the instruction of it has been minimal. Although data wrangling as a programming domain presents distinctive challenges - characterised by on-the-fly syntax lookup and code example integration - it also presents opportunities. One such opportunity is how tabular data structures are easily visualised. To leverage the inherent visualisability of data wrangling, this dissertation evaluates three types of graphics that could be employed as scaffolding for novices: subgoal graphics, thumbnail graphics, and parameter graphics. Using a specially built e-learning platform, this dissertation documents a multi-institutional, randomised, and controlled experiment that investigates the pedagogical effects of these. Our results indicate that the graphics are well-received, that subgoal graphics boost the completion rate, and that thumbnail graphics improve navigability within a command menu. We also obtained several non-significant results, and indications that parameter graphics are counter-productive. We will discuss these findings in the context of general scaffolding dilemmas, and how they fit into a wider research programme on data wrangling instruction

    The expansion of Brachypodium rupestre in western Pyrenees: fungal endophytes and loss of ecosystem services

    Get PDF
    En esta tesis se plantean dos líneas de investigación que han quedado recogidas en los tres trabajos científicos que la conforman: • Capítulo 1) La primera línea de investigación desarrolla un modo de cuantificar económicamente la pérdida del servicio ecosistémico de aprovisionamiento ligado a la expansión de B. rupestre, con el objetivo de sensibilizar a los sectores implicados y fomentar acciones medioambientales dirigidas a promover la conservación de los pastos de montaña, ecosistemas ambiental y económicamente valiosos. Para ello, se realizó una adaptación del método de valoración económica por sustitución y se llevaron a cabo inventarios florísticos en cubiertas de alta y baja diversidad florística. El registro de estos datos nos permitió generar unos valores pastorales transformables en cantidad de forraje disponible para los animales, tras un detallado cálculo de la superficie viable de ser pastada (200,05 ha tras excluir las zonas con grandes pendientes). Con censos de animales y con información detallada y exhaustiva del consumo de energía que tiene cada tipo de animal que pasta las estivas del valle (ovejas, vacas y caballos) se determinaron las raciones de alimento necesarias en una temporada de pastoreo estival (50007 raciones) y su coste monetario (21146€; 107€/ha). De este modo, se estimó el coste que supone la pérdida del valor de aprovisionamiento debido a la expansión de B. rupestre y que va ligado en este caso a la pérdida de riqueza y diversidad florística de las comunidades vegetales que lo constituyen. • Capítulo 2 y 3) La segunda línea de investigación engloba dos trabajos en los que se caracteriza la desconocida comunidad de hongos endófitos de individuos de B. rupestre desarrollándose en comunidades de alta y baja diversidad florística. El estudio de los microorganismos en su ambiente natural es una disciplina relativamente nueva, dado que su interés se ha centrado tradicionalmente en las enfermedades que afectan a los humanos, los animales o las plantas. Sin embargo, los grandes avances tecnológicos y científicos de las últimas décadas nos están permitiendo conocer su altísima diversidad y las complejas redes ecológicas en las que están involucrados. El conocimiento sobre los hongos endófitos, que crecen en el interior de casi todas las plantas sin desarrollar enfermedades, está demostrando numerosas ventajas adaptativas que estos pueden llegar a conferir a su huésped ante numerosas situaciones de estrés (salinidad, herbivorismo, patógenos, etc.).This thesis encompasses two research lines comprised in three scientific papers: • Chapter 1) First chapter quantifies economically the loss of the provisioning service related to the B. rupestre expansion, with the aim of raising awareness among stakeholders and promote environmental actions for grassland restauration. For this purpose, we adapted the substitution economic approach, that we based on field floristic surveys done in high-diversity and low-diversity grasslands. The data collected allowed the estimation of the pastoral values, that were related to energy values and eventually to animal feed rations (50007 rations). We estimated the available surface for grazing (200.05 ha excluding steep slopes) and we compiled summer livestock censuses for determining the energy needs of each type of animal (sheep, cows and horses). Eventually, we estimated the economic cost of the loss of the provisioning service related to the B. rupestre expansion (21146€; 107€/ha). • Chapter 2 and 3) The second line of research encompasses two papers that characterize the fungal endophytic community of B. rupestre from plants developing in communities of high and low- floristic diversity. The study of microorganisms in their natural environment is a new discipline, considering that the research has usually focused on human, animal or plant diseases. The technological and scientific advances in the last decades have broadened our knowledge on the high diversity and the complex ecological networks in which they are involved. In particular, the study of fungal endophytes, which grow inside plants without developing diseases, is demonstrating the numerous adaptive advantages they confer to their host in different stressing situations (salinity, herbivory, pathogens, etc.).Ayudas de nueva solicitud para la Formación de Personal Investigador en la Universidad Pública de Navarra, convocatoria 2017. Ayudas de movilidad de doctorado 2021 para la realización de una estancia internacional en la Universidad de Camerino (Italia). Interreg Sudoe Programme, European Regional Development Fund, Open2preserve Project 108 (SOE2/P5/E0804). Libera 2017, SEO y Ecoembes - Apadrinamiento de espacios naturales. La degradación de los ecosistemas pascícolas de la Selva de Irati: Comportamiento invasivo de especies endémicas bajo condiciones de cambio global. Libera 2018, SEO y Ecoembes-Apadrinamiento de espacios naturales. La degradación de los ecosistemas pascícolas: ZEC Roncesvalles-Selva de Irati. Proyecto de investigación en centros escolares (Gobierno de Navarra, 2018). Formando futuras investigadoras: introduciendo la ciencia ecológica y ambiental al mundo escolar.Programa de Doctorado en Agrobiología Ambiental (RD 99/2011)Ingurumen Agrobiologiako Doktoretza Programa (ED 99/2011

    The new age of fear: an analysis of crisis framing by right-wing populist parties in Greece and France

    Get PDF
    From the 2009 Eurozone economic downturn, to the 2015 mass movement of forcibly displaced migrants and the current COVID-19 pandemic, crises have seemingly become a ‘new normal’ feature of European politics. During this decade, rolling crises generated a wave of public discontent that damaged the legitimacy of national governments and the European Union and heralded a renaissance of populism. The central message of populist parties, which helped them rise in popularity or enter parliament for the first time, is simple but very effective: democratic representation has been undermined by national and global elites. This has provoked a wealth of studies seeking to explain the rise or breakthrough of populist fringe parties, without adequate consideration of how crises transform, not only the demand side, but also the supply of populist arguments, which has received scarce attention. This thesis seeks to address this imbalance by synthesising insights from the crisis framing literature, which facilitates an understanding and operationalisation of populism as a style of discourse. To assess how far-right parties employ this discourse, and the implications of this for their electoral prospects, a comparative case-study design is employed, exploring the discourse of parties, the National Rally (NR) in France and Golden Dawn (GD) in Greece. Their ideologically similar profile but differential electoral performance, allows for a more nuanced analysis of their respective framing strategies. The thesis examines the discourse of the two parties MPs on month by month basis over a four year period, 2012-2015 for GD and 2012-2013 and 2016-2017 for NR, via the use of the NVivo software. Their respective discourses are quantified and broken down into four key areas associated with Foreign Policy, the Economy, the Political System and Society, analysing the content, frequency and salience of key crisis frames. Discourse analysis of excerpts adds a qualitative element to the analysis that showcases the substantial differences between the two case studies. The analysis demonstrates that references to ‘the people’ and anti-elitism were the centrepieces of each case study’s discourse with strong nativist and nationalist elements. The two parties were extremely similar in the diagnostic stage of their framing and the way which they attribute blame for the crises. However, their discursive strategies diverge regarding their proposed solutions to the crises. Golden Dawn remained a single issue party in terms of discourse, since it never presented a comprehensive plan for ending the crises. As a result, Golden Dawn’s discourse remained one-dimensional throughout its brief period of success, being centred solely on attributing blame and attacking its political opponents and the European Union. On the other hand, National Rally’s framing was more elaborate and ambitious both in terms of the variety of issues raised and, especially, the proposed solutions if advocated. This, it is argued, contributed to the evolution of RN into a mainstream competitor that is no longer dependent on a niche part of the electoral market, while the inability of GD to develop equally successful crisis frames offers a unique understanding as to why the party failed electorally and was unable to enter Parliament in the 2019 elections. The overall analysis produces a rich framework that maps out the key elements of populist crisis discourse by far-right parties, which has implications for electoral politics and for our understanding of populism, more broadly
    corecore