333 research outputs found

    Measuring the mixing of contextual information in the transformer

    Get PDF
    The Transformer architecture aggregates input information through the self-attention mechanism, but there is no clear understanding of how this information is mixed across the entire model. Additionally, recent works have demonstrated that attention weights alone are not enough to describe the flow of information. In this paper, we consider the whole attention block --multi-head attention, residual connection, and layer normalization-- and define a metric to measure token-to-token interactions within each layer. Then, we aggregate layer-wise interpretations to provide input attribution scores for model predictions. Experimentally, we show that our method, ALTI (Aggregation of Layer-wise Token-to-token Interactions), provides more faithful explanations and increased robustness than gradient-based methods.Javier Ferrando and Gerard I. Gállego are supported by the Spanish Ministerio de Ciencia e Innovación through the project PID2019-107579RB-I00 / AEI / 10.13039/501100011033.Peer ReviewedPostprint (published version

    Explaining how transformers use context to build predictions

    Get PDF
    Language Generation Models produce words based on the previous context. Although existing methods offer input attributions as explanations for a model's prediction, it is still unclear how prior words affect the model's decision throughout the layers. In this work, we leverage recent advances in explainability of the Transformer and present a procedure to analyze models for language generation. Using contrastive examples, we compare the alignment of our explanations with evidence of the linguistic phenomena, and show that our method consistently aligns better than gradient-based and perturbation-based baselines. Then, we investigate the role of MLPs inside the Transformer and show that they learn features that help the model predict words that are grammatically acceptable. Lastly, we apply our method to Neural Machine Translation models, and demonstrate that they generate human-like source-target alignments for building predictions.Javier Ferrando, Gerard I. Gállego and Ioannis Tsiamas are supported by the Spanish Ministerio de Ciencia e Innovación through the project PID2019-107579RB-I00 / AEI / 10.13039/501100011033.Peer ReviewedPreprin

    Explaining how transformers use context to build predictions

    Get PDF
    Language Generation Models produce words based on the previous context. Although existing methods offer input attributions as explanations for a model’s prediction, it is still unclear how prior words affect the model’s decision throughout the layers. In this work, we leverage recent advances in explainability of the Transformer and present a procedure to analyze models for language generation. Using contrastive examples, we compare the alignment of our explanations with evidence of the linguistic phenomena, and show that our method consistently aligns better than gradient-based and perturbation-based baselines. Then, we investigate the role of MLPs inside the Transformer and show that they learn features that help the model predict words that are grammatically acceptable. Lastly, we apply our method to Neural Machine Translation models, and demonstrate that they generate human-like source-target alignments for building predictions.Javier Ferrando, Gerard I.Gállego and Ioannis Tsiamas are supported by the Spanish Ministerio de Ciencia e Innovación through the project PID2019-107579RB-I00/AEI /10.13039/501100011033.Peer ReviewedPostprint (published version

    Deep Learning-Based Fault Detection and Isolation in Solar Plants for Highly Dynamic Days

    Get PDF
    ICCAD'22: 2022- 6th International Conference on Control, Automation and Diagnosis, Lisbon, Portugal, July 13-15, 2022Solar plants are exposed to numerous agents that degrade and damage their components. Due to their large size and constant operation, it is not easy to access them constantly to analyze possible failures on-site. It is, therefore, necessary to use techniques that automatically detect faults. In addition, it is crucial to detect the fault and know its location to deal with it as quickly and effectively as possible. This work applies a fault detection and isolation method to parabolic trough collector plants. A characteristic of solar plants is that they are highly dependent on the sun and the existence of clouds throughout the day, so it is not easy to achieve methods that work well when disturbances are too variable and difficult to predict. This work proposes dynamic artificial neural networks (ANNs) that take into account past information and are not so sensitive to the variations of the plant at each moment. With this, three types of failures are distinguished: failures in the optical efficiency of the mirrors, flow rate, and thermal losses in the pipes. Different ANNs have been proposed and compared with a simple feedforward ANN, obtaining an accuracy of 73.35%.European Research Council 10.13039/50110000078

    A deep learning-based strategy for fault detection and isolation in parabolic-trough collectors

    Get PDF
    Solar plants are exposed to the appearance of faults in some of their components, as they are vulnerable to the action of external agents (wind, rain, dust, birds …) and internal defects. However, it is necessary to ensure a satisfactory operation when these factors affect the plant. Fault detection and diagnosis methods are essential to detecting and locating the faults, maintaining efficiency and safety in the plant. This work proposes a methodology for detecting and isolating faults in parabolic-trough plants. It is based on a three-layer methodology composed of a neural network to obtain a preliminary detection and classification between three types of fault, a second stage analyzing the flow rate dynamics, and a third stage defocusing the first collector to analyze thermal losses. The methodology has been applied by simulation to a model of the ACUREX plant, which was located at the Plataforma Solar de Almería. The confusion matrices have been obtained, with accuracies over 80% when using the three layers in a hierarchical structure. By forcing all the three layers, the accuracies exceed 90%.Unión Europea - Horizonte 2020 No 789 05

    On the locality of attention in direct speech translation

    Get PDF
    Transformers have achieved state-of-the-art results across multiple NLP tasks. However, the self-attention mechanism complexity scales quadratically with the sequence length, creating an obstacle for tasks involving long sequences, like in the speech domain. In this paper, we discuss the usefulness of self-attention for Direct Speech Translation. First, we analyze the layer-wise token contributions in the self-attention of the encoder, unveiling local diagonal patterns. To prove that some attention weights are avoidable, we propose to substitute the standard self-attention with a local efficient one, setting the amount of context used based on the results of the analysis. With this approach, our model matches the baseline performance, and improves the efficiency by skipping the computation of those weights that standard attention discards.This work was partially funded by the project ADAVOICE, PID2019-107579RB-I00 / AEI / 10.13039/501100011033, and the UPC INIREC scholarship nº3522.Peer ReviewedPostprint (published version

    Metodología de predicción de viajes en ciudades medias con GIS-T: desagregación máxima a coste mínimo

    Get PDF
    Este artículo describe el diseño de un modelo de asignación de tráfico que predice flujos para cada segmento de una red urbana, con una mayor exactitud que el modelo tradicional de cuatro etapas, conservando además los orígenes y destinos de viaje. Los objetivos de investigación son determinar la intensidad de tráfico en áreas específicas de la red, e identificar los orígenes y destinos de los viajes para predecir cambios en la movilidad urbana. Para lograr estos objetivos, se utilizan bases de datos relacionales y un sistema de información geográfico con los que analizar la oferta de transporte (GIS-T). Este entorno de trabajo se completa con datos de entrevistas a hogares y encuestas de intercepción, para identificar los patrones de movilidad en la ciudad detamaño medio de Mérida, España. Estos programas de aplicación pueden detectar cambios en los patrones de movilidad y localizar áreas problemáticas. Los resultados obtenidos demuestran un alto grado de ajuste entre las predicciones y las observaciones de los viajes. Además, los niveles de desagregación en cada sección del punto medio de la red combinada con el ajuste de datos de población mediante pirámides de población, evitan sesgos en las muestras de viaje.This paper describes the design of a traffic assignment model that predicts flows for each segment of an urban network with a higher resolution than a traditional four stage model, retaining the origins and destinations of travel. The research objectives are to determine the traffic intensity in specific areas of the network, and then to identify the origins and destinations of travel to predict changes in urban mobility. To achieve these objectives, relational databases and the geographic information system for transport environment are used (GIS-T), together with data from household and intercept interviews, to identify mobility patterns in the middle-sized city of Mérida, Spain. These application programs can detect changes in the mobility patterns and can locate problem areas. The results obtained show a high degree of adjustment between the predictions and the actual observations of the trips. In addition, the disaggregation levels in each midpoint section of the network combined with population data adjustment using population pyramids avoid bias in the travel samples.Trabajo patrocinado por: Gobierno de Extremadura y Desarrollo Regional Europeo. Ayuda GR 10024peerReviewe

    Prédictions dans les villes moyennes de voyage avec environnement SIG pour le transport: ventilation maximale avec le minimum de coût

    Get PDF
    This paper describes the design of a traffic assignment model that predicts flows for each segment of an urban network with a higher resolution than a traditional four stage model, retaining the origins and destinations of travel. The research objectives are to determine the traffic intensity in specific areas of the network, and then to identify the origins and destinations of travel to predict changes in urban mobility. To achieve these objectives, relational databases and the geographic information system for transport environment are used (GIS-T), together with data from household and intercept interviews, to identify mobility patterns in the middle-sized city of Mérida, Spain. These application programs can detect changes in the mobility patterns and can locate problem areas. The results obtained show a high degree of adjustment between the predictions and the actual observations of the trips. In addition, the disaggregation levels in each midpoint section of the network combined with population data adjustment using population pyramids avoid bias in the travel samples.Este artículo describe el diseño de un modelo de asignación de tráfico que predice flujos para cada segmento de una red urbana, con una mayor exactitud que el modelo tradicional de cuatro etapas, conservando además los orígenes y destinos de viaje. Los objetivos de investigación son determinar la intensidad de tráfico en áreas específicas de la red, e identificar los orígenes y destinos de los viajes para predecir cambios en la movilidad urbana. Para lograr estos objetivos, se utilizan bases de datos relacionales y un sistema de información geográfico con los que analizar la oferta de transporte (GIS-T). Este entorno de trabajo se completa con datos de entrevistas a hogares y encuestas de intercepción, para identificar los patrones de movilidad en la ciudad de tamaño medio de Mérida, España. Estos programas de aplicación pueden detectar cambios en los patrones de movilidad y localizar áreas problemáticas. Los resultados obtenidos demuestran un alto grado de ajuste entre las predicciones y las observaciones de los viajes. Además, los niveles de desagregación en cada sección del punto medio de la red combinada con el ajuste de datos de población mediante pirámides de población, evitan sesgos en las muestras de viaje.Cet article décrit la conception d’un modèle d’affectation de dynamiques de la circulation qui prédit les flux ventilées pour chaque segment d’un réseau urbain, tout en conservant les origines et les destinations de voyage. Les objectifs de recherche sont de déterminer l’intensité du trafic dans des zones spécifiques du réseau, puis d’identifier les origines et les destinations de voyage pour prévoir les changements dans la mobilité urbaine. Pour atteindre ces objectifs, bases de données relationnelles et le système d’information géographique pour l’environnement de transport sont utilisés (GIS-T), ainsi que les données tirées des entrevues de ménage et ordonnée à l’origine, pour identifier les modèles de mobilité dans la taille moyenne ville de Mérida, en Espagne. Ces programmes d’application peuvent détecter des changements dans les schémas de mobilité et peuvent localiser les zones à problèmes. Les résultats obtenus montrent un haut degré d’ajustement entre les prédictions et les observations réelles de l’accord sur les ADPIC. En outre, les niveaux de catégorisation dans chaque section du point médian du réseau combiné avec ajustement de données de population à l’aide de pyramides de population permet d’éviter les biais dans les échantillons de voyage

    Hepatoid adenocarcinoma of the stomach – a different histology for not so different gastric adenocarcinoma: a case report

    Get PDF
    Hepatoid adenocarcinoma is an extrahepatic tumor characterized by morphological similarities to hepatocellular carcinoma. Hepatoid adenocarcinoma of the stomach is a cancer with an extremely poor prognosis with few cases reported. Here, we describe a 75-year-old Spanish man referred to our hospital with a history of abdominal pain, general fatigue, anorexia and sickness. Initial study revealed anemia, and computed tomography scan and abdominal ultrasonography showed multiple metastases to the liver with hepatocellular carcinoma characteristics in a liver with no cirrhotic change. Further study included a serum level of alpha-fetoprotein (AFP), which resulted markedly elevated, and a conclusive esophagogastroduodenoscopy describing an elevated tumour growing through the cardia and gastroesophageal junction with foci of necrosis and haemorrhage. Gastric biopsies of the tumor revealed poorly differenciated adenocarcinoma, with hepatoid differentiation. After a diagnosis of AFP-producing hepatoid adenocarcinoma of the stomach with multiple liver metastases was made, pallitive total gastrectomy, without liver resection, was performed. Patient recovered well after surgery, and entered into a palliative systemich chemotherapy protocol. Although this illness is recognized as having poor prognosis, the patient remains alive 8 months after the operation. Accurate diagnosis of hepatoid adenocarcinoma of the stomach is important, and should be suspected under certain circumstances. We describe this rare case of hepatoid adenocarcinoma of the stomach, and review the literature concerning the clinicopathological aspects

    GAP, an aequorin-based fluorescent indicator for imaging Ca2+ in organelles

    Get PDF
    Producción CientíficaGenetically encoded calcium indicators allow monitoring subcellular Ca2+ signals inside organelles. Most genetically encoded calcium indicators are fusions of endogenous calcium-binding proteins whose functionality in vivo may be perturbed by competition with cellular partners.We describe here a novel family of fluorescent Ca2+ sensors based on the fusion of two Aequorea victoria proteins, GFP and apo-aequorin (GAP). GAP exhibited a unique combination of features: dual-excitation ratiometric imaging, high dynamic range, good signal-to-noise ratio, insensitivity to pH and Mg2+, tunable Ca2+ affinity, uncomplicated calibration, and targetability to five distinct organelles. Moreover, transgenic mice for endoplasmic reticulum-targeted GAP exhibited a robust long-term expression that correlated well with its reproducible performance in various neural tissues. This biosensor fills a gap in the actual repertoire of Ca2+ indicators for organelles and becomes a valuable tool for in vivo Ca2+ imaging applications
    • …
    corecore