511 research outputs found

    Using Nature to Protect Nature: How Environmental Arts Sheds Light on Environmental Issues

    Get PDF

    Sublinear algorithms for Earth Mover's Distance

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Includes bibliographical references (p. 14-15).We study the problem of estimating the Earth Mover's Distance (EMD) between probability distributions when given access only to samples. We give closeness testers and additive-error estimators over domains in [0, [delta]]d, with sample complexities independent of domain size - permitting the testability even of continuous distributions over infinite domains. Instead, our algorithms depend on other parameters, such as the diameter of the domain space, which may be significantly smaller. We also prove lower bounds showing our testers to be optimal in their dependence on these parameters. Additionally, we consider whether natural classes of distributions exist for which there are algorithms with better dependence on the dimension, and show that for highly clusterable data, this is indeed the case. Lastly, we consider a variant of the EMD, defined over tree metrics instead of the usual L₁ metric, and give optimal algorithms.by Khanh Do Ba.S.M

    Apprentissage discriminant des modèles continus en traduction automatique

    Get PDF
    Over the past few years, neural network (NN) architectures have been successfully applied to many Natural Language Processing (NLP) applications, such as Automatic Speech Recognition (ASR) and Statistical Machine Translation (SMT).For the language modeling task, these models consider linguistic units (i.e words and phrases) through their projections into a continuous (multi-dimensional) space, and the estimated distribution is a function of these projections. Also qualified continuous-space models (CSMs), their peculiarity hence lies in this exploitation of a continuous representation that can be seen as an attempt to address the sparsity issue of the conventional discrete models. In the context of SMT, these echniques have been applied on neural network-based language models (NNLMs) included in SMT systems, and oncontinuous-space translation models (CSTMs). These models have led to significant and consistent gains in the SMT performance, but are also considered as very expensive in training and inference, especially for systems involving large vocabularies. To overcome this issue, Structured Output Layer (SOUL) and Noise Contrastive Estimation (NCE) have been proposed; the former modifies the standard structure on vocabulary words, while the latter approximates the maximum-likelihood estimation (MLE) by a sampling method. All these approaches share the same estimation criterion which is the MLE ; however using this procedure results in an inconsistency between theobjective function defined for parameter stimation and the way models are used in the SMT application. The work presented in this dissertation aims to design new performance-oriented and global training procedures for CSMs to overcome these issues. The main contributions lie in the investigation and evaluation of efficient training methods for (large-vocabulary) CSMs which aim~:(a) to reduce the total training cost, and (b) to improve the efficiency of these models when used within the SMT application. On the one hand, the training and inference cost can be reduced (using the SOUL structure or the NCE algorithm), or by reducing the number of iterations via a faster convergence. This thesis provides an empirical analysis of these solutions on different large-scale SMT tasks. On the other hand, we propose a discriminative training framework which optimizes the performance of the whole system containing the CSM as a component model. The experimental results show that this framework is efficient to both train and adapt CSM within SMT systems, opening promising research perspectives.Durant ces dernières années, les architectures de réseaux de neurones (RN) ont été appliquées avec succès à de nombreuses applications en Traitement Automatique de Langues (TAL), comme par exemple en Reconnaissance Automatique de la Parole (RAP) ainsi qu'en Traduction Automatique (TA).Pour la tâche de modélisation statique de la langue, ces modèles considèrent les unités linguistiques (c'est-à-dire des mots et des segments) à travers leurs projections dans un espace continu (multi-dimensionnel), et la distribution de probabilité à estimer est une fonction de ces projections.Ainsi connus sous le nom de "modèles continus" (MC), la particularité de ces derniers se trouve dans l'exploitation de la représentation continue qui peut être considérée comme une solution au problème de données creuses rencontré lors de l'utilisation des modèles discrets conventionnels.Dans le cadre de la TA, ces techniques ont été appliquées dans les modèles de langue neuronaux (MLN) utilisés dans les systèmes de TA, et dans les modèles continus de traduction (MCT).L'utilisation de ces modèles se sont traduit par d'importantes et significatives améliorations des performances des systèmes de TA. Ils sont néanmoins très coûteux lors des phrases d'apprentissage et d'inférence, notamment pour les systèmes ayant un grand vocabulaire.Afin de surmonter ce problème, l'architecture SOUL (pour "Structured Output Layer" en anglais) et l'algorithme NCE (pour "Noise Contrastive Estimation", ou l'estimation contrastive bruitée) ont été proposés: le premier modifie la structure standard de la couche de sortie, alors que le second cherche à approximer l'estimation du maximum de vraisemblance (MV) par une méthode d’échantillonnage.Toutes ces approches partagent le même critère d'estimation qui est la log-vraisemblance; pourtant son utilisation mène à une incohérence entre la fonction objectif définie pour l'estimation des modèles, et la manière dont ces modèles seront utilisés dans les systèmes de TA.Cette dissertation vise à concevoir de nouvelles procédures d'entraînement des MC, afin de surmonter ces problèmes.Les contributions principales se trouvent dans l'investigation et l'évaluation des méthodes d'entraînement efficaces pour MC qui visent à: (i) réduire le temps total de l'entraînement, et (ii) améliorer l'efficacité de ces modèles lors de leur utilisation dans les systèmes de TA.D'un côté, le coût d'entraînement et d'inférence peut être réduit (en utilisant l'architecture SOUL ou l'algorithme NCE), ou la convergence peut être accélérée.La dissertation présente une analyse empirique de ces approches pour des tâches de traduction automatique à grande échelle.D'un autre côté, nous proposons un cadre d'apprentissage discriminant qui optimise la performance du système entier ayant incorporé un modèle continu.Les résultats expérimentaux montrent que ce cadre d'entraînement est efficace pour l'apprentissage ainsi que pour l'adaptation des MC au sein des systèmes de TA, ce qui ouvre de nouvelles perspectives prometteuses

    Molecular simulations of the bulk-heterojunction morphology in organic solar cells

    Get PDF
    In this Thesis, we aim to elucidate clear connections between the chemical functionality and molecular morphologies of a number of high-performing or benchmark π-conjugated materials used in OSCs. We proceed to link these structural features to electronic properties that are important to solar cell performance. This Thesis is organized into three themes, in each of which we investigate a particular component of the chemical functionality of a specific π-conjugated material and its effects on thin-film molecular packing: (i) Fluorine substitution in a polymer donor (Chapter 3) and hole-transport molecular crystal (Chapter 6); (ii) Electron-withdrawing group and alkyl group substitution in a nonfullerene acceptor (Chapter 4); (iii) Modification of the core π-conjugated motif in a nonfullerene acceptor (Chapter 5). The results from studying these specific systems showcase the utility of computer simulations, which when used in tandem with experiment can build a molecular understanding of the BHJ morphology for OSC applications. While the parameter space of the materials studied in this Thesis remains limited, it does provide a rigorous starting point to developing a more comprehensive understanding of the structure-morphology-performance relationships in π-conjugated systems, which are necessary to systematically improve performance.Ph.D

    Wait-Free and Obstruction-Free Snapshot

    Get PDF
    The snapshot problem was first proposed over a decade ago and has since been well-studied in the distributed algorithms community. The challenge is to design a data structure consisting of mm components, shared by upto nn concurrent processes, that supports two operations. The first, Update(i,v)Update(i,v), atomically writes vv to the iith component. The second, Scan()Scan(), returns an atomic snapshot of all mm components. We consider two termination properties: wait-freedom, which requires a process to always terminate in a bounded number of its own steps, and the weaker obstruction-freedom, which requires such termination only for processes that eventually execute uninterrupted. First, we present a simple, time and space optimal, obstruction-free solution to the single-writer, multi-scanner version of the snapshot problem (wherein concurrent Updates never occur on the same component). Second, we assume hardware support for compare&swap (CAS) to give a time-optimal, wait-free solution to the multi-writer, single-scanner snapshot problem (wherein concurrent Scans never occur). This algorithm uses only O(mn)O(mn) space and has optimal CAS, write and remote-reference complexities. Additionally, it can be augmented to implement a general snapshot object with the same time and space bounds, thus improving the space complexity of O(mn2)O(mn^2) of the only previously known time-optimal solution

    Lower Bounds for Sparse Recovery

    Get PDF
    We consider the following k-sparse recovery problem: design an m x n matrix A, such that for any signal x, given Ax we can efficiently recover x' satisfying ||x-x'||_1 <= C min_{k-sparse} x"} ||x-x"||_1. It is known that there exist matrices A with this property that have only O(k log (n/k)) rows. In this paper we show that this bound is tight. Our bound holds even for the more general /randomized/ version of the problem, where A is a random variable and the recovery algorithm is required to work for any fixed x with constant probability (over A).Comment: 11 pages. Appeared at SODA 201

    Impacts of pollution discharges from Dinh Vu industrial zone on water quality in the Hai Phong coastal area

    Get PDF
    The hydrodynamic and water quality models (the Delft3D model) were established based on the measured data and the estimated pollution discharges from Dinh Vu industrial zones to Nam Trieu estuary. With seven separate simulation scenarios, the results show that in case of increased wastewater with the control of pollution discharge (water and concentration), the impact of pollution is only limited to a small area around the discharge point. Their influences on water quality in other areas in Nam Trieu estuary are quite small. Meanwhile, in case of environmental risk, a strongly increasing pollution load would cause the significantly increasing pollutant concentration in this area, they have almost exceeded the value in the National Technical Regulation on surface water quality (QCVN 10-MT:2015/BTNMT), such as NH4, COD, and BOD. Dissolved oxygen in the water would also decrease significantly. The spatial influence extends from the discharge point to Nam Trieu estuary, inside Cam, Bach Dang rivers, and Cat Hai coastal area

    Uma pesquisa-ação de 3 anos em uma universidade vietnamita: Alunos como cogeradores de conteúdo das aulas

    Get PDF
    In the context of the 4th Industrial Revolution with unlimited technological advancement and innovation, how can educators innovate their teaching and facilitate their students in their learning process, so that students can accumulate required skills and achieve the set learning outcomes of each course they take? In order to find out the answer, the authors have revised some literature concerning BYOD trend, active learning strategies, flipped classroom and learner-generated content as the theoretical base for their study. Action research has been conducted at a Vietnamese university with the participation of English-majored students in Theory of English Translation and Interpreting classes from 3 different intakes. The findings include students’ positive perception towards the content-generation practices. Some achievements and challenges in the teaching and learning process have also been reported. This paper also recommends further studies so that the practice could be utilized to the best outcomes.En el contexto de la Cuarta Revolución Industrial con un avance tecnológico e innovación ilimitados, ¿cómo pueden los educadores innovar su enseñanza y facilitar a sus estudiantes su proceso de aprendizaje, para que los estudiantes puedan acumular las habilidades requeridas y lograr los resultados de aprendizaje establecidos en cada curso que toman? Para encontrar la respuesta, los autores han revisado cierta literatura sobre la tendencia BYOD, las estrategias de aprendizaje activo, el aula invertida y el contenido generado por el alumno como base teórica para su estudio. Se llevó a cabo una investigación de acción en una universidad vietnamita con la participación de estudiantes con especialización en inglés en clases de Teoría de la traducción e interpretación en inglés de 3 tomas diferentes. Los hallazgos incluyen la percepción positiva de los estudiantes hacia las prácticas de generación de contenidos. También se reportan algunos logros y desafíos en el proceso de enseñanza y aprendizaje. Este documento también recomienda más estudios para que la práctica pueda utilizarse con los mejores resultados.No contexto da 4ª Revolução Industrial, com avanço tecnológico e inovação ilimitados, como os educadores podem inovar seu ensino e facilitar seus alunos em seu processo de aprendizagem, para que os alunos possam acumular as habilidades necessárias e alcançar os resultados de aprendizagem definidos de cada curso que fazem? Para encontrar a resposta, os autores revisaram alguma literatura sobre a tendência BYOD, estratégias de aprendizagem ativa, sala de aula invertida e conteúdo gerado pelo aluno como base teórica para seu estudo. Uma pesquisa-ação foi realizada em uma universidade vietnamita com a participação de alunos formados em inglês nas aulas de Teoria da Tradução e Interpretação do Inglês de 3 diferentes entradas. Os resultados incluem a percepção positiva dos alunos em relação às práticas de geração de conteúdo. Algumas conquistas e desafios no processo de ensino e aprendizagem também foram relatados. Este artigo também recomenda mais estudos para que a prática possa ser utilizada para os melhores resultados
    corecore