12 research outputs found

    Noisy-parallel and comparable corpora filtering methodology for the extraction of bi-lingual equivalent data at sentence level

    Get PDF
    Text alignment and text quality are critical to the accuracy of Machine Translation (MT) systems, some NLP tools, and any other text processing tasks requiring bilingual data. This research proposes a language independent bi-sentence filtering approach based on Polish (not a position-sensitive language) to English experiments. This cleaning approach was developed on the TED Talks corpus and also initially tested on the Wikipedia comparable corpus, but it can be used for any text domain or language pair. The proposed approach implements various heuristics for sentence comparison. Some of them leverage synonyms and semantic and structural analysis of text as additional information. Minimization of data loss was ensured. An improvement in MT system score with text processed using the tool is discussed.Comment: arXiv admin note: text overlap with arXiv:1509.09093, arXiv:1509.0888

    Some experimental results relevant to the optimization of configuration and planning problems.

    Get PDF
    This communication deals with mass customization and the association of the product configuration task with the planning of its production process while trying to minimize cost and cycle time. We consider a two steps approach that first permit to interactively (with the customer) achieve a first product configuration and first process plan (thanks to non-negotiable requirements) and then optimize both of them (with remaining negotiable requirements). This communication concerns the second optimization step. Our oal is to evaluate a recent evolutionary algorithm (EA). As both problems are considered as constraints satisfaction problems, the optimization problem is constrained. Therefore the considered EA was selected and adapted to fit the problem. The experimentations will compare the EA with a conventional branch and bound according to the problem size and the density of constraints. The hypervolume metric is used for comparison

    Improving Accuracy and Scalability of Personal Recommendation Based on Bipartite Network Projection

    Get PDF
    Bipartite network projection method has been recently employed for personal recommendation. It constructs a bipartite network between users and items. Treating user taste for items as resource in the network, we allocate the resource via links between user nodes and item nodes. However, the taste model employed by existing algorithms cannot differentiate “dislike” and “unrated” cases implied by user ratings. Moreover, the distribution of resource is solely based on node degrees, ignoring the different transfer rates of the links. To enhance the performance, this paper devises a negative-aware and rating-integrated algorithm on top of the baseline algorithm. It enriches the current user taste model to encompass “like,” “dislike,” and “unrated” information from users. Furthermore, in the resource distribution stage, we propose to initialize the resource allocation according to user ratings, which also determines the resource transfer rates on links afterward. Additionally, we also present a scalable implementation in the MapReduce framework by parallelizing the algorithm. Extensive experiments conducted on real data validate the effectiveness and efficiency of the proposed algorithms

    Trustworthiness of Context-Aware Urban Pollution Data in Mobile Crowd Sensing

    Get PDF
    Urban pollution is usually monitored via fixed stations that provide detailed and reliable information, thanks to equipment quality and effective measuring protocols, but these sampled data are gathered from very limited areas and through discontinuous monitoring campaigns. Currently, the spread of mobile devices has fostered the development of new approaches, like Mobile Crowd Sensing (MCS), increasing the chances of using smartphones as suitable sensors in the urban monitoring scenario, because it potentially contributes massive ubiquitous data at relatively low cost. However, MCS is useless (or even counter-productive), if contributed data are not trustworthy, due to wrong data-collection procedures by non-expert practitioners. Contextualizing monitored data with those coming from phone-embedded sensors and from time/space proximity can improve data trustworthiness. This work focuses on the development of an algorithm that exploits context awareness to improve the reliability of MCS collected data. It has been validated against some real use cases for noise pollution and promises to improve the trustworthiness of end users generated data

    Geração de esqueletos para sistemas de ETL a partir de redes de Petri colorida

    Get PDF
    As Redes de Petri Coloridas são uma linguagem gráfica com uma semântica bem definida, que permite o desenho, especificação, simulação e validação de sistemas, cujos processos a modelar exijam características específicas de comunicação, concorrência e sincronização entre si. A nível aplicacional, as Redes de Petri Coloridas surgem em áreas muito diferentes, tais como a especificação de protocolos de comunicação, sistemas de controlo, sistemas de hardware ou de sistemas de software. Devido às suas características as Redes de Petri Coloridas foram adotadas, também, na modelação de sistemas de ETL (Extract-Transformation-Load). Meta-tarefas como Change Data Capture ou Surrogate Key Pipelining, frequentemente encontradas em sistemas de ETL convencionais, foram modeladas e validadas através do uso de redes de Petri Coloridas. Tal sustenta, de forma bastante efetiva, o objetivo principal deste trabalho de dissertação: desenvolver e implementar um sistema para a geração de esqueletos para sistemas de ETL a partir da correspondente Rede de Petri Colorida.Coloured Petri Nets are a graphical language with a well-formed semantic, that allows the design, specification, simulation, and validation of systems, which specific characteristics such as, communication, concurrency and synchronization have a main role in the processes to model. At application level, Coloured Petri Nets are used in a wide variety of scientific areas, such as communication protocol, control systems, hardware systems or software systems. Due their characteristics Coloured Petri Nets were also adopted in modeling ETL (Extract-TransformationLoad) systems. Meta-tasks like Change Data Capture or Surrogate Key Pipelining, that are frequently founded in conventional ETL system, were modeling and validated using Coloured Petri Nets. All this support, quite effectively, the main propose of this dissertation work: develop and implement a system to generating skeletons to ETL systems from the corresponding Coloured Petri Nets

    Healthy snacks consumption and the Theory of Planned Behaviour. The role of anticipated regret

    Get PDF
    Two empirical studies explored the role of anticipated regret (AR) within the Theory of Planned Behavior (TPB) framework (Ajzen, 1991), applied to the case of healthy snacks consumption. AR captures affective reactions and it can be defined as an unpleasant emotion experienced when people realize or imagine that the present situation would be better if they had made a different decision. In this research AR refers to the expected negative feelings for not having consumed healthy snacks (i.e., inaction regret). The aims were: a) to test whether AR improves the TPB predictive power; b) to analyze whether it acts as moderator within the TPB model relationships. Two longitudinal studies were conducted. Target behaviors were: consumption of fruit and vegetables as snacks (Study 1); consumption of fruit as snacks (Study 2). At time 1, the questionnaire included measures of intention and its antecedents, according to the TPB. Both the affective and evaluative components of attitude were assessed. At time 2, self-reported consumption behaviors were surveyed. Two convenience samples of Italian adults were recruited. In hierarchical regressions, the TPB variables were added at the first step; AR was added at the second step, and the interactions at the last step. Results showed that AR significantly improved the TPB ability to predict both intentions and behaviours, also after controlling for intention. In both studies AR moderated the effect of affective attitude on intention: affective attitude was significant only for people low in AR

    Configuration interactive et contraintes : connaissances, filtrage et extensions

    Get PDF
    The value of our research work is rooted in the following observations :-1- the life cycle of products, systems, services and processes is tending to get shorter ; -2- new designs and updates of products on the market are becoming more and more frequent, leading to increasingly short design cycles ; -3 technologies are constantly changing, requiring permanent, ongoing acquisition of knowledge ; -4-the diversity of products offered on the market is growing all the time, ranging from customizable or configurable to made-to-measure or designed to order.These trends, and the mass of information and knowledge that requires treating as a result of them, are placing heavy demands on designers, requiring ever more attentiveness and increasingly intense cognitive effort. The result is an increased risk that the product does not fully meet the customer’s needs, that it is difficult to implement or manufacture, or that it will be prohibitively expensive. The aim of our work is thus to help the design process to reduce these risks and errors by delivering software tools and methodological environments that serve to capitalize and exploit general, contextual, academic, expert or business knowledge.Our work on various complex industrial cases has led us to take into consideration two kinds of knowledge, involving on the one hand the "product domain" and on the other the "product diversity element". Each kind of knowledge leads to differing industrial cases. The first kind of knowledge encompasses the scientific and technical aspects, but also the specific rules governing the business in question. This knowledge is required in order to define the product itself, and involves issues that can be resolved by aiding the product /system/service design. The second kind of knowledge relates to the diverse nature of the products, and involves issues of customization or configuration of the product/system/service.Our aim is to help in what might be called "routine" design, where different kinds and various types of knowledge exist, due to the recurrent nature of the activity. We consider that aid in design or configuration can be formalized, either completely or partially, in the form of a constraint satisfaction problem (CSP). In this context, we focus more specifically on interactive decision-support, by introducing the principles of filtering or constraint propagation. The diversity of knowledge formalized as a CSP and the interaction with the user allow us to assemble and adapt filtering algorithms in a generic constraint propagation engine, integrated in our CoFiADe software solution.In addition, this formalism based on CSP constraints is complemented by : - ontologies to structure knowledge and facilitate its reuse throughout the development cycle, - analogy-based approaches taking advantage of contextual knowledge encapsulated in the case under study, so as to make recommendations to the user on the choice of values, - evolutionary approaches to optimize the search for multi-criteria solutions.Les travaux de recherche présentés dans ce mémoire trouvent leurs fondements dans les constats suivants :-1- la durée de vie des produits et systèmes tend à se réduire,-2- les conceptions et les actualisations des produits mis sur le marché sont de plus en plus fréquentes alors que les cycles de conception sont toujours plus brefs,-3- les technologies employées en constante évolution nécessitent une acquisition de connaissance permanente,-4- la diversité des produits offerte sur les marchés ne cesse de croître allant des produits personnali- sables ou configurés jusqu’aux produits sur-mesure et conçus à la commande.Ces tendances et la masse d’informations et de connaissances à traiter en découlant exigent des concepteurs toujours plus d’attention et un travail cognitif toujours plus intense. Il en résulte une augmentation des risques, que le produit réponde imparfaitement aux besoins du demandeur, qu’il soit difficilement réalisable et fabricable, ou encore qu’il le soit à un coût prohibitif. L’objectif de nos travaux est donc de limiter ces risques et erreurs en proposant des outils logiciels et des environnements méthodologiques destinés à capitaliser et exploiter des connaissances générales, contextuelles, académiques, expertes ou métier pour aider la conception.Les travaux effectués sur différentes problématiques industrielles ont conduit à prendre en considération deux natures de connaissances relevant du « domaine produit » et de la « diversité produit » conduisant à des problématiques industrielles différentes : la première nature de connaissance recouvre aussi bien des aspects scientifiques et techniques que des règles métier, elle est nécessaire pour la définition du produit et débouche sur des problématiques d’aide à la conception de produit ; la seconde nature est une connaissance liée à la diversité des produits, qui débouche sur les problématiques d’aide à la personnalisation ou configuration de produit.Nous visons à aider un type de conception plutôt « routinier » où de la connaissance de différentes natures et de divers types existe du fait de la récurrence de l’activité. Nous considérons de plus dans nos travaux que l’aide à la conception ou configuration peut se formaliser, complètement ou partiellement, comme un problème de satisfaction de contraintes (CSP). Dans ce cadre, nous nous intéressons plus spécifiquement à l’aide à la décision interactive exploitant les principes de filtrage ou de propagation de contraintes. Notre objectif se décline alors en l’accompagnement des concepteurs dans la construction des solutions répondant au mieux à leurs problèmes, en retirant progressivement de l’espace des solutions, celles qui ne sont plus cohérentes avec les décisions prises, en estimant celles-ci au fil de leur construction et/ou en les optimisant.en complément, nous associons à ce formalisme à base de contraintes CSP :- des ontologies pour structurer les connaissances et faciliter leur réutilisateion sur l’ensemble du cycle de développement,- des approches par analogie exploitant de la connaissance contextuelle encapsulée dans des cas afin de proposer à l’utilisateur des recommandations quant aux choix de valeurs,- des approches évolutionnaires pour optimiser la recherche des solutions de manière multicritère

    Bayesian Network Approximation from Local Structures

    Get PDF
    This work is focused on the problem of Bayesian network structure learning. There are two main areas in this field which are here discussed.The first area is a theoretical one. We consider some aspects of the Bayesian network structure learning hardness. In particular we prove that the problem of finding a Bayesian network structure with a minimal number of edges encoding the joint probability distribution of a given dataset is NP-hard. This result can be considered as a significantly different than the standard one view on the NP-hardness of the Bayesian network structure learning. The most notable so far results in this area are focused mainly on the specific characterization of the problem, where the aim is to find a Bayesian network structure maximizing some given probabilistic criterion. These criteria arise from quite advanced considerations in the area of statistics, and in particular their interpretation might be not intuitive---especially for the people not familiar with the Bayesian networks domain. In contrary the proposed here criterion, for which the NP-hardness is proved, does not require any advanced knowledge and it can be easily understandable.The second area is related to concrete algorithms. We focus on one of the most interesting branch in history of Bayesian network structure learning methods, leading to a very significant solutions. Namely we consider the branch of local Bayesian network structure learning methods, where the main aim is to gather first of all some information describing local properties of constructed networks, and then use this information appropriately in order to construct the whole network structure. The algorithm which is the root of this branch is focused on the important local characterization of Bayesian networks---so called Markov blankets. The Markov blanket of a given attribute consists of such other attributes which in the probabilistic sense correspond to the maximal in strength and minimal in size set of its causes. The aforementioned first algorithm in the considered here branch is based on one important observation. Subject to appropriate assumptions it is possible to determine the optimal Bayesian network structure by examining relations between attributes only within the Markov blankets. In the case of datasets derived from appropriately sparse distributions, where Markov blanket of each attribute has a limited by some common constant size, such procedure leads to a well time scalable Bayesian network structure learning approach.The Bayesian network local learning branch has mainly evolved in direction of reducing the gathered local information into even smaller and more reliably learned patterns. This reduction has raised from the parallel progress in the Markov blankets approximation field.The main result of this dissertation is the proposal of Bayesian network structure learning procedure which can be placed into the branch of local learning methods and which leads to the fork in its root in fact. The fundamental idea is to appropriately aggregate learned over the Markov blankets local knowledge not in the form of derived dependencies within these blankets---as it happens in the root method, but in the form of local Bayesian networks. The user can thanks to this have much influence on the character of this local knowledge---by choosing appropriate to his needs Bayesian network structure learning method used in order to learn the local structures. The merging approach of local structures into a global one is justified theoretically and evaluated empirically, showing its ability to enhance even very advanced Bayesian network structure learning algorithms, when applying them locally in the proposed scheme.Praca ta skupia się na problemie uczenia struktury sieci bayesowskiej. Są dwa główne pola w tym temacie, które są tutaj omówione.Pierwsze pole ma charakter teoretyczny. Rozpatrujemy pewne aspekty trudności uczenia struktury sieci bayesowskiej. W szczególności pokozujemy, że problem wyznaczenia struktury sieci bayesowskiej o minimalnej liczbie krawędzi kodującej w sobie łączny rozkład prawdopodobieństwa atrybutów danej tabeli danych jest NP-trudny. Rezultat ten może być postrzegany jako istotnie inne od standardowego spojrzenie na NP-trudność uczenia struktury sieci bayesowskiej. Najbardziej znaczące jak dotąd rezultaty w tym zakresie skupiają się głównie na specyficznej charakterystyce problemu, gdzie celem jest wyznaczenie struktury sieci bayesowskiej maksymalizującej pewne zadane probabilistyczne kryterium. Te kryteria wywodzą się z dość zaawansowanych rozważań w zakresie statystyki i w szczególności mogą nie być intuicyjne---szczególnie dla ludzi niezaznajomionych z dziedziną sieci bayesowskich. W przeciwieństwie do tego zaproponowane tutaj kryterium, dla którego została wykazana NP-trudność, nie wymaga żadnej zaawansowanej wiedzy i może być łatwo zrozumiane.Drugie pole wiąże się z konkretnymi algorytmami. Skupiamy się na jednej z najbardziej interesujących gałęzi w historii metod uczenia struktur sieci bayesowskich, prowadzącej do bardzo znaczących rozwiązań. Konkretnie rozpatrujemy gałąź metod lokalnego uczenia struktur sieci bayesowskich, gdzie głównym celem jest zebranie w pierwszej kolejności pewnych informacji opisujących lokalne własności konstruowanych sieci, a następnie użycie tych informacji w odpowiedni sposób celem konstrukcji pełnej struktury sieci. Algorytm będący korzeniem tej gałęzi skupia się na ważnej lokalnej charakteryzacji sieci bayesowskich---tak zwanych kocach Markowa. Koc Markowa dla zadanego atrybutu składa się z tych pozostałych atrybutów, które w sensie probabilistycznym odpowiadają maksymalnymu w sile i minimalnemu w rozmiarze zbiorowi jego przyczyn. Wspomniany pierwszy algorytm w rozpatrywanej tu gałęzi opiera się na jednej istotnej obserwacji. Przy odpowiednich założeniach możliwe jest wyznaczenie optymalnej struktury sieci bayesowskiej poprzez badanie relacji między atrybutami jedynie w obrębie koców Markowa. W przypadku zbiorów danych wywodzących się z odpowiednio rzadkiego rozkładu, gdzie koc Markowa każdego atrybutu ma ograniczony przez pewną wspólną stałą rozmiar, taka procedura prowadzi do dobrze skalowalnego czasowo podejścia uczenia struktury sieci bayesowskiej.Gałąź lokalnego uczenia sieci bayesowskich rozwinęła się głównie w kierunku redukcji zbieranych lokalnych informacji do jeszcze mniejszych i bardziej niezawodnie wyuczanych wzorców. Redukcja ta wyrosła na bazie równoległego rozwoju w dziedzinie aproksymacji koców Markowa.Głównym rezultatem tej rozprawy jest zaproponowanie procedury uczenia struktury sieci bayesowskiej, która może być umiejscowiona w gałęzi metod lokalnego uczenia i która faktycznie wyznacza rozgałęzienie w jego korzeniu. Fundamentalny pomysł polega tu na tym, żeby odpowiednio agregować wyuczoną w obrębie koców Markowa lokalną wiedzę nie w formie wyprowadzonych zależności w obrębie tych koców---tak jak to się dzieje w przypadku metody - korzenia, ale w formie lokalnych sieci bayesowskich. Użytkownik może mieć dzięki temu duży wpływ na charakter tej lokalnej wiedzy---poprzez wybór odpowiedniej dla jego potrzeb metody uczenia struktury sieci bayesowskiej użytej w celu wyznaczenia lokalnych struktur. Procedura scalenia lokalnych modeli celem utworzenia globalnego jest uzasadniona teoretycznie oraz zbadana eksperymentalnie, pokazując jej zdolność do poprawienia nawet bardzo zaawansowanych algorytmów uczenia struktury sieci bayesowskiej, gdy zastosuje się je lokalnie w ramach zaproponowanego schematu
    corecore