26 research outputs found

    Deep reinforcement learning in robotics and dialog systems

    Get PDF

    Detection and Evaluation of Clusters within Sequential Data

    Full text link
    Motivated by theoretical advancements in dimensionality reduction techniques we use a recent model, called Block Markov Chains, to conduct a practical study of clustering in real-world sequential data. Clustering algorithms for Block Markov Chains possess theoretical optimality guarantees and can be deployed in sparse data regimes. Despite these favorable theoretical properties, a thorough evaluation of these algorithms in realistic settings has been lacking. We address this issue and investigate the suitability of these clustering algorithms in exploratory data analysis of real-world sequential data. In particular, our sequential data is derived from human DNA, written text, animal movement data and financial markets. In order to evaluate the determined clusters, and the associated Block Markov Chain model, we further develop a set of evaluation tools. These tools include benchmarking, spectral noise analysis and statistical model selection tools. An efficient implementation of the clustering algorithm and the new evaluation tools is made available together with this paper. Practical challenges associated to real-world data are encountered and discussed. It is ultimately found that the Block Markov Chain model assumption, together with the tools developed here, can indeed produce meaningful insights in exploratory data analyses despite the complexity and sparsity of real-world data.Comment: 37 pages, 12 figure

    Investigation into the flow fields around bluff bodies and artificial heart valves

    Get PDF

    IN SILICO METHODS FOR DRUG DESIGN AND DISCOVERY

    Get PDF
    Computer-aided drug design (CADD) methodologies are playing an ever-increasing role in drug discovery that are critical in the cost-effective identification of promising drug candidates. These computational methods are relevant in limiting the use of animal models in pharmacological research, for aiding the rational design of novel and safe drug candidates, and for repositioning marketed drugs, supporting medicinal chemists and pharmacologists during the drug discovery trajectory.Within this field of research, we launched a Research Topic in Frontiers in Chemistry in March 2019 entitled “In silico Methods for Drug Design and Discovery,” which involved two sections of the journal: Medicinal and Pharmaceutical Chemistry and Theoretical and Computational Chemistry. For the reasons mentioned, this Research Topic attracted the attention of scientists and received a large number of submitted manuscripts. Among them 27 Original Research articles, five Review articles, and two Perspective articles have been published within the Research Topic. The Original Research articles cover most of the topics in CADD, reporting advanced in silico methods in drug discovery, while the Review articles offer a point of view of some computer-driven techniques applied to drug research. Finally, the Perspective articles provide a vision of specific computational approaches with an outlook in the modern era of CADD

    Structured Prediction on Dirty Datasets

    Get PDF
    Many errors cannot be detected or repaired without taking into account the underlying structure and dependencies in the dataset. One way of modeling the structure of the data is graphical models. Graphical models combine probability theory and graph theory in order to address one of the key objectives in designing and fitting probabilistic models, which is to capture dependencies among relevant random variables. Structure representation helps to understand the side effect of the errors or it reveals correct interrelationships between data points. Hence, principled representation of structure in prediction and cleaning tasks of dirty data is essential for the quality of downstream analytical results. Existing structured prediction research considers limited structures and configurations, with little attention to the performance limitations and how well the problem can be solved in more general settings where the structure is complex and rich. In this dissertation, I present the following thesis: By leveraging the underlying dependency and structure in machine learning models, we can effectively detect and clean errors via pragmatic structured predictions techniques. To highlight the main contributions: I investigate prediction algorithms and systems on dirty data with a more realistic structure and dependencies to help deploy this type of learning in more pragmatic settings. Specifically, We introduce a few-shot learning framework for error detection that uses structure-based features of data such as denial constraints violations and Bayesian network as co-occurrence feature. I have studied the problem of recovering the latent ground truth labeling of a structured instance. Then, I consider the problem of mining integrity constraints from data and specifically using the sampling methods for extracting approximate denial constraints. Finally, I have introduced an ML framework that uses solitary and structured data features to solve the problem of record fusion

    Learning discrete word embeddings to achieve better interpretability and processing efficiency

    Full text link
    L’omniprĂ©sente utilisation des plongements de mot dans le traitement des langues naturellesest la preuve de leur utilitĂ© et de leur capacitĂ© d’adaptation a une multitude de tĂąches. Ce-pendant, leur nature continue est une importante limite en terme de calculs, de stockage enmĂ©moire et d’interprĂ©tation. Dans ce travail de recherche, nous proposons une mĂ©thode pourapprendre directement des plongements de mot discrets. Notre modĂšle est une adaptationd’une nouvelle mĂ©thode de recherche pour base de donnĂ©es avec des techniques dernier crien traitement des langues naturelles comme les Transformers et les LSTM. En plus d’obtenirdes plongements nĂ©cessitant une fraction des ressources informatiques nĂ©cĂ©ssaire Ă  leur sto-ckage et leur traitement, nos expĂ©rimentations suggĂšrent fortement que nos reprĂ©sentationsapprennent des unitĂ©s de bases pour le sens dans l’espace latent qui sont analogues Ă  desmorphĂšmes. Nous appelons ces unitĂ©s dessememes, qui, de l’anglaissemantic morphemes,veut dire morphĂšmes sĂ©mantiques. Nous montrons que notre modĂšle a un grand potentielde gĂ©nĂ©ralisation et qu’il produit des reprĂ©sentations latentes montrant de fortes relationssĂ©mantiques et conceptuelles entre les mots apparentĂ©s.The ubiquitous use of word embeddings in Natural Language Processing is proof of theirusefulness and adaptivity to a multitude of tasks. However, their continuous nature is pro-hibitive in terms of computation, storage and interpretation. In this work, we propose amethod of learning discrete word embeddings directly. The model is an adaptation of anovel database searching method using state of the art natural language processing tech-niques like Transformers and LSTM. On top of obtaining embeddings requiring a fractionof the resources to store and process, our experiments strongly suggest that our representa-tions learn basic units of meaning in latent space akin to lexical morphemes. We call theseunitssememes, i.e., semantic morphemes. We demonstrate that our model has a greatgeneralization potential and outputs representation showing strong semantic and conceptualrelations between related words

    Business Risk in Changing Dynamics of Global Village 2

    Get PDF
    The monograph is prepared based on the presentations and discussions made at the II International Conference “BUSINESS RISK IN CHANGING DYNAMICS OF GLOBAL VILLAGE (BRCDGV 2019)”, November, 7th-08th, 2019, in Ternopil, Ukraine. The aim of this scientific international conference is to provide a platform for professional debate with the participation of experts from around the globe in order to identify & analyze risks and opportunities in today’s global business, and specifically in Ukraine. The conference will provide a framework for researchers, business elites and decision makers to uplift the business ties and minimise the risk for creating a better world and better Ukraine.The Conference is designed to call experts around the globe from different sectors of practices which are effected by globalization and watching changes in Europe as well as in Ukraine. It is an excellent platform for interactions and communication between academicians, corporate representatives, policy makers, representatives of organizations and community, as well as individuals being the part of this globalized world. The 1st edition of this conference was held at the University of Applied Sciences in Nysa, Poland (2017); the 2nd edition took place at Ternopil Ivan Puluj National Technical University, Ukraine (2019); the 3rd edition will be organized at Patna University, India (2020) in cooperation with Indo-European Education Foundation (IEEF, Poland) and its partner universities from Poland, India, Europe and other part of the world.Under modern conditions of globalization nowadays, economic activity is undergoing changes. Innovative technologies, new forms of business, dynamic changes taking place in the world today result in the emergence of the necessity to minimize risks in order to maximize benefits. The cooperation between experts from different fields with the aim to ensure sustainable growth – policymakers, scientists, universities representatives and business elites is essential nowadays. With the purpose to bring them together and discuss the main issues of todays’ global world this conference took place in Ternopil, Ukraine. As Ukraine is now passing through a dynamic period of changes, recommendations coming up from such discussions can be very beneficial for building stronger society and meet the risks globalization brings up. This monograph provides a useful review of economic, financial and policy issues in the context of globalization processes and has proven extremely popular with practitioners and industry advisors. This edition is given the continued high demand and interest for experts form different areas working on diminishing of business risks wishing to keep abreast of current thinking on this subject. According to many experts process of managing risks is currently one of the most relevant business technologies and at the same time it is a complex process which requires ground knowledge in the research field and practical experience. The popularity of business risks management is due to objective reasons such as dynamics of society, interconnections and interdependence between different players in the society, increasing role of human capital in the country’s sustainable developmen

    Toward coherent accounting of uncertainty in hydrometeorological modeling

    Get PDF
    La considĂ©ration adĂ©quate des diffĂ©rentes sources d’incertitude est un aspect crucial de la prĂ©vision hydromĂ©tĂ©orologique. La prĂ©vision d’ensemble, en fournissant des informations sur la probabilitĂ© d’occurrence des sorties du modĂšle, reprĂ©sente une alternative sĂ©duisante Ă  la prĂ©vision dĂ©terministe traditionnelle. De plus, elle permet de faire face aux diffĂ©rentes sources d’incertitude qui se trouvent le long de la chaĂźne de modĂ©lisation hydromĂ©tĂ©orologique en gĂ©nĂ©rant des ensembles lĂ  oĂč ces incertitudes se situent. Le principal objectif de cette thĂšse est d’identifier un systĂšme qui soit capable d’apprĂ©hender les trois sources principales d’incertitude que sont la structure du modĂšle hydrologique, ses conditions initiales et le forçage mĂ©tĂ©orologique, dans le but de fournir une prĂ©vision qui soit Ă  la fois prĂ©cise, fiable et Ă©conomiquement attractive. L’accent est mis sur la cohĂ©rence avec laquelle les diffĂ©rentes incertitudes doivent ĂȘtre quantifiĂ©es et rĂ©duites. Notamment, celles-ci doivent ĂȘtre considĂ©rĂ©es explicitement avec une approche cohĂ©sive qui fasse en sorte que chacune d’entre elles soit traitĂ©e adĂ©quatement, intĂ©gralement et sans redondance dans l’action des divers outils qui composent le systĂšme. Afin de rĂ©pondre Ă  cette attente, plusieurs sous-objectifs sont dĂ©finis. Le premier se penche sur l’approche multimodĂšle pour Ă©valuer ses bĂ©nĂ©fices dans un contexte opĂ©rationnel. Dans un second temps, dans le but d’identifier une implĂ©mentation optimale du filtre d’ensemble de Kalman, diffĂ©rents aspects du filtre qui conditionnent ses performances sont Ă©tudiĂ©s en dĂ©tail. L’étape suivante rassemble les connaissances acquises lors des deux premiers objectifs en rĂ©unissant leurs atouts et en y incluant la prĂ©vision mĂ©tĂ©orologique d’ensemble pour construire un systĂšme qui puisse fournir des prĂ©visions Ă  la fois prĂ©cises et fiables. Il est attendu que ce systĂšme soit en mesure de prendre en compte les diffĂ©rentes sources d’incertitude de façon cohĂ©rente tout en fournissant un cadre de travail pour Ă©tudier la contribution des diffĂ©rents outils hydromĂ©tĂ©orologiques et leurs interactions. Enfin, le dernier volet porte sur l’identification des relations entre les diffĂ©rents systĂšmes de prĂ©visions prĂ©cĂ©demment crĂ©Ă©s, leur valeur Ă©conomique et leur qualitĂ© de la prĂ©vision. La combinaison du filtre d’ensemble de Kalman, de l’approche multimodĂšle et de la prĂ©vision mĂ©tĂ©orologique d’ensemble se rĂ©vĂšle ĂȘtre plus performante qu’aucun des outils utilisĂ©s sĂ©parĂ©ment, Ă  la fois en prĂ©cision et fiabilitĂ© et ceci en raison d’unemeilleure prise en compte de l’incertitude que permet leur action complĂ©mentaire. L’ensemble multimodĂšle, composĂ© par 20 modĂšles hydrologiques sĂ©lectionnĂ©s pour leurs diffĂ©rences structurelles, est capable de minimiser l’incertitude liĂ©e Ă  la structure et Ă  la conceptualisation, grĂące au rĂŽle spĂ©cifique que jouent les modĂšles au sein de l’ensemble. Cette approche, mĂȘme si utilisĂ©e seule, peut conduire Ă  des rĂ©sultats supĂ©rieurs Ă  ceux d’un modĂšle semi-distribuĂ© utilisĂ© de façon opĂ©rationnelle. L’identification de la configuration optimale du filtre d’ensemble de Kalman afin de rĂ©duire l’incertitude sur les conditions initiales est complexe, notamment en raison de l’identification parfois contre-intuitive des hyper-paramĂštres et des variables d’état qui doivent ĂȘtre mises Ă  jour, mais Ă©galement des performances qui varient grandement en fonction du modĂšle hydrologique. Cependant, le filtre reste un outil de premiĂšre importance car il participe efficacement Ă  la rĂ©duction de l’incertitude sur les conditions initiales et contribue de façon importante Ă  la dispersion de l’ensemble prĂ©visionnel. Il doit ĂȘtre malgrĂ© tout assistĂ© par l’approche multimodĂšle et la prĂ©vision mĂ©tĂ©orologique d’ensemble pour pouvoir maintenir une dispersion adĂ©quate pour des horizons dĂ©passant le court terme. Il est Ă©galement dĂ©montrĂ© que les systĂšmes qui sont plus prĂ©cis et plus fiables fournissent en gĂ©nĂ©ral une meilleure valeur Ă©conomique, mĂȘme si cette relation n’est pas dĂ©finie prĂ©cisĂ©ment. Les diffĂ©rentes incertitudes inhĂ©rentes Ă  la prĂ©vision hydromĂ©tĂ©orologique ne sont pas totalement Ă©liminĂ©es, mais en les traitant avec des outils spĂ©cifiques et adaptĂ©s, il est possible de fournir une prĂ©vision d’ensemble qui soit Ă  la fois prĂ©cise, fiable et Ă©conomiquement attractive.A proper consideration of the different sources of uncertainty is a key point in hydrometeorological forecasting. Ensembles are an attractive alternative to traditional deterministic forecasts that provide information about the likelihood of the outcomes. Moreover, ensembles can be generated wherever a source of uncertainty takes place in the hydrometeorological modeling chain. The global objective of this thesis is to identify a system that is able to decipher the three main sources of uncertainty in modeling, i.e. the model structure, the hydrological model initial conditions and the meteorological forcing uncertainty, to provide accurate, reliable, and valuable forecast. The different uncertainties should be quantified and reduced in a coherent way, that is to say that they should be addressed explicitly with a cohesive approach that ensures to handle them adequately without redundancy in the action of the different tools that compose the system. This motivated several sub-objectives, the first one of which focusing on the multimodel approach to identify its benefits in an operational framework. Secondly, the implementation and the features of the Ensemble Kalman Filter (EnKF) are put under scrutiny to identify an optimal implementation. The next step reunites the knowledge of the two first goals by merging their strengths and by adding the meteorological ensembles to build a framework that issues accurate and reliable forecasts. This system is expected to decipher the main sources of uncertainty in a coherent way and provides a framework to study the contribution of the different tools and their interactions. Finally, the focus is set on the forecast economic value and provides an attempt to relate the different systems that have been built to economic value and forecast quality. It is found that the combination of the EnKF, the multimodel, and ensemble forcing, allows to issue forecasts that are accurate and nearly reliable. The combination of the three tools outperforms any other used separately and the uncertainties that were considered are deciphered thanks to their complementary actions. The 20 dissimilar models that compose the multimodel ensemble are able to minimize the uncertainty related to the model structure, thanks to the particular role they play in the ensemble. Such approach has the capacity to outperform more a complex semi-distributed model used operationally. To deal optimally with the initial condition uncertainty, the EnKF implementation may be complex to reach because of the unintuitive specification of hyper-parameters and the selection of the state variable to update, and its varying compatibility with hydrological model. Nonetheless, the filter is a powerful tool to reduce initial condition uncertainty and contributes largely to the predictive ensemble spread. However, it needs to be supported by a multimodel approach and ensemble meteorological forcing to maintain adequate ensemble dispersion for longer lead times. Finally, it is shown that systems that exhibit better accuracy and reliability have generally higher economic value, even if this relation is loosely defined. The different uncertainties inherent to the forecasting process may not be eliminated, nonetheless by explicitly accounting for them with dedicated and suitable tools, an accurate, reliable, and valuable predictive ensemble can be issued
    corecore