956,454 research outputs found

    Hypothesis exploration with visualization of variance.

    Get PDF
    BackgroundThe Consortium for Neuropsychiatric Phenomics (CNP) at UCLA was an investigation into the biological bases of traits such as memory and response inhibition phenotypes-to explore whether they are linked to syndromes including ADHD, Bipolar disorder, and Schizophrenia. An aim of the consortium was in moving from traditional categorical approaches for psychiatric syndromes towards more quantitative approaches based on large-scale analysis of the space of human variation. It represented an application of phenomics-wide-scale, systematic study of phenotypes-to neuropsychiatry research.ResultsThis paper reports on a system for exploration of hypotheses in data obtained from the LA2K, LA3C, and LA5C studies in CNP. ViVA is a system for exploratory data analysis using novel mathematical models and methods for visualization of variance. An example of these methods is called VISOVA, a combination of visualization and analysis of variance, with the flavor of exploration associated with ANOVA in biomedical hypothesis generation. It permits visual identification of phenotype profiles-patterns of values across phenotypes-that characterize groups. Visualization enables screening and refinement of hypotheses about variance structure of sets of phenotypes.ConclusionsThe ViVA system was designed for exploration of neuropsychiatric hypotheses by interdisciplinary teams. Automated visualization in ViVA supports 'natural selection' on a pool of hypotheses, and permits deeper understanding of the statistical architecture of the data. Large-scale perspective of this kind could lead to better neuropsychiatric diagnostics

    Bayesian model predictive control: Efficient model exploration and regret bounds using posterior sampling

    Full text link
    Tight performance specifications in combination with operational constraints make model predictive control (MPC) the method of choice in various industries. As the performance of an MPC controller depends on a sufficiently accurate objective and prediction model of the process, a significant effort in the MPC design procedure is dedicated to modeling and identification. Driven by the increasing amount of available system data and advances in the field of machine learning, data-driven MPC techniques have been developed to facilitate the MPC controller design. While these methods are able to leverage available data, they typically do not provide principled mechanisms to automatically trade off exploitation of available data and exploration to improve and update the objective and prediction model. To this end, we present a learning-based MPC formulation using posterior sampling techniques, which provides finite-time regret bounds on the learning performance while being simple to implement using off-the-shelf MPC software and algorithms. The performance analysis of the method is based on posterior sampling theory and its practical efficiency is illustrated using a numerical example of a highly nonlinear dynamical car-trailer system

    The Millennium Run Observatory: First Light

    Full text link
    Simulations of galaxy evolution aim to capture our current understanding as well as to make predictions for testing by future experiments. Simulations and observations are often compared in an indirect fashion: physical quantities are estimated from the data and compared to models. However, many applications can benefit from a more direct approach, where the observing process is also simulated and the models are seen fully from the observer's perspective. To facilitate this, we have developed the Millennium Run Observatory (MRObs), a theoretical virtual observatory which uses virtual telescopes to `observe' semi-analytic galaxy formation models based on the suite of Millennium Run dark matter simulations. The MRObs produces data that can be processed and analyzed using the standard software packages developed for real observations. At present, we produce images in forty filters from the rest-frame UV to IR for two stellar population synthesis models, three different models of IGM absorption, and two cosmologies (WMAP1/7). Galaxy distributions for a large number of mock lightcones can be `observed' using models of major ground- and space-based telescopes. The data include lightcone catalogues linked to structural properties of galaxies, pre-observation model images, mock telescope images, and Source Extractor products that can all be traced back to the higher level dark matter, semi-analytic galaxy, and lightcone catalogues available in the Millennium database. Here, we describe our methods and announce a first public release of simulated surveys (e.g., SDSS, CFHT-LS, GOODS, GOODS/ERS, CANDELS, and HUDF). The MRObs browser, an online tool, further facilitates exploration of the simulated data. We demonstrate the benefits of a direct approach through a number of example applications (galaxy number counts in CANDELS, clusters, morphologies, and dropout selections).Comment: MNRAS, in press. Millennium Run Observatory data products, online tools, and more available through http://galformod.mpa-garching.mpg.de/mrobs

    Language for Specific Purposes and Corpus-based Pedagogy

    Get PDF
    This chapter describes how corpus-based pedagogies are used for teaching and learning language for specific purposes (LSP). Corpus linguistics (CL) refers to the study of large quantities of authentic language using computer-assisted methods, which form the basis for computer-assisted language learning (CALL) that uses corpora for reference, exploration, and interactive learning. The use of corpora as reference resources to create LSP materials is described. Direct student uses of corpora are illustrated by three approaches to data-driven learning (DDL) where students engage in hands-on explorations of texts. A combination of indirect and direct corpus applications is shown in an illustration of interactive CALL technologies, including an example of an inclusive corpus-based tool for genre-based writing pedagogy. The chapter concludes with potential prospects for future developments in LSP

    Minimizing User Effort in Large Scale Example-driven Data Exploration

    Get PDF
    Data Exploration is a key ingredient in a widely diverse set of discovery-oriented applications, including scientific computing, financial analysis, and evidence-based medicine. It refers to a series of exploratory tasks that aim to extract useful pieces of knowledge from data, and its challenge is to do so without requiring the user to specify with precision what information is being searched for. The goal of assisting users in constructing their exploratory queries effortlessly, which effectively reveals interesting data objects, has led to the development of a variety of intelligent semi-automatic approaches. Among such approaches, Example-driven Exploration is rapidly becoming an attractive choice for exploratory query formulation since it attempts to minimize the amount of prior knowledge required from the user to form an accurate exploratory query. In particular, this dissertation focuses on interactive Example-driven Exploration, which steers the user towards discovering all data objects relevant to the users’ exploration based on their feedback on a small set of examples. Interactive Example-driven Exploration is especially beneficial for non-expert users, as it enables them to circumvent query languages by assigning relevancy to examples as a proxy for the intended exploratory analysis. However, existing interactive Example-driven Exploration systems fall short of supporting the need to perform complex explorations over large, unstructured high-dimensional data. To overcome these challenges, we have developed new methods of data reduction, example selection, data indexing, and result refinement that support practical, interactive data exploration. The novelty of our approach is anchored on leveraging active learning and query optimization techniques that strike a balance between maximizing accuracy and minimizing user effort in providing feedback while enabling interactive performance for exploration tasks with arbitrary, large-sized datasets. Furthermore, it extends the exploration beyond the structured data by supporting a variety of high-dimensional unstructured data and enables the refinement of results when the exploration task is associated with too many relevant data objects that could be overwhelming to the user. To affirm the effectiveness of our proposed models, techniques, and algorithms, we implemented multiple prototype systems and evaluated them using real datasets. Some of them were also used in domain-specific analytics tools

    Constrained Policy Optimization for Controlled Self-Learning in Conversational AI Systems

    Full text link
    Recently, self-learning methods based on user satisfaction metrics and contextual bandits have shown promising results to enable consistent improvements in conversational AI systems. However, directly targeting such metrics by off-policy bandit learning objectives often increases the risk of making abrupt policy changes that break the current user experience. In this study, we introduce a scalable framework for supporting fine-grained exploration targets for individual domains via user-defined constraints. For example, we may want to ensure fewer policy deviations in business-critical domains such as shopping, while allocating more exploration budget to domains such as music. Furthermore, we present a novel meta-gradient learning approach that is scalable and practical to address this problem. The proposed method adjusts constraint violation penalty terms adaptively through a meta objective that encourages balanced constraint satisfaction across domains. We conduct extensive experiments using data from a real-world conversational AI on a set of realistic constraint benchmarks. Based on the experimental results, we demonstrate that the proposed approach is capable of achieving the best balance between the policy value and constraint satisfaction rate

    Distributed expertise: Qualitative study of a British network of multidisciplinary teams supporting parents of children with chronic kidney disease

    Get PDF
    © 2014 The Authors. Background: Long-term childhood conditions are often managed by hospital-based multidisciplinary teams (MDTs) of professionals with discipline specific expertise of a condition, in partnership with parents. However, little evidence exists on professional-parent interactions in this context. An exploration of professionals' accounts of the way they individually and collectively teach parents to manage their child's clinical care at home is, therefore, important for meeting parents' needs, informing policy and educating novice professionals. Using chronic kidney disease as an exemplar this paper reports on one aspect of a study of interactions between professionals and parents in a network of 12 children's kidney units in Britain. Methods: We conducted semi-structured, qualitative interviews with a convenience sample of 112 professionals (clinical-psychologists, dietitians, doctors, nurses, pharmacists, play-workers, therapists and social workers), exploring accounts of their parent-educative activity. We analysed data using framework and the concept of distributed expertise. Results: Four themes emerged that related to the way expertise was distributed within and across teams: (i) recognizing each other's' expertise, (ii) sharing expertise within the MDT, (iii) language interpretation, and (iv) acting as brokers. Two different professional identifications were also seen to co-exist within MDTs, with participants using the term 'we' both as the intra-professional 'we' (relating to the professional identity) when describing expertise within a disciplinary group (for example: 'As dietitians we aim to give tailored advice to optimize children's growth'), and the inter-professional 'we' (a 'team-identification'), when discussing expertise within the team (for example: 'We work as a team and make sure we're all happy with every aspect of their training before they go home'). Conclusions: This study highlights the dual identifications implicit in 'being professional' in this context (to the team and to one's profession) as well as the unique role that each member of a team contributes to children's care. Our methodology and results have the potential to be transferred to teams managing other conditions

    VANTED: A system for advanced data analysis and visualization in the context of biological networks

    Get PDF
    BACKGROUND: Recent advances with high-throughput methods in life-science research have increased the need for automatized data analysis and visual exploration techniques. Sophisticated bioinformatics tools are essential to deduct biologically meaningful interpretations from the large amount of experimental data, and help to understand biological processes. RESULTS: We present VANTED, a tool for the visualization and analysis of networks with related experimental data. Data from large-scale biochemical experiments is uploaded into the software via a Microsoft Excel-based form. Then it can be mapped on a network that is either drawn with the tool itself, downloaded from the KEGG Pathway database, or imported using standard network exchange formats. Transcript, enzyme, and metabolite data can be presented in the context of their underlying networks, e. g. metabolic pathways or classification hierarchies. Visualization and navigation methods support the visual exploration of the data-enriched networks. Statistical methods allow analysis and comparison of multiple data sets such as different developmental stages or genetically different lines. Correlation networks can be automatically generated from the data and substances can be clustered according to similar behavior over time. As examples, metabolite profiling and enzyme activity data sets have been visualized in different metabolic maps, correlation networks have been generated and similar time patterns detected. Some relationships between different metabolites were discovered which are in close accordance with the literature. CONCLUSION: VANTED greatly helps researchers in the analysis and interpretation of biochemical data, and thus is a useful tool for modern biological research. VANTED as a Java Web Start Application including a user guide and example data sets is available free of charge at

    Convergence of a Reinforcement Learning Algorithm in Continuous Domains

    Get PDF
    In the field of Reinforcement Learning, Markov Decision Processes with a finite number of states and actions have been well studied, and there exist algorithms capable of producing a sequence of policies which converge to an optimal policy with probability one. Convergence guarantees for problems with continuous states also exist. Until recently, no online algorithm for continuous states and continuous actions has been proven to produce optimal policies. This Dissertation contains the results of research into reinforcement learning algorithms for problems in which both the state and action spaces are continuous. The problems to be solved are introduced formally as Markov Decision Processes. Also introduced is a value-function solution method known as Q-learning. The primary result of this Dissertation is the presentation of a Q-learning type algorithm adapted for continuous states and actions, and the proof that it asymptotically learns an optimal policy with probability one. While the algorithm is intended to advance the theory of continuous domain reinforcement learning, an example is given to show that with appropriate exploration policies, it can produce satisfactory solutions to non-trivial benchmark problems. Kernel regression based algorithms have excellent theoretical properties, but have high computational cost and do not adapt well to high-dimensional problems. A class of batch-mode regression tree-based algorithms is introduced. These algorithms are modular in the sense that different methods for partitioning, performing local regression, and choosing representative actions can be chosen. Experiments demonstrate superior performance over kernel methods. Batch algorithms possess superior computational efficiency, but pay the price of not being able to use past observations to inform exploration. A data structure useful for limited learning during the exploration phase is introduced. It is then demonstrated that this limited learning can outperform batch algorithms using totally random action exploration

    Two axes re-ordering methods in parallel coordinates plots

    Full text link
    © 2015 Elsevier Ltd. Visualization and interaction of multidimensional data are challenges in visual data analytics, which requires optimized solutions to integrate the display, exploration and analytical reasoning of data into one visual pipeline for human-centered data analysis and interpretation. Even though it is considered to be one of the most popular techniques for visualization and analysis of multidimensional data, parallel coordinate visualization is also suffered from the visual clutter problem as well as the computational complexity problem, same as other visualization methods in which visual clutter occurs where the volume of data needs to be visualized to be increasing. One straightforward way to address these problems is to change the ordering of axis to reach the minimal number of visual clutters. However, the optimization of the ordering of axes is actually a NP-complete problem. In this paper, two axes re-ordering methods are proposed in parallel coordinates visualization: (1) a contribution-based method and (2) a similarity-based method.The contribution-based re-ordering method is mainly based on the singular value decomposition (SVD) algorithm. It can not only provide users with the mathmetical theory for the selection of the first remarkable axis, but also help with visualizing detailed structure of the data according to the contribution of each data dimension. This approach reduces the computational complexity greatly in comparison with other re-ordering methods. A similarity-based re-ordering method is based on the combination of nonlinear correlation coefficient (NCC) and SVD algorithms. By using this approach, axes are re-ordered in line with the degree of similarities among them. It is much more rational, exact and systemic than other re-ordering methods, including those based on Pearson's correlation coefficient (PCC). Meanwhile, the paper also proposes a measurement of contribution rate of each dimension to reveal the property hidden in the dataset. At last, the rationale and effectiveness of these approaches are demonstrated through case studies. For example, the patterns of Smurf and Neptune attacks hidden in KDD 1999 dataset are visualized in parallel coordinates using contribution-based re-ordering method; NCC re-ordering method can enlarge the mean crossing angles and reduce the amount of polylines between the neighboring axes
    • 

    corecore