Search CORE

956,454 research outputs found

Hypothesis exploration with visualization of variance.

Author: Bilder Robert M
Congdon Eliza
Parker Douglass Stott
Publication venue: eScholarship, University of California
Publication date: 01/01/2014
Field of study

BackgroundThe Consortium for Neuropsychiatric Phenomics (CNP) at UCLA was an investigation into the biological bases of traits such as memory and response inhibition phenotypes-to explore whether they are linked to syndromes including ADHD, Bipolar disorder, and Schizophrenia. An aim of the consortium was in moving from traditional categorical approaches for psychiatric syndromes towards more quantitative approaches based on large-scale analysis of the space of human variation. It represented an application of phenomics-wide-scale, systematic study of phenotypes-to neuropsychiatry research.ResultsThis paper reports on a system for exploration of hypotheses in data obtained from the LA2K, LA3C, and LA5C studies in CNP. ViVA is a system for exploratory data analysis using novel mathematical models and methods for visualization of variance. An example of these methods is called VISOVA, a combination of visualization and analysis of variance, with the flavor of exploration associated with ANOVA in biomedical hypothesis generation. It permits visual identification of phenotype profiles-patterns of values across phenotypes-that characterize groups. Visualization enables screening and refinement of hypotheses about variance structure of sets of phenotypes.ConclusionsThe ViVA system was designed for exploration of neuropsychiatric hypotheses by interdisciplinary teams. Automated visualization in ViVA supports 'natural selection' on a pool of hypotheses, and permits deeper understanding of the statistical architecture of the data. Large-scale perspective of this kind could lead to better neuropsychiatric diagnostics

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

Bayesian model predictive control: Efficient model exploration and regret bounds using posterior sampling

Author: Wabersich Kim P.
Zeilinger Melanie N.
Publication venue
Publication date: 08/06/2020
Field of study

Tight performance specifications in combination with operational constraints make model predictive control (MPC) the method of choice in various industries. As the performance of an MPC controller depends on a sufficiently accurate objective and prediction model of the process, a significant effort in the MPC design procedure is dedicated to modeling and identification. Driven by the increasing amount of available system data and advances in the field of machine learning, data-driven MPC techniques have been developed to facilitate the MPC controller design. While these methods are able to leverage available data, they typically do not provide principled mechanisms to automatically trade off exploitation of available data and exploration to improve and update the objective and prediction model. To this end, we present a learning-based MPC formulation using posterior sampling techniques, which provides finite-time regret bounds on the learning performance while being simple to implement using off-the-shelf MPC software and algorithms. The performance analysis of the method is based on posterior sampling theory and its practical efficiency is illustrated using a numerical example of a highly nonlinear dynamical car-trailer system

arXiv.org e-Print Archive

Repository for Publications and Research Data

The Millennium Run Observatory: First Light

Author: Angulo
Angulo
B. M. B. Henriques
Baugh
Baugh
Benson
Bershady
Bertin
Bertin
Bertin
Bertone
Blaizot
Blaizot
Bouwens
Bouwens
Bouwens
Bower
Boylan-Kolchin
Bruzual
Cardelli
Carlson
Casertano
Cattaneo
Cattaneo
Chilingarian
Cole
Cole
Cole
Connolly
Conroy
Croton
Cucciati
Davis
Davis
de la Torre
De Lucia
Diaferio
Dobke
E. Bertin
Erben
Forero-Romero
Fruchter
G. Lemson
G.-D. Marleau
Gibson
Girardi
Grogin
Guo
Guo
Guo
Hamana
Harrison
Harsono
Hatton
Henriques
Henriques
Heymans
Inoue
J. Blaizot
Jee
Jenkins
Jonsson
Jonsson
Kang
Kauffmann
Kauffmann
Kauffmann
Kauffmann
Kitzbichler
Klypin
Koekemoer
Koekemoer
Kriek
Lacey
Leitherer
Lemson
Lotz
Lotz
Madau
Maraston
Maraston
Marinoni
Meiksin
Monaco
Neistein
Ouchi
Overzier
Overzier
Papovich
Pforr
Pollo
Prada
Press
R. E. Angulo
R. Overzier
Robertson
Ruiz
S. D. M. White
Sheth
Somerville
Somerville
Somerville
Somerville
Sousbie
Spergel
Springel
Springel
Teyssier
Tinsley
Tonini
White
Windhorst
Wuyts
Publication venue: 'Oxford University Press (OUP)'
Publication date: 28/09/2012
Field of study

Simulations of galaxy evolution aim to capture our current understanding as well as to make predictions for testing by future experiments. Simulations and observations are often compared in an indirect fashion: physical quantities are estimated from the data and compared to models. However, many applications can benefit from a more direct approach, where the observing process is also simulated and the models are seen fully from the observer's perspective. To facilitate this, we have developed the Millennium Run Observatory (MRObs), a theoretical virtual observatory which uses virtual telescopes to `observe' semi-analytic galaxy formation models based on the suite of Millennium Run dark matter simulations. The MRObs produces data that can be processed and analyzed using the standard software packages developed for real observations. At present, we produce images in forty filters from the rest-frame UV to IR for two stellar population synthesis models, three different models of IGM absorption, and two cosmologies (WMAP1/7). Galaxy distributions for a large number of mock lightcones can be `observed' using models of major ground- and space-based telescopes. The data include lightcone catalogues linked to structural properties of galaxies, pre-observation model images, mock telescope images, and Source Extractor products that can all be traced back to the higher level dark matter, semi-analytic galaxy, and lightcone catalogues available in the Millennium database. Here, we describe our methods and announce a first public release of simulated surveys (e.g., SDSS, CFHT-LS, GOODS, GOODS/ERS, CANDELS, and HUDF). The MRObs browser, an online tool, further facilitates exploration of the simulated data. We demonstrate the benefits of a direct approach through a number of example applications (galaxy number counts in CANDELS, clusters, morphologies, and dropout selections).Comment: MNRAS, in press. Millennium Run Observatory data products, online tools, and more available through http://galformod.mpa-garching.mpg.de/mrobs

arXiv.org e-Print Archive

Language for Specific Purposes and Corpus-based Pedagogy

Author: Anthony Laurence
Aston Guy
Barlow Michael
Benini Aldo
Biber Douglas
Biber Douglas
Boulton Alex
Bunton David
Chen Yu‐Hua
Cortes Viviana
Crawford Camiciottoli Belinda
Dudley‐Evans Tony
Ferguson Charles
Granger Sylviane
Greaves Chris
Hegelheimer Volker
Hyland Ken
McCarthy Philip M.
Popescu Teodora
Robinson Marin S.
Scott Mike
Sinclair John
Swales John M.
Swales John M.
Swales John M.
Teubert Wolfgang
Thompson Paul
Thurstun Jennifer
Vihla Minna
Williams Geoffrey
Publication venue: Iowa State University Digital Repository
Publication date: 30/06/2017
Field of study

This chapter describes how corpus-based pedagogies are used for teaching and learning language for specific purposes (LSP). Corpus linguistics (CL) refers to the study of large quantities of authentic language using computer-assisted methods, which form the basis for computer-assisted language learning (CALL) that uses corpora for reference, exploration, and interactive learning. The use of corpora as reference resources to create LSP materials is described. Direct student uses of corpora are illustrated by three approaches to data-driven learning (DDL) where students engage in hands-on explorations of texts. A combination of indirect and direct corpus applications is shown in an illustration of interactive CALL technologies, including an example of an inclusive corpus-based tool for genre-based writing pedagogy. The chapter concludes with potential prospects for future developments in LSP

Digital Repository @ Iowa State University (ISU)

Crossref

Minimizing User Effort in Large Scale Example-driven Data Exploration

Author: Ge Xiaoyu
Publication venue
Publication date: 08/10/2021
Field of study

Data Exploration is a key ingredient in a widely diverse set of discovery-oriented applications, including scientific computing, financial analysis, and evidence-based medicine. It refers to a series of exploratory tasks that aim to extract useful pieces of knowledge from data, and its challenge is to do so without requiring the user to specify with precision what information is being searched for. The goal of assisting users in constructing their exploratory queries effortlessly, which effectively reveals interesting data objects, has led to the development of a variety of intelligent semi-automatic approaches. Among such approaches, Example-driven Exploration is rapidly becoming an attractive choice for exploratory query formulation since it attempts to minimize the amount of prior knowledge required from the user to form an accurate exploratory query. In particular, this dissertation focuses on interactive Example-driven Exploration, which steers the user towards discovering all data objects relevant to the users’ exploration based on their feedback on a small set of examples. Interactive Example-driven Exploration is especially beneficial for non-expert users, as it enables them to circumvent query languages by assigning relevancy to examples as a proxy for the intended exploratory analysis. However, existing interactive Example-driven Exploration systems fall short of supporting the need to perform complex explorations over large, unstructured high-dimensional data. To overcome these challenges, we have developed new methods of data reduction, example selection, data indexing, and result refinement that support practical, interactive data exploration. The novelty of our approach is anchored on leveraging active learning and query optimization techniques that strike a balance between maximizing accuracy and minimizing user effort in providing feedback while enabling interactive performance for exploration tasks with arbitrary, large-sized datasets. Furthermore, it extends the exploration beyond the structured data by supporting a variety of high-dimensional unstructured data and enables the refinement of results when the exploration task is associated with too many relevant data objects that could be overwhelming to the user. To affirm the effectiveness of our proposed models, techniques, and algorithms, we implemented multiple prototype systems and evaluated them using real datasets. Some of them were also used in domain-specific analytics tools

D-Scholarship@Pitt

Constrained Policy Optimization for Controlled Self-Learning in Conversational AI Systems

Author: Kachuee Mohammad
Lee Sungjin
Publication venue
Publication date: 17/09/2022
Field of study

Recently, self-learning methods based on user satisfaction metrics and contextual bandits have shown promising results to enable consistent improvements in conversational AI systems. However, directly targeting such metrics by off-policy bandit learning objectives often increases the risk of making abrupt policy changes that break the current user experience. In this study, we introduce a scalable framework for supporting fine-grained exploration targets for individual domains via user-defined constraints. For example, we may want to ensure fewer policy deviations in business-critical domains such as shopping, while allocating more exploration budget to domains such as music. Furthermore, we present a novel meta-gradient learning approach that is scalable and practical to address this problem. The proposed method adjusts constraint violation penalty terms adaptively through a meta objective that encourages balanced constraint satisfaction across domains. We conduct extensive experiments using data from a real-world conversational AI on a set of realistic constraint benchmarks. Based on the experimental results, we demonstrate that the proposed approach is capable of achieving the best balance between the policy value and constraint satisfaction rate

arXiv.org e-Print Archive

Distributed expertise: Qualitative study of a British network of multidisciplinary teams supporting parents of children with chronic kidney disease

Author: Birch A.
Brennan E.
Crowther L.
King D.
Krischock L.
Lambert H.
Lunn A.
Milford D.V.
Qizalbash L.
Saleem M.
Sinha M.D.
Smith T.
Swallow Veronica
van der Voort J.
Webb N.J.A.
Williams J.
Wirz L.
Publication venue: 'Wiley'
Publication date: 01/01/2014
Field of study

© 2014 The Authors. Background: Long-term childhood conditions are often managed by hospital-based multidisciplinary teams (MDTs) of professionals with discipline specific expertise of a condition, in partnership with parents. However, little evidence exists on professional-parent interactions in this context. An exploration of professionals' accounts of the way they individually and collectively teach parents to manage their child's clinical care at home is, therefore, important for meeting parents' needs, informing policy and educating novice professionals. Using chronic kidney disease as an exemplar this paper reports on one aspect of a study of interactions between professionals and parents in a network of 12 children's kidney units in Britain. Methods: We conducted semi-structured, qualitative interviews with a convenience sample of 112 professionals (clinical-psychologists, dietitians, doctors, nurses, pharmacists, play-workers, therapists and social workers), exploring accounts of their parent-educative activity. We analysed data using framework and the concept of distributed expertise. Results: Four themes emerged that related to the way expertise was distributed within and across teams: (i) recognizing each other's' expertise, (ii) sharing expertise within the MDT, (iii) language interpretation, and (iv) acting as brokers. Two different professional identifications were also seen to co-exist within MDTs, with participants using the term 'we' both as the intra-professional 'we' (relating to the professional identity) when describing expertise within a disciplinary group (for example: 'As dietitians we aim to give tailored advice to optimize children's growth'), and the inter-professional 'we' (a 'team-identification'), when discussing expertise within the team (for example: 'We work as a team and make sure we're all happy with every aspect of their training before they go home'). Conclusions: This study highlights the dual identifications implicit in 'being professional' in this context (to the team and to one's profession) as well as the unique role that each member of a team contributes to children's care. Our methodology and results have the potential to be transferred to teams managing other conditions

Crossref

Sheffield Hallam University Research Archive

PubMed Central

The University of Manchester - Institutional Repository

White Rose Research Online

VANTED: A system for advanced data analysis and visualization in the context of biological networks

Author: Junker Björn H
Klukas Christian
Schreiber Falk
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Recent advances with high-throughput methods in life-science research have increased the need for automatized data analysis and visual exploration techniques. Sophisticated bioinformatics tools are essential to deduct biologically meaningful interpretations from the large amount of experimental data, and help to understand biological processes. RESULTS: We present VANTED, a tool for the visualization and analysis of networks with related experimental data. Data from large-scale biochemical experiments is uploaded into the software via a Microsoft Excel-based form. Then it can be mapped on a network that is either drawn with the tool itself, downloaded from the KEGG Pathway database, or imported using standard network exchange formats. Transcript, enzyme, and metabolite data can be presented in the context of their underlying networks, e. g. metabolic pathways or classification hierarchies. Visualization and navigation methods support the visual exploration of the data-enriched networks. Statistical methods allow analysis and comparison of multiple data sets such as different developmental stages or genetically different lines. Correlation networks can be automatically generated from the data and substances can be clustered according to similar behavior over time. As examples, metabolite profiling and enzyme activity data sets have been visualized in different metabolic maps, correlation networks have been generated and similar time patterns detected. Some relationships between different metabolites were discovered which are in close accordance with the literature. CONCLUSION: VANTED greatly helps researchers in the analysis and interpretation of biochemical data, and thus is a useful tool for modern biological research. VANTED as a Java Web Start Application including a user guide and example data sets is available free of charge at

KOPS - The Institutional Repository of the University of Konstanz

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Convergence of a Reinforcement Learning Algorithm in Continuous Domains

Author: Carden Stephen
Publication venue: Clemson University Libraries
Publication date: 01/08/2014
Field of study

In the field of Reinforcement Learning, Markov Decision Processes with a finite number of states and actions have been well studied, and there exist algorithms capable of producing a sequence of policies which converge to an optimal policy with probability one. Convergence guarantees for problems with continuous states also exist. Until recently, no online algorithm for continuous states and continuous actions has been proven to produce optimal policies. This Dissertation contains the results of research into reinforcement learning algorithms for problems in which both the state and action spaces are continuous. The problems to be solved are introduced formally as Markov Decision Processes. Also introduced is a value-function solution method known as Q-learning. The primary result of this Dissertation is the presentation of a Q-learning type algorithm adapted for continuous states and actions, and the proof that it asymptotically learns an optimal policy with probability one. While the algorithm is intended to advance the theory of continuous domain reinforcement learning, an example is given to show that with appropriate exploration policies, it can produce satisfactory solutions to non-trivial benchmark problems. Kernel regression based algorithms have excellent theoretical properties, but have high computational cost and do not adapt well to high-dimensional problems. A class of batch-mode regression tree-based algorithms is introduced. These algorithms are modular in the sense that different methods for partitioning, performing local regression, and choosing representative actions can be chosen. Experiments demonstrate superior performance over kernel methods. Batch algorithms possess superior computational efficiency, but pay the price of not being able to use past observations to inform exploration. A data structure useful for limited learning during the exploration phase is introduced. It is then demonstrated that this limited learning can outperform batch algorithms using totally random action exploration

Clemson University: TigerPrints

Two axes re-ordering methods in parallel coordinates plots

Author: Huang ML
Lu LF
Zhang J
Publication venue: 'Elsevier BV'
Publication date: 01/04/2016
Field of study

© 2015 Elsevier Ltd. Visualization and interaction of multidimensional data are challenges in visual data analytics, which requires optimized solutions to integrate the display, exploration and analytical reasoning of data into one visual pipeline for human-centered data analysis and interpretation. Even though it is considered to be one of the most popular techniques for visualization and analysis of multidimensional data, parallel coordinate visualization is also suffered from the visual clutter problem as well as the computational complexity problem, same as other visualization methods in which visual clutter occurs where the volume of data needs to be visualized to be increasing. One straightforward way to address these problems is to change the ordering of axis to reach the minimal number of visual clutters. However, the optimization of the ordering of axes is actually a NP-complete problem. In this paper, two axes re-ordering methods are proposed in parallel coordinates visualization: (1) a contribution-based method and (2) a similarity-based method.The contribution-based re-ordering method is mainly based on the singular value decomposition (SVD) algorithm. It can not only provide users with the mathmetical theory for the selection of the first remarkable axis, but also help with visualizing detailed structure of the data according to the contribution of each data dimension. This approach reduces the computational complexity greatly in comparison with other re-ordering methods. A similarity-based re-ordering method is based on the combination of nonlinear correlation coefficient (NCC) and SVD algorithms. By using this approach, axes are re-ordered in line with the degree of similarities among them. It is much more rational, exact and systemic than other re-ordering methods, including those based on Pearson's correlation coefficient (PCC). Meanwhile, the paper also proposes a measurement of contribution rate of each dimension to reveal the property hidden in the dataset. At last, the rationale and effectiveness of these approaches are demonstrated through case studies. For example, the patterns of Smurf and Neptune attacks hidden in KDD 1999 dataset are visualized in parallel coordinates using contribution-based re-ordering method; NCC re-ordering method can enlarge the mean crossing angles and reduce the amount of polylines between the neighboring axes

OPUS - University of Technology Sydney