956,454 research outputs found
Hypothesis exploration with visualization of variance.
BackgroundThe Consortium for Neuropsychiatric Phenomics (CNP) at UCLA was an investigation into the biological bases of traits such as memory and response inhibition phenotypes-to explore whether they are linked to syndromes including ADHD, Bipolar disorder, and Schizophrenia. An aim of the consortium was in moving from traditional categorical approaches for psychiatric syndromes towards more quantitative approaches based on large-scale analysis of the space of human variation. It represented an application of phenomics-wide-scale, systematic study of phenotypes-to neuropsychiatry research.ResultsThis paper reports on a system for exploration of hypotheses in data obtained from the LA2K, LA3C, and LA5C studies in CNP. ViVA is a system for exploratory data analysis using novel mathematical models and methods for visualization of variance. An example of these methods is called VISOVA, a combination of visualization and analysis of variance, with the flavor of exploration associated with ANOVA in biomedical hypothesis generation. It permits visual identification of phenotype profiles-patterns of values across phenotypes-that characterize groups. Visualization enables screening and refinement of hypotheses about variance structure of sets of phenotypes.ConclusionsThe ViVA system was designed for exploration of neuropsychiatric hypotheses by interdisciplinary teams. Automated visualization in ViVA supports 'natural selection' on a pool of hypotheses, and permits deeper understanding of the statistical architecture of the data. Large-scale perspective of this kind could lead to better neuropsychiatric diagnostics
Bayesian model predictive control: Efficient model exploration and regret bounds using posterior sampling
Tight performance specifications in combination with operational constraints
make model predictive control (MPC) the method of choice in various industries.
As the performance of an MPC controller depends on a sufficiently accurate
objective and prediction model of the process, a significant effort in the MPC
design procedure is dedicated to modeling and identification. Driven by the
increasing amount of available system data and advances in the field of machine
learning, data-driven MPC techniques have been developed to facilitate the MPC
controller design. While these methods are able to leverage available data,
they typically do not provide principled mechanisms to automatically trade off
exploitation of available data and exploration to improve and update the
objective and prediction model. To this end, we present a learning-based MPC
formulation using posterior sampling techniques, which provides finite-time
regret bounds on the learning performance while being simple to implement using
off-the-shelf MPC software and algorithms. The performance analysis of the
method is based on posterior sampling theory and its practical efficiency is
illustrated using a numerical example of a highly nonlinear dynamical
car-trailer system
The Millennium Run Observatory: First Light
Simulations of galaxy evolution aim to capture our current understanding as
well as to make predictions for testing by future experiments. Simulations and
observations are often compared in an indirect fashion: physical quantities are
estimated from the data and compared to models. However, many applications can
benefit from a more direct approach, where the observing process is also
simulated and the models are seen fully from the observer's perspective. To
facilitate this, we have developed the Millennium Run Observatory (MRObs), a
theoretical virtual observatory which uses virtual telescopes to `observe'
semi-analytic galaxy formation models based on the suite of Millennium Run dark
matter simulations. The MRObs produces data that can be processed and analyzed
using the standard software packages developed for real observations. At
present, we produce images in forty filters from the rest-frame UV to IR for
two stellar population synthesis models, three different models of IGM
absorption, and two cosmologies (WMAP1/7). Galaxy distributions for a large
number of mock lightcones can be `observed' using models of major ground- and
space-based telescopes. The data include lightcone catalogues linked to
structural properties of galaxies, pre-observation model images, mock telescope
images, and Source Extractor products that can all be traced back to the higher
level dark matter, semi-analytic galaxy, and lightcone catalogues available in
the Millennium database. Here, we describe our methods and announce a first
public release of simulated surveys (e.g., SDSS, CFHT-LS, GOODS, GOODS/ERS,
CANDELS, and HUDF). The MRObs browser, an online tool, further facilitates
exploration of the simulated data. We demonstrate the benefits of a direct
approach through a number of example applications (galaxy number counts in
CANDELS, clusters, morphologies, and dropout selections).Comment: MNRAS, in press. Millennium Run Observatory data products, online
tools, and more available through http://galformod.mpa-garching.mpg.de/mrobs
Language for Specific Purposes and Corpus-based Pedagogy
This chapter describes how corpus-based pedagogies are used for teaching and learning language for specific purposes (LSP). Corpus linguistics (CL) refers to the study of large quantities of authentic language using computer-assisted methods, which form the basis for computer-assisted language learning (CALL) that uses corpora for reference, exploration, and interactive learning. The use of corpora as reference resources to create LSP materials is described. Direct student uses of corpora are illustrated by three approaches to data-driven learning (DDL) where students engage in hands-on explorations of texts. A combination of indirect and direct corpus applications is shown in an illustration of interactive CALL technologies, including an example of an inclusive corpus-based tool for genre-based writing pedagogy. The chapter concludes with potential prospects for future developments in LSP
Minimizing User Effort in Large Scale Example-driven Data Exploration
Data Exploration is a key ingredient in a widely diverse set of discovery-oriented applications, including scientific computing, financial analysis, and evidence-based medicine. It refers to a series of exploratory tasks that aim to extract useful pieces of knowledge from data, and its challenge is to do so without requiring the user to specify with precision what information is being searched for. The goal of assisting users in constructing their exploratory queries effortlessly, which effectively reveals interesting data objects, has led to the development of a variety of intelligent semi-automatic approaches. Among such approaches, Example-driven Exploration is rapidly becoming an attractive choice for exploratory query formulation since it attempts to minimize the amount of prior knowledge required from the user to form an accurate exploratory query.
In particular, this dissertation focuses on interactive Example-driven Exploration, which steers the user towards discovering all data objects relevant to the usersâ exploration based on their feedback on a small set of examples. Interactive Example-driven Exploration is especially beneficial for non-expert users, as it enables them to circumvent query languages by assigning relevancy to examples as a proxy for the intended exploratory analysis. However, existing interactive Example-driven Exploration systems fall short of supporting the need to perform complex explorations over large, unstructured high-dimensional data. To overcome these challenges, we have developed new methods of data reduction, example selection, data indexing, and result refinement that support practical, interactive data exploration.
The novelty of our approach is anchored on leveraging active learning and query optimization techniques that strike a balance between maximizing accuracy and minimizing user effort in providing feedback while enabling interactive performance for exploration tasks with arbitrary, large-sized datasets. Furthermore, it extends the exploration beyond the structured data by supporting a variety of high-dimensional unstructured data and enables the refinement of results when the exploration task is associated with too many relevant data objects that could be overwhelming to the user. To affirm the effectiveness of our proposed models, techniques, and algorithms, we implemented multiple prototype systems and evaluated them using real datasets. Some of them were also used in domain-specific analytics tools
Constrained Policy Optimization for Controlled Self-Learning in Conversational AI Systems
Recently, self-learning methods based on user satisfaction metrics and
contextual bandits have shown promising results to enable consistent
improvements in conversational AI systems. However, directly targeting such
metrics by off-policy bandit learning objectives often increases the risk of
making abrupt policy changes that break the current user experience. In this
study, we introduce a scalable framework for supporting fine-grained
exploration targets for individual domains via user-defined constraints. For
example, we may want to ensure fewer policy deviations in business-critical
domains such as shopping, while allocating more exploration budget to domains
such as music. Furthermore, we present a novel meta-gradient learning approach
that is scalable and practical to address this problem. The proposed method
adjusts constraint violation penalty terms adaptively through a meta objective
that encourages balanced constraint satisfaction across domains. We conduct
extensive experiments using data from a real-world conversational AI on a set
of realistic constraint benchmarks. Based on the experimental results, we
demonstrate that the proposed approach is capable of achieving the best balance
between the policy value and constraint satisfaction rate
Distributed expertise: Qualitative study of a British network of multidisciplinary teams supporting parents of children with chronic kidney disease
© 2014 The Authors. Background: Long-term childhood conditions are often managed by hospital-based multidisciplinary teams (MDTs) of professionals with discipline specific expertise of a condition, in partnership with parents. However, little evidence exists on professional-parent interactions in this context. An exploration of professionals' accounts of the way they individually and collectively teach parents to manage their child's clinical care at home is, therefore, important for meeting parents' needs, informing policy and educating novice professionals. Using chronic kidney disease as an exemplar this paper reports on one aspect of a study of interactions between professionals and parents in a network of 12 children's kidney units in Britain. Methods: We conducted semi-structured, qualitative interviews with a convenience sample of 112 professionals (clinical-psychologists, dietitians, doctors, nurses, pharmacists, play-workers, therapists and social workers), exploring accounts of their parent-educative activity. We analysed data using framework and the concept of distributed expertise. Results: Four themes emerged that related to the way expertise was distributed within and across teams: (i) recognizing each other's' expertise, (ii) sharing expertise within the MDT, (iii) language interpretation, and (iv) acting as brokers. Two different professional identifications were also seen to co-exist within MDTs, with participants using the term 'we' both as the intra-professional 'we' (relating to the professional identity) when describing expertise within a disciplinary group (for example: 'As dietitians we aim to give tailored advice to optimize children's growth'), and the inter-professional 'we' (a 'team-identification'), when discussing expertise within the team (for example: 'We work as a team and make sure we're all happy with every aspect of their training before they go home'). Conclusions: This study highlights the dual identifications implicit in 'being professional' in this context (to the team and to one's profession) as well as the unique role that each member of a team contributes to children's care. Our methodology and results have the potential to be transferred to teams managing other conditions
VANTED: A system for advanced data analysis and visualization in the context of biological networks
BACKGROUND: Recent advances with high-throughput methods in life-science research have increased the need for automatized data analysis and visual exploration techniques. Sophisticated bioinformatics tools are essential to deduct biologically meaningful interpretations from the large amount of experimental data, and help to understand biological processes. RESULTS: We present VANTED, a tool for the visualization and analysis of networks with related experimental data. Data from large-scale biochemical experiments is uploaded into the software via a Microsoft Excel-based form. Then it can be mapped on a network that is either drawn with the tool itself, downloaded from the KEGG Pathway database, or imported using standard network exchange formats. Transcript, enzyme, and metabolite data can be presented in the context of their underlying networks, e. g. metabolic pathways or classification hierarchies. Visualization and navigation methods support the visual exploration of the data-enriched networks. Statistical methods allow analysis and comparison of multiple data sets such as different developmental stages or genetically different lines. Correlation networks can be automatically generated from the data and substances can be clustered according to similar behavior over time. As examples, metabolite profiling and enzyme activity data sets have been visualized in different metabolic maps, correlation networks have been generated and similar time patterns detected. Some relationships between different metabolites were discovered which are in close accordance with the literature. CONCLUSION: VANTED greatly helps researchers in the analysis and interpretation of biochemical data, and thus is a useful tool for modern biological research. VANTED as a Java Web Start Application including a user guide and example data sets is available free of charge at
Convergence of a Reinforcement Learning Algorithm in Continuous Domains
In the field of Reinforcement Learning, Markov Decision Processes with a finite number of states and actions have been well studied, and there exist algorithms capable of producing a sequence of policies which converge to an optimal policy with probability one. Convergence guarantees for problems with continuous states also exist. Until recently, no online algorithm for continuous states and continuous actions has been proven to produce optimal policies. This Dissertation contains the results of research into reinforcement learning algorithms for problems in which both the state and action spaces are continuous. The problems to be solved are introduced formally as Markov Decision Processes. Also introduced is a value-function solution method known as Q-learning. The primary result of this Dissertation is the presentation of a Q-learning type algorithm adapted for continuous states and actions, and the proof that it asymptotically learns an optimal policy with probability one. While the algorithm is intended to advance the theory of continuous domain reinforcement learning, an example is given to show that with appropriate exploration policies, it can produce satisfactory solutions to non-trivial benchmark problems. Kernel regression based algorithms have excellent theoretical properties, but have high computational cost and do not adapt well to high-dimensional problems. A class of batch-mode regression tree-based algorithms is introduced. These algorithms are modular in the sense that different methods for partitioning, performing local regression, and choosing representative actions can be chosen. Experiments demonstrate superior performance over kernel methods. Batch algorithms possess superior computational efficiency, but pay the price of not being able to use past observations to inform exploration. A data structure useful for limited learning during the exploration phase is introduced. It is then demonstrated that this limited learning can outperform batch algorithms using totally random action exploration
Two axes re-ordering methods in parallel coordinates plots
© 2015 Elsevier Ltd. Visualization and interaction of multidimensional data are challenges in visual data analytics, which requires optimized solutions to integrate the display, exploration and analytical reasoning of data into one visual pipeline for human-centered data analysis and interpretation. Even though it is considered to be one of the most popular techniques for visualization and analysis of multidimensional data, parallel coordinate visualization is also suffered from the visual clutter problem as well as the computational complexity problem, same as other visualization methods in which visual clutter occurs where the volume of data needs to be visualized to be increasing. One straightforward way to address these problems is to change the ordering of axis to reach the minimal number of visual clutters. However, the optimization of the ordering of axes is actually a NP-complete problem. In this paper, two axes re-ordering methods are proposed in parallel coordinates visualization: (1) a contribution-based method and (2) a similarity-based method.The contribution-based re-ordering method is mainly based on the singular value decomposition (SVD) algorithm. It can not only provide users with the mathmetical theory for the selection of the first remarkable axis, but also help with visualizing detailed structure of the data according to the contribution of each data dimension. This approach reduces the computational complexity greatly in comparison with other re-ordering methods. A similarity-based re-ordering method is based on the combination of nonlinear correlation coefficient (NCC) and SVD algorithms. By using this approach, axes are re-ordered in line with the degree of similarities among them. It is much more rational, exact and systemic than other re-ordering methods, including those based on Pearson's correlation coefficient (PCC). Meanwhile, the paper also proposes a measurement of contribution rate of each dimension to reveal the property hidden in the dataset. At last, the rationale and effectiveness of these approaches are demonstrated through case studies. For example, the patterns of Smurf and Neptune attacks hidden in KDD 1999 dataset are visualized in parallel coordinates using contribution-based re-ordering method; NCC re-ordering method can enlarge the mean crossing angles and reduce the amount of polylines between the neighboring axes
- âŠ