7,572 research outputs found
Advances in ranking and selection: variance estimation and constraints
In this thesis, we first show that the performance of ranking and selection (R&S) procedures in steady-state simulations depends highly on the quality of the variance estimates that are used. We study the performance of R&S procedures using three variance estimators --- overlapping area, overlapping Cramer--von Mises, and overlapping modified jackknifed Durbin--Watson estimators --- that show better long-run performance than other estimators previously used in conjunction with R&S procedures for steady-state simulations. We devote additional study to the development of the new overlapping modified jackknifed Durbin--Watson estimator and demonstrate some of its useful properties.
Next, we consider the problem of finding the best simulated system under a primary performance measure, while also satisfying stochastic constraints on secondary performance measures, known as constrained ranking and selection. We first present a new framework that allows certain systems to become dormant, halting sampling for those systems as the procedure continues. We also develop general procedures for constrained R&S that guarantee a nominal probability of correct selection, under any number of constraints and correlation across systems. In addition, we address new topics critical to efficiency of the these procedures, namely the allocation of error between feasibility check and selection, the use of common random numbers, and the cost of switching between simulated
systems.Ph.D.Committee Co-chairs: Sigrun Andradottir, Dave Goldsman and Seong-Hee Kim; Committee Members:Shabbir Ahmed and Brani Vidakovi
Bayesian stochastic blockmodeling
This chapter provides a self-contained introduction to the use of Bayesian
inference to extract large-scale modular structures from network data, based on
the stochastic blockmodel (SBM), as well as its degree-corrected and
overlapping generalizations. We focus on nonparametric formulations that allow
their inference in a manner that prevents overfitting, and enables model
selection. We discuss aspects of the choice of priors, in particular how to
avoid underfitting via increased Bayesian hierarchies, and we contrast the task
of sampling network partitions from the posterior distribution with finding the
single point estimate that maximizes it, while describing efficient algorithms
to perform either one. We also show how inferring the SBM can be used to
predict missing and spurious links, and shed light on the fundamental
limitations of the detectability of modular structures in networks.Comment: 44 pages, 16 figures. Code is freely available as part of graph-tool
at https://graph-tool.skewed.de . See also the HOWTO at
https://graph-tool.skewed.de/static/doc/demos/inference/inference.htm
Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems
Two emerging hardware trends will dominate the database system technology in
the near future: increasing main memory capacities of several TB per server and
massively parallel multi-core processing. Many algorithmic and control
techniques in current database technology were devised for disk-based systems
where I/O dominated the performance. In this work we take a new look at the
well-known sort-merge join which, so far, has not been in the focus of research
in scalable massively parallel multi-core data processing as it was deemed
inferior to hash joins. We devise a suite of new massively parallel sort-merge
(MPSM) join algorithms that are based on partial partition-based sorting.
Contrary to classical sort-merge joins, our MPSM algorithms do not rely on a
hard to parallelize final merge step to create one complete sort order. Rather
they work on the independently created runs in parallel. This way our MPSM
algorithms are NUMA-affine as all the sorting is carried out on local memory
partitions. An extensive experimental evaluation on a modern 32-core machine
with one TB of main memory proves the competitive performance of MPSM on large
main memory databases with billions of objects. It scales (almost) linearly in
the number of employed cores and clearly outperforms competing hash join
proposals - in particular it outperforms the "cutting-edge" Vectorwise parallel
query engine by a factor of four.Comment: VLDB201
Quantitative Trait Loci Involved in Sex Determination and Body Growth in the Gilthead Sea Bream (Sparus aurata L.) through Targeted Genome Scan
Among vertebrates, teleost fish exhibit a considerably wide range of sex determination patterns that may be influenced by extrinsic parameters. However even for model fish species like the zebrafish Danio rerio the precise mechanisms involved in primary sex determination have not been studied extensively. The zebrafish, a gonochoristic species, is lacking discernible sex chromosomes and the sex of juvenile fish is difficult to determine. Sequential protandrous hermaphrodite species provide distinct determination of the gender and allow studying the sex determination process by looking at the mechanism of sex reversal. This is the first attempt to understand the genetic basis of phenotypic variation for sex determination and body weight in a sequential protandrous hermaphrodite species, the gilthead sea bream (Sparus aurata). This work demonstrates a fast and efficient strategy for Quantitative Trait Loci (QTL) detection in the gilthead sea bream, a non-model but target hermaphrodite fish species. Therefore a comparative mapping approach was performed to query syntenies against two other Perciformes, the European sea bass (Dicentrarchus labrax), a gonochoristic species and the Asian sea bass (Lates calcarifer) a protandrous hermaphrodite. In this manner two significant QTLs, one QTL affecting both body weight and sex and one QTL affecting sex, were detected on the same linkage group. The co-segregation of the two QTLs provides a genomic base to the observed genetic correlation between these two traits in sea bream as well as in other teleosts. The identification of QTLs linked to sex reversal and growth, will contribute significantly to a better understanding of the complex nature of sex determination in S. aurata where most individuals reverse to the female sex at the age of two years through development and maturation of the ovarian portion of the gonad and regression of the testicular area. [Genomic sequences reported in this manuscript have been submitted to GenBank under accession numbers HQ021443–HQ021749.
Efficiently Finding Approximately-Optimal Queries for Improving Policies and Guaranteeing Safety
When a computational agent (called the “robot”) takes actions on behalf of a human user, it may be uncertain about the human’s preferences. The human may initially specify her preferences incompletely or inaccurately. In this case, the robot’s performance may be unsatisfactory or even cause negative side effects to the environment. There are approaches in the literature that may solve this problem. For example, the human can provide some demonstrations which clarify the robot’s uncertainty. The human may give real-time feedback to the robot’s behavior, or monitor the robot and stop the robot when it may perform anything dangerous. However, these methods typically require much of the human’s attention. Alternatively, the robot may estimate the human’s true preferences using the specified preferences, but this is error-prone and requires making assumptions on how the human specifies her preferences.
In this thesis, I consider a querying approach. Before taking any actions, the robot has a chance to query the human about her preferences. For example, the robot may query the human about which trajectory in a set of trajectories she likes the most, or whether the human cares about some side effects to the domain. After the human responds to the query, the robot expects to improve its performance and/or guarantee that its behavior is considered safe by the human.
If we do not impose any constraint on the number of queries the robot can pose, the robot may keep posing queries until it is absolutely certain about the human’s preferences. This may consume too much of the human’s cognitive load. The information obtained in the responses to some of the queries may only marginally improve the robot’s performance, which is not worth the human’s attention at all. So in the problems considered in this thesis, I constrain the number of queries that the robot can pose, or associate each query with a cost. The research question is how to efficiently find the most useful query under such constraints.
Finding a provably optimal query can be challenging since it is usually a combinatorial optimization problem. In this thesis, I contribute to providing efficient query selection algorithms under uncertainty. I first formulate the robot’s uncertainty as reward uncertainty and safety-constraint uncertainty. Under only reward uncertainty, I provide a query selection algorithm that finds approximately-optimal k-response queries. Under only safety-constraint uncertainty, I provide a query selection algorithm that finds an optimal k-element query to improve a known safe policy, and an algorithm that uses a set-cover-based query selection strategy to find an initial safe policy. Under both types of uncertainty simultaneously, I provide a batch-query-based querying method that empirically outperforms other baseline querying methods.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163125/1/shunzh_1.pd
The NASA Astrophysics Data System: Architecture
The powerful discovery capabilities available in the ADS bibliographic
services are possible thanks to the design of a flexible search and retrieval
system based on a relational database model. Bibliographic records are stored
as a corpus of structured documents containing fielded data and metadata, while
discipline-specific knowledge is segregated in a set of files independent of
the bibliographic data itself.
The creation and management of links to both internal and external resources
associated with each bibliography in the database is made possible by
representing them as a set of document properties and their attributes.
To improve global access to the ADS data holdings, a number of mirror sites
have been created by cloning the database contents and software on a variety of
hardware and software platforms.
The procedures used to create and manage the database and its mirrors have
been written as a set of scripts that can be run in either an interactive or
unsupervised fashion.
The ADS can be accessed at http://adswww.harvard.eduComment: 25 pages, 8 figures, 3 table
ConfidentCare: A Clinical Decision Support System for Personalized Breast Cancer Screening
Breast cancer screening policies attempt to achieve timely diagnosis by the
regular screening of apparently healthy women. Various clinical decisions are
needed to manage the screening process; those include: selecting the screening
tests for a woman to take, interpreting the test outcomes, and deciding whether
or not a woman should be referred to a diagnostic test. Such decisions are
currently guided by clinical practice guidelines (CPGs), which represent a
one-size-fits-all approach that are designed to work well on average for a
population, without guaranteeing that it will work well uniformly over that
population. Since the risks and benefits of screening are functions of each
patients features, personalized screening policies that are tailored to the
features of individuals are needed in order to ensure that the right tests are
recommended to the right woman. In order to address this issue, we present
ConfidentCare: a computer-aided clinical decision support system that learns a
personalized screening policy from the electronic health record (EHR) data.
ConfidentCare operates by recognizing clusters of similar patients, and
learning the best screening policy to adopt for each cluster. A cluster of
patients is a set of patients with similar features (e.g. age, breast density,
family history, etc.), and the screening policy is a set of guidelines on what
actions to recommend for a woman given her features and screening test scores.
ConfidentCare algorithm ensures that the policy adopted for every cluster of
patients satisfies a predefined accuracy requirement with a high level of
confidence. We show that our algorithm outperforms the current CPGs in terms of
cost-efficiency and false positive rates
- …