4,697 research outputs found
Statistical significance of variables driving systematic variation
There are a number of well-established methods such as principal components
analysis (PCA) for automatically capturing systematic variation due to latent
variables in large-scale genomic data. PCA and related methods may directly
provide a quantitative characterization of a complex biological variable that
is otherwise difficult to precisely define or model. An unsolved problem in
this context is how to systematically identify the genomic variables that are
drivers of systematic variation captured by PCA. Principal components (and
other estimates of systematic variation) are directly constructed from the
genomic variables themselves, making measures of statistical significance
artificially inflated when using conventional methods due to over-fitting. We
introduce a new approach called the jackstraw that allows one to accurately
identify genomic variables that are statistically significantly associated with
any subset or linear combination of principal components (PCs). The proposed
method can greatly simplify complex significance testing problems encountered
in genomics and can be utilized to identify the genomic variables significantly
associated with latent variables. Using simulation, we demonstrate that our
method attains accurate measures of statistical significance over a range of
relevant scenarios. We consider yeast cell-cycle gene expression data, and show
that the proposed method can be used to straightforwardly identify
statistically significant genes that are cell-cycle regulated. We also analyze
gene expression data from post-trauma patients, allowing the gene expression
data to provide a molecularly-driven phenotype. We find a greater enrichment
for inflammatory-related gene sets compared to using a clinically defined
phenotype. The proposed method provides a useful bridge between large-scale
quantifications of systematic variation and gene-level significance analyses.Comment: 35 pages, 1 table, 6 main figures, 7 supplementary figure
Price Discovery in Canadian Government Bond Futures and Spot Markets
In this paper we look at the relative information content of cash and futures prices for Canadian Government bonds. We follow the information-share approaches introduced by Hasbrouck (1995) and Harris et al (1995), applying the techniques in Gonzalo-Granger (1995), to evaluate the relative contributions of trading in the cash and futures markets to the price discovery process. Both approaches estimate a vector error correction model that permits the separation of long-run price movements from short-run market microstructure effects. As well, we follow Yan and Zivot (2004) who introduce size measures of a market's adjustment to a new equilibrium during the price discovery process. We find that, on an average day, just over 70% of price discovery occurs on the futures market where bid-ask spreads are lower and trading activity is higher. The size of the responses to shocks and the time taken to adjust to a new equilibrium are found to be significantly larger for the cash market.Financial markets; Market structure and pricing
Testing and Learning on Distributions with Symmetric Noise Invariance
Kernel embeddings of distributions and the Maximum Mean Discrepancy (MMD),
the resulting distance between distributions, are useful tools for fully
nonparametric two-sample testing and learning on distributions. However, it is
rarely that all possible differences between samples are of interest --
discovered differences can be due to different types of measurement noise, data
collection artefacts or other irrelevant sources of variability. We propose
distances between distributions which encode invariance to additive symmetric
noise, aimed at testing whether the assumed true underlying processes differ.
Moreover, we construct invariant features of distributions, leading to learning
algorithms robust to the impairment of the input distributions with symmetric
additive noise.Comment: 22 page
Jaccard/Tanimoto similarity test and estimation methods
Binary data are used in a broad area of biological sciences. Using binary
presence-absence data, we can evaluate species co-occurrences that help
elucidate relationships among organisms and environments. To summarize
similarity between occurrences of species, we routinely use the
Jaccard/Tanimoto coefficient, which is the ratio of their intersection to their
union. It is natural, then, to identify statistically significant
Jaccard/Tanimoto coefficients, which suggest non-random co-occurrences of
species. However, statistical hypothesis testing using this similarity
coefficient has been seldom used or studied.
We introduce a hypothesis test for similarity for biological presence-absence
data, using the Jaccard/Tanimoto coefficient. Several key improvements are
presented including unbiased estimation of expectation and centered
Jaccard/Tanimoto coefficients, that account for occurrence probabilities. We
derived the exact and asymptotic solutions and developed the bootstrap and
measurement concentration algorithms to compute statistical significance of
binary similarity. Comprehensive simulation studies demonstrate that our
proposed methods produce accurate p-values and false discovery rates. The
proposed estimation methods are orders of magnitude faster than the exact
solution. The proposed methods are implemented in an open source R package
called jaccard (https://cran.r-project.org/package=jaccard).
We introduce a suite of statistical methods for the Jaccard/Tanimoto
similarity coefficient, that enable straightforward incorporation of
probabilistic measures in analysis for species co-occurrences. Due to their
generality, the proposed methods and implementations are applicable to a wide
range of binary data arising from genomics, biochemistry, and other areas of
science
Simulation Design Approach for the Selection of Alternative Commercial Passenger Aircraft Seating Configurations
Loading strategies for commercial passenger aircraft have been a subject of recent study among air transportation research analysts. A fundamental assumption in the majority of these studies is the fixed configuration of passenger seats. Previous studies have focused on analyzing different strategies in an effort to reduce passenger loading time. This study takes a more proactive approach to the passenger loading process by starting with the design of the aircraft seating layout itself. Simulation analysis results indicate that alternative designs can result in loading time reductions between approximately 9–44%
Using Capillary Electrophoresis to Quantify Competitive Binding of Adsorbates to Silver Nanoparticles
Silver nanoparticles (AgNPs) are increasingly used commercially and medically due to their antimicrobial and antibacterial properties. With increased use comes increased release of AgNPs into the environment, and once released, AgNPs can form coronas with molecules ranging from biomolecules to proteins to natural organic matter (NOM). The molecules in the corona adsorb to the surface of the AgNPs, drastically altering their innate properties such as cytotoxicity and binding behavior. In this study, we characterize and quantify model AgNP-adsorbate systems by obtaining their relevant reaction parameters through three different affordable analytical techniques, including dynamic light scattering (DLS), UV-vis spectroscopy, and capillary electrophoresis (CE). Citrate-stabilized AgNPs with hydrodynamic diameters of 10 nm, 20 nm and 40 nm were used in this work. Kₐ values of AgNPs reacting with a model protein, bovine serum albumin (BSA) and a model NOM, Suwannee River humic acid (SRHA), were individually quantified using UV-vis spectroscopy. Nonequilibrium capillary electrophoresis of equilibrium mixtures (NECEEM) was also employed to obtain binding and rate constants pertaining to the individual reactions and compared with the values acquired through UV-vis spectroscopy. The AgNP size was shown to have an indirect relationship with their reactivity, with smaller AgNPs having higher Kₐ values. There was also remarkable agreement between the two quantitative analyses, validating the use of the novel NECEEM technique for use in other NP corona complexes. DLS was used to characterize the initial nanoparticles as well as those with a formed corona, and circular dichroism (CD) spectroscopy was used to monitor protein conformational changes upon adsorption of BSA to AgNPs and interaction with SRHA. Subsequently, in a field dominated by single adsorbate studies of the AgNP coronas, we strived to take this study a step further and investigate multiple adsorbate systems. Thus, a new CE-based pull-down assay was developed and optimized for quantitative analysis of the relative reactivity of multiple adsorbates interacting with AgNPs. Using this new technique, SRHA was found to decrease the amount of BSA adsorbed to AgNPs in solution across all sizes. Smaller sized AgNPs seemed to favor BSA adsorption over SRHA, but as the size of the AgNP increased, the affinity seemed to shift to favoring the adsorption of SRHA
Intelligent Agents for Retrieving Chinese Web Financial News
As the popularity of World Wide Web increases, many newspapers expand their services by providing news information on the Web in order to be competitive and increase benefit. The Web provides real time dissemination of financial news to investors. However, most investors find it difficult to search for the financial information of interest from the huge Web information space. Most of the commercial search engines are not user friendly and do not provide any tailor-made intelligent agents to search for relevant Web documents on behalf of users. Users have to exert a lot of effort to submit an appropriate query to obtain the information they want. Intelligent agents that learn user preferences and monitor the postings of Web information providers are desired. In this paper, we present an intelligent agent that utilizes user profiles and user feedback to search for the Chinese Web financial news articles on behalf of users. A Chinese indexing component is developed to index the continuously fetched Chinese financial news articles. User profiles capture the basic knowledge of user preferences based on the sources of news articles, the regions of the news reported, categories of industries related, the listed companies, and user specified keywords. User feedback captures the semantics of the user rated news articles. The search engine will rank the top 20 news articles that users are most interested in based on these inputs. Experiments were conducted to measure the performance of the agents based on the inputs from user profile and user feedback
- …