4,697 research outputs found

    Statistical significance of variables driving systematic variation

    Full text link
    There are a number of well-established methods such as principal components analysis (PCA) for automatically capturing systematic variation due to latent variables in large-scale genomic data. PCA and related methods may directly provide a quantitative characterization of a complex biological variable that is otherwise difficult to precisely define or model. An unsolved problem in this context is how to systematically identify the genomic variables that are drivers of systematic variation captured by PCA. Principal components (and other estimates of systematic variation) are directly constructed from the genomic variables themselves, making measures of statistical significance artificially inflated when using conventional methods due to over-fitting. We introduce a new approach called the jackstraw that allows one to accurately identify genomic variables that are statistically significantly associated with any subset or linear combination of principal components (PCs). The proposed method can greatly simplify complex significance testing problems encountered in genomics and can be utilized to identify the genomic variables significantly associated with latent variables. Using simulation, we demonstrate that our method attains accurate measures of statistical significance over a range of relevant scenarios. We consider yeast cell-cycle gene expression data, and show that the proposed method can be used to straightforwardly identify statistically significant genes that are cell-cycle regulated. We also analyze gene expression data from post-trauma patients, allowing the gene expression data to provide a molecularly-driven phenotype. We find a greater enrichment for inflammatory-related gene sets compared to using a clinically defined phenotype. The proposed method provides a useful bridge between large-scale quantifications of systematic variation and gene-level significance analyses.Comment: 35 pages, 1 table, 6 main figures, 7 supplementary figure

    Price Discovery in Canadian Government Bond Futures and Spot Markets

    Get PDF
    In this paper we look at the relative information content of cash and futures prices for Canadian Government bonds. We follow the information-share approaches introduced by Hasbrouck (1995) and Harris et al (1995), applying the techniques in Gonzalo-Granger (1995), to evaluate the relative contributions of trading in the cash and futures markets to the price discovery process. Both approaches estimate a vector error correction model that permits the separation of long-run price movements from short-run market microstructure effects. As well, we follow Yan and Zivot (2004) who introduce size measures of a market's adjustment to a new equilibrium during the price discovery process. We find that, on an average day, just over 70% of price discovery occurs on the futures market where bid-ask spreads are lower and trading activity is higher. The size of the responses to shocks and the time taken to adjust to a new equilibrium are found to be significantly larger for the cash market.Financial markets; Market structure and pricing

    Testing and Learning on Distributions with Symmetric Noise Invariance

    Full text link
    Kernel embeddings of distributions and the Maximum Mean Discrepancy (MMD), the resulting distance between distributions, are useful tools for fully nonparametric two-sample testing and learning on distributions. However, it is rarely that all possible differences between samples are of interest -- discovered differences can be due to different types of measurement noise, data collection artefacts or other irrelevant sources of variability. We propose distances between distributions which encode invariance to additive symmetric noise, aimed at testing whether the assumed true underlying processes differ. Moreover, we construct invariant features of distributions, leading to learning algorithms robust to the impairment of the input distributions with symmetric additive noise.Comment: 22 page

    Jaccard/Tanimoto similarity test and estimation methods

    Full text link
    Binary data are used in a broad area of biological sciences. Using binary presence-absence data, we can evaluate species co-occurrences that help elucidate relationships among organisms and environments. To summarize similarity between occurrences of species, we routinely use the Jaccard/Tanimoto coefficient, which is the ratio of their intersection to their union. It is natural, then, to identify statistically significant Jaccard/Tanimoto coefficients, which suggest non-random co-occurrences of species. However, statistical hypothesis testing using this similarity coefficient has been seldom used or studied. We introduce a hypothesis test for similarity for biological presence-absence data, using the Jaccard/Tanimoto coefficient. Several key improvements are presented including unbiased estimation of expectation and centered Jaccard/Tanimoto coefficients, that account for occurrence probabilities. We derived the exact and asymptotic solutions and developed the bootstrap and measurement concentration algorithms to compute statistical significance of binary similarity. Comprehensive simulation studies demonstrate that our proposed methods produce accurate p-values and false discovery rates. The proposed estimation methods are orders of magnitude faster than the exact solution. The proposed methods are implemented in an open source R package called jaccard (https://cran.r-project.org/package=jaccard). We introduce a suite of statistical methods for the Jaccard/Tanimoto similarity coefficient, that enable straightforward incorporation of probabilistic measures in analysis for species co-occurrences. Due to their generality, the proposed methods and implementations are applicable to a wide range of binary data arising from genomics, biochemistry, and other areas of science

    Simulation Design Approach for the Selection of Alternative Commercial Passenger Aircraft Seating Configurations

    Get PDF
    Loading strategies for commercial passenger aircraft have been a subject of recent study among air transportation research analysts. A fundamental assumption in the majority of these studies is the fixed configuration of passenger seats. Previous studies have focused on analyzing different strategies in an effort to reduce passenger loading time. This study takes a more proactive approach to the passenger loading process by starting with the design of the aircraft seating layout itself. Simulation analysis results indicate that alternative designs can result in loading time reductions between approximately 9–44%

    Using Capillary Electrophoresis to Quantify Competitive Binding of Adsorbates to Silver Nanoparticles

    Get PDF
    Silver nanoparticles (AgNPs) are increasingly used commercially and medically due to their antimicrobial and antibacterial properties. With increased use comes increased release of AgNPs into the environment, and once released, AgNPs can form coronas with molecules ranging from biomolecules to proteins to natural organic matter (NOM). The molecules in the corona adsorb to the surface of the AgNPs, drastically altering their innate properties such as cytotoxicity and binding behavior. In this study, we characterize and quantify model AgNP-adsorbate systems by obtaining their relevant reaction parameters through three different affordable analytical techniques, including dynamic light scattering (DLS), UV-vis spectroscopy, and capillary electrophoresis (CE). Citrate-stabilized AgNPs with hydrodynamic diameters of 10 nm, 20 nm and 40 nm were used in this work. Kₐ values of AgNPs reacting with a model protein, bovine serum albumin (BSA) and a model NOM, Suwannee River humic acid (SRHA), were individually quantified using UV-vis spectroscopy. Nonequilibrium capillary electrophoresis of equilibrium mixtures (NECEEM) was also employed to obtain binding and rate constants pertaining to the individual reactions and compared with the values acquired through UV-vis spectroscopy. The AgNP size was shown to have an indirect relationship with their reactivity, with smaller AgNPs having higher Kₐ values. There was also remarkable agreement between the two quantitative analyses, validating the use of the novel NECEEM technique for use in other NP corona complexes. DLS was used to characterize the initial nanoparticles as well as those with a formed corona, and circular dichroism (CD) spectroscopy was used to monitor protein conformational changes upon adsorption of BSA to AgNPs and interaction with SRHA. Subsequently, in a field dominated by single adsorbate studies of the AgNP coronas, we strived to take this study a step further and investigate multiple adsorbate systems. Thus, a new CE-based pull-down assay was developed and optimized for quantitative analysis of the relative reactivity of multiple adsorbates interacting with AgNPs. Using this new technique, SRHA was found to decrease the amount of BSA adsorbed to AgNPs in solution across all sizes. Smaller sized AgNPs seemed to favor BSA adsorption over SRHA, but as the size of the AgNP increased, the affinity seemed to shift to favoring the adsorption of SRHA

    Intelligent Agents for Retrieving Chinese Web Financial News

    Get PDF
    As the popularity of World Wide Web increases, many newspapers expand their services by providing news information on the Web in order to be competitive and increase benefit. The Web provides real time dissemination of financial news to investors. However, most investors find it difficult to search for the financial information of interest from the huge Web information space. Most of the commercial search engines are not user friendly and do not provide any tailor-made intelligent agents to search for relevant Web documents on behalf of users. Users have to exert a lot of effort to submit an appropriate query to obtain the information they want. Intelligent agents that learn user preferences and monitor the postings of Web information providers are desired. In this paper, we present an intelligent agent that utilizes user profiles and user feedback to search for the Chinese Web financial news articles on behalf of users. A Chinese indexing component is developed to index the continuously fetched Chinese financial news articles. User profiles capture the basic knowledge of user preferences based on the sources of news articles, the regions of the news reported, categories of industries related, the listed companies, and user specified keywords. User feedback captures the semantics of the user rated news articles. The search engine will rank the top 20 news articles that users are most interested in based on these inputs. Experiments were conducted to measure the performance of the agents based on the inputs from user profile and user feedback
    corecore