44 research outputs found

    New Spiked-In Probe Sets for the Affymetrix HGU-133A Latin Square Experiment

    Get PDF
    The Affymetrix HGU-133A spike in data set has been used for determining the sensitivity and specificity of various methods for the analysis of microarray data. We show that there are 22 additional probe sets that detect spike in RNAs that should be considered as spike in probe sets. We assign each proposed spiked-in probe set to a concentration group within the Latin Square design, and examine the effects of the additional spiked-in probe sets on assessing the accuracy of analysis methods currently in use. We show that several popular preprocessing methods are more sensitive and specific when the new spike-ins are used to determine false positive and false negative rates

    Measuring Sources of Bias in Diving Competitions

    Full text link
    We examine bias in diving scores from high school meets. Usually, examination of bias in sports where judges assign subjective scores has been examined in terms of nationalistic bias, where judges award favorable scores to athletes from their own home countries. In this work, bias is defined as difference between the judge's scores for a given meet and the ability of the diver. As a measure of the ability of a diver, we calculate a mean score for all dives and all meets in which the diver has participated. We call this mean score a diver's competency score. We use the difference between judges' scores within a given meet and the diver's competency to define a discrepancy: the difference between a judge's estimation of a diver's ability and their true ability. We examine this discrepancy with respect to age, gender, direction of dive, position of dive, round of competition, and dive difficulty.Comment: 11 pages, 6 figures, presented at JSM 2023 Toront

    Impact of COVID-19 on Recruitment of High School Athletes to DI Track and Field

    Get PDF
    Due to COVID-19, in the spring of 2020, the NCAA gave scholarship athletes an extra year of eligibility but did not increase the number of scholarships a school could issue. This potentially led to increased competition for scholarships as coaches could choose between retaining athletes or recruiting new ones. Furthermore, the Spring 2020 track and field season for high school seniors ended early – limiting high school athletes’ chance to get their best scores, and interrupting student to college interaction. This research looks specifically at the impact of COVID-19, and the resulting NCAA policy changes, on the recruitment to DI of high school athletes who excelled at the 200-meter track and field events. The study looks at both the supply side, by examining recruitment trends and athlete scores, and the demand-side, by looking at DI track and field grade compositio

    The Impact of the COVID-19 Pandemic on Faculty Productivity and Gender Inequalities in STEM Disciplines

    Get PDF
    Women and minorities within STEM disciplines historically encounter obstacles in academic advancement, a situation compounded by the COVID-19 pandemic due to the imposition of additional responsibilities like caregiving. This study meticulously probes into the pandemic\u27s influence on traditional academic productivity metrics – specifically publication and submission frequency, citation volume, and leadership in scholarly entities, by employing Natural Language Processing to extract and analyze data from key journals within various scientific domains. A critical revelation from the research indicates a notable downturn in publication activity during 2021, potentially attributed to pandemic-induced disruptions, with a compensatory surge observed in 2022. Although a gradual ascendancy towards gender parity in academic authorship was observed, the journey toward substantive equality is confronted with future challenges, including policy shifts and societal factors. This investigation not only illuminates the nuanced disparities in academic publishing but also endeavors to guide institutional strategies towards genuinely equitable promotion, tenure policies, and practices, ensuring that the academic merit of all scholars, regardless of gender or minority status, is acknowledged and rewarded

    Predictions Generated from a Simulation Engine for Gene Expression Micro-arrays for use in Research Laboratories

    Get PDF
    In this paper we introduce the technical components, the biology and data science involved in the use of microarray technology in biological and clinical research. We discuss how laborious experimental protocols involved in obtaining this data used in laboratories could benefit from using simulations of the data. We discuss the approach used in the simulation engine from [7]. We use this simulation engine to generate a prediction tool in Power BI, a Microsoft, business intelligence tool for analytics and data visualization [22]. This tool could be used in any laboratory using micro-arrays to improve experimental design by comparing how predicted signal intensity compares to observed signal intensity. Signal intensity in micro-arrays is a proxy for level of gene expression in cells. We suggest further development avenues for the prediction tool

    A distribution-free convolution model for background correction of oligonucleotide microarray data

    Get PDF
    IntroductionAffymetrix GeneChip® high-density oligonucleotide arrays are widely used in biological and medical research because of production reproducibility, which facilitates the comparison of results between experiment runs. In order to obtain high-level classification and cluster analysis that can be trusted, it is important to perform various pre-processing steps on the probe-level data to control for variability in sample processing and array hybridization. Many proposed preprocessing methods are parametric, in that they assume that the background noise generated by microarray data is a random sample from a statistical distribution, typically a normal distribution. The quality of the final results depends on the validity of such assumptions. ResultsWe propose a Distribution Free Convolution Model (DFCM) to circumvent observed deficiencies in meeting and validating distribution assumptions of parametric methods. Knowledge of array structure and the biological function of the probes indicate that the intensities of mismatched (MM) probes that correspond to the smallest perfect match (PM) intensities can be used to estimate the background noise. Specifically, we obtain the smallest q2 percent of the MM intensities that are associated with the lowest q1 percent PM intensities, and use these intensities to estimate background. ConclusionUsing the Affymetrix Latin Square spike-in experiments, we show that the background noise generated by microarray experiments typically is not well modeled by a single overall normal distribution. We further show that the signal is not exponentially distributed, as is also commonly assumed. Therefore, DFCM has better sensitivity and specificity, as measured by ROC curves and area under the curve (AUC) than MAS 5.0, RMA, RMA with no background correction (RMA-noBG), GCRMA, PLIER, and dChip (MBEI) for preprocessing of Affymetrix microarray data. These results hold for two spike-in data sets and one real data set that were analyzed. Comparisons with other methods on two spike-in data sets and one real data set show that our nonparametric methods are a superior alternative for background correction of Affymetrix data

    A gene selection method for GeneChip array data with small sample sizes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In microarray experiments with small sample sizes, it is a challenge to estimate p-values accurately and decide cutoff p-values for gene selection appropriately. Although permutation-based methods have proved to have greater sensitivity and specificity than the regular t-test, their p-values are highly discrete due to the limited number of permutations available in very small sample sizes. Furthermore, estimated permutation-based p-values for true nulls are highly correlated and not uniformly distributed between zero and one, making it difficult to use current false discovery rate (FDR)-controlling methods.</p> <p>Results</p> <p>We propose a model-based information sharing method (MBIS) that, after an appropriate data transformation, utilizes information shared among genes. We use a normal distribution to model the mean differences of true nulls across two experimental conditions. The parameters of the model are then estimated using all data in hand. Based on this model, p-values, which are uniformly distributed from true nulls, are calculated. Then, since FDR-controlling methods are generally not well suited to microarray data with very small sample sizes, we select genes for a given cutoff p-value and then estimate the false discovery rate.</p> <p>Conclusion</p> <p>Simulation studies and analysis using real microarray data show that the proposed method, MBIS, is more powerful and reliable than current methods. It has wide application to a variety of situations.</p

    The Ontology for Biomedical Investigations

    Get PDF
    The Ontology for Biomedical Investigations (OBI) is an ontology that provides terms with precisely defined meanings to describe all aspects of how investigations in the biological and medical domains are conducted. OBI re-uses ontologies that provide a representation of biomedical knowledge from the Open Biological and Biomedical Ontologies (OBO) project and adds the ability to describe how this knowledge was derived. We here describe the state of OBI and several applications that are using it, such as adding semantic expressivity to existing databases, building data entry forms, and enabling interoperability between knowledge resources. OBI covers all phases of the investigation process, such as planning, execution and reporting. It represents information and material entities that participate in these processes, as well as roles and functions. Prior to OBI, it was not possible to use a single internally consistent resource that could be applied to multiple types of experiments for these applications. OBI has made this possible by creating terms for entities involved in biological and medical investigations and by importing parts of other biomedical ontologies such as GO, Chemical Entities of Biological Interest (ChEBI) and Phenotype Attribute and Trait Ontology (PATO) without altering their meaning. OBI is being used in a wide range of projects covering genomics, multi-omics, immunology, and catalogs of services. OBI has also spawned other ontologies (Information Artifact Ontology) and methods for importing parts of ontologies (Minimum information to reference an external ontology term (MIREOT)). The OBI project is an open cross-disciplinary collaborative effort, encompassing multiple research communities from around the globe. To date, OBI has created 2366 classes and 40 relations along with textual and formal definitions. The OBI Consortium maintains a web resource (http://obi-ontology.org) providing details on the people, policies, and issues being addressed in association with OBI. The current release of OBI is available at http://purl.obolibrary.org/obo/obi.owl
    corecore