118 research outputs found

    Assessing the reproducibility of discriminant function analyses.

    Get PDF
    Data are the foundation of empirical research, yet all too often the datasets underlying published papers are unavailable, incorrect, or poorly curated. This is a serious issue, because future researchers are then unable to validate published results or reuse data to explore new ideas and hypotheses. Even if data files are securely stored and accessible, they must also be accompanied by accurate labels and identifiers. To assess how often problems with metadata or data curation affect the reproducibility of published results, we attempted to reproduce Discriminant Function Analyses (DFAs) from the field of organismal biology. DFA is a commonly used statistical analysis that has changed little since its inception almost eight decades ago, and therefore provides an opportunity to test reproducibility among datasets of varying ages. Out of 100 papers we initially surveyed, fourteen were excluded because they did not present the common types of quantitative result from their DFA or gave insufficient details of their DFA. Of the remaining 86 datasets, there were 15 cases for which we were unable to confidently relate the dataset we received to the one used in the published analysis. The reasons ranged from incomprehensible or absent variable labels, the DFA being performed on an unspecified subset of the data, or the dataset we received being incomplete. We focused on reproducing three common summary statistics from DFAs: the percent variance explained, the percentage correctly assigned and the largest discriminant function coefficient. The reproducibility of the first two was fairly high (20 of 26, and 44 of 60 datasets, respectively), whereas our success rate with the discriminant function coefficients was lower (15 of 26 datasets). When considering all three summary statistics, we were able to completely reproduce 46 (65%) of 71 datasets. While our results show that a majority of studies are reproducible, they highlight the fact that many studies still are not the carefully curated research that the scientific community and public expects

    Mandated data archiving greatly improves access to research data

    Full text link
    The data underlying scientific papers should be accessible to researchers both now and in the future, but how best can we ensure that these data are available? Here we examine the effectiveness of four approaches to data archiving: no stated archiving policy, recommending (but not requiring) archiving, and two versions of mandating data deposition at acceptance. We control for differences between data types by trying to obtain data from papers that use a single, widespread population genetic analysis, STRUCTURE. At one extreme, we found that mandated data archiving policies that require the inclusion of a data availability statement in the manuscript improve the odds of finding the data online almost a thousand-fold compared to having no policy. However, archiving rates at journals with less stringent policies were only very slightly higher than those with no policy at all. At one extreme, we found that mandated data archiving policies that require the inclusion of a data availability statement in the manuscript improve the odds of finding the data online almost a thousand fold compared to having no policy. However, archiving rates at journals with less stringent policies were only very slightly higher than those with no policy at all. We also assessed the effectiveness of asking for data directly from authors and obtained over half of the requested datasets, albeit with about 8 days delay and some disagreement with authors. Given the long term benefits of data accessibility to the academic community, we believe that journal based mandatory data archiving policies and mandatory data availability statements should be more widely adopted

    Complex genetic patterns in closely related colonizing invasive species

    Get PDF
    Anthropogenic activities frequently result in both rapidly changing environments and translocation of species from their native ranges (i.e., biological invasions). Empirical studies suggest that many factors associated with these changes can lead to complex genetic patterns, particularly among invasive populations. However, genetic complexities and factors responsible for them remain uncharacterized in many cases. Here, we explore these issues in the vase tunicate Ciona intestinalis (Ascidiacea: Enterogona: Cionidae), a model species complex, of which spA and spB are rapidly spreading worldwide. We intensively sampled 26 sites (N= 873) from both coasts of North America, and performed phylogenetic and population genetics analyses based on one mitochondrial fragment (cytochrome c oxidase subunit 3–NADH dehydrogenase subunit I, COX3-ND1) and eight nuclear microsatellites. Our analyses revealed extremely complex genetic patterns in both species on both coasts. We detected a contrasting pattern based on the mitochondrial marker: two major genetic groups in C. intestinalis spA on the west coast versus no significant geographic structure in C. intestinalis spB on the east coast. For both species, geo-graphically distant populations often showed high microsatellite-based genetic affinities whereas neighboring ones often did not. In addition, mitochondrial and nuclear markers provided largely inconsistent genetic patterns. Multiple factors, including random genetic drift associated with demographic changes, rapid selection due to strong local adaptation, and varying propensity for human-mediated propagule dispersal could be responsible for the observed genetic complexities

    Assessing the reproducibility of discriminant function analyses

    Get PDF
    Data are the foundation of empirical research, yet all too often the datasets underlying published papers are unavailable, incorrect, or poorly curated. This is a serious issue, because future researchers are then unable to validate published results or reuse data to explore new ideas and hypotheses. Even if data files are securely stored and accessible, they must also be accompanied by accurate labels and identifiers. To assess how often problems with metadata or data curation affect the reproducibility of published results, we attempted to reproduce Discriminant Function Analyses (DFAs) from the field of organismal biology. DFA is a commonly used statistical analysis that has changed little since its inception almost eight decades ago, and therefore provides an opportunity to test reproducibility among datasets of varying ages. Out of 100 papers we initially surveyed, fourteen were excluded because they did not present the common types of quantitative result from their DFA or gave insufficient details of their DFA. Of the remaining 86 datasets, there were 15 cases for which we were unable to confidently relate the dataset we received to the one used in the published analysis. The reasons ranged from incomprehensible or absent variable labels, the DFA being performed on an unspecified subset of the data, or the dataset we received being incomplete. We focused on reproducing three common summary statistics from DFAs: the percent variance explained, the percentage correctly assigned and the largest discriminant function coefficient. The reproducibility of the first two was fairly high (20 of 26, and 44 of 60 datasets, respectively), whereas our success rate with the discriminant function coefficients was lower (15 of 26 datasets). When considering all three summary statistics, we were able to completely reproduce 46 (65%) of 71 datasets. While our results show that a majority of studies are reproducible, they highlight the fact that many studies still are not the carefully curated research that the scientific community and public expects

    The Allen Telescope Array Pi GHz Sky Survey I. Survey Description and Static Catalog Results for the Bootes Field

    Get PDF
    The Pi GHz Sky Survey (PiGSS) is a key project of the Allen Telescope Array. PiGSS is a 3.1 GHz survey of radio continuum emission in the extragalactic sky with an emphasis on synoptic observations that measure the static and time-variable properties of the sky. During the 2.5-year campaign, PiGSS will twice observe ~250,000 radio sources in the 10,000 deg^2 region of the sky with b > 30 deg to an rms sensitivity of ~1 mJy. Additionally, sub-regions of the sky will be observed multiple times to characterize variability on time scales of days to years. We present here observations of a 10 deg^2 region in the Bootes constellation overlapping the NOAO Deep Wide Field Survey field. The PiGSS image was constructed from 75 daily observations distributed over a 4-month period and has an rms flux density between 200 and 250 microJy. This represents a deeper image by a factor of 4 to 8 than we will achieve over the entire 10,000 deg^2. We provide flux densities, source sizes, and spectral indices for the 425 sources detected in the image. We identify ~100$ new flat spectrum radio sources; we project that when completed PiGSS will identify 10^4 flat spectrum sources. We identify one source that is a possible transient radio source. This survey provides new limits on faint radio transients and variables with characteristic durations of months.Comment: Accepted for publication in ApJ; revision submitted with extraneous figure remove

    Chromosome-scale genome assembly of the brown anole (Anolis sagrei), an emerging model species

    Get PDF
    Rapid technological improvements are democratizing access to high quality, chromosome-scale genome assemblies. No longer the domain of only the most highly studied model organisms, now non-traditional and emerging model species can be genome-enabled using a combination of sequencing technologies and assembly software. Consequently, old ideas built on sparse sampling across the tree of life have recently been amended in the face of genomic data drawn from a growing number of high-quality reference genomes. Arguably the most valuable are those long-studied species for which much is already known about their biology; what many term emerging model species. Here, we report a highly complete chromosome-scale genome assembly for the brown anole, Anolis sagrei – a lizard species widely studied across a variety of disciplines and for which a high-quality reference genome was long overdue. This assembly exceeds the vast majority of existing reptile and snake genomes in contiguity (N50 = 253.6 Mb) and annotation completeness. Through the analysis of this genome and population resequence data, we examine the history of repetitive element accumulation, identify the X chromosome, and propose a hypothesis for the evolutionary history of fusions between autosomes and the X that led to the sex chromosomes of A. sagrei

    The Allen Telescope Array Pi GHz Sky Survey I. Survey Description and Static Catalog Results for the Bootes Field

    Full text link
    The Pi GHz Sky Survey (PiGSS) is a key project of the Allen Telescope Array. PiGSS is a 3.1 GHz survey of radio continuum emission in the extragalactic sky with an emphasis on synoptic observations that measure the static and time-variable properties of the sky. During the 2.5-year campaign, PiGSS will twice observe ~250,000 radio sources in the 10,000 deg^2 region of the sky with b > 30 deg to an rms sensitivity of ~1 mJy. Additionally, sub-regions of the sky will be observed multiple times to characterize variability on time scales of days to years. We present here observations of a 10 deg^2 region in the Bootes constellation overlapping the NOAO Deep Wide Field Survey field. The PiGSS image was constructed from 75 daily observations distributed over a 4-month period and has an rms flux density between 200 and 250 microJy. This represents a deeper image by a factor of 4 to 8 than we will achieve over the entire 10,000 deg^2. We provide flux densities, source sizes, and spectral indices for the 425 sources detected in the image. We identify ~100$ new flat spectrum radio sources; we project that when completed PiGSS will identify 10^4 flat spectrum sources. We identify one source that is a possible transient radio source. This survey provides new limits on faint radio transients and variables with characteristic durations of months.Comment: Accepted for publication in ApJ; revision submitted with extraneous figure remove

    The Allen Telescope Array Twenty-centimeter Survey - A 690-Square-Degree, 12-Epoch Radio Dataset - I: Catalog and Long-Duration Transient Statistics

    Full text link
    We present the Allen Telescope Array Twenty-centimeter Survey (ATATS), a multi-epoch (12 visits), 690 square degree radio image and catalog at 1.4GHz. The survey is designed to detect rare, very bright transients as well as to verify the capabilities of the ATA to form large mosaics. The combined image using data from all 12 ATATS epochs has RMS noise sigma = 3.94mJy / beam and dynamic range 180, with a circular beam of 150 arcsec FWHM. It contains 4408 sources to a limiting sensitivity of S = 20 mJy / beam. We compare the catalog generated from this 12-epoch combined image to the NRAO VLA Sky Survey (NVSS), a legacy survey at the same frequency, and find that we can measure source positions to better than ~20 arcsec. For sources above the ATATS completeness limit, the median flux density is 97% of the median value for matched NVSS sources, indicative of an accurate overall flux calibration. We examine the effects of source confusion due to the effects of differing resolution between ATATS and NVSS on our ability to compare flux densities. We detect no transients at flux densities greater than 40 mJy in comparison with NVSS, and place a 2-sigma upper limit on the transient rate for such sources of 0.004 per square degree. These results suggest that the > 1 Jy transients reported by Matsumura et al. (2009) may not be true transients, but rather variable sources at their flux density threshold.Comment: 41 pages, 19 figures, ApJ accepted; corrected minor typo in Table
    corecore