4 research outputs found

    Landscape Genomics of White-Footed Mice (Peromyscus leucopus) along an Urban-to-Rural Gradient in the New York City Metropolitan Area

    No full text
    Urbanization can change an area’s habitat in ways that pose novel selection pressures on native species, and previous work has shown evidence for divergent selection in white-footed mice populations in New York City (NYC) parks compared to nearby rural populations. This study aims to 1) identify potential candidate genes exhibiting signatures of selection with increasing levels of urbanization, and 2) compare these results with previous findings that NYC populations of P. leucopus experience directional selection for metabolic processes and immune function. I approached these aims using a SNP dataset derived from exomes of 95 P. leucopus specimens sampled from sites in and around NYC. Outlier detection consisted of methods which rely on measures of population genetics (such as FST) and genotype-environment analyses that incorporate environmental factors (such as degree of urbanization). I ran Gene Ontology enrichment tests on the resulting outliers to see what biological functions are overrepresented among the outliers. I found overrepresentation of genes related to metabolic function as well as ciliary function, particularly with regard to spermatogenesis, which corroborates previous findings in this system. I additionally found multiple unconventional myosins and other proteins that imply possible selection on genes related to hearing function

    MitoHiFi: a python pipeline for mitochondrial genome assembly from PacBio high fidelity reads

    Get PDF
    Abstract Background  PacBio high fidelity (HiFi) sequencing reads are both long (15–20 kb) and highly accurate (&gt; Q20). Because of these properties, they have revolutionised genome assembly leading to more accurate and contiguous genomes. In eukaryotes the mitochondrial genome is sequenced alongside the nuclear genome often at very high coverage. A dedicated tool for mitochondrial genome assembly using HiFi reads is still missing. Results  MitoHiFi was developed within the Darwin Tree of Life Project to assemble mitochondrial genomes from the HiFi reads generated for target species. The input for MitoHiFi is either the raw reads or the assembled contigs, and the tool outputs a mitochondrial genome sequence fasta file along with annotation of protein and RNA genes. Variants arising from heteroplasmy are assembled independently, and nuclear insertions of mitochondrial sequences are identified and not used in organellar genome assembly. MitoHiFi has been used to assemble 374 mitochondrial genomes (368 Metazoa and 6 Fungi species) for the Darwin Tree of Life Project, the Vertebrate Genomes Project and the Aquatic Symbiosis Genome Project. Inspection of 60 mitochondrial genomes assembled with MitoHiFi for species that already have reference sequences in public databases showed the widespread presence of previously unreported repeats. Conclusions  MitoHiFi is able to assemble mitochondrial genomes from a wide phylogenetic range of taxa from Pacbio HiFi data. MitoHiFi is written in python and is freely available on GitHub (https://github.com/marcelauliano/MitoHiFi). MitoHiFi is available with its dependencies as a Docker container on GitHub (ghcr.io/marcelauliano/mitohifi:master). </jats:sec

    Scalable, accessible, and reproducible reference genome assembly and evaluation in Galaxy

    No full text
    Improvements in genome sequencing and assembly are enabling high-quality reference genomes for all species. However, the assembly process is still laborious, computationally and technically demanding, lacks standards for reproducibility, and is not readily scalable. Here we present the latest Vertebrate Genomes Project assembly pipeline and demonstrate that it delivers high-quality reference genomes at scale across a set of vertebrate species arising over the last ~500 million years. The pipeline is versatile and combines PacBio HiFi long-reads and Hi-C-based haplotype phasing in a new graph-based paradigm. Standardized quality control is performed automatically to troubleshoot assembly issues and assess biological complexities. We make the pipeline freely accessible through Galaxy, accommodating researchers even without local computational resources and enhanced reproducibility by democratizing the training and assembly process. We demonstrate the flexibility and reliability of the pipeline by assembling reference genomes for 51 vertebrate species from major taxonomic groups (fish, amphibians, reptiles, birds, and mammals)

    The Galaxy platform for accessible, reproducible, and collaborative data analyses: 2024 update

    No full text
    International audienceAbstract Galaxy (https://galaxyproject.org) is deployed globally, predominantly through free-to-use services, supporting user-driven research that broadens in scope each year. Users are attracted to public Galaxy services by platform stability, tool and reference dataset diversity, training, support and integration, which enables complex, reproducible, shareable data analysis. Applying the principles of user experience design (UXD), has driven improvements in accessibility, tool discoverability through Galaxy Labs/subdomains, and a redesigned Galaxy ToolShed. Galaxy tool capabilities are progressing in two strategic directions: integrating general purpose graphical processing units (GPGPU) access for cutting-edge methods, and licensed tool support. Engagement with global research consortia is being increased by developing more workflows in Galaxy and by resourcing the public Galaxy services to run them. The Galaxy Training Network (GTN) portfolio has grown in both size, and accessibility, through learning paths and direct integration with Galaxy tools that feature in training courses. Code development continues in line with the Galaxy Project roadmap, with improvements to job scheduling and the user interface. Environmental impact assessment is also helping engage users and developers, reminding them of their role in sustainability, by displaying estimated CO2 emissions generated by each Galaxy job
    corecore