Search CORE

30 research outputs found

Orienting Ordered Scaffolds: Complexity and Algorithms

Author: Aganezov Sergey
Alekseyev Max A.
Alexeev Nikita
Avdeyev Pavel
Rong Yongwu
Publication venue
Publication date: 25/11/2019
Field of study

Despite the recent progress in genome sequencing and assembly, many of the currently available assembled genomes come in a draft form. Such draft genomes consist of a large number of genomic fragments (scaffolds), whose order and/or orientation (i.e., strand) in the genome are unknown. There exist various scaffold assembly methods, which attempt to determine the order and orientation of scaffolds along the genome chromosomes. Some of these methods (e.g., based on FISH physical mapping, chromatin conformation capture, etc.) can infer the order of scaffolds, but not necessarily their orientation. This leads to a special case of the scaffold orientation problem (i.e., deducing the orientation of each scaffold) with a known order of the scaffolds. We address the problem of orientating ordered scaffolds as an optimization problem based on given weighted orientations of scaffolds and their pairs (e.g., coming from pair-end sequencing reads, long reads, or homologous relations). We formalize this problem using notion of a scaffold graph (i.e., a graph, where vertices correspond to the assembled contigs or scaffolds and edges represent connections between them). We prove that this problem is NP-hard, and present a polynomial-time algorithm for solving its special case, where orientation of each scaffold is imposed relatively to at most two other scaffolds. We further develop an FPT algorithm for the general case of the OOS problem

arXiv.org e-Print Archive

Endometrial receptivity in women of reproductive age with "thin" and "absolutely thin" endometrium

Author: Ksenia E. Gogichashvili
Natalia V. Aganezova
Sergey S. Aganezov
Publication venue: 'MediaMedica'
Publication date: 01/01/2023
Field of study

Aim. To evaluate the expression of steroid receptors (estrogen [ER] and progesterone [PR]) in the endometrium during the implantation window in females with a history of fertility disorders in "thin" and "absolutely thin" endometrium versus healthy females. Materials and methods. A prospective comparative study was conducted. The study group (n=42) included patients with "thin" endometrium (7 mm M-echo 5 mm at cycle days 1113 according to ultrasound); the comparison group (n=10) included females with "absolutely thin" (5 mm according to ultrasound in the pre-ovulatory days) endometrium (females in both groups had a history of infertility and miscarriage of unclear reasons in the anamnesis); the control group included 16 healthy fertile females. A Pipelle biopsy of the uterine mucosa was performed on day 68 after ovulation, and a peripheral blood sample was obtained to measure the concentration of sex steroids (estradiol [E2] and progesterone [P]). Endometrial samples were examined by histological and immunohistochemical methods (ER, PR expression). Results. All study participants had an ovulatory cycle of P16.1 nmol/L (day 68 after ovulation) and normal estrogen levels (E2, pmol/L). E2/P was similar in all cohorts (p0.05 for all measures). ER and PR expression in the endometrium similar to those in healthy females was detected in 20% of patients in the study and comparison groups (M-echo = 4.83.1 mm): 21% (9/42) and 20% (2/10), respectively. ER and PR expression in the endometrial glands and ER expression in the endometrial stroma were significantly different (p0.05) from healthy females in 79% (41/52) of patients with "thin" endometrium and 80% (8/10) of patients with "absolutely thin" endometrium. No differences in the ER or PR expression in the endometrium in females with hypoplastic endometrium were found (p0.05). Conclusion. The M-echo value does not accurately determine endometrial hormonal-receptor abnormalities: 20% of the study participants with hypoplastic endometrium had ER and PR expression comparable to those in healthy females. No differences were found in the expression of endometrial estrogen and progesterone receptors in females with "thin" and "absolutely thin" endometrium

Directory of Open Access Journals

Jasmine: Population-scale structural variant comparison and analysis

Author: Aganezov Sergey
Kirsche Melanie
Ni Bohan
Prabhu Gautam
Schatz Michael
Sherman Rachel
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 28/05/2021
Field of study

The increasing availability of long-reads is revolutionizing studies of structural variants (SVs). However, because SVs vary across individuals and are discovered through imprecise read technologies and methods, they can be difficult to compare. Addressing this, we present Jasmine (https://github.com/mkirsche/Jasmine ), a fast and accurate method for SV refinement, comparison, and population analysis. Using an SV proximity graph, Jasmine outperforms five widely-used comparison methods, including reducing the rate of Mendelian discordance in trio datasets by more than five-fold, and reveals a set of high confidence de novo SVs confirmed by multiple long-read technologies. We also present a harmonized callset of 205,192 SVs from 31 samples of diverse ancestry sequenced with long reads. We genotype these SVs in 444 short read samples from the 1000 Genomes Project with both DNA and RNA sequencing data and assess their widespread impact on gene expression, including within several medically relevant genes

Cold Spring Harbor Laboratory Institutional Repository

On pairwise distances and median score of three genomes under DCJ

Author: A Bergeron
A Caprara
A Goeffon
AW Xu
AW Xu
AW Xu
E Tannier
MA Alekseyev
MA Alekseyev
MA Alekseyev
MA Alekseyev
Max A Alekseyev
R Lenne
S Yancopoulos
Sergey Aganezov
V Rajan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/10/2012
Field of study

In comparative genomics, the rearrangement distance between two genomes (equal the minimal number of genome rearrangements required to transform them into a single genome) is often used for measuring their evolutionary remoteness. Generalization of this measure to three genomes is known as the median score (while a resulting genome is called median genome). In contrast to the rearrangement distance between two genomes which can be computed in linear time, computing the median score for three genomes is NP-hard. This inspires a quest for simpler and faster approximations for the median score, the most natural of which appears to be the halved sum of pairwise distances which in fact represents a lower bound for the median score. In this work, we study relationship and interplay of pairwise distances between three genomes and their median score under the model of Double-Cut-and-Join (DCJ) rearrangements. Most remarkably we show that while a rearrangement may change the sum of pairwise distances by at most 2 (and thus change the lower bound by at most 1), even the most "powerful" rearrangements in this respect that increase the lower bound by 1 (by moving one genome farther away from each of the other two genomes), which we call strong, do not necessarily affect the median score. This observation implies that the two measures are not as well-correlated as one's intuition may suggest. We further prove that the median score attains the lower bound exactly on the triples of genomes that can be obtained from a single genome with strong rearrangements. While the sum of pairwise distances with the factor 2/3 represents an upper bound for the median score, its tightness remains unclear. Nonetheless, we show that the difference of the median score and its lower bound is not bounded by a constant.Comment: Proceedings of the 10-th Annual RECOMB Satellite Workshop on Comparative Genomics (RECOMB-CG), 2012. (to appear

arXiv.org e-Print Archive

Crossref

Springer - Publisher Connector

SVCollector: Optimized sample selection for cost-efficient long-read population sequencing

Author: Aganezov Sergey
Lemmon Zachary
Lippman Zachary
McCoy Rajiv
Ranallo-Benavidez Rhyker
Salerno William
Schatz Michael
Sedlazeck Fritz
Soyk Sebastian
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 06/08/2020
Field of study

An increasingly important scenario in population genetics is when a large cohort has been genotyped using a low-resolution approach (e.g. microarrays, exome capture, short-read WGS), from which a few individuals are selected for resequencing using a more comprehensive approach, especially long-read sequencing. The subset of individuals selected should ensure that the captured genetic diversity is fully representative and includes variants across all subpopulations. For example, human variation has historically been focused on individuals with European ancestry, but this represents a small fraction of the overall diversity. To address this goal, SVCollector ( https://github.com/fritzsedlazeck/SVCollector ) identifies the optimal subset of individuals for resequencing. SVCollector analyzes a population-level VCF file from a low resolution genotyping study. It then computes a ranked list of samples that maximizes the total number of variants present from a subset of a given size. To solve this optimization problem, SVCollector implements a fast greedy heuristic and an exact algorithm using integer linear programming. We apply SVCollector on simulated data, 2504 human genomes from the 1000 Genomes Project, and 3024 genomes from the 3K Rice Genomes Project and show the rankings it computes are more representative than widely used naive strategies. Notably, we show that when selecting an optimal subset of 100 samples in these two cohorts, SV-Collector identifies individuals from every subpopulation while naive methods yield an unbalanced selection. Finally, we show the number of variants present in cohorts of different sizes selected using this approach follows a power-law distribution that is naturally related to the population genetic concept of the allele frequency spectrum, allowing us to estimate the diversity present with increasing numbers of samples

Cold Spring Harbor Laboratory Institutional Repository

Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing

Author: Aganezov Sergey
Alonge Michael
Jenike Katie
Kirsche Melanie
Lebeigle Ludivine
Lippman Zachary B
Ou Shujun
Schatz Michael C
Soyk Sebastian
Wang Xingang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2022
Field of study

Advancing crop genomics requires efficient genetic systems enabled by high-quality personalized genome assemblies. Here, we introduce RagTag, a toolset for automating assembly scaffolding and patching, and we establish chromosome-scale reference genomes for the widely used tomato genotype M82 along with Sweet-100, a new rapid-cycling genotype that we developed to accelerate functional genomics and genome editing in tomato. This work outlines strategies to rapidly expand genetic systems and genomic resources in other plant species

Cold Spring Harbor Laboratory Institutional Repository

Directory of Open Access Journals

Serveur académique lausannois

PubMed Central

Multi-tissue integrative analysis of personal epigenomes

Author: Adrian Jessika
Aganezov Sergey
Balderrama-Gutierrez Gabriela
Banskota Samridhi
Bernstein Bradley
Berthel Ana
Borsari Beatrice
Cameron Christopher
Chang Justin
Chee Sora
Chen Zhanlin
Cherry Michael
Chhetri Surya
Choudhary Jyoti
Corona Guillermo
Danyko Cassidy
Davis Carrie
Dobin Alexander
Drenkow Jorg
Epstein Charles
Farid Daniel
Farrell Nina
Gabdank Idan
Galeev Timur
Gao Jiahao
Gaskell Elizabeth
Gerstein Mark
Gillis Jesse
Gingeras Thomas
Gofin Yoel
Gorkin David
Gu Mengting
Guigo Roderic
Gursoy Gamze
Hecht Vivian
Hitz Benjamin
Issner Robbyn
Kirsche Melanie
Kong Xiangmeng
Lam Bonita
Levine Morgan
Li Bian
Li Shantao
Li Tianxiao
Li Xiqi
Lin Khine
Liu Jason
Luo Ruibang
Mackiewicz Mark
Martins Gabriel
Mendenhall Eric
Milosavljevic Aleksandar
Moore Jill
Mortazavi Ali
Mudge Jonathan
Myers Richard
Navarro Fabio
Nelson Nicholas
Noble William
Nusbaum Chad
Popov Ioann
Pratt Henry
Qiu Yunjiang
Ramakrishnan Srividya
Raymond Joe
Ren Bing
Rozowsky Joel
Salichos Leonidas
Scavelli Alexandra
Schatz Michael
Schreiber Jacob
Sedlazeck Fritz
See Lei
Sherman Rachel
Shi Minyi
Shi Xu
Shoresh Noam
Sloan Cricket
Snyder Michael
Strattan Seth
Sun Maxwell
Tan Zhen
Tanaka Forrest
Vlasova Anna
Wang Jun
Weng Zhiping
Werner Jonathan
Williams Brian
Wold Barbara
Wright James
Xiong Kun
Xu Jinrui
Xu Min
Yan Chengfei
Yang Yucheng
Yu Keyang
Yu Lu
Zaleski Christopher
Zhang Jing
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 26/04/2021
Field of study

Evaluating the impact of genetic variants on transcriptional regulation is a central goal in biological science that has been constrained by reliance on a single reference genome. To address this, we constructed phased, diploid genomes for four cadaveric donors (using long-read sequencing) and systematically charted noncoding regulatory elements and transcriptional activity across more than 25 tissues from these donors. Integrative analysis revealed over a million variants with allele-specific activity, coordinated, locus-scale allelic imbalances, and structural variants impacting proximal chromatin structure. We relate the personal genome analysis to the ENCODE encyclopedia, annotating allele- and tissue-specific elements that are strongly enriched for variants impacting expression and disease phenotypes. These experimental and statistical approaches, and the corresponding EN-TEx resource, provide a framework for personalized functional genomics

Cold Spring Harbor Laboratory Institutional Repository

Caltech Authors

A complete reference genome improves analysis of human genetic variation.

Author: Aganezov Sergey,
Publication venue
Publication date: 29/06/2023
Field of study

Ezid

Recommended from our members

Reconstruction of clone- and haplotype-specific cancer genome karyotypes from bulk tumor samples.

Author: Aganezov Sergey
Raphael Benjamin J
Publication venue
Publication date: 04/09/2020
Field of study

Many cancer genomes are extensively rearranged with aberrant chromosomal karyotypes. Deriving these karyotypes from high-throughput DNA sequencing of bulk tumor samples is complicated because most tumors are a heterogeneous mixture of normal cells and subpopulations of cancer cells, or clones, that harbor distinct somatic mutations. We introduce a new algorithm, Reconstructing Cancer Karyotypes (RCK), to reconstruct haplotype-specific karyotypes of one or more rearranged cancer genomes from DNA sequencing data from a bulk tumor sample. RCK leverages evolutionary constraints on the somatic mutational process in cancer to reduce ambiguity in the deconvolution of admixed sequencing data into multiple haplotype-specific cancer karyotypes. RCK models mixtures containing an arbitrary number of derived genomes and allows the incorporation of information both from short-read and long-read DNA sequencing technologies. We compare RCK to existing approaches on 17 primary and metastatic prostate cancer samples. We find that RCK infers cancer karyotypes that better explain the DNA sequencing data and conform to a reasonable evolutionary model. RCK's reconstructions of clone- and haplotype-specific karyotypes will aid further studies of the role of intra-tumor heterogeneity in cancer development and response to treatment. RCK is freely available as open source software

Princeton University Open Access Repository