Search CORE

8 research outputs found

EMMA: Adding Sequences into a Constraint Alignment with High Accuracy and Scalability (Abstract)

Author: Liu Baqiao
Shen Chengze
Warnow Tandy
Williams Kelly P.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)
Publication date: 01/01/2023
Field of study

Multiple sequence alignment (MSA) is a crucial precursor to many downstream biological analyses, such as phylogeny estimation [Morrison, 2006], RNA structure prediction [Shapiro et al., 2007], protein structure prediction [Jumper et al., 2021], etc. Obtaining an accurate MSA can be challenging, especially when the dataset is large (i.e., more than 1000 sequences). A key technique for large-scale MSA estimation is to add sequences into an existing alignment. For example, biological knowledge can be used to form a reference alignment on a subset of the sequences, and then the remaining sequences can be added to the reference alignment. Another case where adding sequences into an existing alignment occurs is when new sequences or genomes are added to databases, leading to the opportunity to add the new sequences for each gene in the genome into a growing alignment. A third case is for de novo multiple sequence alignment, where a subset of the sequences is selected and aligned, and then the remaining sequences are added into this "backbone alignment" [Nguyen et al., 2015; Park et al., 2023; Shen et al., 2022; Liu and Warnow, 2023; Park and Warnow, 2023; Yamada et al., 2016]. Thus, adding sequences into existing alignments is a natural problem with multiple applications to biological sequence analysis. A few methods have been developed to add sequences into an existing alignment, with MAFFT--add [Katoh and Frith, 2012] perhaps the most well-known. However, several multiple sequence alignment methods that operate in two steps (first extract and align the backbone sequences and then add the remaining sequences into this backbone alignment) also provide utilities for adding sequences into a user-provided alignment. We present EMMA, a new approach for adding "query" sequences into an existing "constraint" alignment. By construction, EMMA never changes the constraint alignment, except through the introduction of additional sites to represent homologies between the query sequences. EMMA uses a divide-and-conquer technique combined with MAFFT--add (using the most accurate setting, MAFFT-linsi--add) to add sequences into a user-provided alignment. We evaluate EMMA by comparing it to MAFFT-linsi--add, MAFFT--add (the default setting), and WITCH-ng-add. We include a range of biological and simulated datasets (nucleotides and proteins) ranging in size from 1000 to almost 200,000 sequences and evaluate alignment accuracy and scalability. MAFFT-linsi--add was the slowest and least scalable method, only able to run on datasets with at most 1000 sequences in this study, but had excellent accuracy (often the best) on those datasets. We also see that EMMA has better recall than WITCH-ng-add and MAFFT--add on large datasets, especially when the backbone alignment is small or clade-based

Dagstuhl Research Online Publication Server

Contrastive Masked Autoencoders are Stronger Vision Learners

Author: Cheng Ming-Ming
Feng Jiashi
Fu Dongmei
Hou Qibin
Huang Zhicheng
Jin Xiaojie
Lu Chengze
Shen Xiaohui
Publication venue
Publication date: 28/01/2024
Field of study

Masked image modeling (MIM) has achieved promising results on various vision tasks. However, the limited discriminability of learned representation manifests there is still plenty to go for making a stronger vision learner. Towards this goal, we propose Contrastive Masked Autoencoders (CMAE), a new self-supervised pre-training method for learning more comprehensive and capable vision representations. By elaboratively unifying contrastive learning (CL) and masked image model (MIM) through novel designs, CMAE leverages their respective advantages and learns representations with both strong instance discriminability and local perceptibility. Specifically, CMAE consists of two branches where the online branch is an asymmetric encoder-decoder and the momentum branch is a momentum updated encoder. During training, the online encoder reconstructs original images from latent representations of masked images to learn holistic features. The momentum encoder, fed with the full images, enhances the feature discriminability via contrastive learning with its online counterpart. To make CL compatible with MIM, CMAE introduces two new components, i.e. pixel shifting for generating plausible positive views and feature decoder for complementing features of contrastive pairs. Thanks to these novel designs, CMAE effectively improves the representation quality and transfer performance over its MIM counterpart. CMAE achieves the state-of-the-art performance on highly competitive benchmarks of image classification, semantic segmentation and object detection. Notably, CMAE-Base achieves

85.3\%

top-1 accuracy on ImageNet and

52.5\%

mIoU on ADE20k, surpassing previous best results by

0.7\%

and

1.8\%

respectively. The source code is publicly accessible at \url{https://github.com/ZhichengHuang/CMAE}.Comment: Accepted by TPAM

arXiv.org e-Print Archive

BATCH-SCAMPP: Scaling Phylogenetic Placement Methods to Place Many Sequences (Abstract)

Author: Shen Chengze
Warnow Tandy
Wedell Eleanor
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)
Publication date: 01/01/2023
Field of study

Dagstuhl Research Online Publication Server

EMMA: a new method for computing multiple sequence alignments given a constraint subset alignment

Author: Baqiao Liu
Chengze Shen
Kelly P. Williams
Tandy Warnow
Publication venue: BMC
Publication date: 01/12/2023
Field of study

Abstract Background Adding sequences into an existing (possibly user-provided) alignment has multiple applications, including updating a large alignment with new data, adding sequences into a constraint alignment constructed using biological knowledge, or computing alignments in the presence of sequence length heterogeneity. Although this is a natural problem, only a few tools have been developed to use this information with high fidelity. Results We present EMMA (Extending Multiple alignments using MAFFT--add) for the problem of adding a set of unaligned sequences into a multiple sequence alignment (i.e., a constraint alignment). EMMA builds on MAFFT--add, which is also designed to add sequences into a given constraint alignment. EMMA improves on MAFFT--add methods by using a divide-and-conquer framework to scale its most accurate version, MAFFT-linsi--add, to constraint alignments with many sequences. We show that EMMA has an accuracy advantage over other techniques for adding sequences into alignments under many realistic conditions and can scale to large datasets with high accuracy (hundreds of thousands of sequences). EMMA is available at https://github.com/c5shen/EMMA . Conclusions EMMA is a new tool that provides high accuracy and scalability for adding sequences into an existing alignment

Directory of Open Access Journals

Large scale sequence alignment via efficient inference in generative models

Author: Arash Gholami Davoodi
Chengze Shen
Guillaume Marçais
Hosein Mohimani
Mihir Mongia
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/05/2023
Field of study

Abstract Finding alignments between millions of reads and genome sequences is crucial in computational biology. Since the standard alignment algorithm has a large computational cost, heuristics have been developed to speed up this task. Though orders of magnitude faster, these methods lack theoretical guarantees and often have low sensitivity especially when reads have many insertions, deletions, and mismatches relative to the genome. Here we develop a theoretically principled and efficient algorithm that has high sensitivity across a wide range of insertion, deletion, and mutation rates. We frame sequence alignment as an inference problem in a probabilistic model. Given a reference database of reads and a query read, we find the match that maximizes a log-likelihood ratio of a reference read and query read being generated jointly from a probabilistic model versus independent models. The brute force solution to this problem computes joint and independent probabilities between each query and reference pair, and its complexity grows linearly with database size. We introduce a bucketing strategy where reads with higher log-likelihood ratio are mapped to the same bucket with high probability. Experimental results show that our method is more accurate than the state-of-the-art approaches in aligning long-reads from Pacific Bioscience sequencers to genome sequences

Directory of Open Access Journals

Image_2_StRAB4 gene is required for filamentous growth, conidial development, and pathogenicity in Setosphaeria turcica.pdf

Author: Chengze Wang (11483452)
Fanli Zeng (791183)
Hang Zhu (303107)
Jingao Dong (3843526)
Jingzhe Jia (17748981)
Pan Li (335711)
Shang Feng (17748984)
Shen Shen (3509270)
Xinpeng Han (6269522)
Yanhui Wang (108090)
Zhimin Hao (3843529)
Publication venue
Publication date: 08/01/2024
Field of study

Setosphaeria turcica, the fungal pathogen responsible for northern corn leaf blight in maize, forms specialized infectious structures called appressoria that are critical for fungal penetration of maize epidermal cells. The Rab family of proteins play a crucial role in the growth, development, and pathogenesis of many eukaryotic species. Rab4, in particular, is a key regulator of endocytosis and vesicle trafficking, essential for filamentous growth and successful infection by other fungal pathogens. In this study, we silenced StRAB4 in S. turcica to gain a better understanding the function of Rab4 in this plant pathogen. Phenotypically, the mutants exhibited a reduced growth rate, a significant decline in conidia production, and an abnormal conidial morphology. These phenotypes indicate that StRab4 plays an instrumental role in regulating mycelial growth and conidial development in S. turcica. Further investigations revealed that StRab4 is a positive regulator of cell wall integrity and melanin secretion. Functional enrichment analysis of differentially expressed genes highlighted primary enrichments in peroxisome pathways, oxidoreductase and catalytic activities, membrane components, and cell wall organization processes. Collectively, our findings emphasize the significant role of StRab4 in S. turcica infection and pathogenicity in maize and provide valuable insights into fungal behavior and disease mechanisms.</p

FigShare

Genesis of the Weiquan Ag-Polymetallic Deposit in East Tianshan, China: Evidence from Zircon U-Pb Geochronology and C-H-O-S-Pb Isotope Systematics

Author: Albarède
Ault
Bixiang
Black
Charvet
Chen
Chen
Chengze
Clayton
Coleman
Coplen
Deng
Deng
Deng
Deng
Doe
Donoghue
Faure
Fifarek
Gao
Gleeson
Gwalani
Han
Han
Han
Hoefs
Hoefs
Hoskin
Hou
Huang
Huang
Imai
Ishihara
Ishihara
Jahn
Jahn
Jiahao
Jiajun
Jianhua
Jianming
jing
Jingbin
Jingwen
Jinyi
Kamvong
Kezhang
Kezhang
Lei
Leng
Li
Li
Liangshu
Liu
Liu
Liu
Longsheng
Ludwig
Mao
Mao
Mao
Marks
Munoz
Newton
O'Neil
Ohmoto
Ohmoto
Pirajno
Pirajno
Pirajno
Polya
Qin
Robinson
Ruishi
Ruishi
Ruxiong
Rye
Santosh
Schidlowski
Shen
Shen
Sheppard
Song
Su
Su
Sun
Sun
Sun
Tang
Tang
Taofa
Taylor
Taylor
Tianfeng
Ting
Wang
Wang
Wang
Wang
Wang
Wang
Wang
Wang
Wang
Wiedenbeck
Wilkinson
Windley
Wu
Wu
Xiang
Xiao
Xiao
Xiao
Xiao
Xiao
Xiao
Xinghua
Xinkun
Yang
Yanjing
Yanjing
Yanjing
Yinhong
Zartman
Zeng
Zengjie
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhicheng
Zhou
Zhou
Zhou
Şengör
Publication venue: 'Wiley'
Publication date
Field of study

Crossref

Chandra

Author: Chengze Liu
Ciotti L.
Desjardins T. D.
Eric W. Peng
Ferrarese L.
Ferrarese L.
Ferrarese L.
Fragos T.
Gallo E.
Gallo E.
Gonzalez A. H.
Hornschemeier A. E.
Hou M.
Jeltema T. E.
Jeltema T. E.
Jordán A.
Jordán A.
Kim D.-W.
Kim D.-W.
Kim S.
Lehmer B. D.
Li Z.
Li Z.
Liu C.
Liu C.
Luo B.
Mei S.
Meicun Hou
Merloni A.
Merritt D.
Mihos J. C.
Mihos J. C.
Mihos J. C.
Miller B.
Miller B.
Mineo S.
Morishita T.
Pandya V.
Park T.
Peacock M. B.
Phillipps S.
Sarazin C. L.
Schnittman J. D.
Shen Y.
Strader J.
Totsuji H.
Tzanavaris P.
Tzanavaris P.
Volonteri M.
Wang Q. D.
Zhiyuan Li
Publication venue: 'American Astronomical Society'
Publication date
Field of study

Crossref