12 research outputs found
Phylogenetic Reconstruction Analysis on Gene Order and Copy Number Variation
Genome rearrangement is known as one of the main evolutionary mechanisms on the genomic level. Phylogenetic analysis based on rearrangement played a crucial role in biological research in the past decades, especially with the increasing avail- ability of fully sequenced genomes. In general, phylogenetic analysis aims to solve two problems: Small Parsimony Problem (SPP) and Big Parsimony Problem (BPP). Maximum parsimony is a popular approach for SPP and BPP which relies on itera- tively solving a NP-hard problem, the median problem. As a result, current median solvers and phylogenetic inference methods based on the median problem all face se- rious problems on scalability and cannot be applied to datasets with large and distant genomes. In this thesis, we propose a new median solver for gene order data that combines double-cut-join (DCJ) sorting with the Simulated Annealing algorithm (SA- Median). Based on this median solver, we built a new phylogenetic inference method to solve both SPP and BPP problems. Our experimental results show that the new median solver achieves an excellent performance on simulated datasets and the phylo- genetic inference tool built based on the new median solver has a better performance than other existing methods.
Cancer is known for its heterogeneity and is regarded as an evolutionary process driven by somatic mutations and clonal expansions. This evolutionary process can be modeled by a phylogenetic tree and phylogenetic analysis of multiple subclones of cancer cells can facilitate the study of the tumor variants progression. Copy-number aberration occurs frequently in many types of tumors in terms of segmental ampli- fications and deletions. In this thesis, we developed a distance-based method for reconstructing phylogenies from copy-number profiles of cancer cells. We demon- strate the importance of distance correction from the edit (minimum) distance to the estimated actual number of events. Experimental results show that our approaches provide accurate and scalable results in estimating the actual number of evolutionary events between copy number profiles and in reconstructing phylogenies.
High-throughput sequencing of tumor samples has reported various degrees of ge- netic heterogeneity between primary tumors and their distant subpopulations. The clonal theory of cancer evolution shows that tumor cells are descended from a common origin cell. This origin cell includes an advantageous mutation that cause a clonal expansion with a large amount of population of cells descended from the origin cell. To further investigate cancer progression, phylogenetic analysis on the tumor cells is imperative. In this thesis, we developed a novel approach to infer the phylogeny to analyze both Next-Generation Sequencing and Long-Read Sequencing data. Experi- mental results show that our new proposed method can infer the entire phylogenetic progression very accurately on both Next-Generation Sequencing and Long-Read Se- quencing data.
In this thesis, we focused on phylogenetic analysis on both gene order sequence and copy number variations. Our thesis work can be categorized into three parts. First, we developed a new median solver to solve the median problem and phylogeny inference with DCJ model and apply our method to both simulated data and real yeast data. Second, we explored a new approach to infer the phylogeny of copy number profiles for a wide range of parameters (e.g., different number of leaf genomes, different number of positions in the genome, and different tree diameters). Third, we concentrated our work on the phylogeny inference on the high-throughput sequencing data and proposed a novel approach to further investigate and phylogenetic analyze the entire expansion process of cancer cells on both Next-Generation Sequencing and Long-Read Sequencing data
Fe-assisted epitaxial growth of 4-inch single-crystal transition-metal dichalcogenides on c-plane sapphire without miscut angle
Epitaxial growth and controllable doping of wafer-scale single-crystal
transition-metal dichalcogenides (TMDCs) are two central tasks for extending
Moore's law beyond silicon. However, despite considerable efforts, addressing
such crucial issues simultaneously under two-dimensional (2D) confinement is
yet to be realized. Here we design an ingenious epitaxial strategy to
synthesize record-breaking 4-inch single-crystal Fe-doped TMDCs monolayers on
industry-compatible c-plane sapphire without miscut angle. In-depth
characterizations and theoretical calculations reveal that the introduction of
Fe significantly decreases the formation energy of parallel steps on sapphire
surfaces and contributes to the edge-nucleation of unidirectional TMDCs domains
(>99%). The ultrahigh electron mobility (~86 cm2 V -1 s-1) and remarkable
on/off current ratio (~108) are discovered on 4-inch single-crystal Fe-MoS2
monolayers due to the ultralow contact resistance and perfect Ohmic contact
with metal electrodes. This work represents a substantial leap in terms of
bridging the synthesis and doping of wafer-scale single-crystal 2D
semiconductors without the need for substrate miscut, which should promote the
further device downscaling and extension of Moore's law.Comment: 17 pages, 5 figure
Reconstructing Yeasts Phylogenies and Ancestors from Whole Genome Data
Phylogenetic studies aim to discover evolutionary relationships and histories. These studies are based on similarities of morphological characters and molecular sequences. Currently, widely accepted phylogenetic approaches are based on multiple sequence alignments, which analyze shared gene datasets and concatenate/coalesce these results to a final phylogeny with maximum support. However, these approaches still have limitations, and often have conflicting results with each other. Reconstructing ancestral genomes helps us understand mechanisms and corresponding consequences of evolution. Most existing genome level phylogeny and ancestor reconstruction methods can only process simplified real genome datasets or simulated datasets with identical genome content, unique genome markers, and limited types of evolutionary events. Here, we provide an alternative way to resolve phylogenetic problems based on analyses of real genome data. We use phylogenetic signals from all types of genome level evolutionary events, and overcome the conflicting issues existing in traditional phylogenetic approaches. Further, we build an automated computational pipeline to reconstruct phylogenies and ancestral genomes for two high-resolution real yeast genome datasets. Comparison results with recent studies and publications show that we reconstruct very accurate and robust phylogenies and ancestors. Finally, we identify and analyze the conserved syntenic blocks among reconstructed ancestral genomes and present yeast species
Robust estimation of bacterial cell count from optical density
Optical density (OD) is widely used to estimate the density of cells in liquid culture, but cannot be compared between instruments without a standardized calibration protocol and is challenging to relate to actual cell count. We address this with an interlaboratory study comparing three simple, low-cost, and highly accessible OD calibration protocols across 244 laboratories, applied to eight strains of constitutive GFP-expressing E. coli. Based on our results, we recommend calibrating OD to estimated cell count using serial dilution of silica microspheres, which produces highly precise calibration (95.5% of residuals <1.2-fold), is easily assessed for quality control, also assesses instrument effective linear range, and can be combined with fluorescence calibration to obtain units of Molecules of Equivalent Fluorescein (MEFL) per cell, allowing direct comparison and data fusion with flow cytometry measurements: in our study, fluorescence per cell measurements showed only a 1.07-fold mean difference between plate reader and flow cytometry data
Bi-stream CNN Down Syndrome screening model based on genotyping array
Abstract Background Human Down syndrome (DS) is usually caused by genomic micro-duplications and dosage imbalances of human chromosome 21. It is associated with many genomic and phenotype abnormalities. Even though human DS occurs about 1 per 1,000 births worldwide, which is a very high rate, researchers haven’t found any effective method to cure DS. Currently, the most efficient ways of human DS prevention are screening and early detection. Methods In this study, we used deep learning techniques and analyzed a set of Illumina genotyping array data. We built a bi-stream convolutional neural networks model to screen/predict the occurrence of DS. Firstly, we built image input data by converting the intensities of each SNP site into chromosome SNP maps. Next, we proposed a bi-stream convolutional neural network (CNN) architecture with nine layers and two branch models. We further merged two CNN branch models into one model in the fourth convolutional layer, and output the prediction in the last layer. Results Our bi-stream CNN model achieved 99.3% average accuracies, and very low false-positive and false-negative rates, which was necessary for further applications in disease prediction and medical practice. We further visualized the feature maps and learned filters from intermediate convolutional layers, which showed the genomic patterns and correlated SNPs variations in human DS genomes. We also compared our methods with other CNN and traditional machine learning models. We further analyzed and discussed the characteristics and strengths of our bi-stream CNN model. Conclusions Our bi-stream model used two branch CNN models to learn the local genome features and regional patterns among adjacent genes and SNP sites from two chromosomes simultaneously. It achieved the best performance in all evaluating metrics when compared with two single-stream CNN models and three traditional machine-learning algorithms. The visualized feature maps also provided opportunities to study the genomic markers and pathway components associated with Human DS, which provided insights for gene therapy and genomic medicine developments
A computational approach for examining the roots and spreading patterns of fake news: Evolution tree analysis
To improve the flow of quality information and combat fake news on social media, it is essential to identify the origins and evolution patterns of false information. However, scholarship dedicated to this area is lacking. Using a recent development in the field of computational network science (i.e., evolution tree analysis), this study examined this issue in the context of the 2016 US presidential election. By retrieving 307,738 tweets about 30 fake and 30 real news stories, we examined the root content, producers of original source, and evolution patterns. The findings revealed that root tweets about fake news were mostly generated by accounts from ordinary users, but they often included a link to non-credible news websites. Additionally, we observed significant differences between real and fake news stories in terms of evolution patterns. In our evolution tree analysis, tweets about real news showed wider breadth and shorter depth than tweets about fake news. The results also indicated that tweets about real news spread widely and quickly, but tweets about fake news underwent a greater number of modifications in content over the spreading process.
•The evolution tree analysis was performed on tweets about real and fake news stories.•Most root tweets about fake news were generated by ordinary users.•Root tweets about fake news often included a link to non-credible news websites.•Tweets about fake news undergo frequent modifications in the original content over the spreading process.•Tweets about real news spread widely without modifications in the original content
Robust multiferroic in interfacial modulation synthesized wafer-scale one-unit-cell of chromium sulfide
Abstract Multiferroic materials offer a promising avenue for manipulating digital information by leveraging the cross-coupling between ferroelectric and ferromagnetic orders. Despite the ferroelectricity has been uncovered by ion displacement or interlayer-sliding, one-unit-cell of multiferroic materials design and wafer-scale synthesis have yet to be realized. Here we develope an interface modulated strategy to grow 1-inch one-unit-cell of non-layered chromium sulfide with unidirectional orientation on industry-compatible c-plane sapphire. The interfacial interaction between chromium sulfide and substrate induces the intralayer-sliding of self-intercalated chromium atoms and breaks the space reversal symmetry. As a result, robust room-temperature ferroelectricity (retaining more than one month) emerges in one-unit-cell of chromium sulfide with ultrahigh remanent polarization. Besides, long-range ferromagnetic order is discovered with the Curie temperature approaching 200 K, almost two times higher than that of bulk counterpart. In parallel, the magnetoelectric coupling is certified and which makes 1-inch one-unit-cell of chromium sulfide the largest and thinnest multiferroics