955 research outputs found
Clustering of Leukemia Patients via Gene Expression Data Analysis
This thesis attempts to cluster some leukemia patients described by gene expression data, and discover the most discriminating a few genes that are responsible for the clustering. A combined approach of Principal Direction Divisive Partitioning and bisect K-means algorithms is applied to the clustering of the selected leukemia dataset, and both unsupervised and supervised methods are considered in order to get the optimal results. As shown by the experimental results and the predefined reference, the combination of PDDP and bisect K-means successfully clusters the leukemia patients, and efficiently discovers some significant genes that can serve as the discriminator of the clustering. The combined approach works well on the automatic clustering of leukemia patients depending merely on the gene expression information, and it has great potential on solving similar problems. The discovered a few genes may provide very important information for the diagnosis of the disease of leukemia
Robust and Efficient Algorithms for Protein 3-D Structure Alignment and Genome Sequence Comparison
Sequence analysis and structure analysis are two of the fundamental areas of bioinformatics research. This dissertation discusses, specifically, protein structure related problems including protein structure alignment and query, and genome sequence related problems including haplotype reconstruction and genome rearrangement. It first presents an algorithm for pairwise protein structure alignment that is tested with structures from the Protein Data Bank (PDB). In many cases it outperforms two other well-known algorithms, DaliLite and CE. The preliminary algorithm is a graph-theory based approach, which uses the concept of \stars to reduce the complexity of clique-finding algorithms. The algorithm is then improved by introducing \double-center stars in the graph and applying a self-learning strategy. The updated algorithm is tested with a much larger set of protein structures and shown to be an improvement in accuracy, especially in cases of weak similarity. A protein structure query algorithm is designed to search for similar structures in the PDB, using the improved alignment algorithm. It is compared with SSM and shows better performance with lower maximum and average Q-score for missing proteins. An interesting problem dealing with the calculation of the diameter of a 3-D sequence of points arose and its connection to the sublinear time computation is discussed. The diameter calculation of a 3-D sequence is approximated by a series of sublinear time deterministic, zero-error and bounded-error randomized algorithms and we have obtained a series of separations about the power of sublinear time computations. This dissertation also discusses two genome sequence related problems. A probabilistic model is proposed for reconstructing haplotypes from SNP matrices with incomplete and inconsistent errors. The experiments with simulated data show both high accuracy and speed, conforming to the theoretically provable e ciency and accuracy of the algorithm. Finally, a genome rearrangement problem is studied. The concept of non-breaking similarity is introduced. Approximating the exemplar non-breaking similarity to factor n1..f is proven to be NP-hard. Interestingly, for several practical cases, several polynomial time algorithms are presented
Recommended from our members
Influence of Gate Separation on IGZO Thin Film Transistor Behavior
Metal oxide are attracting great interests in the electronics field as a promising active layercandidate for various uses including wearable sensors, flexible display, and LED displays. Thecurrent status of manufacturing relies on cleanroom manufacturing, which can be time consumingand costly. Consequently, a repeatable and reliable process to fabricate stable, large scale TFT isneeded for manufacturing and consumer’s need. Metal oxides have proven their values to be thenext generation display for their hi-performance electrical characteristic, abundance, and straightforward fabrication method. In particular, system consists of Indium-Gallium-Zinc-Oxide (IGZO)has demonstrated stability as well as high electrical performance. Science then, TFTs with IGZOsystems had prompt extensive research in the solution process field. Since the conventional methodare limited by sample size and processing time, solution-processing had opened gateway to moreflexible, even large-scale fabrication with way less steps and processing time. The major drawbackof solution processing the its instability, uncertainty, and weaker device performance comparingto those fabricated in the cleanroom environment. In this work, several methods were investigatedincluding direct light patterning and UV and ozone treatment of sample surface to improve deviceperformance. A gallium rich IGZO solution TFT with 2:2:1 molar ratio was made with direct lightpatterning method and compared to conventionally made IGZO TFT. It is shown that direct lightpattering could drastically enhance device stability and performances. Other factors such as clustersize, interface treatment, and etchant composition could greatly affect the outcome as well
Efficient protein alignment algorithm for protein search
© 2010 Lu et al; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution Licens
Benign Adversarial Attack: Tricking Models for Goodness
In spite of the successful application in many fields, machine learning
models today suffer from notorious problems like vulnerability to adversarial
examples. Beyond falling into the cat-and-mouse game between adversarial attack
and defense, this paper provides alternative perspective to consider
adversarial example and explore whether we can exploit it in benign
applications. We first attribute adversarial example to the human-model
disparity on employing non-semantic features. While largely ignored in
classical machine learning mechanisms, non-semantic feature enjoys three
interesting characteristics as (1) exclusive to model, (2) critical to affect
inference, and (3) utilizable as features. Inspired by this, we present brave
new idea of benign adversarial attack to exploit adversarial examples for
goodness in three directions: (1) adversarial Turing test, (2) rejecting
malicious model application, and (3) adversarial data augmentation. Each
direction is positioned with motivation elaboration, justification analysis and
prototype applications to showcase its potential.Comment: ACM MM2022 Brave New Ide
Two-Dimensional Numerical Modelling of a Moored Floating Body under Sloping Seabed Conditions
publishedVersio
MGMAE: Motion Guided Masking for Video Masked Autoencoding
Masked autoencoding has shown excellent performance on self-supervised video
representation learning. Temporal redundancy has led to a high masking ratio
and customized masking strategy in VideoMAE. In this paper, we aim to further
improve the performance of video masked autoencoding by introducing a motion
guided masking strategy. Our key insight is that motion is a general and unique
prior in video, which should be taken into account during masked pre-training.
Our motion guided masking explicitly incorporates motion information to build
temporal consistent masking volume. Based on this masking volume, we can track
the unmasked tokens in time and sample a set of temporal consistent cubes from
videos. These temporal aligned unmasked tokens will further relieve the
information leakage issue in time and encourage the MGMAE to learn more useful
structure information. We implement our MGMAE with an online efficient optical
flow estimator and backward masking map warping strategy. We perform
experiments on the datasets of Something-Something V2 and Kinetics-400,
demonstrating the superior performance of our MGMAE to the original VideoMAE.
In addition, we provide the visualization analysis to illustrate that our MGMAE
can sample temporal consistent cubes in a motion-adaptive manner for more
effective video pre-training.Comment: ICCV 2023 camera-ready versio
- …