75 research outputs found

    Imaging of the Space-time Structure of a Vortex Generator in Supersonic Flow

    Get PDF
    AbstractThe fine space-time structure of a vortex generator (VG) in supersonic flow is studied with the nanoparticle-based planar laser scattering (NPLS) method in a quiet supersonic wind tunnel. The fine coherent structure at the symmetrical plane of the flow field around the VG is imaged with NPLS. The spatial structure and temporal evolution characteristics of the vortical structure are analyzed, which demonstrate periodic evolution and similar geometry, and the characteristics of rapid movement and slow change. Because the NPLS system yields the flow images at high temporal and spatial resolutions, from these images the position of a large scale structure can be extracted precisely. The position and velocity of the large scale structures can be evaluated with edge detection and correlation algorithms. The shocklet structures induced by vortices are imaged, from which the generation and development of shocklets are discussed in this paper

    Data Mining In Epigenetic Modification and Gene Expression

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.This thesis employs data mining techniques to discover domain knowledge in epigenetic modification and gene expression profile. Computational methods are developed for three research questions, namely, how to accurately predict DNA N⁴-methylcytosine site, how to precisely identify mRNA N⁶-methyladenosine sites, and how to identify lung cancer gene expression profile markers. The motivations of the proposed methods are improving the performance of computational methods via constructing efficient feature space, optimizing machine learning schemes, solving the data imbalance issue, and employing novel statistical analysis approach to provide researchers efficient computational tools. DNA N⁴-methylcytosine (4mC) is a critical epigenetic modification and plays various roles in the restriction-modification system. The computational methods have been explored to identify 4mC in the DNA sequence in recent years due to the high cost of experimental laboratory detection. However, the state-of-the-art methods have limited performance because of the lack of effective sequence features and the ad hoc choice of learning algorithms. Chapter 3 proposes a new method with novel sequence feature space and machine learning scheme. In sequence encoding, five essential sequence features are integrated into a 292-dimension feature space, representing both global and local sequence characteristics. Then a feature selection scheme is built, where the feature importance score produced from the training process of XGBoost machine is taken as the criterion of feature selection. At last, an SVM-based prediction model is trained with the selected features and optimized by 10-fold cross-validations. In the result part, the impact of feature selection on model performance is evaluated by an independent test. The proposed method outperforms three state-of-art predictors in both independent test and 10-fold cross-validation. Furthermore, two case studies prove the effectiveness of our method in practical situations. N⁶-methyladenosine (m⁶A) widely involves in mRNA metabolism and embryogenesis. Multiple computational human mRNA m⁶A site predictors have been developed. However, there are two main drawbacks of the existing methods: first, inadequate learning of the imbalanced training data; second, the sequence text features are not outstanding in representing m⁶A sequence characteristics. Chapter 4 proposes to use the cost-sensitive learning idea to solve the imbalance data issues in the problem. This cost-sensitive approach learns from the entire imbalanced dataset without a random selection of negative samples. In sequence representation, site location, entropy features and specific single nucleotide polymorphism (SNP) positions are taken as new features, which improve the performs significantly. In the comparison with existing predictors, our method achieves better correctness and robustness in both independent tests and case studies. The results suggest that imbalance learning is promising to improve the performance of m⁶A prediction. The early diagnosis of lung cancer has been a challenging problem in clinical practice for a long time. The identification of differentially expressed genes as a disease marker is a promising solution. Chapter 5 presents a novel approach to identify marker genes and define the boundary of gene expression profile for human lung cancer. By calculating the kernel maximum mean discrepancy, the proposed method evaluates the expression difference between normal, normal adjacent to tumor (NAT) and tumor samples. The expression level boundaries among different groups are defined with the information entropy theory for marker genes. Compared with two conventional methods t-test and fold change, the genes selected by MMD values have better performance under all metrics in 10-fold cross-validation. Furthermore, the GO and KEGG enrichment analysis validate the discovered marker gene in function pathways. At last, we choose ten most meaningful genes as lung cancer markers and calculate the expression profile boundaries. The proposed method is more accurate than conventional DEA methods in marker gene identification and provides a reliable method for defining the gene expression level boundaries
    corecore