61 research outputs found
Kernel-based distance metric learning for microarray data classification
BACKGROUND: The most fundamental task using gene expression data in clinical oncology is to classify tissue samples according to their gene expression levels. Compared with traditional pattern classifications, gene expression-based data classification is typically characterized by high dimensionality and small sample size, which make the task quite challenging. RESULTS: In this paper, we present a modified K-nearest-neighbor (KNN) scheme, which is based on learning an adaptive distance metric in the data space, for cancer classification using microarray data. The distance metric, derived from the procedure of a data-dependent kernel optimization, can substantially increase the class separability of the data and, consequently, lead to a significant improvement in the performance of the KNN classifier. Intensive experiments show that the performance of the proposed kernel-based KNN scheme is competitive to those of some sophisticated classifiers such as support vector machines (SVMs) and the uncorrelated linear discriminant analysis (ULDA) in classifying the gene expression data. CONCLUSION: A novel distance metric is developed and incorporated into the KNN scheme for cancer classification. This metric can substantially increase the class separability of the data in the feature space and, hence, lead to a significant improvement in the performance of the KNN classifier
The Validity of CET-6 among Chinese Students Studying Overseas
This paper focuses on the validity of College English Test Band 6 (CET-6) in oversea life among Chinese students to find out whether the scores of CET-6 can truly reflect students’ English language ability and whether it is possible to use the scores of CET-6 as a proof for English language proficiency. To do the survey, we conducted the survey by quantitative research methods with 50 samples in Universiti Putra Malaysia(UPM). After the collection and analysis of data, some current issues about the assessment standards of CET-6 are found, and suggestions are also given to improve the validity of CET-6
A ship detector applying Principal Component Analysis to the polarimetric Notch Filter
Ship detection using polarimetric synthetic aperture radar (PolSAR) data has attracted a lot of attention in recent years. Polarimetry can provide information regarding the scattering mechanisms of targets, which helps discriminate between ships and sea clutter. This enhancement is particularly valuable when we aim at detecting smaller vessels in rough sea states. This work exploits a ship detector called the Geometrical Perturbation-Polarimetric Notch Filter (GP-PNF), and it is aimed at improving its performance especially when less polarimetric images are available (e.g., dual-polarimetric data). The idea is to design a new polarimetric feature vector containing more features that are renowned to allow separation between ships and sea clutter. Then, a Principal Component Analysis (PCA) is further used to reduce the dimensionality of the new feature space. Experiments on four real Sentinel-1 datasets are carried out to demonstrate the validity of the proposed method and compare it against other ship detectors. Analyses of the experimental results show that the proposed algorithm can not only reduce the false alarms significantly, but also enhance the target-to-clutter ratio (TCR) so that it can more effectively detect weaker ships
OpenSARUrban: A Sentinel-1 SAR Image Dataset for Urban Interpretation
Sentinel-1 mission provides a freely accessible opportunity for urban interpretation from synthetic aperture radar (SAR) images with specific resolution, which is of paramount importance for earth observation. In parallel, with the rapid development of advanced technologies, especially deep learning, it is urgently needed to construct a large-scale SAR dataset leading urban interpretation. This paper presents OpenSARUrban: a Sentinel-1 dataset dedicated to urban interpretation from SAR images, including a well-defined hierarchical annotation scheme, the data collection, the well-established procedures for dataset construction and organizations, the properties, visualizations, and applications of this dataset. Particularly, the OpenSARUrban provides 33358 image patches of SAR urban scene, covering 21 major cities of China, including 10 different categories, 4
kinds of formats, 2 kinds of polarization modes, and owning 5 essential properties: large-scale, diversity, specificity, reliability, and sustainability. These properties guarantee the achievable of several goals for OpenSARUrban. The first is to support urban target characterization. The second is to help develop applicable and advanced algorithms for Sentinel-1 urban target classification. The dataset visualization is implemented from the perspective of manifold to give an intuitive understanding. Besides a detailed description and visualization of the dataset, we present results of some benchmark algorithms, demonstrating
that this dataset is practical and challenging. Notably, developing algorithms to enhance the classification performance on the whole dataset and considering the data imbalance are especially challenging
Recommended from our members
Long-term Variations of CO2 Trapped in Different Mechanisms in Deep Saline Formations: A Case Study of the Songliao Basin, China
The geological storage of CO{sub 2} in deep saline formations is increasing seen as a viable strategy to reduce the release of greenhouse gases to the atmosphere. There are numerous sedimentary basins in China, in which a number of suitable CO{sub 2} geologic reservoirs are potentially available. To identify the multi-phase processes, geochemical changes and mineral alteration, and CO{sub 2} trapping mechanisms after CO{sub 2} injection, reactive geochemical transport simulations using a simple 2D model were performed. Mineralogical composition and water chemistry from a deep saline formation of Songliao Basin were used. Results indicate that different storage forms of CO{sub 2} vary with time. In the CO{sub 2} injection period, a large amount of CO{sub 2} remains as a free supercritical phase (gas trapping), and the amount dissolved in the formation water (solubility trapping) gradually increases. Later, gas trapping decreases, solubility trapping increases significantly due to migration and diffusion of the CO{sub 2} plume, and the amount trapped by carbonate minerals increases gradually with time. The residual CO{sub 2} gas keeps dissolving into groundwater and precipitating carbonate minerals. For the Songliao Basin sandstone, variations in the reaction rate and abundance of chlorite, and plagioclase composition affect significantly the estimates of mineral alteration and CO{sub 2} storage in different trapping mechanisms. The effect of vertical permeability and residual gas saturation on the overall storage is smaller compared to the geochemical factors. However, they can affect the spatial distribution of the injected CO{sub 2} in the formations. The CO{sub 2} mineral trapping capacity could be in the order of ten kilogram per cubic meter medium for the Songliao Basin sandstone, and may be higher depending on the composition of primary aluminosilicate minerals especially the content of Ca, Mg, and Fe
The large area detector onboard the eXTP mission
The Large Area Detector (LAD) is the high-throughput, spectral-timing instrument onboard the eXTP mission, a flagship
mission of the Chinese Academy of Sciences and the China National Space Administration, with a large European
participation coordinated by Italy and Spain. The eXTP mission is currently performing its phase B study, with a target
launch at the end-2027. The eXTP scientific payload includes four instruments (SFA, PFA, LAD and WFM) offering
unprecedented simultaneous wide-band X-ray timing and polarimetry sensitivity. The LAD instrument is based on the
design originally proposed for the LOFT mission. It envisages a deployed 3.2 m2 effective area in the 2-30 keV energy
range, achieved through the technology of the large-area Silicon Drift Detectors - offering a spectral resolution of up to
200 eV FWHM at 6 keV - and of capillary plate collimators - limiting the field of view to about 1 degree. In this paper
we will provide an overview of the LAD instrument design, its current status of development and anticipated
performance
Optimized Kernel Machine for Cancer Classification Using Gene Expression Data
Abstract — The cancer classification using gene expression data has shown to be very useful for cancer diagnose and prediction. However, the nature of very high dimensionality and relatively small sample size associated with the gene expression data make the tasks of classification quite challenging. In this paper, we present a new approach, which is based on optimizing the kernel function, to improve the performances of the classifiers in classifying gene expression data. Aiming to increase the class separability of the data, we utilize a more flexible kernel function model, the data-dependent kernel, as the objective kernel to be optimized. The experimental results show that using the optimized kernel usually results in a substantial improvement for the K-nearest-neighbor (KNN) and support vector machine (SVM) in classifying gene expression data. I
- …