Search CORE

441 research outputs found

Unsupervised feature analysis for high dimensional big data

Author: Qian Mingjie
Publication venue
Publication date
Field of study

In practice we often encounter the scenario that label information is unavailable due to either high cost of manual labeling or unwillingness of users to label. When label information is not available, traditional supervised learning can not be directly applied so we need to study unsupervised methods which could work well even without supervision. Feature analysis has been proven effective and important for many applications. Feature analysis is a broad research field, whose research topics includes but are not limited to feature selection, feature extraction, feature construction, and feature composition e.g., in topic discovery the learned topics can be viewed as compound features. In many real systems, it is often necessary and important to do feature analysis to determine which individual or compound features should be used for posterior learning tasks. The effectiveness of traditional feature analysis often relies on labels of the training data examples. However, in the era of big data, label information is often unavailable. In the unsupervised scenario, it is more challenging to do feature analysis. Two important research topics in unsupervised feature analysis are unsupervised feature selection and unsupervised feature composition, e.g., to discover topics as compound features. This would naturally create two lines for unsupervised feature analysis. Also, combined with single-view or multiple-view for the data, we would generate a table with four cells. Except for the single-view feature composition (or topic discovery) where there're already many work done e.g., PLSA, LDA, and NMF, the other three cells correspond to new research topics, and there is few work done yet. For single view unsupervised feature analysis, we propose two unsupervised feature selection methods. For multi-view unsupervised feature analysis, we focus on text-image web news data and propose a multi-view unsupervised feature selection method and a text-image topic model. Specifically, for single-view unsupervised feature selection, we propose a new method that is called Robust Unsupervised Feature Selection (RUFS), where pseudo cluster labels are learned via local learning regularized robust NMF and feature selection is performed simultaneously by robust joint

l_{2, 1}

-norm minimization. Outliers could be effectively handled and redundant or noisy features could be effectively reduced. We also design a (projected) limited-memory BFGS based linear time iterative algorithm to efficiently solve the optimization problem. We also study how the choice of norms for data fitting and feature selection terms affect the ultimate unsupervised feature selection performance. Specifically, we propose to use joint adaptive loss and

l_2/l_0

minimization for data fitting and feature selection. We mathematically explain desirable properties of joint adaptive loss and

l_2/l_0

minimization over recent unsupervised feature selection models. We solve the optimization problem with an efficient iterative algorithm whose computational complexity and memory cost are linear to both sample size and feature size. For multiple-view unsupervised feature selection, we propose a more effective approach for high dimensional text-image web news data. We propose to use raw text features in label learning to avoid information loss. We propose a new multi-view unsupervised feature selection method in which image local learning regularized orthogonal nonnegative matrix factorization is used to learn pseudo labels and simultaneously robust joint

l_{2,1}

-norm minimization is performed to select discriminative features. Cross-view consensus on pseudo labels can be obtained as much as possible. For multi-view topic discovery, we study how to systematically mine topics from high dimensional text-image web news data. The application problem is important because almost all news articles have one picture associated. Unlike traditional topic modeling which considers text alone, the new task aims to discover heterogeneous topics from web news of multiple data types. We propose to tackle the problem by a regularized nonnegative constrained

l_{2,1}

-norm minimization framework. We also present a new iterative algorithm to solve the optimization problem. The proposed single-view feature selection methods can be applied on almost all single-view data. The proposed multi-view methods are designed to process text-image web news data, but the idea can be naturally generalized to analyze any multi-view data. Practitioners could run the proposed methods to select features that will be used in posterior learning tasks. One can also run our multi-view topic model to analyze and visualize topics in text-image web news corpora to help interpret the data

Illinois Digital Environment for Access to Learning and Scholarship Repository

An efficient algorithm for multiuser sum-rate maximization of large-scale active RIS-aided MIMO system

Author: Li Qiang
Liu Ju
Shao Mingjie
Zhang Qian
Publication venue
Publication date: 11/01/2024
Field of study

Active reconfigurable intelligent surface (RIS) is a new RIS architecture that can reflect and amplify communication signals. It can provide enhanced performance gain compared to the conventional passive RIS systems that can only reflect the signals. On the other hand, the design problem of active RIS-aided systems is more challenging than the passive RIS-aided systems and its efficient algorithms are less studied. In this paper, we consider the sum rate maximization problem in the multiuser massive multiple-input single-output (MISO) downlink with the aid of a large-scale active RIS. Existing approaches for handling this problem usually resort to general optimization solvers and can be computationally prohibitive. We propose an efficient block successive upper bound minimization (BSUM) method, of which each step has a (semi) closed-form update. Thus, the proposed algorithm has an attractive low per-iteration complexity. By simulation, our proposed algorithm consumes much less computation than the existing approaches. In particular, when the MIMO and/or RIS sizes are large, our proposed algorithm can be orders-of-magnitude faster than existing approaches.Comment: ICASSP 202

arXiv.org e-Print Archive

Soft Scattering Evaporation of Dark Matter Subhalos by Inner Galactic Gases

Author: Bi Xiao-jun
Gao Yu
Jin Mingjie
Lin Yugen
Xiang Qian-Fei
Publication venue
Publication date: 02/12/2021
Field of study

The large gap between a galactic dark matter subhalo's velocity and its own gravitational binding velocity creates the situation that dark matter soft-scattering on baryons to evaporate the subhalo, if kinetic energy transfer is efficient by low momentum exchange. Small subhalos can evaporate before dark matter thermalize with baryons due to the low binding velocity. In case dark matter acquires an electromagnetic dipole moment, the survival of low-mass subhalos requires stringent limits on the photon-mediated soft scattering. We calculate the subhalo evaporation rate via soft collision by ionization gas and accelerated cosmic rays, and show the stability of subhalos lighter than

10^{-5}M_{\odot}

in the gaseous inner galactic region is sensitive to dark matter's effective electric and magnetic dipole moments below current direct detection limits.Comment: 8 pages, 4 figure

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

Directory of Open Access Journals

Developmental expression of BK channels in chick cochlear hair cells

Author: Atkin Graham M
Duncan R Keith
Li Yi
Liu Li Qian
Morales Marti M
Tong Mingjie
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Cochlear hair cells are high-frequency sensory receptors. At the onset of hearing, hair cells acquire fast, calcium-activated potassium (BK) currents, turning immature spiking cells into functional receptors. In non-mammalian vertebrates, the number and kinetics of BK channels are varied systematically along the frequency-axis of the cochlea giving rise to an intrinsic electrical tuning mechanism. The processes that control the appearance and heterogeneity of hair cell BK currents remain unclear. Results Quantitative PCR results showed a non-monotonic increase in BK α subunit expression throughout embryonic development of the chick auditory organ (i.e. basilar papilla). Expression peaked near embryonic day (E) 19 with six times the transcript level of E11 sensory epithelia. The steady increase in gene expression from E11 to E19 could not explain the sudden acquisition of currents at E18-19, implicating post-transcriptional mechanisms. Protein expression also preceded function but progressed in a sequence from diffuse cytoplasmic staining at early ages to punctate membrane-bound clusters at E18. Electrophysiology data confirmed a continued refinement of BK trafficking from E18 to E20, indicating a translocation of BK clusters from supranuclear to subnuclear domains over this critical developmental age. Conclusions Gene products encoding BK α subunits are detected up to 8 days before the acquisition of anti-BK clusters and functional BK currents. Therefore, post-transcriptional mechanisms seem to play a key role in the delayed emergence of calcium-sensitive currents. We suggest that regulation of translation and trafficking of functional α subunits, near voltage-gated calcium channels, leads to functional BK currents at the onset of hearing.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Deep Blue Documents

Couler: Unified Machine Learning Workflow Optimization in Cloud

Author: Guo Tengda
Qian Jiang
Sang Bo
Sha Jian
Tang Mingjie
Tang Yuan
Wang Xiaoda
Wu Jingji
Zhang Ke
Publication venue
Publication date: 12/03/2024
Field of study

Machine Learning (ML) has become ubiquitous, fueling data-driven applications across various organizations. Contrary to the traditional perception of ML in research, ML workflows can be complex, resource-intensive, and time-consuming. Expanding an ML workflow to encompass a wider range of data infrastructure and data types may lead to larger workloads and increased deployment costs. Currently, numerous workflow engines are available (with over ten being widely recognized). This variety poses a challenge for end-users in terms of mastering different engine APIs. While efforts have primarily focused on optimizing ML Operations (MLOps) for a specific workflow engine, current methods largely overlook workflow optimization across different engines. In this work, we design and implement Couler, a system designed for unified ML workflow optimization in the cloud. Our main insight lies in the ability to generate an ML workflow using natural language (NL) descriptions. We integrate Large Language Models (LLMs) into workflow generation, and provide a unified programming interface for various workflow engines. This approach alleviates the need to understand various workflow engines' APIs. Moreover, Couler enhances workflow computation efficiency by introducing automated caching at multiple stages, enabling large workflow auto-parallelization and automatic hyperparameters tuning. These enhancements minimize redundant computational costs and improve fault tolerance during deep learning workflow training. Couler is extensively deployed in real-world production scenarios at Ant Group, handling approximately 22k workflows daily, and has successfully improved the CPU/Memory utilization by more than 15% and the workflow completion rate by around 17%

arXiv.org e-Print Archive

Astragalus Granule Prevents Ca 2+

Author: Haifeng Cui
Jie Wan
Jing Liu
Jinjin Lu
Lina Li
Mingjie Sun
Mingxue Zhou
Qian Lin
Qian Wu
Qun Gao
Sinai Li
Weihong Liu
Xiaolu Shi
Xiaoyun Cui
Yan Li
Yibing Nong
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2017
Field of study

Background. Astragalus was broadly used for treating heart failure (HF) and arrhythmias in East Asia for thousands of years. Astragalus granule (AG), extracted from Astragalus, shows beneficial effect on the treatment of HF in clinical research. We hypothesized that administration of AG prevents the remodeling of L-type Ca2+ current (ICa-L) in HF mice by the downregulation of Ca2+/calmodulin-dependent protein kinase II (CaMKII). Methods. HF mice were induced by thoracic aortic constriction (TAC). After 4 weeks of AG treatment, cardiac function and QT interval were evaluated. Single cardiac ventricular myocyte was then isolated and whole-cell patch clamp was used to record action potential (AP) and ICa-L. The expressions of L-type calcium channel alpha 1C subunit (Cav1.2), CaMKII, and phosphorylated protein kinase A (p-PKA) were examined by western blot. Results. The failing heart manifested distinct electrical remodeling including prolonged repolarization time and altered ICa-L kinetics. AG treatment attenuated this electrical remodeling, supported by AG-related shortened repolarization time, decreased peak ICa-L, accelerated ICa-L inactivation, and positive frequency-dependent ICa-L facilitation. In addition, AG treatment suppressed the overexpression of CaMKII, but not p-PKA, in the failing heart. Conclusion. AG treatment protected the failing heart against electrical remodeling and ICa-L remodeling by downregulating CaMKII

Crossref

Directory of Open Access Journals

An established protocol for generating transgenic wheat for wheat functional genomics via particle bombardment

Author: Boju Yu
Guangxiao Yang
Guangyuan He
Hongyan Zhao
Jian Zeng
Junli Chang
Li Li
Mingjie Chen
Peipei Su
Qian Zhang
Ruibin Wang
Xiaoxue Xie
Yaqiong Wang
Ya’nan Wu
Yin Li
Yuesheng Wang
Yufan Zhang
Publication venue: 'Frontiers Media SA'
Publication date: 01/12/2022
Field of study

Wheat is one of the most important food crops in the world and is considered one of the top targets in crop biotechnology. With the high-quality reference genomes of wheat and its relative species and the recent burst of genomic resources in Triticeae, demands to perform gene functional studies in wheat and genetic improvement have been rapidly increasing, requiring that production of transgenic wheat should become a routine technique. While established for more than 20 years, the particle bombardment-mediated wheat transformation has not become routine yet, with only a handful of labs being proficient in this technique. This could be due to, at least partly, the low transformation efficiency and the technical difficulties. Here, we describe the current version of this method through adaptation and optimization. We report the detailed protocol of producing transgenic wheat by the particle gun, including several critical steps, from the selection of appropriate explants (i.e., immature scutella), the preparation of DNA-coated gold particles, and several established strategies of tissue culture. More importantly, with over 20 years of experience in wheat transformation in our lab, we share the many technical details and recommendations and emphasize that the particle bombardment-mediated approach has fewer limitations in genotype dependency and vector construction when compared with the Agrobacterium-mediated methods. The particle bombardment-mediated method has been successful for over 30 wheat genotypes, from the tetraploid durum wheat to the hexaploid common wheat, from modern elite varieties to landraces. In conclusion, the particle bombardment-mediated wheat transformation has demonstrated its potential and wide applications, and the full set of protocol, experience, and successful reports in many wheat genotypes described here will further its impacts, making it a routine and robust technique in crop research labs worldwide

Directory of Open Access Journals

Multimodal Data and Multiscale Kernel-Based Multistream CNN for Fine Classification of a Complex Surface-Mined Area

Author: Mingjie Qian
Song Sun
Xianju Li
Publication venue: MDPI AG
Publication date: 01/12/2021
Field of study

Fine land cover classification (FLCC) of complex landscapes is a popular and challenging task in the remote sensing community. In complex surface-mined areas (CSMAs), researchers have conducted FLCC using traditional machine learning methods and deep learning algorithms. However, convolutional neural network (CNN) algorithms that may be useful for FLCC of CSMAs have not been fully investigated. This study proposes a multimodal remote sensing data and multiscale kernel-based multistream CNN (3M-CNN) model. Experiments based on two ZiYuan-3 (ZY-3) satellite imageries of different times and seasons were conducted in Wuhan, China. The 3M-CNN model had three main features: (1) multimodal data-based multistream CNNs, i.e., using ZY-3 imagery-derived true color, false color, and digital elevation model data to form three CNNs; (2) multisize neighbors, i.e., using different neighbors of optical and topographic data as inputs; and (3) multiscale convolution flows revised from an inception module for optical and topographic data. Results showed that the proposed 3M-CNN model achieved excellent overall accuracies on two different images, and outperformed other comparative models. In particular, the 3M-CNN model yielded obvious better visual performances. In general, the proposed process was beneficial for the FLCC of complex landscape areas

Directory of Open Access Journals

Prior Knowledge-Based Deep Convolutional Neural Networks for Fine Classification of Land Covers in Surface Mining Landscapes

Author: Mingjie Qian
Xuting Yu
Yifan Li
Yunbo Zhao
Publication venue: MDPI AG
Publication date: 01/10/2022
Field of study

Land cover classification is critical for urban sustainability applications. Although deep convolutional neural networks (DCNNs) have been widely utilized, they have rarely been used for land cover classification of complex landscapes. This study proposed the prior knowledge-based pretrained DCNNs (i.e., VGG and Xception) for fine land cover classifications of complex surface mining landscapes. ZiYuan-3 data collected over an area of Wuhan City, China, in 2012 and 2020 were used. The ZiYuan-3 imagery consisted of multispectral imagery with four bands and digital terrain model data. Based on prior knowledge, the inputs of true and false color images were initially used. Then, a combination of the first and second principal components of the four bands and the digital terrain model data (PD) was examined. In addition, the combination of red and near-infrared bands and digital terrain model data (43D) was evaluated (i.e., VGG-43D and Xcep-43D). The results indicate that: (1) the input of 43D performed better than the others; (2) VGG-43D achieved the best overall accuracy values; (3) although the use of PD did not produce the best models, it also provides a strategy for integrating DCNNs and multi-band and multimodal data. These findings are valuable for future applications of DCNNs to determine fine land cover classifications in complex landscapes

Directory of Open Access Journals