Search CORE

7,906 research outputs found

Reuse of imputed data in microarray analysis increases imputation efficiency

Author: Kim Byoung-Jin
Kim Ki-Yeol
Yi Gwan-Su
Publication venue: BioMed Central
Publication date: 01/01/2004
Field of study

BACKGROUND: The imputation of missing values is necessary for the efficient use of DNA microarray data, because many clustering algorithms and some statistical analysis require a complete data set. A few imputation methods for DNA microarray data have been introduced, but the efficiency of the methods was low and the validity of imputed values in these methods had not been fully checked. RESULTS: We developed a new cluster-based imputation method called sequential K-nearest neighbor (SKNN) method. This imputes the missing values sequentially from the gene having least missing values, and uses the imputed values for the later imputation. Although it uses the imputed values, the efficiency of this new method is greatly improved in its accuracy and computational complexity over the conventional KNN-based method and other methods based on maximum likelihood estimation. The performance of SKNN was in particular higher than other imputation methods for the data with high missing rates and large number of experiments. Application of Expectation Maximization (EM) to the SKNN method improved the accuracy, but increased computational time proportional to the number of iterations. The Multiple Imputation (MI) method, which is well known but not applied previously to microarray data, showed a similarly high accuracy as the SKNN method, with slightly higher dependency on the types of data sets. CONCLUSIONS: Sequential reuse of imputed data in KNN-based imputation greatly increases the efficiency of imputation. The SKNN method should be practically useful to save the data of some microarray experiments which have high amounts of missing entries. The SKNN method generates reliable imputed values which can be used for further cluster-based analysis of microarray data

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Serrating Nozzle Surfaces for Complete Transfer of Droplets

Author: Kim Chang-Jin "CJ"
Yi Uichong
Publication venue
Publication date
Field of study

A method of ensuring the complete transfer of liquid droplets from nozzles in microfluidic devices to nearby surfaces involves relatively simple geometric modification of the nozzle surfaces. The method is especially applicable to nozzles in print heads and similar devices required to dispense liquid droplets having precise volumes. Examples of such devices include heads for soft printing of ink on paper and heads for depositing droplets of deoxyribonucleic acid (DNA) or protein solutions on glass plates to form microarrays of spots for analysis. The main purpose served by the present method is to ensure that droplets transferred from a nozzle have consistent volume, as needed to ensure accuracy in microarray analysis or consistent appearance of printed text and images. In soft printing, droplets having consistent volume are generated inside a print head, but in the absence of the present method, the consistency is lost in printing because after each printing action (in which a drop is ejected from a nozzle), a small residual volume of liquid remains attached to the nozzle. By providing for complete transfer of droplets (and thus eliminating residual liquid attached to the nozzle) the method ensures consistency of volume of transferred droplets. An additional benefit of elimination of residue is prevention of cross-contamination among different liquids printed through the same nozzle a major consideration in DNA microarray analysis. The method also accelerates the printing process by minimizing the need to clean a printing head to prevent cross-contamination. Soft printing involves a hydrophobic nozzle surface and a hydrophilic print surface. When the two surfaces are brought into proximity such that a droplet in the nozzle makes contact with the print surface, a substantial portion of the droplet becomes transferred to the print surface. Then as the nozzle and the print surface are pulled apart, the droplet is pulled apart and most of the droplet remains on the print surface. The basic principle of the present method is to reduce the liquid-solid surface energy of the nozzle to a level sufficiently below the intrinsic solid-liquid surface energy of the nozzle material so that the droplet is not pulled apart and, instead, the entire droplet volume becomes transferred to the print surface. In this method, the liquid-solid surface energy is reduced by introducing artificial surface roughness in the form of micromachined serrations on the inner nozzle surface (see figure). The method was tested in experiments on soft printing of DNA solutions and of deionized water through 0.5-mm-diameter nozzles, of which some were not serrated, some were partially serrated, and some were fully serrated. In the nozzles without serrations, transfer was incomplete; that is, residual liquids remained in the nozzles after printing. However, in every nozzle in which at least half the inner surface was serrated, complete transfer of droplets to the print surface was achieved

NASA Technical Reports Server

Faculty & Guest Artist Recital: Hannah Yi and Francisco Villa with Dr. So Jin Kim

Author: Kim So Jin
Villa Francisco
Yi Hannah
Publication venue: Chapman University Digital Commons
Publication date: 30/09/2023
Field of study

Chapman University Digital Commons

Distributed Machine Learning via Sufficient Factor Broadcasting

Author: Ho Qirong
Kim Jin Kyu
Kumar Abhimanu
Xie Pengtao
Xing Eric
Yu Yaoliang
Zhou Yi
Publication venue
Publication date: 07/09/2015
Field of study

Matrix-parametrized models, including multiclass logistic regression and sparse coding, are used in machine learning (ML) applications ranging from computer vision to computational biology. When these models are applied to large-scale ML problems starting at millions of samples and tens of thousands of classes, their parameter matrix can grow at an unexpected rate, resulting in high parameter synchronization costs that greatly slow down distributed learning. To address this issue, we propose a Sufficient Factor Broadcasting (SFB) computation model for efficient distributed learning of a large family of matrix-parameterized models, which share the following property: the parameter update computed on each data sample is a rank-1 matrix, i.e., the outer product of two "sufficient factors" (SFs). By broadcasting the SFs among worker machines and reconstructing the update matrices locally at each worker, SFB improves communication efficiency --- communication costs are linear in the parameter matrix's dimensions, rather than quadratic --- without affecting computational correctness. We present a theoretical convergence analysis of SFB, and empirically corroborate its efficiency on four different matrix-parametrized ML models

arXiv.org e-Print Archive

CiteSeerX