Search CORE

4,652 research outputs found

A novel computational framework for fast, distributed computing and knowledge integration for microarray gene expression data analysis

Author: Sethi Prerna
Publication venue: Louisiana Tech Digital Commons
Publication date: 01/04/2006
Field of study

The healthcare burden and suffering due to life-threatening diseases such as cancer would be significantly reduced by the design and refinement of computational interpretation of micro-molecular data collected by bioinformaticians. Rapid technological advancements in the field of microarray analysis, an important component in the design of in-silico molecular medicine methods, have generated enormous amounts of such data, a trend that has been increasing exponentially over the last few years. However, the analysis and handling of these data has become one of the major bottlenecks in the utilization of the technology. The rate of collection of these data has far surpassed our ability to analyze the data for novel, non-trivial, and important knowledge. The high-performance computing platform, and algorithms that utilize its embedded computing capacity, has emerged as a leading technology that can handle such data-intensive knowledge discovery applications. In this dissertation, we present a novel framework to achieve fast, robust, and accurate (biologically-significant) multi-class classification of gene expression data using distributed knowledge discovery and integration computational routines, specifically for cancer genomics applications. The research presents a unique computational paradigm for the rapid, accurate, and efficient selection of relevant marker genes, while providing parametric controls to ensure flexibility of its application. The proposed paradigm consists of the following key computational steps: (a) preprocess, normalize the gene expression data; (b) discretize the data for knowledge mining application; (c) partition the data using two proposed methods: partitioning with overlapped windows and adaptive selection; (d) perform knowledge discovery on the partitioned data-spaces for association rule discovery; (e) integrate association rules from partitioned data and knowledge spaces on distributed processor nodes using a novel knowledge integration algorithm; and (f) post-analysis and functional elucidation of the discovered gene rule sets. The framework is implemented on a shared-memory multiprocessor supercomputing environment, and several experimental results are demonstrated to evaluate the algorithms. We conclude with a functional interpretation of the computational discovery routines for enhanced biological physiological discovery from cancer genomics datasets, while suggesting some directions for future research

Louisiana Tech Digital Commons

Autonomous clustering using rough set theory

Author: A. K. Jain
A. K. Jain
A. Skowron
B. J. F. Manly
B. S. Everitt
C. L. Bean
C. L. Bean
Chandra Kambhampati
Charlotte Bean
D. Dubois
E. W. Forgey
F. H. C. Marriott
F. Höppner
G. H. Ball
J. A. Hartigan
J. B. MacQueen
J. C. Bezdek
J. C. Dunn
J. H. Ward
J. Komorowski
J. S. R. Jang
M. R. Anderberg
M. S. Aldenderfer
M. S. Kamel
P. Sneath
R. C. Jancey
R. R. Sokal
R. R. Yegar
S. Sharma
S. Z. Selim
T. Okuzaki
T. Sorensen
Z. Pawlak
Z. Pawlak
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

This paper proposes a clustering technique that minimises the need for subjective human intervention and is based on elements of rough set theory. The proposed algorithm is unified in its approach to clustering and makes use of both local and global data properties to obtain clustering solutions. It handles single-type and mixed attribute data sets with ease and results from three data sets of single and mixed attribute types are used to illustrate the technique and establish its efficiency

Repository@Hull - Worktribe

Crossref

Warwick Research Archives Portal Repository

Methods for fast and reliable clustering

Author: Kärkkäinen Ismo
Publication venue: University of Joensuu
Publication date
Field of study

UEF Electronic Publications

An intelligent, data-centric scheme for allocating queries to query processors

Author: Καρανίκα Άννα Τ.
Σούλα Μανταλένα Ν.
Publication venue
Publication date: 01/01/2019
Field of study

University of Thessaly Institutional Repository

A versatile programming model for dynamic task scheduling on cluster computers

Author: Jin Dejiang
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/2005
Field of study

This dissertation studies the development of application programs for parallel and distributed computer systems, especially PC clusters. A methodology is proposed to increase the efficiency of code development, the productivity of programmers and enhance performance of executing the developed programs on PC clusters while facilitating improvement of scalability and code portability of these programs. A new programming model, named the Super-Programming Model (SPM), is created. Programs are developed assuming an instruction set architecture comprised of SuperInstructions (SIs). SPM models the target system as a large Virtual Machine (VM); VM contains functional units which are underlain with sub-computer systems and SIs are implemented with codes. When these functional units execute SIs, their codes will run on member computers to perform the corresponding operations. This approach resembles the process of designing instruction sets for microprocessors but the VM employs much coarser instructions and data structures. SIs use Super-Data Blocks (SDBs) as their operands. Each SI is assigned to a single member computer and is indivisible (i.e., its implementation is not interrupted for I/O). SIs have predictable execution times because SDB sizes are limited by predefined thresholds. These qualities of SIs help dynamic load balancing. Employing software to implement instructions makes this approach more flexible. The developed programs fit to architectures of cluster systems better. SPM provides mechanisms, such as dynamic load balancing, to assure the efficient execution of programs. The vast majority of current programming models lack such mechanisms for distributed environments that suffer from long communication latencies. Since SPM employs coarse-grain tasks, the overall management overhead is small. SDB access can often overlap the execution of other SIs; a cache system further decreases average memory latencies. Since all SDBs are virtual entities, with the runtime system support, they can be accessed in parallel and efficiently minimizes additional constraints to parallelism from underlying computer systems. In this research, a reference implementation of VM has been developed. A performance estimation model is developed that takes these features into account. Finally, the definition of scalability for parallel/distributed processing is refined to represent a multi-dimensional entity. Sample cases are analyzed

Digital Commons @ New Jersey Institute of Technology (NJIT)