265 research outputs found
Recommended from our members
Parallel computing in information retrieval - An updated review
The progress of parallel computing in Information Retrieval (IR) is reviewed. In particular we stress the importance of the motivation in using parallel computing for Text Retrieval. We analyse parallel IR systems using a classification due to Rasmussen [1] and describe some parallel IR systems. We give a description of the retrieval models used in parallel Information Processing.. We describe areas of research which we believe are needed
Extending functional databases for use in text-intensive applications
This thesis continues research exploring the benefits of using functional
databases based around the functional data model for advanced database
applications-particularly those supporting investigative systems. This is a
growing generic application domain covering areas such as criminal and military
intelligence, which are characterised by significant data complexity, large data
sets and the need for high performance, interactive use. An experimental
functional database language was developed to provide the requisite semantic
richness. However, heavy use in a practical context has shown that language
extensions and implementation improvements are required-especially in the
crucial areas of string matching and graph traversal. In addition, an
implementation on multiprocessor, parallel architectures is essential to meet the
performance needs arising from existing and projected database sizes in the
chosen application area. [Continues.
A novel computational framework for fast, distributed computing and knowledge integration for microarray gene expression data analysis
The healthcare burden and suffering due to life-threatening diseases such as cancer would be significantly reduced by the design and refinement of computational interpretation of micro-molecular data collected by bioinformaticians. Rapid technological advancements in the field of microarray analysis, an important component in the design of in-silico molecular medicine methods, have generated enormous amounts of such data, a trend that has been increasing exponentially over the last few years. However, the analysis and handling of these data has become one of the major bottlenecks in the utilization of the technology. The rate of collection of these data has far surpassed our ability to analyze the data for novel, non-trivial, and important knowledge. The high-performance computing platform, and algorithms that utilize its embedded computing capacity, has emerged as a leading technology that can handle such data-intensive knowledge discovery applications.
In this dissertation, we present a novel framework to achieve fast, robust, and accurate (biologically-significant) multi-class classification of gene expression data using distributed knowledge discovery and integration computational routines, specifically for cancer genomics applications. The research presents a unique computational paradigm for the rapid, accurate, and efficient selection of relevant marker genes, while providing parametric controls to ensure flexibility of its application.
The proposed paradigm consists of the following key computational steps: (a) preprocess, normalize the gene expression data; (b) discretize the data for knowledge mining application; (c) partition the data using two proposed methods: partitioning with overlapped windows and adaptive selection; (d) perform knowledge discovery on the partitioned data-spaces for association rule discovery; (e) integrate association rules from partitioned data and knowledge spaces on distributed processor nodes using a novel knowledge integration algorithm; and (f) post-analysis and functional elucidation of the discovered gene rule sets. The framework is implemented on a shared-memory multiprocessor supercomputing environment, and several experimental results are demonstrated to evaluate the algorithms. We conclude with a functional interpretation of the computational discovery routines for enhanced biological physiological discovery from cancer genomics datasets, while suggesting some directions for future research
ICL Technical Journal 4(4): CAFS-ISP
The special issue of the ICL Technical Journal on CAFS-ISP. This closely followed the award to ICL of the Queen's Award for Technology in April, 1985. The contents include the history of the hardware and software, its status and future, perspectives from leading developers and users, and a list of related patents
An overview of decision table literature 1982-1995.
This report gives an overview of the literature on decision tables over the past 15 years. As much as possible, for each reference, an author supplied abstract, a number of keywords and a classification are provided. In some cases own comments are added. The purpose of these comments is to show where, how and why decision tables are used. The literature is classified according to application area, theoretical versus practical character, year of publication, country or origin (not necessarily country of publication) and the language of the document. After a description of the scope of the interview, classification results and the classification by topic are presented. The main body of the paper is the ordered list of publications with abstract, classification and comments.
Recommended from our members
Distributed Inverted Files and Performance: A Study of Parallelism and Data Distribution Methods in IR
The study investigates the performance of parallel information retrieval (IR) algorithms on different data distribution methods for Inverted files to identify which is the best for the requirements of specific IR tasks. We define a data distribution method as a way of distributing Inverted file data to local disks on a parallel machine. A data distribution method may be on-the-fly (with one copy of the index held), replication (all nodes have all of the index) or partitioning (data for index is split amongst nodes). Partitioning of inverted file data can be done in many ways but we consider only two: by term (Termld) and by document (Dodd). Termld partitioning is a type of partitioning which distributes unique word data to a single partition, while D odd partitioning distributes unique document data to a single partition. We consider the issue of improving the performance of standard IR algorithms on these data distribution methods by looking at sequential job service not concurrent job service, e.g. we consider the issue of sequential query service not concurrent query service. This methodology rules out some distribution methods for some tasks studied. We consider the following main tasks of IR: indexing, search, passage retrieval, inverted file update and query optimisation for routing /filtering. We produce a synthetic performance model for each of these tasks for the purposes of comparison. We have two subsidiary aims; one was to demonstrate portability of our implemented data structures and algorithms on different parallel machines. Secondly, we also study the possibility of increased retrieval effectiveness by examining a larger section of the search space for both passage retrieval and routing/filtering. We consider the implications of concurrency in updates on Inverted files. Our theoretical and empirical results show that in most cases the D odd partitioning method is the best data distribution method apart from routing/filtering where replication was found to be superior
- …