5,423 research outputs found
A versatile programming model for dynamic task scheduling on cluster computers
This dissertation studies the development of application programs for parallel and distributed computer systems, especially PC clusters. A methodology is proposed to increase the efficiency of code development, the productivity of programmers and enhance performance of executing the developed programs on PC clusters while facilitating improvement of scalability and code portability of these programs. A new programming model, named the Super-Programming Model (SPM), is created. Programs are developed assuming an instruction set architecture comprised of SuperInstructions (SIs). SPM models the target system as a large Virtual Machine (VM); VM contains functional units which are underlain with sub-computer systems and SIs are implemented with codes. When these functional units execute SIs, their codes will run on member computers to perform the corresponding operations.
This approach resembles the process of designing instruction sets for microprocessors but the VM employs much coarser instructions and data structures. SIs use Super-Data Blocks (SDBs) as their operands. Each SI is assigned to a single member computer and is indivisible (i.e., its implementation is not interrupted for I/O). SIs have predictable execution times because SDB sizes are limited by predefined thresholds. These qualities of SIs help dynamic load balancing. Employing software to implement instructions makes this approach more flexible. The developed programs fit to architectures of cluster systems better. SPM provides mechanisms, such as dynamic load balancing, to assure the efficient execution of programs. The vast majority of current programming models lack such mechanisms for distributed environments that suffer from long communication latencies. Since SPM employs coarse-grain tasks, the overall management overhead is small. SDB access can often overlap the execution of other SIs; a cache system further decreases average memory latencies. Since all SDBs are virtual entities, with the runtime system support, they can be accessed in parallel and efficiently minimizes additional constraints to parallelism from underlying computer systems.
In this research, a reference implementation of VM has been developed. A performance estimation model is developed that takes these features into account. Finally, the definition of scalability for parallel/distributed processing is refined to represent a multi-dimensional entity. Sample cases are analyzed
Data Mining and Machine Learning in Astronomy
We review the current state of data mining and machine learning in astronomy.
'Data Mining' can have a somewhat mixed connotation from the point of view of a
researcher in this field. If used correctly, it can be a powerful approach,
holding the potential to fully exploit the exponentially increasing amount of
available data, promising great scientific advance. However, if misused, it can
be little more than the black-box application of complex computing algorithms
that may give little physical insight, and provide questionable results. Here,
we give an overview of the entire data mining process, from data collection
through to the interpretation of results. We cover common machine learning
algorithms, such as artificial neural networks and support vector machines,
applications from a broad range of astronomy, emphasizing those where data
mining techniques directly resulted in improved science, and important current
and future directions, including probability density functions, parallel
algorithms, petascale computing, and the time domain. We conclude that, so long
as one carefully selects an appropriate algorithm, and is guided by the
astronomical problem at hand, data mining can be very much the powerful tool,
and not the questionable black box.Comment: Published in IJMPD. 61 pages, uses ws-ijmpd.cls. Several extra
figures, some minor additions to the tex
Data Mining-based Fragmentation of XML Data Warehouses
With the multiplication of XML data sources, many XML data warehouse models
have been proposed to handle data heterogeneity and complexity in a way
relational data warehouses fail to achieve. However, XML-native database
systems currently suffer from limited performances, both in terms of manageable
data volume and response time. Fragmentation helps address both these issues.
Derived horizontal fragmentation is typically used in relational data
warehouses and can definitely be adapted to the XML context. However, the
number of fragments produced by classical algorithms is difficult to control.
In this paper, we propose the use of a k-means-based fragmentation approach
that allows to master the number of fragments through its parameter. We
experimentally compare its efficiency to classical derived horizontal
fragmentation algorithms adapted to XML data warehouses and show its
superiority
An introduction to Graph Data Management
A graph database is a database where the data structures for the schema
and/or instances are modeled as a (labeled)(directed) graph or generalizations
of it, and where querying is expressed by graph-oriented operations and type
constructors. In this article we present the basic notions of graph databases,
give an historical overview of its main development, and study the main current
systems that implement them
New Fundamental Technologies in Data Mining
The progress of data mining technology and large public popularity establish a need for a comprehensive text on the subject. The series of books entitled by "Data Mining" address the need by presenting in-depth description of novel mining algorithms and many useful applications. In addition to understanding each section deeply, the two books present useful hints and strategies to solving problems in the following chapters. The contributing authors have highlighted many future research directions that will foster multi-disciplinary collaborations and hence will lead to significant development in the field of data mining
Study on intrusion detecton using average matching degree space based on class association rule mining
制度:新 ; 報告番号:甲3767号 ; 学位の種類:博士(工学) ; 授与年月日:2013/1/28 ; 早大学位記番号:新6140Waseda Universit
- …