13 research outputs found

    Modelling parallel database management systems for performance prediction

    Get PDF
    Abstract unavailable please refer to PD

    Process algebra approach to parallel DBMS performance modelling

    Get PDF
    Abstract unavailable please refer to PD

    A bottom-up process management environment dedicated to process actors

    Get PDF
    Les organisations adoptent de plus en plus les environnements de gestion des processus car ils offrent des perspectives prometteuses d'exécution en termes de flexibilité et d'efficacité. Les environnements traditionnels proposent cependant une approche descendante qui nécessite, de la part de concepteurs, l'élaboration d'un modèle avant sa mise en oeuvre par les acteurs qui le déploient tout au long du cycle d'ingénierie. En raison de cette divergence, un différentiel important est souvent constaté entre les modèles de processus et leur mise en oeuvre. De par l'absence de prise directe avec les acteurs de terrain, le niveau opérationnel des environnements de processus est trop faiblement exploité, en particulier en ingénierie des systèmes et des logiciels. Afin de faciliter l'utilisation des environnements de processus, cette thèse présente une approche ascendante mettant les acteurs du processus au coeur de la problématique. L'approche proposée autorise conjointement la modélisation et la mise en oeuvre de leurs activités quotidiennes. Dans cet objectif, notre approche s'appuie sur la description des artéfacts produits et consommés durant l'exécution d'une activité. Cette description permet à chaque acteur du processus de décrire le fragment de processus exprimant les activités dictées par son rôle. Le processus global se décompose ainsi en plusieurs fragments appartenant à différents rôles. Chaque fragment est modélisé indépendamment des autres fragments ; il peut aussi être greffé progressivement au modèle de processus initial. La modélisation des processus devient ainsi moins complexe et plus parcellaire. En outre, un fragment de processus ne modélise que l'aspect structurel des activités d'un rôle sans anticiper sur le comportement des activités ; il est moins prescriptif qu'un ordonnancement des activités de l'acteur. Un moteur de processus basé sur la production et la consommation d'artéfacts a été développé pour promulguer des activités provenant de différents fragments de processus. Ce moteur ne requiert pas de relations prédéfinies d'ordonnancement entre les activités pour les synchroniser, mais déduit leur dépendance à partir de leurs artéfacts échangés. Les dépendances sont représentées et actualisées au sein d'un graphe appelé Process Dependency Graph (PDG) qui reflète à tout instant l'état courant de l'exécution du processus. Cet environnement a été étendu afin de gérer les changements imprévus qui se produisent inévitablement lors de la mise en oeuvre des processus. Ce dispositif permet aux acteurs de signaler des changements émergents, d'analyser les impacts possibles et de notifier les personnes affectées par les modifications. En résumé, notre approche préconise de répartir les tâches d'un processus en plusieurs fragments, modélisés et adoptés séparément par les acteurs du processus. Le moteur de processus, qui s'appuie sur la disponibilité des artéfacts pour synchroniser les activités, permet d'exécuter indépendamment les fragments des processus. Il permet aussi l'exécution d'un processus partiellement défini pour lequel certains fragments seraient manquants. La vision globale de l'état d'avancement des différents acteurs concernés émerge au fur et à mesure de l'exécution des fragments. Cette nouvelle approche vise à intégrer au mieux les acteurs du processus dans le cycle de vie de la gestion des processus, ce qui rend ces systèmes plus attractifs et plus proches de leurs préoccupations.Companies increasingly adopt process management environments, which offer promising perspectives for a more flexible and efficient process execution. Traditional process management environments embodies a top-down approach in which process modeling is performed by process designers and process enacting is performed by process actors. Due to this separation, there is often a gap between process models and their real enactments. As a consequence, the operational level of top down process environments has stayed low, especially in system and software industry, because they are not directly relevant to process actors' needs. In order to facilitate the usage of process environments for process actors, this thesis presents a user-centric and bottom-up approach that enables integration of process actors into process management life cycle by allowing them to perform both the modeling and enacting of their real processes. To this end, first, a bottom-up approach based on the artifact-centric modeling paradigm was proposed to allow each process actor to easily describe the process fragment containing the activities carried out by his role. The global process is thus decomposed into several fragments belonging to different roles. Each fragment can be modeled independently of other fragments and can be added progressively to the process model; therefore the process modeling becomes less complex and more partial. Moreover, a process fragment models only the structural aspect of a role's activities without anticipating the behavior of these activities; therefore the process model is less prescriptive. Second, a data-driven process engine was developed to enact activities coming from different process fragments. Our process engine does not require predefined work-sequence relations among these activities to synchronize them, but deduces such dependencies from their enactment-time exchanged artifacts. We used a graph structure name Process Dependency Graph (PDG) to store enactment-time process information and establish the dependencies among process elements. Third, we extend our process environment in order to handle unforeseen changes occurring during process enactment. This results in a Change-Aware Process Environment that allows process actors reporting emergent changes, analyzing possible impacts and notifying people affected by the changes. In our bottom-up approach, a process is split into several fragments separately modeled and enacted by process actors. Our data-driven process engine, which uses the availability of working artifacts to synchronize activities, enables enacting independently process fragments, and even a partially modeled process where some fragments are missing. The global process progressively emerges only at enactment time from the execution of process fragments. This new approach, with its simpler modeling and more flexible enactment, integrates better process actors into process management life cycle, and hence makes process management systems more attractive and useful for them

    Técnicas de particionamiento multidimensional basadas en índices multiatributo en bases de datos paralelas

    Get PDF
    Los requerimientos cada día más exigentes de modernas aplicaciones de bases de datos, tales como GIS, CAD, CASE y otras, imponen la necesidad de encontrar nuevas vías de solución al problema del tratamiento de grandes volúmenes de información. La potencia de procesamiento de computadores paralelos económicamente abordables, ha atraído la atención de una gran comunidad de investigadores y técnicos que encuentran en los sistemas paralelos de bases de datos la respuesta eficiente a las exigencias de nuevas aplicaciones. Específicamente, la tecnología del paralelismo resulta una atractiva vía de solución a la problemática tradicional del cuello de botella que representan las operaciones de entrada/salida. Con objeto de minimizar el tiempo de respuesta a una consulta, los sistemas de bases de datos paralelas particionan los datos entre un conjunto de dispositivos de almacenamiento, favoreciendo el acceso en paralelo a los mismos y permitiendo, en definitiva la participación concurrente de varios procesadores en la ejecución de una consulta. Habitualmente, el particionamiento de las relaciones se efectúa por un sólo atributo, enviando las tupias a distintos dispositivos dependiendo del valor de dicha tupia sobre el atributo de particionamiento. Esta forma de fragmentar los datos resulta adecuada cuando el predicado de la consulta incluye el atributo de particionamiento. Sin embargo, en aquellos casos en que esto no sea así, la consulta debe ser dirigida hacia todos los nodos de procesamiento encargados de gestionar algún fragmento de la relación o relaciones implicadas en la consulta. Este modo de proceder afecta negativamente no sólo al tiempo de ejecución de la consulta, sino también al throughput del sistema. En la tesis que se presenta, se proponen modelos de particionamiento multidimensional, basados en la consideración de múltiples atributos. Básicamente, la técnica propuesta consiste en realizar un particionamiento por múltiples dimensiones del espacio de tupias, enviando posteriormente los diferentes fragmentos en que queda dividido este espacio a un determinado número de discos del sistema. Por su parte, la fragmentación del espacio de tupias se realiza equilibradamente por medio de un nuevo mecanismo de indexación multiatributo, conocido bajo el nombre de árbol Q. En el desarrollo de esta memoria de tesis, se exponen las ideas que han conducido al establecimiento del árbol Q; se definen con detalle las estructuras y algoritmos de manipulación del árbol Q; se presentan diversas estrategias de particionamiento basadas en esta estructura y se exhiben los resultados de rendimiento de las diferentes propuestas, basados en los trabajos de implementación realizados durante la fase de ejecución de esta tesis. Abstract The demanding requirements of modern datábase applications, such as GIS, CAD, CASE and others, claim for new solutions to the problem of managing large quantity of information. The processing power of inexpensive parallel computers has focussed the attention of many searchers who find in such computer systems the answer to the demands of these new applications. Specifically, the parallelism technology seems an attractive via to solve the traditional bottlelneck found in input/output operations. With the goal of minimizing the response time of a query, the parallel datábase systems decluster data among a number of storage devices, by favouring the access in parallel to data and by permitting the contribution of several processors in the execution of a query. Frequently, the partitioning of relations is made by a single attribute, sending tupies to different disks by depending on the valué of the tupie on the partitioning attribute. This way to fragment data is useful when the partitioning attribute is involved in the predícate of the query. However, in those situations where it is not the case, the query must be directed to every processing node which is in charge of some fragment of the relation or relations involved in the query. This approach affects negatively to both, the response time of the query and the throughput of the system. In the thesis we present, a multidimensional partitioning model is proposed. In short, the proposed technique partitions, on the base of múltiple attributes, the tupie space by sending the different fragments of the space to a specific number of disks in the system. By its hand, the tupie space partitioning is made in a balanced way by means of a new multi-attribute indexing method, called the Q-tree. In this thesis dissertation, we present the ideas which have guided the stablishment of the Q-tree. In addition, we define the structures and algorithms for manipulating the Q-tree, we introduce several partitioning strategies based on this structure and, finally, we include the performance results of the different proposals, based on the implementation tasks carried out during the execution of this doctoral thesis

    Studies related to the process of program development

    Get PDF
    The submitted work consists of a collection of publications arising from research carried out at Rhodes University (1970-1980) and at Heriot-Watt University (1980-1992). The theme of this research is the process of program development, i.e. the process of creating a computer program to solve some particular problem. The papers presented cover a number of different topics which relate to this process, viz. (a) Programming methodology programming. (b) Properties of programming languages. aspects of structured. (c) Formal specification of programming languages. (d) Compiler techniques. (e) Declarative programming languages. (f) Program development aids. (g) Automatic program generation. (h) Databases. (i) Algorithms and applications

    Data quality and data cleaning in database applications

    Get PDF
    Today, data plays an important role in people's daily activities. With the help of some database applications such as decision support systems and customer relationship management systems (CRM), useful information or knowledge could be derived from large quantities of data. However, investigations show that many such applications fail to work successfully. There are many reasons to cause the failure, such as poor system infrastructure design or query performance. But nothing is more certain to yield failure than lack of concern for the issue of data quality. High quality of data is a key to today's business success. The quality of any large real world data set depends on a number of factors among which the source of the data is often the crucial factor. It has now been recognized that an inordinate proportion of data in most data sources is dirty. Obviously, a database application with a high proportion of dirty data is not reliable for the purpose of data mining or deriving business intelligence and the quality of decisions made on the basis of such business intelligence is also unreliable. In order to ensure high quality of data, enterprises need to have a process, methodologies and resources to monitor and analyze the quality of data, methodologies for preventing and/or detecting and repairing dirty data. This thesis is focusing on the improvement of data quality in database applications with the help of current data cleaning methods. It provides a systematic and comparative description of the research issues related to the improvement of the quality of data, and has addressed a number of research issues related to data cleaning. In the first part of the thesis, related literature of data cleaning and data quality are reviewed and discussed. Building on this research, a rule-based taxonomy of dirty data is proposed in the second part of the thesis. The proposed taxonomy not only summarizes the most dirty data types but is the basis on which the proposed method for solving the Dirty Data Selection (DDS) problem during the data cleaning process was developed. This helps us to design the DDS process in the proposed data cleaning framework described in the third part of the thesis. This framework retains the most appealing characteristics of existing data cleaning approaches, and improves the efficiency and effectiveness of data cleaning as well as the degree of automation during the data cleaning process. Finally, a set of approximate string matching algorithms are studied and experimental work has been undertaken. Approximate string matching is an important part in many data cleaning approaches which has been well studied for many years. The experimental work in the thesis confirmed the statement that there is no clear best technique. It shows that the characteristics of data such as the size of a dataset, the error rate in a dataset, the type of strings in a dataset and even the type of typo in a string will have significant effect on the performance of the selected techniques. In addition, the characteristics of data also have effect on the selection of suitable threshold values for the selected matching algorithms. The achievements based on these experimental results provide the fundamental improvement in the design of 'algorithm selection mechanism' in the data cleaning framework, which enhances the performance of data cleaning system in database applications.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Data quality and data cleaning in database applications

    Get PDF
    Today, data plays an important role in people’s daily activities. With the help of some database applications such as decision support systems and customer relationship management systems (CRM), useful information or knowledge could be derived from large quantities of data. However, investigations show that many such applications fail to work successfully. There are many reasons to cause the failure, such as poor system infrastructure design or query performance. But nothing is more certain to yield failure than lack of concern for the issue of data quality. High quality of data is a key to today’s business success. The quality of any large real world data set depends on a number of factors among which the source of the data is often the crucial factor. It has now been recognized that an inordinate proportion of data in most data sources is dirty. Obviously, a database application with a high proportion of dirty data is not reliable for the purpose of data mining or deriving business intelligence and the quality of decisions made on the basis of such business intelligence is also unreliable. In order to ensure high quality of data, enterprises need to have a process, methodologies and resources to monitor and analyze the quality of data, methodologies for preventing and/or detecting and repairing dirty data. This thesis is focusing on the improvement of data quality in database applications with the help of current data cleaning methods. It provides a systematic and comparative description of the research issues related to the improvement of the quality of data, and has addressed a number of research issues related to data cleaning.In the first part of the thesis, related literature of data cleaning and data quality are reviewed and discussed. Building on this research, a rule-based taxonomy of dirty data is proposed in the second part of the thesis. The proposed taxonomy not only summarizes the most dirty data types but is the basis on which the proposed method for solving the Dirty Data Selection (DDS) problem during the data cleaning process was developed. This helps us to design the DDS process in the proposed data cleaning framework described in the third part of the thesis. This framework retains the most appealing characteristics of existing data cleaning approaches, and improves the efficiency and effectiveness of data cleaning as well as the degree of automation during the data cleaning process.Finally, a set of approximate string matching algorithms are studied and experimental work has been undertaken. Approximate string matching is an important part in many data cleaning approaches which has been well studied for many years. The experimental work in the thesis confirmed the statement that there is no clear best technique. It shows that the characteristics of data such as the size of a dataset, the error rate in a dataset, the type of strings in a dataset and even the type of typo in a string will have significant effect on the performance of the selected techniques. In addition, the characteristics of data also have effect on the selection of suitable threshold values for the selected matching algorithms. The achievements based on these experimental results provide the fundamental improvement in the design of ‘algorithm selection mechanism’ in the data cleaning framework, which enhances the performance of data cleaning system in database applications