10,671 research outputs found

    Automating the multidimensional design of data warehouses

    Get PDF
    Les experiències prèvies en l'àmbit dels magatzems de dades (o data warehouse), mostren que l'esquema multidimensional del data warehouse ha de ser fruit d'un enfocament híbrid; això és, una proposta que consideri tant els requeriments d'usuari com les fonts de dades durant el procés de disseny.Com a qualsevol altre sistema, els requeriments són necessaris per garantir que el sistema desenvolupat satisfà les necessitats de l'usuari. A més, essent aquest un procés de reenginyeria, les fonts de dades s'han de tenir en compte per: (i) garantir que el magatzem de dades resultant pot ésser poblat amb dades de l'organització, i, a més, (ii) descobrir capacitats d'anàlisis no evidents o no conegudes per l'usuari.Actualment, a la literatura s'han presentat diversos mètodes per donar suport al procés de modelatge del magatzem de dades. No obstant això, les propostes basades en un anàlisi dels requeriments assumeixen que aquestos són exhaustius, i no consideren que pot haver-hi informació rellevant amagada a les fonts de dades. Contràriament, les propostes basades en un anàlisi exhaustiu de les fonts de dades maximitzen aquest enfocament, i proposen tot el coneixement multidimensional que es pot derivar des de les fonts de dades i, conseqüentment, generen massa resultats. En aquest escenari, l'automatització del disseny del magatzem de dades és essencial per evitar que tot el pes de la tasca recaigui en el dissenyador (d'aquesta forma, no hem de confiar únicament en la seva habilitat i coneixement per aplicar el mètode de disseny elegit). A més, l'automatització de la tasca allibera al dissenyador del sempre complex i costós anàlisi de les fonts de dades (que pot arribar a ser inviable per grans fonts de dades).Avui dia, els mètodes automatitzables analitzen en detall les fonts de dades i passen per alt els requeriments. En canvi, els mètodes basats en l'anàlisi dels requeriments no consideren l'automatització del procés, ja que treballen amb requeriments expressats en llenguatges d'alt nivell que un ordenador no pot manegar. Aquesta mateixa situació es dona en els mètodes híbrids actual, que proposen un enfocament seqüencial, on l'anàlisi de les dades es complementa amb l'anàlisi dels requeriments, ja que totes dues tasques pateixen els mateixos problemes que els enfocament purs.En aquesta tesi proposem dos mètodes per donar suport a la tasca de modelatge del magatzem de dades: MDBE (Multidimensional Design Based on Examples) and AMDO (Automating the Multidimensional Design from Ontologies). Totes dues consideren els requeriments i les fonts de dades per portar a terme la tasca de modelatge i a més, van ser pensades per superar les limitacions dels enfocaments actuals.1. MDBE segueix un enfocament clàssic, en el que els requeriments d'usuari són coneguts d'avantmà. Aquest mètode es beneficia del coneixement capturat a les fonts de dades, però guia el procés des dels requeriments i, conseqüentment, és capaç de treballar sobre fonts de dades semànticament pobres. És a dir, explotant el fet que amb uns requeriments de qualitat, podem superar els inconvenients de disposar de fonts de dades que no capturen apropiadament el nostre domini de treball.2. A diferència d'MDBE, AMDO assumeix un escenari on es disposa de fonts de dades semànticament riques. Per aquest motiu, dirigeix el procés de modelatge des de les fonts de dades, i empra els requeriments per donar forma i adaptar els resultats generats a les necessitats de l'usuari. En aquest context, a diferència de l'anterior, unes fonts de dades semànticament riques esmorteeixen el fet de no tenir clars els requeriments d'usuari d'avantmà.Cal notar que els nostres mètodes estableixen un marc de treball combinat que es pot emprar per decidir, donat un escenari concret, quin enfocament és més adient. Per exemple, no es pot seguir el mateix enfocament en un escenari on els requeriments són ben coneguts d'avantmà i en un escenari on aquestos encara no estan clars (un cas recorrent d'aquesta situació és quan l'usuari no té clares les capacitats d'anàlisi del seu propi sistema). De fet, disposar d'uns bons requeriments d'avantmà esmorteeix la necessitat de disposar de fonts de dades semànticament riques, mentre que a l'inversa, si disposem de fonts de dades que capturen adequadament el nostre domini de treball, els requeriments no són necessaris d'avantmà. Per aquests motius, en aquesta tesi aportem un marc de treball combinat que cobreix tots els possibles escenaris que podem trobar durant la tasca de modelatge del magatzem de dades.Previous experiences in the data warehouse field have shown that the data warehouse multidimensional conceptual schema must be derived from a hybrid approach: i.e., by considering both the end-user requirements and the data sources, as first-class citizens. Like in any other system, requirements guarantee that the system devised meets the end-user necessities. In addition, since the data warehouse design task is a reengineering process, it must consider the underlying data sources of the organization: (i) to guarantee that the data warehouse must be populated from data available within the organization, and (ii) to allow the end-user discover unknown additional analysis capabilities.Currently, several methods for supporting the data warehouse modeling task have been provided. However, they suffer from some significant drawbacks. In short, requirement-driven approaches assume that requirements are exhaustive (and therefore, do not consider the data sources to contain alternative interesting evidences of analysis), whereas data-driven approaches (i.e., those leading the design task from a thorough analysis of the data sources) rely on discovering as much multidimensional knowledge as possible from the data sources. As a consequence, data-driven approaches generate too many results, which mislead the user. Furthermore, the design task automation is essential in this scenario, as it removes the dependency on an expert's ability to properly apply the method chosen, and the need to analyze the data sources, which is a tedious and timeconsuming task (which can be unfeasible when working with large databases). In this sense, current automatable methods follow a data-driven approach, whereas current requirement-driven approaches overlook the process automation, since they tend to work with requirements at a high level of abstraction. Indeed, this scenario is repeated regarding data-driven and requirement-driven stages within current hybrid approaches, which suffer from the same drawbacks than pure data-driven or requirement-driven approaches.In this thesis we introduce two different approaches for automating the multidimensional design of the data warehouse: MDBE (Multidimensional Design Based on Examples) and AMDO (Automating the Multidimensional Design from Ontologies). Both approaches were devised to overcome the limitations from which current approaches suffer. Importantly, our approaches consider opposite initial assumptions, but both consider the end-user requirements and the data sources as first-class citizens.1. MDBE follows a classical approach, in which the end-user requirements are well-known beforehand. This approach benefits from the knowledge captured in the data sources, but guides the design task according to requirements and consequently, it is able to work and handle semantically poorer data sources. In other words, providing high-quality end-user requirements, we can guide the process from the knowledge they contain, and overcome the fact of disposing of bad quality (from a semantical point of view) data sources.2. AMDO, as counterpart, assumes a scenario in which the data sources available are semantically richer. Thus, the approach proposed is guided by a thorough analysis of the data sources, which is properly adapted to shape the output result according to the end-user requirements. In this context, disposing of high-quality data sources, we can overcome the fact of lacking of expressive end-user requirements.Importantly, our methods establish a combined and comprehensive framework that can be used to decide, according to the inputs provided in each scenario, which is the best approach to follow. For example, we cannot follow the same approach in a scenario where the end-user requirements are clear and well-known, and in a scenario in which the end-user requirements are not evident or cannot be easily elicited (e.g., this may happen when the users are not aware of the analysis capabilities of their own sources). Interestingly, the need to dispose of requirements beforehand is smoothed by the fact of having semantically rich data sources. In lack of that, requirements gain relevance to extract the multidimensional knowledge from the sources.So that, we claim to provide two approaches whose combination turns up to be exhaustive with regard to the scenarios discussed in the literaturePostprint (published version

    Learning k-Nearest Neighbors Classifier from Distributed Data

    Get PDF
    Most learning algorithms assume that all the relevant data are available on a single computer site. In the emerging networked environments learning tasks are encountering situations in which the relevant data exists in a number of geographically distributed databases that are connected by communication networks. These databases cannot be moved to other network sites due to security, size, privacy, or data-ownership considerations. In this paper we show how a k-nearest classifier algorithm can be adapted for distributed data situations. The objective of our algorithms is to achieve the learning objectives for any data distribution encountered across the network by exchanging local summaries among the participating nodes

    Effect of nano black rice husk ash on the chemical and physical properties of porous concrete pavement

    Get PDF
    Black rice husk is a waste from this agriculture industry. It has been found that majority inorganic element in rice husk is silica. In this study, the effect of Nano from black rice husk ash (BRHA) on the chemical and physical properties of concrete pavement was investigated. The BRHA produced from uncontrolled burning at rice factory was taken. It was then been ground using laboratory mill with steel balls and steel rods. Four different grinding grades of BRHA were examined. A rice husk ash dosage of 10% by weight of binder was used throughout the experiments. The chemical and physical properties of the Nano BRHA mixtures were evaluated using fineness test, X-ray Fluorescence spectrometer (XRF) and X-ray diffraction (XRD). In addition, the compressive strength test was used to evaluate the performance of porous concrete pavement. Generally, the results show that the optimum grinding time was 63 hours. The result also indicated that the use of Nano black rice husk ash ground for 63hours produced concrete with good strengt

    Representing and reasoning with qualitative preferences for compositional systems

    Get PDF
    Many applications call for techniques for representing and reasoning about preferences, i.e., relative desirability over a set of alternatives. Preferences over the alternatives are typically derived from preferences with respect to the various attributes of the alternatives (e.g., a student\u27s preference for one course over another may be influenced by his preference for the topic, the time of the day when the course is offered, etc.). Such preferences are often qualitative and conditional. When the alternatives are expressed as tuples of valuations of the relevant attributes, preferences between alternatives can often be expressed in the form of (a) preferences over the values of each attribute, and (b) relative importance of certain attributes over others. An important problem in reasoning with multi-attribute qualitative preferences is dominance testing, i.e., to find if one alternative (assignment to all attributes) is preferred over another. This problem is hard (PSPACE-complete) in general for well known qualitative conditional preference languages such as TCP-nets. We provide two practical approaches to dominance testing. First, we study a restricted unconditional preference language, and provide a dominance relation that can be computed in polynomial time by evaluating the satisfiability of an appropriately constructed logic formula. Second, we show how to reduce dominance testing for TCP-nets to reachability analysis in an induced preference graph. We provide an encoding of TCP-nets in the form of a Kripke structure for CTL. We show how to compute dominance using NuSMV, a model checker for CTL. We address the problem of identifying a preferred outcome in a setting where the outcomes or alternatives to be compared are composite in nature (i.e., collections of components that satisfy certain functional requirements). We define a dominance relation that allows us to compare collections of objects in terms of preferences over attributes of the objects that make up the collection, and show that the dominance relation is a strict partial order under certain conditions. We provide algorithms that use this dominance relation to identify only (sound), all (complete), or at least one (weakly complete) of the most preferred collections. We establish some key properties of the dominance relation and analyze the quality of solutions produced by the algorithms. We present results of simulation experiments aimed at comparing the algorithms, and report interesting conjectures and results that were derived from our analysis. Finally, we show how the above formalism and algorithms can be used in preference-based service composition, substitution, and adaptation

    Semantics-aware planning methodology for automatic web service composition

    Get PDF
    Service-Oriented Computing (SOC) has been a major research topic in the past years. It is based on the idea of composing distributed applications even in heterogeneous environments by discovering and invoking network-available Web Services to accomplish some complex tasks when no existing service can satisfy the user request. Service-Oriented Architecture (SOA) is a key design principle to facilitate building of these autonomous, platform-independent Web Services. However, in distributed environments, the use of services without considering their underlying semantics, either functional semantics or quality guarantees can negatively affect a composition process by raising intermittent failures or leading to slow performance. More recently, Artificial Intelligence (AI) Planning technologies have been exploited to facilitate the automated composition. But most of the AI planning based algorithms do not scale well when the number of Web Services increases, and there is no guarantee that a solution for a composition problem will be found even if it exists. AI Planning Graph tries to address various limitations in traditional AI planning by providing a unique search space in a directed layered graph. However, the existing AI Planning Graph algorithm only focuses on finding complete solutions without taking account of other services which are not achieving the goals. It will result in the failure of creating such a graph in the case that many services are available, despite most of them being irrelevant to the goals. This dissertation puts forward a concept of building a more intelligent planning mechanism which should be a combination of semantics-aware service selection and a goal-directed planning algorithm. Based on this concept, a new planning system so-called Semantics Enhanced web service Mining (SEwsMining) has been developed. Semantic-aware service selection is achieved by calculating on-demand multi-attributes semantics similarity based on semantic annotations (QWSMO-Lite). The planning algorithm is a substantial revision of the AI GraphPlan algorithm. To reduce the size of planning graph, a bi-directional planning strategy has been developed

    Extraction of Belief Knowledge from a Relational Database for Quantitative Bayesian Network Inference

    Get PDF
    The problem of extracting knowledge from a relational database for probabilistic reasoning is still unsolved. On the basis of a three-phase learning framework, we propose the integration of a Bayesian network (BN) with the functional dependency (FD) discovery technique. Association rule analysis is employed to discover FDs and expert knowledge encoded within a BN; that is, key relationships between attributes are emphasized. Moreover, the BN can be updated by using an expert-driven annotation process wherein redundant nodes and edges are removed. Experimental results show the effectiveness and efficiency of the proposed approach
    • …
    corecore