5 research outputs found

    Mineração de dados em sistemas OLAP

    Get PDF
    Dissertação de mestrado em Engenharia InformáticaAs diversas vantagens que os data warehouses têm proporcionado no que toca ao armazenamento e processamento de informação levaram a uma subida substancial na aquisição deste tipo de estruturas por parte das organizações. De facto, os data warehouses são caracterizados por um modelo de dados que permite, entre várias opções, realizar pesquisas complexas, selecionar conjuntos de dados de maior interesse, executar operações de sintetização, fazer comparações de dados e proporcionar diferentes visualizações dos dados. No entanto, a sua complexidade acarreta diversos custos, nomeadamente custos de computação e de materialização. Por um lado, a pré-computação de um cubo a partir de um data warehouse proporciona tempos de resposta reduzidos às pesquisas realizadas, mas, por outro lado, isso causa problemas no que toca à quantidade de espaço de armazenamento necessário. As técnicas de mineração de dados, nomeadamente aquelas que consideram os algoritmos de mineração de regras de associação, permitem encontrar conjuntos de itens frequentes entre os dados, permitindo, consequentemente, definir um conjunto de preferências de exploração ou de utilização. O estudo de preferências OLAP apresentado nesta dissertação visa identificar os dados mais acedidos por parte dos utilizadores, de forma a ser possível chegar a um consenso sobre quais as partes de um cubo que não são necessárias materializar, uma vez que não são utilizadas em processos de análise, mantendo tempos de resposta das pesquisas aceitáveis e reduzindo significativamente a quantidade de memória utilizada.The many benefits provided by data warehouses, in particular regarding to storage and data processing, have led to a substantial growth of the data warehousing market and in the number of organizations who adopted these systems. In fact, the data model of this type of structures allows the user to perform a large number of different operations: complex queries, find the most interesting information, aggregate and compare different values, and to provide an interactive data visualization. However, its complexity brings some computation and materialization costs. The pre-computation of the all data cube can provide a precise and fast response to analytical queries, but it requires an enormous quantity of space to storage all materialized views. The application of data mining techniques, such as algorithms for mining association rules, allows the discovery of frequent items among data and, consequently, the definition of OLAP preferences. The study of OLAP preferences presented in this dissertation aims to identify the most accessed parts in a data cube and to define which parts should be materialized. With the identification and materialization only of the important parts for the analysis, it is possible to preserve a satisfactory query response time, achieving a significant reduction of memory costs

    Poslovna inteligencija u funkciji autorizovanog modela sistema za učenje na daljinu

    Get PDF
    Ideja da svo znanje ovog sveta može biti smešteno na jednom mestu, stara je više hiljada godina. Zvuči kao neka fantazija, ali to je stvarnost koja se već dogodila u Aleksandriji. Viziju te moćne ideje imao je Aleksandar Veliki. Može se reći da je Aleksandrija grad sagrađen od sna. To je i grad gde je Aleksandar Veliki sahranjen, grad gde je Kleopatra zavela Marka Antonija i Cezara, konačno taj grad je bio dom jednom od sedam svetskih čuda antičkog sveta. Međutim, bezuslovna ambicija Aleksandra Velikog da Aleksandrija postane najmoćniji grad na svetu, sprovedena je tako što je svo znanje ovog sveta bilo zarobljeno među zidovima Aleksandrijske biblioteke [Hughes, 2010]. Samo je potvrđeno: znanje je moć. Možda smo samo malo doradili ideju Aleksandra Velikog i dobili pojam Poslovna inteligencija u čijoj osnovi leži skladištenje podataka i otkrivanje znanja u tim podacima, a savremena, današnja Aleksandrijska biblioteka je naš Web. U vremenu koje karakterišu nepredvidive promene, posebno u oblasti informaciono komunikacionih tehnologija od izuzetnog je značaja razumevanje važnosti znanja. Za uspešan i razvijen svet znanje je jedina trajna vrednost. Onaj ko ga poseduje ima velike šanse za uspeh. Onaj ko ga nema, nema se čemu nadati. Kao i sve što vredi, znanje košta: truda, vremena i novaca. Mora se graditi svakodnevno i po određenim pravilima, ali pre svega mora mu se krenuti u susret. Shodno tome, u ovom radu razmatrana je primena poslovne inteligencije i definisanje analitičkog modela autorizovanog sistema za učenje na daljinu, kako bi se realizovala potreba za analizama podataka unutar DLS platforme. Svi podaci koji su korišćeni pri OLAP (On-line Analytical Processing) i EDM (Educational Data Mining) analizama prikupljeni su delimično pomoću autorizovanog sistema za učenja na daljinu (tj. dinamičke interaktivne DLS Web aplikacije, nazvane DLS platforma) i iz dokumenata u papirnatom izdanju (poput dnevnika, matičnih knjiga učenika). Metodologija poslovne inteligencije je postojeću transakcionu DLS bazu podataka, delimično pomoću procesa ETL (Extract, Transform and Load), prevela u analitičku DLS bazu podataka, tj. definasan je model skladišta podataka (Data Warehouse, DW) autorizovanog sistema za učenja na daljinu koji je omogućio OLAP i EDM analize, sa ciljem da se unapredi nastavni proces i ostvari što kvalitetnije usvajanje znanja učenika srednje tehničke škole u Srbiji, kao glavnih korisnika DLS platforme. U ovom radu sprovedene su sledeće analize: analiza uspeha školovanja, analiza urađenih resursa za učenje, analiza ocena učenika i ocena predmeta, analiza vrednovanja Distance Learning (DL) obrazovanja. Primenom koncepta poslovne inteligencije, dobijeni rezultati analiza ukazuju na to da je moguće pravovremenom intervencijom doći do znanja neophodnog za donošenje ispravnih odluka i samim tim sprovođenja niza akcija koje bi poboljšale uspeh učenika.The idea that all the knowledge of the world can be stored in one place, is thousands of years old. It sounds like a fantasy, but it is a reality that has already happened in Alexandria. It was Alexander the Great who had the vision of this powerful idea. One can say that Alexandria was a city built on dreams. It is also the city where Alexander the Great was buried, the city where Cleopatra seduced Marcus Antonius and Caesar, and finally this city was home to one of the seven wonders of the ancient world. However, unconditional ambition of Alexander the Great for Alexandria to become the most powerful city in the world, was implemented so that all the knowledge of the world was to be captured within the walls of the Library of Alexandria [Hughes, 2010]. This only confirms: knowledge is power. We may have only slightly improved the idea of Alexander the Great and came up with the concept of Business Intelligence at which basis lies data warehousing and knowledge discovery from this data, so the modern, Alexandrian library of today is our Web. In the time that is characterized by unpredictable changes, especially in the field of information and communication technologies it is of great importance to understand the importance of knowledge. In a successful and developed world, knowledge is the only permanent value. The one who possesses it has a good chance of success. The one who does not, has nothing to hope for. Like everything of value, knowledge has a price: effort, time and money. It must be built up on a daily basis and according to certain rules, but above all you must go for it. Consequently, this paper considers the application of business intelligence and analytical model definition of an authorized distance learning system, in order to realize the need for data analysis within the DLS platform. All data used in OLAP (Online Analytical Processing) and EDM (Educational Data Mining) was collected through analyses in part by an authorized distance learning system (i.e., dynamic interactive DLS Web applications, called DLS platform) and from paper documents (such as logs, student registry books). The methodology of business intelligence has translated an existing transactional database, partly through the process of ETL (Extract, Transform and Load), into an analytical database, i.e. it defined the data warehouse model (Data Warehouse, DW) of an authorized distance learning system, which enabled OLAP and EDM analysis, with the goal to improve the teaching process and achieve the best possible learning skills of students in secondary technical schools in Serbia, as major users of the DLS platform. In this paper the following analyses were conducted: analysis of the education success, analysis of the implemented learning resources, analysis of students grades and an evaluation of subjects, analysis of the evaluation of Distance Learning (DL) education. By applying the concept of business intelligence, the results obtained from these analyses indicate that it is possible, with a timely intervention, to aquire the knowledge necessary to make proper decisions and therefore the implementation of a series of actions that would improve student success

    Semantic metadata for supporting exploratory OLAP

    Get PDF
    Cotutela Universitat Politècnica de Catalunya i Aalborg UniversitetOn-Line Analytical Processing (OLAP) is an approach widely used for data analysis. OLAP is based on the multidimensional (MD) data model where factual data are related to their analytical perspectives called dimensions and together they form an n-dimensional data space referred to as data cube. MD data are typically stored in a data warehouse, which integrates data from in-house data sources, and then analyzed by means of OLAP operations, e.g., sales data can be (dis)aggregated along the location dimension. As OLAP proved to be quite intuitive, it became broadly accepted by non-technical and business users. However, as users still encountered difficulties in their analysis, different approaches focused on providing user assistance. These approaches collect situational metadata about users and their actions and provide suggestions and recommendations that can help users' analysis. However, although extensively exploited and evidently needed, little attention is paid to metadata in this context. Furthermore, new emerging tendencies call for expanding the use of OLAP to consider external data sources and heterogeneous settings. This leads to the Exploratory OLAP approach that especially argues for the use of Semantic Web (SW) technologies to facilitate the description and integration of external sources. With data becoming publicly available on the (Semantic) Web, the number and diversity of non-technical users are also significantly increasing. Thus, the metadata to support their analysis become even more relevant. This PhD thesis focuses on metadata for supporting Exploratory OLAP. The study explores the kinds of metadata artifacts used for the user assistance purposes and how they are exploited to provide assistance. Based on these findings, the study then aims at providing theoretical and practical means such as models, algorithms, and tools to address the gaps and challenges identified. First, based on a survey of existing user assistance approaches related to OLAP, the thesis proposes the analytical metadata (AM) framework. The framework includes the definition of the assistance process, the AM artifacts that are classified in a taxonomy, and the artifacts organization and related types of processing to support the user assistance. Second, the thesis proposes a semantic metamodel for AM. Hence, Resource Description Framework (RDF) is used to represent the AM artifacts in a flexible and re-usable manner, while the metamodeling abstraction level is used to overcome the heterogeneity of (meta)data models in the Exploratory OLAP context. Third, focusing on the schema as a fundamental metadata artifact for enabling OLAP, the thesis addresses some important challenges on constructing an MD schema on the SW using RDF. It provides the algorithms, method, and tool to construct an MD schema over statistical linked open data sets. Especially, the focus is on enabling that even non-technical users can perform this task. Lastly, the thesis deals with queries as the second most relevant artifact for user assistance. In the spirit of Exploratory OLAP, the thesis proposes an RDF-based model for OLAP queries created by instantiating the previously proposed metamodel. This model supports the sharing and reuse of queries across the SW and facilitates the metadata preparation for the assistance exploitation purposes. Finally, the results of this thesis provide metadata foundations for supporting Exploratory OLAP and advocate for greater attention to the modeling and use of semantics related to metadata.El processament analític en línia (OLAP) és una tècnica àmpliament utilitzada per a l'anàlisi de dades. OLAP es basa en el model multi-dimensional (MD) de dades, on dades factuals es relacionen amb les seves perspectives analítiques, anomenades dimensions, i conjuntament formen un espai de dades n-dimensional anomenat cub de dades. Les dades MD s'emmagatzemen típicament en un data warehouse (magatzem de dades), el qual integra dades de fonts internes, les quals posteriorment s'analitzen mitjançant operacions OLAP, per exemple, dades de vendes poden ser (des)agregades a partir de la dimensió ubicació. Un cop OLAP va ser provat com a intuïtiu, va ser ampliament acceptat tant per usuaris no tècnics com de negoci. Tanmateix, donat que els usuaris encara trobaven dificultats per a realitzar el seu anàlisi, diferents tècniques s'han enfocat en la seva assistència. Aquestes tècniques recullen metadades situacionals sobre els usuaris i les seves accions, i proporcionen suggerències i recomanacions per tal d'ajudar en aquest anàlisi. Tot i ésser extensivament emprades i necessàries, poca atenció s'ha prestat a les metadades en aquest context. A més a més, les noves tendències demanden l'expansió d'ús d'OLAP per tal de considerar fonts de dades externes en escenaris heterogenis. Això ens porta a la tècnica d'OLAP exploratori, la qual es basa en l'ús de tecnologies en la web semàntica (SW) per tal de facilitar la descripció i integració d'aquestes fonts externes. Amb les dades essent públicament disponibles a la web (semàntica), el nombre i diversitat d'usuaris no tècnics també incrementa signifícativament. Així doncs, les metadades per suportar el seu anàlisi esdevenen més rellevants. Aquesta tesi doctoral s'enfoca en l'ús de metadades per suportar OLAP exploratori. L'estudi explora els tipus d'artefactes de metadades utilitzats per l'assistència a l'usuari, i com aquests són explotats per proporcionar assistència. Basat en aquestes troballes, l'estudi preté proporcionar mitjans teòrics i pràctics, com models, algorismes i eines, per abordar els reptes identificats. Primerament, basant-se en un estudi de tècniques per assistència a l'usuari en OLAP, la tesi proposa el marc de treball de metadades analítiques (AM). Aquest marc inclou la definició del procés d'assistència, on els artefactes d'AM són classificats en una taxonomia, i l'organització dels artefactes i tipus relacionats de processament pel suport d'assistència a l'usuari. En segon lloc, la tesi proposa un meta-model semàntic per AM. Així doncs, s'utilitza el Resource Description Framework (RDF) per representar els artefactes d'AM d'una forma flexible i reusable, mentre que el nivell d'abstracció de metamodel s'utilitza per superar l'heterogeneïtat dels models de (meta)dades en un context d'OLAP exploratori. En tercer lloc, centrant-se en l'esquema com a artefacte fonamental de metadades per a OLAP, la tesi adreça reptes importants en la construcció d'un esquema MD en la SW usant RDF. Proporciona els algorismes, mètodes i eines per construir un esquema MD sobre conjunts de dades estadístics oberts i relacionats. Especialment, el focus rau en permetre que usuaris no tècnics puguin realitzar aquesta tasca. Finalment, la tesi tracta amb consultes com el segon artefacte més rellevant per l'assistència a usuari. En l'esperit d'OLAP exploratori, la tesi proposa un model basat en RDF per consultes OLAP instanciant el meta-model prèviament proposat. Aquest model suporta el compartiment i reutilització de consultes sobre la SW i facilita la preparació de metadades per l'explotació de l'assistència. Finalment, els resultats d'aquesta tesi proporcionen els fonaments en metadades per suportar l'OLAP exploratori i propugnen la major atenció al model i ús de semàntica relacionada a metadades.On-Line Analytical Processing (OLAP) er en bredt anvendt tilgang til dataanalyse. OLAP er baseret på den multidimensionelle (MD) datamodel, hvor faktuelle data relateres til analytiske synsvinkler, såkaldte dimensioner. Tilsammen danner de et n-dimensionelt rum af data kaldet en data cube. Multidimensionelle data er typisk lagret i et data warehouse, der integrerer data fra forskellige interne datakilder, og kan analyseres ved hjælp af OLAPoperationer. For eksempel kan salgsdata disaggregeres langs sted-dimensionen. OLAP har vist sig at være intuitiv at forstå og er blevet taget i brug af ikketekniske og orretningsorienterede brugere. Nye tilgange er siden blevet udviklet i forsøget på at afhjælpe de problemer, som denne slags brugere dog stadig står over for. Disse tilgange indsamler metadata om brugerne og deres handlinger og kommer efterfølgende med forslag og anbefalinger, der kan bidrage til brugernes analyse. På trods af at der er en klar nytteværdi i metadata (givet deres udbredelse), har stadig ikke været meget opmærksomhed på metadata i denne kotekst. Desuden lægger nye fremspirende teknikker nu op til en udvidelse af brugen af OLAP til også at bruge eksterne og uensartede datakilder. Dette har ført til Exploratory OLAP, en tilgang til OLAP, der benytter teknologier fra Semantic Web til at understøtte beskrivelse og integration af eksterne kilder. Efterhånden som mere data gøres offentligt tilgængeligt via Semantic Web, kommer flere og mere forskelligartede ikketekniske brugere også til. Derfor er metadata til understøttelsen af deres dataanalyser endnu mere relevant. Denne ph.d.-afhandling omhandler metadata, der understøtter Exploratory OLAP. Der foretages en undersøgelse af de former for metadata, der benyttes til at hjælpe brugere, og af, hvordan sådanne metadata kan udnyttes. Med grundlag i disse fund søges der løsninger til de identificerede problemer igennem teoretiske såvel som praktiske midler. Det vil sige modeller, algoritmer og værktøjer. På baggrund af en afdækning af eksisterende tilgange til brugerassistance i forbindelse med OLAP præsenteres først rammeværket Analytical Metadata (AM). Det inkluderer definition af assistanceprocessen, en taksonomi over tilhørende artefakter og endelig relaterede processeringsformer til brugerunderstøttelsen. Dernæst præsenteres en semantisk metamodel for AM. Der benyttes Resource Description Framework (RDF) til at repræsentere AM-artefakterne på en genbrugelig og fleksibel facon, mens metamodellens abstraktionsniveau har til formål at nedbringe uensartetheden af (meta)data i Exploratory OLAPs kontekst. Så fokuseres der på skemaet som en fundamental metadata-artefakt i OLAP, og afhandlingen tager fat i vigtige udfordringer i forbindelse med konstruktionen af multidimensionelle skemaer i Semantic Web ved brug af RDF. Der præsenteres algoritmer, metoder og redskaber til at konstruere disse skemaer sammenkoblede åbne statistiske datasæt. Der lægges særlig vægt på, at denne proces skal kunne udføres af ikke-tekniske brugere. Til slut tager afhandlingen fat i forespørgsler som anden vigtig artefakt inden for bruger-assistance. I samme ånd som Exploratory OLAP foreslås en RDF-baseret model for OLAP-forespørgsler, hvor førnævnte metamodel benyttes. Modellen understøtter deling og genbrug af forespørgsler over Semantic Web og fordrer klargørelsen af metadata med øje for assistance-relaterede formål. Endelig leder resultaterne af afhandlingen til fundamenterne for metadata i støttet Exploratory OLAP og opfordrer til en øget opmærksomhed på modelleringen og brugen af semantik i forhold til metadataPostprint (published version
    corecore