759 research outputs found

    Proceedings TLAD 2012:10th International Workshop on the Teaching, Learning and Assessment of Databases

    Get PDF
    This is the tenth in the series of highly successful international workshops on the Teaching, Learning and Assessment of Databases (TLAD 2012). TLAD 2012 is held on the 9th July at the University of Hertfordshire and hopes to be just as successful as its predecessors. The teaching of databases is central to all Computing Science, Software Engineering, Information Systems and Information Technology courses, and this year, the workshop aims to continue the tradition of bringing together both database teachers and researchers, in order to share good learning, teaching and assessment practice and experience, and further the growing community amongst database academics. As well as attracting academics and teachers from the UK community, the workshop has also been successful in attracting academics from the wider international community, through serving on the programme committee, and attending and presenting papers. Due to the healthy number of high quality submissions this year, the workshop will present eight peer reviewed papers. Of these, six will be presented as full papers and two as short papers. These papers cover a number of themes, including: the teaching of data mining and data warehousing, SQL and NoSQL, databases at school, and database curricula themselves. The final paper will give a timely ten-year review of TLAD workshops, and it is expected that these papers will lead to a stimulating closing discussion, which will continue beyond the workshop. We also look forward to a keynote presentation by Karen Fraser, who has contributed to many TLAD workshops as the HEA organizer. Titled “An Effective Higher Education Academy”, the keynote will discuss the Academy’s plans for the future and outline how participants can get involved

    Proceedings TLAD 2012:10th International Workshop on the Teaching, Learning and Assessment of Databases

    Get PDF
    This is the tenth in the series of highly successful international workshops on the Teaching, Learning and Assessment of Databases (TLAD 2012). TLAD 2012 is held on the 9th July at the University of Hertfordshire and hopes to be just as successful as its predecessors. The teaching of databases is central to all Computing Science, Software Engineering, Information Systems and Information Technology courses, and this year, the workshop aims to continue the tradition of bringing together both database teachers and researchers, in order to share good learning, teaching and assessment practice and experience, and further the growing community amongst database academics. As well as attracting academics and teachers from the UK community, the workshop has also been successful in attracting academics from the wider international community, through serving on the programme committee, and attending and presenting papers. Due to the healthy number of high quality submissions this year, the workshop will present eight peer reviewed papers. Of these, six will be presented as full papers and two as short papers. These papers cover a number of themes, including: the teaching of data mining and data warehousing, SQL and NoSQL, databases at school, and database curricula themselves. The final paper will give a timely ten-year review of TLAD workshops, and it is expected that these papers will lead to a stimulating closing discussion, which will continue beyond the workshop. We also look forward to a keynote presentation by Karen Fraser, who has contributed to many TLAD workshops as the HEA organizer. Titled “An Effective Higher Education Academy”, the keynote will discuss the Academy’s plans for the future and outline how participants can get involved

    Pervasive data science applied to the society of services

    Get PDF
    Dissertação de mestrado integrado em Information Systems Engineering and ManagementWith the technological progress that has been happening in the last few years, and now with the actual implementation of the Internet of Things concept, it is possible to observe an enormous amount of data being collected each minute. Well, this brings along a problem: “How can we process such amount of data in order to extract relevant knowledge in useful time?”. That’s not an easy issue to solve, because most of the time one needs to deal not just with tons but also with different kinds of data, which makes the problem even more complex. Today, and in an increasing way, huge quantities of the most varied types of data are produced. These data alone do not add value to the organizations that collect them, but when subjected to data analytics processes, they can be converted into crucial information sources in the core business. Therefore, the focus of this project is to explore this problem and try to give it a modular solution, adaptable to different realities, using recent technologies and one that allows users to access information where and whenever they wish. In the first phase of this dissertation, bibliographic research, along with a review of the same sources, was carried out in order to realize which kind of solutions already exists and also to try to solve the remaining questions. After this first work, a solution was developed, which is composed by four layers, and consists in getting the data to submit it to a treatment process (where eleven treatment functions are included to actually fulfill the multidimensional data model previously designed); and then an OLAP layer, which suits not just structured data but unstructured data as well, was constructed. In the end, it is possible to consult a set of four dashboards (available on a web application) based on more than twenty basic queries and that allows filtering data with a dynamic query. For this case study, and as proof of concept, the company IOTech was used, a company that provides the data needed to accomplish this dissertation, and based on which five Key Performance Indicators were defined. During this project two different methodologies were applied: Design Science Research, in the research field, and SCRUM, in the practical component.Com o avanço tecnológico que se tem vindo a notar nos últimos anos e, atualmente, com a implementação do conceito Internet of Things, é possível observar o enorme crescimento dos volumes de dados recolhidos a cada minuto. Esta realidade levanta uma problemática: “Como podemos processar grandes volumes dados e extrair conhecimento a partir deles em tempo útil?”. Este não é um problema fácil de resolver pois muitas vezes não estamos a lidar apenas com grandes volumes de dados, mas também com diferentes tipos dos mesmos, o que torna a problemática ainda mais complexa. Atualmente, grandes quantidades dos mais variados tipos de dados são geradas. Estes dados por si só não acrescentam qualquer valor às organizações que os recolhem. Porém, quando submetidos a processos de análise, podem ser convertidos em fontes de informação cruciais no centro do negócio. Assim sendo, o foco deste projeto é explorar esta problemática e tentar atribuir-lhe uma solução modular e adaptável a diferentes realidades, com base em tecnologias atuais que permitam ao utilizador aceder à informação onde e quando quiser. Na primeira fase desta dissertação, foi executada uma pesquisa bibliográfica, assim como, uma revisão da literatura recolhida nessas mesmas fontes, a fim de compreender que soluções já foram propostas e quais são as questões que requerem uma resposta. Numa segunda fase, foi desenvolvida uma solução, composta por quatro modulos, que passa por submeter os dados a um processo de tratamento (onde estão incluídas onze funções de tratamento, com o objetivo de preencher o modelo multidimensional previamente desenhado) e, posteriormente, desenvolver uma camada OLAP que seja capaz de lidar não só com dados estruturados, mas também dados não estruturados. No final, é possível consultar um conjunto de quatro dashboards disponibilizados numa plataforma web que tem como base mais de vinte queries iniciais, e filtros com base numa query dinamica. Para este caso de estudo e como prova de conceito foi utilizada a empresa IOTech, empresa que disponibilizará os dados necessários para suportar esta dissertação, e com base nos quais foram definidos cinco Key Performance Indicators. Durante este projeto foram aplicadas diferentes metodologias: Design Science Research, no que diz respeito à pesquisa, e SCRUM, no que diz respeito à componente prática

    Implementing Multidimensional Data Warehouses into NoSQL

    Get PDF
    International audienceNot only SQL (NoSQL) databases are becoming increasingly popular and have some interesting strengths such as scalability and flexibility. In this paper, we investigate on the use of NoSQL systems for implementing OLAP (On-Line Analytical Processing) systems. More precisely, we are interested in instantiating OLAP systems (from the conceptual level to the logical level) and instantiating an aggregation lattice (optimization). We define a set of rules to map star schemas into two NoSQL models: columnoriented and document-oriented. The experimental part is carried out using the reference benchmark TPC. Our experiments show that our rules can effectively instantiate such systems (star schema and lattice). We also analyze differences between the two NoSQL systems considered. In our experiments, HBase (columnoriented) happens to be faster than MongoDB (document-oriented) in terms of loading time

    Evaluating cloud database migration options using workload models

    Get PDF
    A key challenge in porting enterprise software systems to the cloud is the migration of their database. Choosing a cloud provider and service option (e.g., a database-as-a-service or a manually configured set of virtual machines) typically requires the estimation of the cost and migration duration for each considered option. Many organisations also require this information for budgeting and planning purposes. Existing cloud migration research focuses on the software components, and therefore does not address this need. We introduce a two-stage approach which accurately estimates the migration cost, migration duration and cloud running costs of relational databases. The first stage of our approach obtains workload and structure models of the database to be migrated from database logs and the database schema. The second stage performs a discrete-event simulation using these models to obtain the cost and duration estimates. We implemented software tools that automate both stages of our approach. An extensive evaluation compares the estimates from our approach against results from real-world cloud database migrations

    Relational Database Design and Multi-Objective Database Queries for Position Navigation and Timing Data

    Get PDF
    Performing flight tests is a natural part of researching cutting edge sensors and filters for sensor integration. Unfortunately, tests are expensive, and typically take many months of planning. A sensible goal would be to make previously collected data readily available to researchers for future development. The Air Force Institute of Technology (AFIT) has hundreds of data logs potentially available to aid in facilitating further research in the area of navigation. A database would provide a common location where older and newer data sets are available. Such a database must be able to store the sensor data, metadata about the sensors, and affiliated metadata of interest. This thesis proposes a standard approach for sensor and metadata schema and three different design approaches that organize this data in relational databases. Queries proposed by members of the Autonomy and Navigation Technology (ANT) Center at AFIT are the foundation of experiments for testing. These tests fall into two categories, downloaded data, and queries which return a list of missions. Test databases of 100 and 1000 missions are created for the three design approaches to simulate AFIT\u27s present and future volume of data logs. After testing, this thesis recommends one specific approach to the ANT Center as its database solution. In order to enable more complex queries, a Genetic algorithm and Hill Climber algorithm are developed as solutions to queries in the combined Knapsack/Set Covering Problem Domains. These algorithms are tested against the two test databases for the recommended database approach. Each algorithm returned solutions in under two minutes, and may be a valuable tool for researchers when the database becomes operational

    Adaptive Big Data Pipeline

    Get PDF
    Over the past three decades, data has exponentially evolved from being a simple software by-product to one of the most important companies’ assets used to understand their customers and foresee trends. Deep learning has demonstrated that big volumes of clean data generally provide more flexibility and accuracy when modeling a phenomenon. However, handling ever-increasing data volumes entail new challenges: the lack of expertise to select the appropriate big data tools for the processing pipelines, as well as the speed at which engineers can take such pipelines into production reliably, leveraging the cloud. We introduce a system called Adaptive Big Data Pipelines: a platform to automate data pipelines creation. It provides an interface to capture the data sources, transformations, destinations and execution schedule. The system builds up the cloud infrastructure, schedules and fine-tunes the transformations, and creates the data lineage graph. This system has been tested on data sets of 50 gigabytes, processing them in just a few minutes without user intervention.ITESO, A. C
    • …
    corecore