44 research outputs found

    Augmenting data warehousing architectures with hadoop

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Information Systems and Technologies ManagementAs the volume of available data increases exponentially, traditional data warehouses struggle to transform this data into actionable knowledge. Data strategies that include the creation and maintenance of data warehouses have a lot to gain by incorporating technologies from the Big Data’s spectrum. Hadoop, as a transformation tool, can add a theoretical infinite dimension of data processing, feeding transformed information into traditional data warehouses that ultimately will retain their value as central components in organizations’ decision support systems. This study explores the potentialities of Hadoop as a data transformation tool in the setting of a traditional data warehouse environment. Hadoop’s execution model, which is oriented for distributed parallel processing, offers great capabilities when the amounts of data to be processed require the infrastructure to expand. Horizontal scalability, which is a key aspect in a Hadoop cluster, will allow for proportional growth in processing power as the volume of data increases. Through the use of a Hive on Tez, in a Hadoop cluster, this study transforms television viewing events, extracted from Ericsson’s Mediaroom Internet Protocol Television infrastructure, into pertinent audience metrics, like Rating, Reach and Share. These measurements are then made available in a traditional data warehouse, supported by a traditional Relational Database Management System, where they are presented through a set of reports. The main contribution of this research is a proposed augmented data warehouse architecture where the traditional ETL layer is replaced by a Hadoop cluster, running Hive on Tez, with the purpose of performing the heaviest transformations that convert raw data into actionable information. Through a typification of the SQL statements, responsible for the data transformation processes, we were able to understand that Hadoop, and its distributed processing model, delivers outstanding performance results associated with the analytical layer, namely in the aggregation of large data sets. Ultimately, we demonstrate, empirically, the performance gains that can be extracted from Hadoop, in comparison to an RDBMS, regarding speed, storage usage and scalability potential, and suggest how this can be used to evolve data warehouses into the age of Big Data

    Estratégias eficientes para identificação de falhas utilizando o diagnóstico baseado em comparações

    Get PDF
    Orientador: Prof. Dr. Elias Procópio Duarte Jr.Tese (doutorado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Curso de Pós-Graduaçao em Informática. Defesa: Curitiba, 12/04/2013Bibliografia: fls. 126-148Resumo: O diagnóstico baseado em comparações e uma forma realista para detectar falhas em hardware, software, redes e sistemas distribuídos. O diagnostico se baseia na comparaçao de resultados de tarefas produzidos por pares de unidades para determinar quais sao as unidades falhas e sem-falha do sistema. Qualquer diferenca no resultado da comparacao indica que uma ou ambas as unidades estao falhas. O diagnostico completo do sistema e baseado no resultado de todas as comparações. Este trabalho apresenta um novo algoritmo de diagnostico para identificar falhas em sistemas de topologia arbitraria com base no modelo MM*. A complexidade do algoritmo proposto e O(t2AN) no pior caso para sistemas de N unidades, onde t denota o numero maximo permitido de unidades falhas e A e o grau da unidade de maior grau no sistema. Esta complexidade e significativamente menor que a dos outros algoritmos previamente publicados. Alem da especificacao do algoritmo e das provas de correcão, resultados obtidos atraves da execucao exaustiva de experimentos sao apresentados, mostrando o desempenho me dio do algoritmo para diferentes sistemas. Al em do novo algoritmo para sistemas de topologia arbitraria, este trabalho tambem apresenta duas outras solucoes para deteccão e combate a poluicao de conteudo, ou alteracoes nao autorizadas, em transmissões de mídia contínua ao vivo em redes P2P - a primeira e uma solucão centralizada e que realiza o diagnostico da poluicao na rede, e a segunda e uma solucao completamente distribuída e descentralizada que tem o objetivo de combater a propagacao da poluicao na rede. Ambas as solucoes utilizam o diagnostico baseado em comparacoes para detectar alterações no conteudo dos dados transmitidos. As soluções foram implementadas no Fireflies, um protocolo escalavel para redes overlay, e diversos experimentos atraves de simulacao foram conduzidos. Os resultados mostram que ambas as estrategias sao solucães viaveis para identificar e combater a poluiçcãao de conteudo em transmissãoes ao vivo e que adicionam baixa sobrecarga ao trafego da rede. Em particular a estrategia de combate a poluicao foi capaz de reduzir consideravelmente a poluicão de conteudo em diversas configurações, em varios casos chegando a elimina-la no decorrer das transmissoães.Abstract: Comparison-based diagnosis is a practical approach to detect faults in hardware, software, and network-based systems. Diagnosis is based on the comparison of task outputs returned by pairs of system units in order to determine whether those units are faulty or fault-free. If the comparison results in a mismatch then one ore both units are faulty. System diagnosis is based on the complete set of all comparison results. This work introduces a novel diagnosis algorithm to identify faults in t-diagnosable systems of arbitrary topology under the MM* model. The complexity of the proposed algorithm is O(t2AN) in the worst case for systems with N units, where t denotes the maximum number of faulty units allowed and A corresponds to the maximum degree of a unit in the system. This complexity is significantly lower than those of previously published algorithms. Besides the algorithm specification and correctness proofs, exhaustive simulations results are presented, showing the typical performance of the algorithm for different systems. Moreover, this work also presents two different strategies to detect and fight content pollution in P2P live streaming transmissions - the first strategy is centralized and performs the diagnosis of content pollution in the network, and the second strategy is a completely distributed solution to combat the propagation of the pollution. Both strategies employ comparison-based diagnosis in order to detect any modification in the data transmitted. The solutions were also implemented in Fireflies, a scalable and fault-tolerant overlay network protocol, and a large number of simulation experiments were conduced. Results show that both strategies are feasible solutions to identify and fight content pollution in live streaming sessions and that they add low overhead in terms of network bandwidth usage. In particular, the solution proposed to combat content pollution was able to significantly reduce the pollution over the system in diverse network configurations - in many cases the solution nearly eliminated the pollution during the transmission

    Infraestrutura para análise de tráfego e comportamento de condutores

    Get PDF
    Mestrado em Engenharia de Computadores e TelemáticaO trabalho realizado nesta dissertação pode ser visto como um sistema de apoio à decisão para tráfego. Foi motivado pelos projetos smart cities dos quais os transportes são uma área importante. Com a evolução das tecnologias nas viaturas é possível fazer uma recolha de cada vez mais informação sobre veículos num ambiente real, permitindo assim fazer uma análise mais detalhada sobre o tráfego e comportamento dos condutores. A pesquisa efetuada sobre trabalho relacionado nesta área revelou que muitas das análises efetuadas não tem em consideração o contexto sendo que alguns estudos apontavam integrar fatores influentes na condução como trabalho futuro. Nesta dissertação os conceitos do trabalho relacionado são integrados assim como fontes de dados heterogénias com informação sobre o contexto. Foi também feito um estudo sobre diferentes paradigmas de bases de dados, onde foram estudados os principais paradigmas NoSQL, os seus casos de uso e as sua principais implementações. Esta dissertação tem como objetivo propor o desenho e a implementação de uma infraestrutura para análise de tráfego e comportamento de condutores a partir de dados sobre trajetórias obtidos de viaturas em circulação. Para a prova de conceito, foram efetuados dois casos de estudo com dados extraidos de duas fontes distintas. Um conjunto de ferramentas de extração, transformação e carregamento de dados foi criado para alimentar os data marts desenvolvidos. Ferramentas de visualização foram usadas de modo a poder fazer uma análise visual através de gráficos para as medidas agregadas e software sistemas de informação geográficos para os detalhes espaciais. Esta infraestrutura foi desenhada de modo a poder ser adaptada para diferentes casos de uso da área, desde gestão de transportes públicos até seguros com base em comportamento. Os resultados obtidos permitem estudar o comportamento dos condutores de modo a obter conhecimento nesta área e possivelmente melhorar o tráfego ou a experiência de condução.The work in this dissertation can be seen as a traffic decision support system. It was motivated for the smart cities project which transportation are a major area. With the technology evolution on vehicles it is possible to gather even more information about vehicles in a real scenario, this allows to perform a more detailed analysis about traffic and drivers’ behavior. The research done about related work in this area showed that a lot of the analysis performed did not have into consideration the context, some of this studies even proposed to integrate factors that influence the driving experience in the future. In this dissertation the concepts of the related work are integrated as well as heterogeneous data sources with context information. It was also performed a study about different database paradigms, in which were studied the most relevant NoSQL paradigms, their use cases and most used implementations. This dissertation proposes the design and implementation of a framework for traffic data analysis and drivers’ behavior based on trajectory data gathered from moving vehicles. For the proof of concept, it was performed two different case studies with data extracted from two distinct datasets with vehicles trajectories. A set of tools was developed to extract, transform and load data to the data marts developed. Visualization tools were used in order to perform a visual analysis through charts for aggregate measures and GIS software for the geospatial details. This framework was designed to be adaptable for different application scenarios involving moving vehicles, from public transportation management to behavior based insurance. The achieved results allows the study of traffic and drivers’ behavior in order to obtain knowledge in this area and possibly improve traffic management or the driving experience

    Network-on-Chip

    Get PDF
    Limitations of bus-based interconnections related to scalability, latency, bandwidth, and power consumption for supporting the related huge number of on-chip resources result in a communication bottleneck. These challenges can be efficiently addressed with the implementation of a network-on-chip (NoC) system. This book gives a detailed analysis of various on-chip communication architectures and covers different areas of NoCs such as potentials, architecture, technical challenges, optimization, design explorations, and research directions. In addition, it discusses current and future trends that could make an impactful and meaningful contribution to the research and design of on-chip communications and NoC systems

    The Murray Ledger and Times, December 20, 2001

    Get PDF

    Modeling and Optimization of Next-Generation Wireless Access Networks

    Get PDF
    The ultimate goal of the next generation access networks is to provide all network users, whether they are fixed or mobile, indoor or outdoor, with high data rate connectivity, while ensuring a high quality of service. In order to realize this ambitious goal, delay, jitter, error rate and packet loss should be minimized: a goal that can only be achieved through integrating different technologies, including passive optical networks, 4th generation wireless networks, and femtocells, among others. This thesis focuses on medium access control and physical layers of future networks. In this regard, the first part of this thesis discusses techniques to improve the end-to-end quality of service in hybrid optical-wireless networks. In these hybrid networks, users are connected to a wireless base station that relays their data to the core network through an optical connection. Hence, by integrating wireless and optical parts of these networks, a smart scheduler can predict the incoming traffic to the optical network. The prediction data generated herein is then used to propose a traffic-aware dynamic bandwidth assignment algorithm for reducing the end-to-end delay. The second part of this thesis addresses the challenging problem of interference management in a two-tier macrocell/femtocell network. A high quality, high speed connection for indoor users is ensured only if the network has a high signal to noise ratio. A requirement that can be fulfilled with using femtocells in cellular networks. However, since femtocells generate harmful interference to macrocell users in proximity of them, careful analysis and realistic models should be developed to manage the introduced interference. Thus, a realistic model for femtocell interference outside suburban houses is proposed and several performance measures, e.g., signal to interference and noise ratio and outage probability are derived mathematically for further analysis. The quality of service of cellular networks can be degraded by several factors. For example, in industrial environments, simultaneous fading and strong impulsive noise significantly deteriorate the error rate performance. In the third part of this thesis, a technique to improve the bit error rate of orthogonal frequency division multiplexing systems in industrial environments is presented. This system is the most widely used technology in next-generation networks, and is very susceptible to impulsive noise, especially in fading channels. Mathematical analysis proves that the proposed method can effectively mitigate the degradation caused by impulsive noise and significantly improve signal to interference and noise ratio and bit error rate, even in frequency-selective fading channels

    Scene Understanding For Real Time Processing Of Queries Over Big Data Streaming Video

    Get PDF
    With heightened security concerns across the globe and the increasing need to monitor, preserve and protect infrastructure and public spaces to ensure proper operation, quality assurance and safety, numerous video cameras have been deployed. Accordingly, they also need to be monitored effectively and efficiently. However, relying on human operators to constantly monitor all the video streams is not scalable or cost effective. Humans can become subjective, fatigued, even exhibit bias and it is difficult to maintain high levels of vigilance when capturing, searching and recognizing events that occur infrequently or in isolation. These limitations are addressed in the Live Video Database Management System (LVDBMS), a framework for managing and processing live motion imagery data. It enables rapid development of video surveillance software much like traditional database applications are developed today. Such developed video stream processing applications and ad hoc queries are able to reuse advanced image processing techniques that have been developed. This results in lower software development and maintenance costs. Furthermore, the LVDBMS can be intensively tested to ensure consistent quality across all associated video database applications. Its intrinsic privacy framework facilitates a formalized approach to the specification and enforcement of verifiable privacy policies. This is an important step towards enabling a general privacy certification for video surveillance systems by leveraging a standardized privacy specification language. With the potential to impact many important fields ranging from security and assembly line monitoring to wildlife studies and the environment, the broader impact of this work is clear. The privacy framework protects the general public from abusive use of surveillance technology; iii success in addressing the trust issue will enable many new surveillance-related applications. Although this research focuses on video surveillance, the proposed framework has the potential to support many video-based analytical applications

    Unmet goals of tracking: within-track heterogeneity of students' expectations for

    Get PDF
    Educational systems are often characterized by some form(s) of ability grouping, like tracking. Although substantial variation in the implementation of these practices exists, it is always the aim to improve teaching efficiency by creating homogeneous groups of students in terms of capabilities and performances as well as expected pathways. If students’ expected pathways (university, graduate school, or working) are in line with the goals of tracking, one might presume that these expectations are rather homogeneous within tracks and heterogeneous between tracks. In Flanders (the northern region of Belgium), the educational system consists of four tracks. Many students start out in the most prestigious, academic track. If they fail to gain the necessary credentials, they move to the less esteemed technical and vocational tracks. Therefore, the educational system has been called a 'cascade system'. We presume that this cascade system creates homogeneous expectations in the academic track, though heterogeneous expectations in the technical and vocational tracks. We use data from the International Study of City Youth (ISCY), gathered during the 2013-2014 school year from 2354 pupils of the tenth grade across 30 secondary schools in the city of Ghent, Flanders. Preliminary results suggest that the technical and vocational tracks show more heterogeneity in student’s expectations than the academic track. If tracking does not fulfill the desired goals in some tracks, tracking practices should be questioned as tracking occurs along social and ethnic lines, causing social inequality

    The Review Wed, September 10, 1986

    Get PDF

    Protocole de routage à chemins multiples pour des réseaux ad hoc

    Get PDF
    Ad hoc networks consist of a collection of wireless mobile nodes which dynamically exchange data without reliance on any fixed based station or a wired backbone network. They are by definition self-organized. The frequent topological changes make multi-hops routing a crucial issue for these networks. In this PhD thesis, we propose a multipath routing protocol named Multipath Optimized Link State Routing (MP-OLSR). It is a multipath extension of OLSR, and can be regarded as a hybrid routing scheme because it combines the proactive nature of topology sensing and reactive nature of multipath computation. The auxiliary functions as route recovery and loop detection are introduced to improve the performance of the network. The usage of queue length metric for link quality criteria is studied and the compatibility between single path and multipath routing is discussed to facilitate the deployment of the protocol. The simulations based on NS2 and Qualnet softwares are performed in different scenarios. A testbed is also set up in the campus of Polytech’Nantes. The results from the simulator and testbed reveal that MP-OLSR is particularly suitable for mobile, large and dense networks with heavy network load thanks to its ability to distribute the traffic into different paths and effective auxiliary functions. The H.264/SVC video service is applied to ad hoc networks with MP-OLSR. By exploiting the scalable characteristic of H.264/SVC, we propose to use Priority Forward Error Correction coding based on Finite Radon Transform (FRT) to improve the received video quality. An evaluation framework called SVCEval is built to simulate the SVC video transmission over different kinds of networks in Qualnet. This second study highlights the interest of multiple path routing to improve quality of experience over self-organized networks.Les réseaux ad hoc sont constitués d’un ensemble de nœuds mobiles qui échangent des données sans infrastructure de type point d’accès ou artère filaire. Ils sont par définition auto-organisés. Les changements fréquents de topologie des réseaux ad hoc rendent le routage multi-sauts très problématique. Dans cette thèse, nous proposons un protocole de routage à chemins multiples appelé Multipath Optimized Link State Routing (MP-OLSR). C’est une extension d’OLSR à chemins multiples qui peut être considérée comme une méthode de routage hybride. En effet, MP-OLSR combine la caractéristique proactive de la détection de topologie et la caractéristique réactive du calcul de chemins multiples qui est effectué à la demande. Les fonctions auxiliaires comme la récupération de routes ou la détection de boucles sont introduites pour améliorer la performance du réseau. L’utilisation de la longueur des files d’attente des nœuds intermédiaires comme critère de qualité de lien est étudiée et la compatibilité entre routage à chemins multiples et chemin unique est discutée pour faciliter le déploiement du protocole. Les simulations basées sur les logiciels NS2 et Qualnet sont effectuées pour tester le routage MP-OLSR dans des scénarios variés. Une mise en œuvre a également été réalisée au cours de cette thèse avec une expérimentation sur le campus de Polytech’Nantes. Les résultats de la simulation et de l’expérimentation révèlent que MP-OLSR est particulièrement adapté pour les réseaux mobiles et denses avec des trafics élevés grâce à sa capacité à distribuer le trafic dans des chemins différents et à des fonctions auxiliaires efficaces. Au niveau application, le service vidéo H.264/SVC est appliqué à des réseaux ad hoc MP-OLSR. En exploitant la hiérarchie naturelle délivrée par le format H.264/SVC, nous proposons d’utiliser un codage à protection inégale (PFEC) basé sur la Transformation de Radon Finie (FRT) pour améliorer la qualité de la vidéo à la réception. Un outil appelé SVCEval est développé pour simuler la transmission de vidéo SVC sur différents types de réseaux dans le logiciel Qualnet. Cette deuxième étude témoigne de l’intérêt du codage à protection inégale dans un routage à chemins multiples pour améliorer une qualité d’usage sur des réseaux auto-organisés
    corecore