226 research outputs found

    Data Integration over NoSQL Stores Using Access Path Based Mappings

    Get PDF
    International audienceDue to the large amount of data generated by user interactions on the Web, some companies are currently innovating in the domain of data management by designing their own systems. Many of them are referred to as NoSQL databases, standing for 'Not only SQL'. With their wide adoption will emerge new needs and data integration will certainly be one of them. In this paper, we adapt a framework encountered for the integration of relational data to a broader context where both NoSQL and relational databases can be integrated. One important extension consists in the efficient answering of queries expressed over these data sources. The highly denormalized aspect of NoSQL databases results in varying performance costs for several possible query translations. Thus a data integration targeting NoSQL databases needs to generate an optimized translation for a given query. Our contributions are to propose (i) an access path based mapping solution that takes benefit of the design choices of each data source, (ii) integrate preferences to handle conflicts between sources and (iii) a query language that bridges the gap between the SQL query expressed by the user and the query language of the data sources. We also present a prototype implementation, where the target schema is represented as a set of relations and which enables the integration of two of the most popular NoSQL database models, namely document and a column family stores

    Analyzing performance of Apache Tez and MapReduce with hadoop multinode cluster on Amazon cloud

    Get PDF

    Bilateral Ovarian Endometriomas Presenting as Nonprogress of Labor: First Case Report in the Literature Is Concomitant Surgical Excision during Cesarean Section Advisable?

    Get PDF
    Objective. To report the first case of bilateral ovarian endometriomas, leading to nonprogress of labour, successfully excised during cesarean section.Design. Case report.Setting. Department of Obstetrics &amp; Gynecology of Dr. RPGMC Tanda, Kangra, India.Patients. A primigravida in labour at term gestation.Interventions. Surgical management.Main Outcome Measures. Description and treatment of a pregnant woman with bilateral ovarian endometriomas during cesarean section.Results. Successful excision of ovarian endometriomas and reconstruction of the ovaries during cesarean section.Conclusion. Management of incidentally detected endometriomas during cesarean section should be individualized, taking into account the symptoms, size, bilaterality, and adhesion with adjacent organs.</jats:p

    Big Data Analysis

    Get PDF
    The value of big data is predicated on the ability to detect trends and patterns and more generally to make sense of the large volumes of data that is often comprised of a heterogeneous mix of format, structure, and semantics. Big data analysis is the component of the big data value chain that focuses on transforming raw acquired data into a coherent usable resource suitable for analysis. Using a range of interviews with key stakeholders in small and large companies and academia, this chapter outlines key insights, state of the art, emerging trends, future requirements, and sectorial case studies for data analysis

    Auditoría de marca basada en las variables de marketing mix y brand equity. Caso: IKARUS

    Get PDF
    El presente trabajo de investigación aborda el análisis de la marca corporativa Ikarus mediante una auditoria de marca desarrollada en base a las variables del marketing mix y las dimensiones de brand equity de David Aaker. Esta investigación resulta relevante debido al actual mercado competitivo al que se enfrentan los emprendimientos peruanos, el cual obliga a dichas empresas a proponer nuevas acciones dirigidas a la calidad, diferenciación, segmentación y políticas de precio que contribuyan a su crecimiento. La investigación se realiza mediante un estudio de caso que se centra en un emprendimiento peruano del sector textil y confecciones, específicamente de ropa urbana. Propone entender cuáles son las acciones realizadas por la empresa y la perspectiva de los clientes de la marca, así como presentar recomendaciones que aporten a su mejora. La marca corporativa Ikarus fabrica y comercializa prendas de ropa urbana cuyo valor agregado son sus diseños únicos. Desarrollándose en un contexto tan competitivo, se considera necesaria una evaluación que permita a la empresa encontrar tanto sus puntos de mejora como sus fortalezas, para así tomar decisiones estratégicas respecto a sus marcas, apuntando hacia su crecimiento en el sector. El análisis se realiza a través de un enfoque mixto, con ayuda de herramientas como focus group, observaciones en puntos de venta, entrevistas a profundidad a colaboradores internos y externos, y encuestas a clientes de la marca, las cuales brindaron información útil al estudio. Asimismo, se plantearon en la investigación dos hipótesis importantes: las acciones de la marca Ikarus han servido de apoyo para su crecimiento y el brand equity de la marca es positivo. En base al análisis del marketing mix para estudiar las acciones realizadas por la marca y al análisis de las dimensiones del brand equity para estudiar la perspectiva del cliente, se comprueban las hipótesis planteadas. La marca Ikarus posee un conjunto de acciones que logran generar un crecimiento, así como un nivel alto en las distintas dimensiones del brand equity: calidad percibida/medidas de liderazgo, lealtad, reconocimiento y medidas de asociación/ diferenciación.Tesi

    Just-In-Time Data Distribution for Analytical Query Processing

    Get PDF
    Distributed processing commonly requires data spread across machines using a priori static or hash-based data allocation. In this paper, we explore an alternative approach that starts from a master node in control of the complete database, and a variable number of worker nodes for delegated query processing. Data is shipped just-in-time to the worker nodes using a need to know policy, and is being reused, if possible, in subsequent queries. A bidding mechanism among the workers yields a scheduling with the most efficient reuse of previously shipped data, minimizing the data transfer costs. Just-in-time data shipment allows our system to benefit from locally available idle resources to boost overall performance. The system is maintenance-free and allocation is fully transparent to users. Our experiments show that the proposed adaptive distributed architecture is a viable and flexible alternative for small scale MapReduce-type of settings

    Hadoop-BAM: directly manipulating next generation sequencing data in the cloud

    Get PDF
    Summary: Hadoop-BAM is a novel library for the scalable manipulation of aligned next-generation sequencing data in the Hadoop distributed computing framework. It acts as an integration layer between analysis applications and BAM files that are processed using Hadoop. Hadoop-BAM solves the issues related to BAM data access by presenting a convenient API for implementing map and reduce functions that can directly operate on BAM records. It builds on top of the Picard SAM JDK, so tools that rely on the Picard API are expected to be easily convertible to support large-scale distributed processing. In this article we demonstrate the use of Hadoop-BAM by building a coverage summarizing tool for the Chipster genome browser. Our results show that Hadoop offers good scalability, and one should avoid moving data in and out of Hadoop between analysis steps

    Challenges in managing real-time data in health information system (HIS)

    Get PDF
    © Springer International Publishing Switzerland 2016. In this paper, we have discussed the challenges in handling real-time medical big data collection and storage in health information system (HIS). Based on challenges, we have proposed a model for realtime analysis of medical big data. We exemplify the approach through Spark Streaming and Apache Kafka using the processing of health big data Stream. Apache Kafka works very well in transporting data among different systems such as relational databases, Apache Hadoop and nonrelational databases. However, Apache Kafka lacks analyzing the stream, Spark Streaming framework has the capability to perform some operations on the stream. We have identified the challenges in current realtime systems and proposed our solution to cope with the medical big data streams

    Supply chain hybrid simulation: From Big Data to distributions and approaches comparison

    Get PDF
    The uncertainty and variability of Supply Chains paves the way for simulation to be employed to mitigate such risks. Due to the amounts of data generated by the systems used to manage relevant Supply Chain processes, it is widely recognized that Big Data technologies may bring benefits to Supply Chain simulation models. Nevertheless, a simulation model should also consider statistical distributions, which allow it to be used for purposes such as testing risk scenarios or for prediction. However, when Supply Chains are complex and of huge-scale, performing distribution fitting may not be feasible, which often results in users focusing on subsets of problems or selecting samples of elements, such as suppliers or materials. This paper proposed a hybrid simulation model that runs using data stored in a Big Data Warehouse, statistical distributions or a combination of both approaches. The results show that the former approach brings benefits to the simulations and is essential when setting the model to run based on statistical distributions. Furthermore, this paper also compared these approaches, emphasizing the pros and cons of each, as well as their differences in computational requirements, hence establishing a milestone for future researches in this domain.This work has been supported by national funds through FCT -Fundacao para a Ciencia e Tecnologia within the Project Scope: UID/CEC/00319/2019 and by the Doctoral scholarship PDE/BDE/114566/2016 funded by FCT, the Portuguese Ministry of Science, Technology and Higher Education, through national funds, and co-financed by the European Social Fund (ESF) through the Operational Programme for Human Capital (POCH)

    On the use of simulation as a Big Data semantic validator for supply chain management

    Get PDF
    Simulation stands out as an appropriate method for the Supply Chain Management (SCM) field. Nevertheless, to produce accurate simulations of Supply Chains (SCs), several business processes must be considered. Thus, when using real data in these simulation models, Big Data concepts and technologies become necessary, as the involved data sources generate data at increasing volume, velocity and variety, in what is known as a Big Data context. While developing such solution, several data issues were found, with simulation proving to be more efficient than traditional data profiling techniques in identifying them. Thus, this paper proposes the use of simulation as a semantic validator of the data, proposed a classification for such issues and quantified their impact in the volume of data used in the final achieved solution. This paper concluded that, while SC simulations using Big Data concepts and technologies are within the grasp of organizations, their data models still require considerable improvements, in order to produce perfect mimics of their SCs. In fact, it was also found that simulation can help in identifying and bypassing some of these issues.This work has been supported by FCT (Fundacao para a Ciencia e Tecnologia) within the Project Scope: UID/CEC/00319/2019 and by the Doctoral scholarship PDE/BDE/114566/2016 funded by FCT, the Portuguese Ministry of Science, Technology and Higher Education, through national funds, and co-financed by the European Social Fund (ESF) through the Operational Programme for Human Capital (POCH)
    corecore