56,204 research outputs found

    Data Management for Data Science - Towards Embedded Analytics

    Get PDF
    The rise of Data Science has caused an influx of new usersin need of data management solutions. However, insteadof utilizing existing RDBMS solutions they are opting touse a stack of independent solutions for data storage andprocessing glued together by scripting languages. This is notbecause they do not need the functionality that an integratedRDBMS provides, but rather because existing RDBMS im-plementations do not cater to their use case. To solve theseissues, we propose a new class of data management systems:embedded analytical systems. These systems are tightlyintegrated with analytical tools, and provide fast and effi-cient access to the data stored within them. In this work,we describe the unique challenges and opportunities w.r.tworkloads, resilience and cooperation that are faced by thisnew class of systems and the steps we have taken towardsaddressing them in the DuckDB system

    Inside a Data Science Team: Data Crafting in Generating Strategic Value from Analytics

    Get PDF
    Current research agrees that the value of data lies in analytics that generate valuable insights for strategic purposes. However, little is known about how these insights are derived by data scientists. This research reports on the work of an embedded data science team at an organization striving to use people analytics to improve its strategic human resource management. We find that to create strategically valuable analytics, data scientists engage in data crafting, an approach to data science work that relies on broadcasting the potential value of data science towards the organization, cultivating a shared vision of value within the team, and creating value-adding data products with organizational customers. To do so, the team requires appropriate positioning and autonomy within the organization. Our findings have implications on understanding the role of data science teams and organizational data with respect to strategy, and practical insights for realizing strategic value from analytics

    Database integrated analytics using R : initial experiences with SQL-Server + R

    Get PDF
    © 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Most data scientists use nowadays functional or semi-functional languages like SQL, Scala or R to treat data, obtained directly from databases. Such process requires to fetch data, process it, then store again, and such process tends to be done outside the DB, in often complex data-flows. Recently, database service providers have decided to integrate “R-as-a-Service” in their DB solutions. The analytics engine is called directly from the SQL query tree, and results are returned as part of the same query. Here we show a first taste of such technology by testing the portability of our ALOJA-ML analytics framework, coded in R, to Microsoft SQL-Server 2016, one of the SQL+R solutions released recently. In this work we discuss some data-flow schemes for porting a local DB + analytics engine architecture towards Big Data, focusing specially on the new DB Integrated Analytics approach, and commenting the first experiences in usability and performance obtained from such new services and capabilities.Peer ReviewedPostprint (author's final draft

    Big data analytics:Computational intelligence techniques and application areas

    Get PDF
    Big Data has significant impact in developing functional smart cities and supporting modern societies. In this paper, we investigate the importance of Big Data in modern life and economy, and discuss challenges arising from Big Data utilization. Different computational intelligence techniques have been considered as tools for Big Data analytics. We also explore the powerful combination of Big Data and Computational Intelligence (CI) and identify a number of areas, where novel applications in real world smart city problems can be developed by utilizing these powerful tools and techniques. We present a case study for intelligent transportation in the context of a smart city, and a novel data modelling methodology based on a biologically inspired universal generative modelling approach called Hierarchical Spatial-Temporal State Machine (HSTSM). We further discuss various implications of policy, protection, valuation and commercialization related to Big Data, its applications and deployment

    Big Data and the Internet of Things

    Full text link
    Advances in sensing and computing capabilities are making it possible to embed increasing computing power in small devices. This has enabled the sensing devices not just to passively capture data at very high resolution but also to take sophisticated actions in response. Combined with advances in communication, this is resulting in an ecosystem of highly interconnected devices referred to as the Internet of Things - IoT. In conjunction, the advances in machine learning have allowed building models on this ever increasing amounts of data. Consequently, devices all the way from heavy assets such as aircraft engines to wearables such as health monitors can all now not only generate massive amounts of data but can draw back on aggregate analytics to "improve" their performance over time. Big data analytics has been identified as a key enabler for the IoT. In this chapter, we discuss various avenues of the IoT where big data analytics either is already making a significant impact or is on the cusp of doing so. We also discuss social implications and areas of concern.Comment: 33 pages. draft of upcoming book chapter in Japkowicz and Stefanowski (eds.) Big Data Analysis: New algorithms for a new society, Springer Series on Studies in Big Data, to appea

    BigExcel: A Web-Based Framework for Exploring Big Data in Social Sciences

    Get PDF
    This paper argues that there are three fundamental challenges that need to be overcome in order to foster the adoption of big data technologies in non-computer science related disciplines: addressing issues of accessibility of such technologies for non-computer scientists, supporting the ad hoc exploration of large data sets with minimal effort and the availability of lightweight web-based frameworks for quick and easy analytics. In this paper, we address the above three challenges through the development of 'BigExcel', a three tier web-based framework for exploring big data to facilitate the management of user interactions with large data sets, the construction of queries to explore the data set and the management of the infrastructure. The feasibility of BigExcel is demonstrated through two Yahoo Sandbox datasets. The first dataset is the Yahoo Buzz Score data set we use for quantitatively predicting trending technologies and the second is the Yahoo n-gram corpus we use for qualitatively inferring the coverage of important events. A demonstration of the BigExcel framework and source code is available at http://bigdata.cs.st-andrews.ac.uk/projects/bigexcel-exploring-big-data-for-social-sciences/.Comment: 8 page

    Views from the coalface: chemo-sensors, sensor networks and the semantic sensor web

    Get PDF
    Currently millions of sensors are being deployed in sensor networks across the world. These networks generate vast quantities of heterogeneous data across various levels of spatial and temporal granularity. Sensors range from single-point in situ sensors to remote satellite sensors which can cover the globe. The semantic sensor web in principle should allow for the unification of the web with the real-word. In this position paper, we discuss the major challenges to this unification from the perspective of sensor developers (especially chemo-sensors) and integrating sensors data in real-world deployments. These challenges include: (1) identifying the quality of the data; (2) heterogeneity of data sources and data transport methods; (3) integrating data streams from different sources and modalities (esp. contextual information), and (4) pushing intelligence to the sensor level

    Mapping domain characteristics influencing Analytics initiatives: The example of Supply Chain Analytics

    Get PDF
    Purpose: Analytics research is increasingly divided by the domains Analytics is applied to. Literature offers little understanding whether aspects such as success factors, barriers and management of Analytics must be investigated domain-specific, while the execution of Analytics initiatives is similar across domains and similar issues occur. This article investigates characteristics of the execution of Analytics initiatives that are distinct in domains and can guide future research collaboration and focus. The research was conducted on the example of Logistics and Supply Chain Management and the respective domain-specific Analytics subfield of Supply Chain Analytics. The field of Logistics and Supply Chain Management has been recognized as early adopter of Analytics but has retracted to a midfield position comparing different domains. Design/methodology/approach: This research uses Grounded Theory based on 12 semi-structured Interviews creating a map of domain characteristics based of the paradigm scheme of Strauss and Corbin. Findings: A total of 34 characteristics of Analytics initiatives that distinguish domains in the execution of initiatives were identified, which are mapped and explained. As a blueprint for further research, the domain-specifics of Logistics and Supply Chain Management are presented and discussed. Originality/value: The results of this research stimulates cross domain research on Analytics issues and prompt research on the identified characteristics with broader understanding of the impact on Analytics initiatives. The also describe the status-quo of Analytics. Further, results help managers control the environment of initiatives and design more successful initiatives.DFG, 414044773, Open Access Publizieren 2019 - 2020 / Technische UniversitÀt Berli
    • 

    corecore