28,096 research outputs found

    Teaching Data Science

    Get PDF
    We describe an introductory data science course, entitled Introduction to Data Science, offered at the University of Illinois at Urbana-Champaign. The course introduced general programming concepts by using the Python programming language with an emphasis on data preparation, processing, and presentation. The course had no prerequisites, and students were not expected to have any programming experience. This introductory course was designed to cover a wide range of topics, from the nature of data, to storage, to visualization, to probability and statistical analysis, to cloud and high performance computing, without becoming overly focused on any one subject. We conclude this article with a discussion of lessons learned and our plans to develop new data science courses.Comment: 10 pages, 4 figures, International Conference on Computational Science (ICCS 2016

    Pay One, Get Hundreds for Free: Reducing Cloud Costs through Shared Query Execution

    Full text link
    Cloud-based data analysis is nowadays common practice because of the lower system management overhead as well as the pay-as-you-go pricing model. The pricing model, however, is not always suitable for query processing as heavy use results in high costs. For example, in query-as-a-service systems, where users are charged per processed byte, collections of queries accessing the same data frequently can become expensive. The problem is compounded by the limited options for the user to optimize query execution when using declarative interfaces such as SQL. In this paper, we show how, without modifying existing systems and without the involvement of the cloud provider, it is possible to significantly reduce the overhead, and hence the cost, of query-as-a-service systems. Our approach is based on query rewriting so that multiple concurrent queries are combined into a single query. Our experiments show the aggregated amount of work done by the shared execution is smaller than in a query-at-a-time approach. Since queries are charged per byte processed, the cost of executing a group of queries is often the same as executing a single one of them. As an example, we demonstrate how the shared execution of the TPC-H benchmark is up to 100x and 16x cheaper in Amazon Athena and Google BigQuery than using a query-at-a-time approach while achieving a higher throughput

    Big Data Privacy Context: Literature Effects On Secure Informational Assets

    Get PDF
    This article's objective is the identification of research opportunities in the current big data privacy domain, evaluating literature effects on secure informational assets. Until now, no study has analyzed such relation. Its results can foster science, technologies and businesses. To achieve these objectives, a big data privacy Systematic Literature Review (SLR) is performed on the main scientific peer reviewed journals in Scopus database. Bibliometrics and text mining analysis complement the SLR. This study provides support to big data privacy researchers on: most and least researched themes, research novelty, most cited works and authors, themes evolution through time and many others. In addition, TOPSIS and VIKOR ranks were developed to evaluate literature effects versus informational assets indicators. Secure Internet Servers (SIS) was chosen as decision criteria. Results show that big data privacy literature is strongly focused on computational aspects. However, individuals, societies, organizations and governments face a technological change that has just started to be investigated, with growing concerns on law and regulation aspects. TOPSIS and VIKOR Ranks differed in several positions and the only consistent country between literature and SIS adoption is the United States. Countries in the lowest ranking positions represent future research opportunities.Comment: 21 pages, 9 figure

    Careering through the Web: the potential of Web 2.0 and 3.0 technologies for career development and career support services

    Get PDF
    This paper examines the environment that the web provides for career exploration. Career practitioners have long seen value in engaging in technology and the opportunities offered by the internet, and this interest continues. However, this paper suggests that the online environment for career exploration is far broader than that provided by public-sector careers services. In addition to these services, there is a wide range of other players including private-sector career consultants, employers, recruitment companies and learning providers who are all contributing to a potentially rich career exploration environment.UKCE

    The future of Earth observation in hydrology

    Get PDF
    In just the past 5 years, the field of Earth observation has progressed beyond the offerings of conventional space-agency-based platforms to include a plethora of sensing opportunities afforded by CubeSats, unmanned aerial vehicles (UAVs), and smartphone technologies that are being embraced by both for-profit companies and individual researchers. Over the previous decades, space agency efforts have brought forth well-known and immensely useful satellites such as the Landsat series and the Gravity Research and Climate Experiment (GRACE) system, with costs typically of the order of 1 billion dollars per satellite and with concept-to-launch timelines of the order of 2 decades (for new missions). More recently, the proliferation of smart-phones has helped to miniaturize sensors and energy requirements, facilitating advances in the use of CubeSats that can be launched by the dozens, while providing ultra-high (3-5 m) resolution sensing of the Earth on a daily basis. Start-up companies that did not exist a decade ago now operate more satellites in orbit than any space agency, and at costs that are a mere fraction of traditional satellite missions. With these advances come new space-borne measurements, such as real-time high-definition video for tracking air pollution, storm-cell development, flood propagation, precipitation monitoring, or even for constructing digital surfaces using structure-from-motion techniques. Closer to the surface, measurements from small unmanned drones and tethered balloons have mapped snow depths, floods, and estimated evaporation at sub-metre resolutions, pushing back on spatio-temporal constraints and delivering new process insights. At ground level, precipitation has been measured using signal attenuation between antennae mounted on cell phone towers, while the proliferation of mobile devices has enabled citizen scientists to catalogue photos of environmental conditions, estimate daily average temperatures from battery state, and sense other hydrologically important variables such as channel depths using commercially available wireless devices. Global internet access is being pursued via high-altitude balloons, solar planes, and hundreds of planned satellite launches, providing a means to exploit the "internet of things" as an entirely new measurement domain. Such global access will enable real-time collection of data from billions of smartphones or from remote research platforms. This future will produce petabytes of data that can only be accessed via cloud storage and will require new analytical approaches to interpret. The extent to which today's hydrologic models can usefully ingest such massive data volumes is unclear. Nor is it clear whether this deluge of data will be usefully exploited, either because the measurements are superfluous, inconsistent, not accurate enough, or simply because we lack the capacity to process and analyse them. What is apparent is that the tools and techniques afforded by this array of novel and game-changing sensing platforms present our community with a unique opportunity to develop new insights that advance fundamental aspects of the hydrological sciences. To accomplish this will require more than just an application of the technology: in some cases, it will demand a radical rethink on how we utilize and exploit these new observing systems
    corecore