7,895 research outputs found

    Social media analytics: a survey of techniques, tools and platforms

    Get PDF
    This paper is written for (social science) researchers seeking to analyze the wealth of social media now available. It presents a comprehensive review of software tools for social networking media, wikis, really simple syndication feeds, blogs, newsgroups, chat and news feeds. For completeness, it also includes introductions to social media scraping, storage, data cleaning and sentiment analysis. Although principally a review, the paper also provides a methodology and a critique of social media tools. Analyzing social media, in particular Twitter feeds for sentiment analysis, has become a major research and business activity due to the availability of web-based application programming interfaces (APIs) provided by Twitter, Facebook and News services. This has led to an ‘explosion’ of data services, software tools for scraping and analysis and social media analytics platforms. It is also a research area undergoing rapid change and evolution due to commercial pressures and the potential for using social media data for computational (social science) research. Using a simple taxonomy, this paper provides a review of leading software tools and how to use them to scrape, cleanse and analyze the spectrum of social media. In addition, it discussed the requirement of an experimental computational environment for social media research and presents as an illustration the system architecture of a social media (analytics) platform built by University College London. The principal contribution of this paper is to provide an overview (including code fragments) for scientists seeking to utilize social media scraping and analytics either in their research or business. The data retrieval techniques that are presented in this paper are valid at the time of writing this paper (June 2014), but they are subject to change since social media data scraping APIs are rapidly changing

    Profiling relational data: a survey

    Get PDF
    Profiling data to determine metadata about a given dataset is an important and frequent activity of any IT professional and researcher and is necessary for various use-cases. It encompasses a vast array of methods to examine datasets and produce metadata. Among the simpler results are statistics, such as the number of null values and distinct values in a column, its data type, or the most frequent patterns of its data values. Metadata that are more difficult to compute involve multiple columns, namely correlations, unique column combinations, functional dependencies, and inclusion dependencies. Further techniques detect conditional properties of the dataset at hand. This survey provides a classification of data profiling tasks and comprehensively reviews the state of the art for each class. In addition, we review data profiling tools and systems from research and industry. We conclude with an outlook on the future of data profiling beyond traditional profiling tasks and beyond relational databases

    A Visual Approach to Construction Cost Estimating

    Get PDF
    Construction cost estimating is considered one of the most important and critical phases of a construction project. Preparing reliable and accurate estimates to help decision makers is the most challenging assignment that estimators face. An estimate is not only necessary for proposal preparation but also for several project management functions. Despite the importance of estimating, it has remained a very time consuming process. The most inefficient part of construction cost estimating is determination of the amount of resources needed for the construction of a project. This is also known as quantity takeoff. Quantity takeoff is a very long and error-prone process that is performed manually by estimators. Missing or duplicating work items are among the errors that can occur during the quantity takeoff process. New Parametric CAD software has recently attained widespread attention in the Architectural, Engineering, and Construction (AEC) industry. It represents the development and use of computer-generated models to simulate the planning, design, construction and operation of a facility. It helps architects, engineers, and contractors visualize what is to be built in a simulated environment and to identify potential design, construction or operational problems. The model created from parametric CAD software will significantly increase construction cost estimator productivity by substantially reducing the manual work necessary for performing quantity takeoffs. This study presents a methodology that uses parametric CAD software and visualization technologies to streamline the estimating process. Although this methodology won\u27t totally automate the estimating process, it will help in the following areas: (1) providing a navigable 3D model of the project, (2) simplifying the quantity takeoff process, and (3) eliminating manual calculations and search for data. This study uses visualization technologies to navigate through a 3D CAD model. This would provide the estimator with a tool to improve the understanding of the location and relationships between elements in a model. The quantity takeoff process may be simplified by using properties and geometry information extracted from the 3D CAD model. This study also uses a database technology to store labor, equipment, and material cost data. This helps eliminate manual calculations and enables an estimator to search for data stored in the database. A case study is presented to illustrate the process and capabilities of the developed system

    ETL and analysis of IoT data using OpenTSDB, Kafka, and Spark

    Get PDF
    Master's thesis in Computer scienceThe Internet of Things (IoT) is becoming increasingly prevalent in today's society. Innovations in storage and processing methodologies enable the processing of large amounts of data in a scalable manner, and generation of insights in near real-time. Data from IoT are typically time-series data but they may also have a strong spatial correlation. In addition, many time-series data are deployed in industries that still place the data in inappropriate relational databases. Many open-source time-series databases exist today with inspiring features in terms of storage, analytic representation, and visualization. Finding an efficient method to migrate data into a time-series database is the first objective of the thesis. In recent decades, machine learning has become one of the backbones of data innovation. With the constantly expanding amounts of information available, there is good reason to expect that smart data analysis will become more pervasive as an essential element for innovative progress. Methods for modeling time-series data in machine learning and migrating time-series data from a database to a big data machine learning framework, such as Apache Spark, is explored in this thesis
    corecore