1,628 research outputs found

    Streaming Infrastructure and Natural Language Modeling with Application to Streaming Big Data

    Get PDF
    Streaming data are produced in great velocity and diverse variety. The vision of this research is to build an end-to-end system that handles the collection, curation and analysis of streaming data. The streaming data used in this thesis contain both numeric type data and text type data. First, in the field of data collection, we design and evaluate a data delivery framework that handles the real-time nature of streaming data. In this component, we use streaming data in automotive domain since it is suitable for testing and evaluating our data delivery system. Secondly, in the field of data curation, we use a language model to analyze two online automotive forums as an example for streaming text data curation. Last but not least, we present our approach for automated query expansion on Twitter data as an example of streaming social media data analysis. This thesis provides a holistic view of the end-to-end system we have designed, built and analyzed. To study the streaming data in automotive domain, a complex and massive amount of data is being collected from on-board sensors of operational connected vehicles (CVs), infrastructure data sources such as roadway sensors and traffic signals, mobile data sources such as cell phones, social media sources such as Twitter, and news and weather data services. Unfortunately, these data create a bottleneck at data centers for processing and retrievals of collected data, and require the deployment of additional message transfer infrastructure between data producers and consumers to support diverse CV applications. The first part of this dissertation, we present a strategy for creating an efficient and low-latency distributed message delivery system for CV systems using a distributed message delivery platform. This strategy enables large-scale ingestion, curation, and transformation of unstructured data (roadway traffic-related and roadway non-traffic-related data) into labeled and customized topics for a large number of subscribers or consumers, such as CVs, mobile devices, and data centers. We evaluate the performance of this strategy by developing a prototype infrastructure using Apache Kafka, an open source message delivery system, and compared its performance with the latency requirements of CV applications. We present experimental results of the message delivery infrastructure on two different distributed computing testbeds at Clemson University. Experiments were performed to measure the latency of the message delivery system for a variety of testing scenarios. These experiments reveal that measured latencies are less than the U.S. Department of Transportation recommended latency requirements for CV applications, which provides evidence that the system is capable for managing CV related data distribution tasks. Human-generated streaming data are large in volume and noisy in content. Direct acquisition of the full scope of human-generated data is often ineffective. In our research, we try to find an alternative resource to study such data. Common Crawl is a massive multi-petabyte dataset hosted by Amazon. It contains archived HTML web page data from 2008 to date. Common Crawl has been widely used for text mining purposes. Using data extracted from Common Crawl has several advantages over a direct crawl of web data, among which is removing the likelihood of a user\u27s home IP address becoming blacklisted for accessing a given web site too frequently. However, Common Crawl is a data sample, and so questions arise about the quality of Common Crawl as a representative sample of the original data. We perform systematic tests on the similarity of topics estimated from Common Crawl compared to topics estimated from the full data of online forums. Our target is online discussions from a user forum for car enthusiasts, but our research strategy can be applied to other domains and samples to evaluate the representativeness of topic models. We show that topic proportions estimated from Common Crawl are not significantly different than those estimated on the full data. We also show that topics are similar in terms of their word compositions, and not worse than topic similarity estimated under true random sampling, which we simulate through a series of experiments. Our research will be of interest to analysts who wish to use Common Crawl to study topics of interest in user forum data, and analysts applying topic models to other data samples. Twitter data is another example of high-velocity streaming data. We use it as an example to study the query expansion application in streaming social media data analysis. Query expansion is a problem concerned with gathering more relevant documents from a given set that cover a certain topic. Here in this thesis we outline a number of tools for a query expansion system that will allow its user to gather more relevant documents (in this case, tweets from the Twitter social media system), while discriminating from irrelevant documents. These tools include a method for triggering a given query expansion using a Jaccard similarity threshold between keywords, and a query expansion method using archived news reports to create a vector space of novel keywords. As the nature of streaming data, Twitter stream contains emerging events that are constantly changing and therefore not predictable using static queries. Since keywords used in static query method often mismatch the words used in topics around emerging events. To solve this problem, our proposed approach of automated query expansion detects the emerging events in the first place. Then we combine both local analysis and global analysis methods to generate queries for capturing the emerging topics. Experiment results show that by combining the global analysis and local analysis method, our approach can capture the semantic information in the emerging events with high efficiency

    Wireless communication, identification and sensing technologies enabling integrated logistics: a study in the harbor environment

    Get PDF
    In the last decade, integrated logistics has become an important challenge in the development of wireless communication, identification and sensing technology, due to the growing complexity of logistics processes and the increasing demand for adapting systems to new requirements. The advancement of wireless technology provides a wide range of options for the maritime container terminals. Electronic devices employed in container terminals reduce the manual effort, facilitating timely information flow and enhancing control and quality of service and decision made. In this paper, we examine the technology that can be used to support integration in harbor's logistics. In the literature, most systems have been developed to address specific needs of particular harbors, but a systematic study is missing. The purpose is to provide an overview to the reader about which technology of integrated logistics can be implemented and what remains to be addressed in the future

    Introducing the new paradigm of Social Dispersed Computing: Applications, Technologies and Challenges

    Full text link
    [EN] If last decade viewed computational services as a utility then surely this decade has transformed computation into a commodity. Computation is now progressively integrated into the physical networks in a seamless way that enables cyber-physical systems (CPS) and the Internet of Things (IoT) meet their latency requirements. Similar to the concept of ¿platform as a service¿ or ¿software as a service¿, both cloudlets and fog computing have found their own use cases. Edge devices (that we call end or user devices for disambiguation) play the role of personal computers, dedicated to a user and to a set of correlated applications. In this new scenario, the boundaries between the network node, the sensor, and the actuator are blurring, driven primarily by the computation power of IoT nodes like single board computers and the smartphones. The bigger data generated in this type of networks needs clever, scalable, and possibly decentralized computing solutions that can scale independently as required. Any node can be seen as part of a graph, with the capacity to serve as a computing or network router node, or both. Complex applications can possibly be distributed over this graph or network of nodes to improve the overall performance like the amount of data processed over time. In this paper, we identify this new computing paradigm that we call Social Dispersed Computing, analyzing key themes in it that includes a new outlook on its relation to agent based applications. We architect this new paradigm by providing supportive application examples that include next generation electrical energy distribution networks, next generation mobility services for transportation, and applications for distributed analysis and identification of non-recurring traffic congestion in cities. The paper analyzes the existing computing paradigms (e.g., cloud, fog, edge, mobile edge, social, etc.), solving the ambiguity of their definitions; and analyzes and discusses the relevant foundational software technologies, the remaining challenges, and research opportunities.Garcia Valls, MS.; Dubey, A.; Botti, V. (2018). Introducing the new paradigm of Social Dispersed Computing: Applications, Technologies and Challenges. Journal of Systems Architecture. 91:83-102. https://doi.org/10.1016/j.sysarc.2018.05.007S831029

    Situation inference and context recognition for intelligent mobile sensing applications

    Get PDF
    The usage of smart devices is an integral element in our daily life. With the richness of data streaming from sensors embedded in these smart devices, the applications of ubiquitous computing are limitless for future intelligent systems. Situation inference is a non-trivial issue in the domain of ubiquitous computing research due to the challenges of mobile sensing in unrestricted environments. There are various advantages to having robust and intelligent situation inference from data streamed by mobile sensors. For instance, we would be able to gain a deeper understanding of human behaviours in certain situations via a mobile sensing paradigm. It can then be used to recommend resources or actions for enhanced cognitive augmentation, such as improved productivity and better human decision making. Sensor data can be streamed continuously from heterogeneous sources with different frequencies in a pervasive sensing environment (e.g., smart home). It is difficult and time-consuming to build a model that is capable of recognising multiple activities. These activities can be performed simultaneously with different granularities. We investigate the separability aspect of multiple activities in time-series data and develop OPTWIN as a technique to determine the optimal time window size to be used in a segmentation process. As a result, this novel technique reduces need for sensitivity analysis, which is an inherently time consuming task. To achieve an effective outcome, OPTWIN leverages multi-objective optimisation by minimising the impurity (the number of overlapped windows of human activity labels on one label space over time series data) while maximising class separability. The next issue is to effectively model and recognise multiple activities based on the user's contexts. Hence, an intelligent system should address the problem of multi-activity and context recognition prior to the situation inference process in mobile sensing applications. The performance of simultaneous recognition of human activities and contexts can be easily affected by the choices of modelling approaches to build an intelligent model. We investigate the associations of these activities and contexts at multiple levels of mobile sensing perspectives to reveal the dependency property in multi-context recognition problem. We design a Mobile Context Recognition System, which incorporates a Context-based Activity Recognition (CBAR) modelling approach to produce effective outcome from both multi-stage and multi-target inference processes to recognise human activities and their contexts simultaneously. Upon our empirical evaluation on real-world datasets, the CBAR modelling approach has significantly improved the overall accuracy of simultaneous inference on transportation mode and human activity of mobile users. The accuracy of activity and context recognition can also be influenced progressively by how reliable user annotations are. Essentially, reliable user annotation is required for activity and context recognition. These annotations are usually acquired during data capture in the world. We research the needs of reducing user burden effectively during mobile sensor data collection, through experience sampling of these annotations in-the-wild. To this end, we design CoAct-nnotate --- a technique that aims to improve the sampling of human activities and contexts by providing accurate annotation prediction and facilitates interactive user feedback acquisition for ubiquitous sensing. CoAct-nnotate incorporates a novel multi-view multi-instance learning mechanism to perform more accurate annotation prediction. It also includes a progressive learning process (i.e., model retraining based on co-training and active learning) to improve its predictive performance over time. Moving beyond context recognition of mobile users, human activities can be related to essential tasks that the users perform in daily life. Conversely, the boundaries between the types of tasks are inherently difficult to establish, as they can be defined differently from the individuals' perspectives. Consequently, we investigate the implication of contextual signals for user tasks in mobile sensing applications. To define the boundary of tasks and hence recognise them, we incorporate such situation inference process (i.e., task recognition) into the proposed Intelligent Task Recognition (ITR) framework to learn users' Cyber-Physical-Social activities from their mobile sensing data. By recognising the engaged tasks accurately at a given time via mobile sensing, an intelligent system can then offer proactive supports to its user to progress and complete their tasks. Finally, for robust and effective learning of mobile sensing data from heterogeneous sources (e.g., Internet-of-Things in a mobile crowdsensing scenario), we investigate the utility of sensor data in provisioning their storage and design QDaS --- an application agnostic framework for quality-driven data summarisation. This allows an effective data summarisation by performing density-based clustering on multivariate time series data from a selected source (i.e., data provider). Thus, the source selection process is determined by the measure of data quality. Nevertheless, this framework allows intelligent systems to retain comparable predictive results by its effective learning on the compact representations of mobile sensing data, while having a higher space saving ratio. This thesis contains novel contributions in terms of the techniques that can be employed for mobile situation inference and context recognition, especially in the domain of ubiquitous computing and intelligent assistive technologies. This research implements and extends the capabilities of machine learning techniques to solve real-world problems on multi-context recognition, mobile data summarisation and situation inference from mobile sensing. We firmly believe that the contributions in this research will help the future study to move forward in building more intelligent systems and applications

    IoT-Enabled Smart Cities: A Review of Concepts, Frameworks and Key Technologies

    Get PDF
    In recent years, smart cities have been significantly developed and have greatly expanded their potential. In fact, novel advancements to the Internet of things (IoT) have paved the way for new possibilities, representing a set of key enabling technologies for smart cities and allowing the production and automation of innovative services and advanced applications for the different city stakeholders. This paper presents a review of the research literature on IoT-enabled smart cities, with the aim of highlighting the main trends and open challenges of adopting IoT technologies for the development of sustainable and efficient smart cities. This work first provides a survey on the key technologies proposed in the literature for the implementation of IoT frameworks, and then a review of the main smart city approaches and frameworks, based on classification into eight domains, which extends the traditional six domain classification that is typically adopted in most of the related works

    A survey on smart grid communication infrastructures: Motivations, requirements and challenges

    Get PDF
    A communication infrastructure is an essential part to the success of the emerging smart grid. A scalable and pervasive communication infrastructure is crucial in both construction and operation of a smart grid. In this paper, we present the background and motivation of communication infrastructures in smart grid systems. We also summarize major requirements that smart grid communications must meet. From the experience of several industrial trials on smart grid with communication infrastructures, we expect that the traditional carbon fuel based power plants can cooperate with emerging distributed renewable energy such as wind, solar, etc, to reduce the carbon fuel consumption and consequent green house gas such as carbon dioxide emission. The consumers can minimize their expense on energy by adjusting their intelligent home appliance operations to avoid the peak hours and utilize the renewable energy instead. We further explore the challenges for a communication infrastructure as the part of a complex smart grid system. Since a smart grid system might have over millions of consumers and devices, the demand of its reliability and security is extremely critical. Through a communication infrastructure, a smart grid can improve power reliability and quality to eliminate electricity blackout. Security is a challenging issue since the on-going smart grid systems facing increasing vulnerabilities as more and more automation, remote monitoring/controlling and supervision entities are interconnected. © 1998-2012 IEEE

    SNAP : A Software-Defined & Named-Data Oriented Publish-Subscribe Framework for Emerging Wireless Application Systems

    Get PDF
    The evolution of Cyber-Physical Systems (CPSs) has given rise to an emergent class of CPSs defined by ad-hoc wireless connectivity, mobility, and resource constraints in computation, memory, communications, and battery power. These systems are expected to fulfill essential roles in critical infrastructure sectors. Vehicular Ad-Hoc Network (VANET) and a swarm of Unmanned Aerial Vehicles (UAV swarm) are examples of such systems. The significant utility of these systems, coupled with their economic viability, is a crucial indicator of their anticipated growth in the future. Typically, the tasks assigned to these systems have strict Quality-of-Service (QoS) requirements and require sensing, perception, and analysis of a substantial amount of data. To fulfill these QoS requirements, the system requires network connectivity, data dissemination, and data analysis methods that can operate well within a system\u27s limitations. Traditional Internet protocols and methods for network connectivity and data dissemination are typically designed for well-engineering cyber systems and do not comprehensively support this new breed of emerging systems. The imminent growth of these CPSs presents an opportunity to develop broadly applicable methods that can meet the stated system requirements for a diverse range of systems and integrate these systems with the Internet. These methods could potentially be standardized to achieve interoperability among various systems of the future. This work presents a solution that can fulfill the communication and data dissemination requirements of a broad class of emergent CPSs. The two main contributions of this work are the Application System (APPSYS) system abstraction, and a complementary communications framework called the Software-Defined NAmed-data enabled Publish-Subscribe (SNAP) communication framework. An APPSYS is a new breed of Internet application representing the mobile and resource-constrained CPSs supporting data-intensive and QoS-sensitive safety-critical tasks, referred to as the APPSYS\u27s mission. The functioning of the APPSYS is closely aligned with the needs of the mission. The standard APPSYS architecture is distributed and partitions the system into multiple clusters where each cluster is a hierarchical sub-network. The SNAP communication framework within the APPSYS utilized principles of Information-Centric Networking (ICN) through the publish-subscribe communication paradigm. It further extends the role of brokers within the publish-subscribe paradigm to create a distributed software-defined control plane. The SNAP framework leverages the APPSYS design characteristics to provide flexible and robust communication and dynamic and distributed control-plane decision-making that successfully allows the APPSYS to meet the communication requirements of data-oriented and QoS-sensitive missions. In this work, we present the design, implementation, and performance evaluation of an APPSYS through an exemplar UAV swarm APPSYS. We evaluate the benefits offered by the APPSYS design and the SNAP communication framework in meeting the dynamically changed requirements of a data-intensive and QoS-sensitive Coordinated Search and Tracking (CSAT) mission operating in a UAV swarm APPSYS on the battlefield. Results from the performance evaluation demonstrate that the UAV swarm APPSYS successfully monitors and mitigates network impairment impacting a mission\u27s QoS to support the mission\u27s QoS requirements

    Proceedings, MSVSCC 2012

    Get PDF
    Proceedings of the 6th Annual Modeling, Simulation & Visualization Student Capstone Conference held on April 19, 2012 at VMASC in Suffolk, Virginia
    corecore