1,975 research outputs found
When Things Matter: A Data-Centric View of the Internet of Things
With the recent advances in radio-frequency identification (RFID), low-cost
wireless sensor devices, and Web technologies, the Internet of Things (IoT)
approach has gained momentum in connecting everyday objects to the Internet and
facilitating machine-to-human and machine-to-machine communication with the
physical world. While IoT offers the capability to connect and integrate both
digital and physical entities, enabling a whole new class of applications and
services, several significant challenges need to be addressed before these
applications and services can be fully realized. A fundamental challenge
centers around managing IoT data, typically produced in dynamic and volatile
environments, which is not only extremely large in scale and volume, but also
noisy, and continuous. This article surveys the main techniques and
state-of-the-art research efforts in IoT from data-centric perspectives,
including data stream processing, data storage models, complex event
processing, and searching in IoT. Open research issues for IoT data management
are also discussed
Carbon Capture; Transport and Storage in Europe: A Problematic Energy Bridge to Nowhere?
This paper is a follow up of the SECURE-project, financed by the European Commission to study “Security of Energy Considering its Uncertainties, Risks and Economic Implications”. It addresses the perspectives of, and the obstacles to a CCTS-roll out, as stipulated in some of the scenarios. Our main hypothesis is that given the substantial technical and institutional uncertainties, the lack of a clear political commitment, and the available alternatives of low-carbon technologies, CCTS is unlikely to play an important role in the future energy mix; it is even less likely to be an “energy bridge” into a low-carbon energy futureCarbon Capture, Transport, Storage
Scaling up Labeling, Mining, and Inferencing on Event Extraction
Numerous important events happen every day and are reported in different media sources with varying narrative styles across different knowledge domains and languages. Detecting the real-world events that have been reported from online articles and posts is one of the main tasks in event extraction. Other tasks include identifying event triggers and trigger types, identifying event arguments and argument types, clustering and tracking similar events from different texts, event prediction, and event evolution. As one of the most important research themes in natural language processing and understanding, event extraction has wide applications in diverse domains and has been intensively researched for decades. This work targets a scaling-up of End-to-End event extraction task through three ways. First, scaling up the event labeling process to different languages and domains. We designed and implemented four approaches to accurately and efficiently produce multi-lingual labels for events. Using the approaches we developed, we were able to complete Arabic actor and verb dictionaries with coverage equivalent to English in less than two years of work, compared to two decades for English dictionary development. Second, scaling up event extraction by using the document topics information in a topic-aware deep learning framework. We propose a domain-aware event extraction method by using the topic name embeddings to enrich the sentences' contextual representations and multi-task setup of event extraction and topic classification task. With the topic-aware model we developed, we were able to improve F1 by 1.8% on all event types, and F1 by 13.34% on few-shot event types. Third, scaling up event extraction by designing containerized and efficient pipelines, which researchers can comfortably adopt. The pipeline has a container-based architecture that adapts to the available systems and load to process text. With the Kalman filter based batch size optimization, we were able to achieve 20.33% improvement on processing time compared to static batch size. Using the pipeline we developed, we were able to publish largest machine-coded political event dataset covering 1979 to 2016 (2TB, 300 million documents)
Giving RSEs a Larger Stage through the Better Scientific Software Fellowship
The Better Scientific Software Fellowship (BSSwF) was launched in 2018 to
foster and promote practices, processes, and tools to improve developer
productivity and software sustainability of scientific codes. BSSwF's vision is
to grow the community with practitioners, leaders, mentors, and consultants to
increase the visibility of scientific software production and sustainability.
Over the last five years, many fellowship recipients and honorable mentions
have identified as research software engineers (RSEs). This paper provides case
studies from several of the program's participants to illustrate some of the
diverse ways BSSwF has benefited both the RSE and scientific communities. In an
environment where the contributions of RSEs are too often undervalued, we
believe that programs such as BSSwF can be a valuable means to recognize and
encourage community members to step outside of their regular commitments and
expand on their work, collaborations and ideas for a larger audience.Comment: submitted to Computing in Science & Engineering (CiSE), Special Issue
on the Future of Research Software Engineers in the U
Recommended from our members
Robust Algorithms for Clustering with Applications to Data Integration
A growing number of data-based applications are used for decision-making that have far-reaching consequences and significant societal impact. Entity resolution, community detection and taxonomy construction are some of the building blocks of these applications and for these methods, clustering is the fundamental underlying concept. Therefore, the use of accurate, robust and scalable methods for clustering cannot be overstated. We tackle the various facets of clustering with a multi-pronged approach described below.
1. While identification of clusters that refer to different entities is challenging for automated strategies, it is relatively easy for humans. We study the robustness of clustering methods that leverage supervision through an oracle i.e an abstraction of crowdsourcing. Additionally, we focus on scalability to handle web-scale datasets.
2. In community detection applications, a common setback in evaluation of the quality of clustering techniques is the lack of ground truth data. We propose a generative model that considers dependent edge formation and devise techniques for efficient cluster recovery
Introducing distributed dynamic data-intensive (D3) science: Understanding applications and infrastructure
A common feature across many science and engineering applications is the
amount and diversity of data and computation that must be integrated to yield
insights. Data sets are growing larger and becoming distributed; and their
location, availability and properties are often time-dependent. Collectively,
these characteristics give rise to dynamic distributed data-intensive
applications. While "static" data applications have received significant
attention, the characteristics, requirements, and software systems for the
analysis of large volumes of dynamic, distributed data, and data-intensive
applications have received relatively less attention. This paper surveys
several representative dynamic distributed data-intensive application
scenarios, provides a common conceptual framework to understand them, and
examines the infrastructure used in support of applications.Comment: 38 pages, 2 figure
Risky Business: The Economic Risks of Climate Change in the United States
The American economy could face significant and widespread disruptions from climate change unless U.S. businesses and policymakers take immediate action to reduce climate risk. This report summarizes findings of an independent assessment of the impact of climate change at the county, state, and regional level, and shows that communities, industries, and properties across the U.S. face profound risks from climate change. The findings also show that the most severe risks can still be avoided through early investments in resilience, and through immediate action to reduce the pollution that causes global warming. The Risky Business report shows that two of the primary impacts of climate change -- extreme heat and sea level rise -- will disproportionately affect certain regions of the U.S., and pose highly variable risks across the nation. In the U.S. Gulf Coast, Northeast, and Southeast, for example, sea level rise and increased damage from storm surge are likely to lead to an additional 3.5 billion in property losses each year by 2030, with escalating costs in future decades. In interior states in the Midwest and Southwest, extreme heat will threaten human health, reduce labor productivity and strain electricity grids. Conversely in northern latitudes such as North Dakota and Montana, winter temperatures will likely rise, reducing frost events and cold-related deaths, and lengthening the growing season for some crops. The report is a product of The Risky Business Project a joint, non-partisan initiative of former Treasury Secretary Henry M. Paulson, Jr., Mayor of New York City from 2002-2013 Michael R. Bloomberg, and Thomas P. Steyer, former Senior Managing Member of Farallon Capital Management. They were joined by members of a high-level "Risk Committee" who helped scope the research and reviewed the research findings
The Analysis of Big Data on Cites and Regions - Some Computational and Statistical Challenges
Big Data on cities and regions bring new opportunities and challenges to data analysts and city planners. On the one side, they hold great promise to combine increasingly detailed data for each citizen with critical infrastructures to plan, govern and manage cities and regions, improve their sustainability, optimize processes and maximize the provision of public and private services. On the other side, the massive sample size and high-dimensionality of Big Data and their geo-temporal character introduce unique computational and statistical challenges. This chapter provides overviews on the salient characteristics of Big Data and how these features impact on paradigm change of data management and analysis, and also on the computing environment.Series: Working Papers in Regional Scienc
Analysis domain model for shared virtual environments
The field of shared virtual environments, which also
encompasses online games and social 3D environments, has a
system landscape consisting of multiple solutions that share great functional overlap. However, there is little system interoperability between the different solutions. A shared virtual environment has an associated problem domain that is highly complex raising difficult challenges to the development process, starting with the architectural design of the underlying system. This paper has two main contributions. The first contribution is a broad domain analysis of shared virtual environments, which enables developers to have a better understanding of the whole rather than the part(s). The second contribution is a reference domain model for discussing and describing solutions - the Analysis Domain Model
- …