1,859 research outputs found

    From Social Data Mining to Forecasting Socio-Economic Crisis

    Full text link
    Socio-economic data mining has a great potential in terms of gaining a better understanding of problems that our economy and society are facing, such as financial instability, shortages of resources, or conflicts. Without large-scale data mining, progress in these areas seems hard or impossible. Therefore, a suitable, distributed data mining infrastructure and research centers should be built in Europe. It also appears appropriate to build a network of Crisis Observatories. They can be imagined as laboratories devoted to the gathering and processing of enormous volumes of data on both natural systems such as the Earth and its ecosystem, as well as on human techno-socio-economic systems, so as to gain early warnings of impending events. Reality mining provides the chance to adapt more quickly and more accurately to changing situations. Further opportunities arise by individually customized services, which however should be provided in a privacy-respecting way. This requires the development of novel ICT (such as a self- organizing Web), but most likely new legal regulations and suitable institutions as well. As long as such regulations are lacking on a world-wide scale, it is in the public interest that scientists explore what can be done with the huge data available. Big data do have the potential to change or even threaten democratic societies. The same applies to sudden and large-scale failures of ICT systems. Therefore, dealing with data must be done with a large degree of responsibility and care. Self-interests of individuals, companies or institutions have limits, where the public interest is affected, and public interest is not a sufficient justification to violate human rights of individuals. Privacy is a high good, as confidentiality is, and damaging it would have serious side effects for society.Comment: 65 pages, 1 figure, Visioneer White Paper, see http://www.visioneer.ethz.c

    Combating Fake News on Social Media: A Framework, Review, and Future Opportunities

    Get PDF
    Social media platforms facilitate the sharing of a vast magnitude of information in split seconds among users. However, some false information is also widely spread, generally referred to as “fake news”. This can have major negative impacts on individuals and societies. Unfortunately, people are often not able to correctly identify fake news from truth. Therefore, there is an urgent need to find effective mechanisms to fight fake news on social media. To this end, this paper adapts the Straub Model of Security Action Cycle to the context of combating fake news on social media. It uses the adapted framework to classify the vast literature on fake news to action cycle phases (i.e., deterrence, prevention, detection, and mitigation/remedy). Based on a systematic and inter-disciplinary review of the relevant literature, we analyze the status and challenges in each stage of combating fake news, followed by introducing future research directions. These efforts allow the development of a holistic view of the research frontier on fighting fake news online. We conclude that this is a multidisciplinary issue; and as such, a collaborative effort from different fields is needed to effectively address this problem

    Innovative big data integrationand analysis techniques for urban hazard management

    Get PDF
    PhD ThesisModern early warning systems (EWS) require sophisticated knowledge of natural hazards, the urban context and underlying risk factors to enable dynamic and timely decision making (e.g., hazard detection, hazard preparedness). Landslides are a common form of natural hazard with a global impact and are closely linked to a variety of other hazards. EWS for landslide prediction and detection relies on scienti c methods and models which require input from the time-series data, such as the earth observation (EO) and ancillary data. Such data sets are produced by a variety of remote sensing satellites and Internet of Things sensors which are deployed in landslide-prone areas. Besides, social media-based time-series data has played a signi cant role in modern disaster management. The emergence of social media has led to the possibility of the general public contributing to the monitoring of natural hazard by reporting incidents related to hazard events. To this end, the data integration and analysis of potential time-series data sources in EWS applications have become a challenge due to the complexity and high variety of data sources. Moreover, sophisticated domain knowledge of natural hazards and risk management are also required to enable dynamic and timely decision making about serious hazards. In this thesis, a comprehensive set of algorithmic techniques for managing high varieties of time series data from heterogeneous data sources is investigated. A novel ontology, namely Landslip Ontology, is proposed to provide a knowledge base that establishes the relationship between landslide hazard and EO and ancillary data sources to support data integration for EWS applications. Moreover, an ontology-based data integration and analytics system that includes human in the loop of hazard information acquisition from social media is proposed to establish a deeper and more accurate situational awareness of hazard events. Finally, the system is extended to enable an interaction between natural hazard EWS and electrical grid EWS to contribute to electrical grid network monitoring and support decision-making for electrical grid infrastructure management

    Cluster Analysis of Twitter Data: A Review of Algorithms

    Get PDF
    Twitter, a microblogging online social network (OSN), has quickly gained prominence as it provides people with the opportunity to communicate and share posts and topics. Tremendous value lies in automated analysing and reasoning about such data in order to derive meaningful insights, which carries potential opportunities for businesses, users, and consumers. However, the sheer volume, noise, and dynamism of Twitter, imposes challenges that hinder the efficacy of observing clusters with high intra-cluster (i.e. minimum variance) and low inter-cluster similarities. This review focuses on research that has used various clustering algorithms to analyse Twitter data streams and identify hidden patterns in tweets where text is highly unstructured. This paper performs a comparative analysis on approaches of unsupervised learning in order to determine whether empirical findings support the enhancement of decision support and pattern recognition applications. A review of the literature identified 13 studies that implemented different clustering methods. A comparison including clustering methods, algorithms, number of clusters, dataset(s) size, distance measure, clustering features, evaluation methods, and results was conducted. The conclusion reports that the use of unsupervised learning in mining social media data has several weaknesses. Success criteria and future directions for research and practice to the research community are discussed

    Development of a National-Scale Big Data Analytics Pipeline to Study the Potential Impacts of Flooding on Critical Infrastructures and Communities

    Get PDF
    With the rapid development of the Internet of Things (IoT) and Big data infrastructure, crowdsourcing techniques have emerged to facilitate data processing and problem solving particularly for flood emergences purposes. A Flood Analytics Information System (FAIS) has been developed as a Python Web application to gather Big data from multiple servers and analyze flooding impacts during historical and real-time events. The application is smartly designed to integrate crowd intelligence, machine learning (ML), and natural language processing of tweets to provide flood warning with the aim to improve situational awareness for flood risk management and decision making. FAIS allows the user to submit search request from the United States Geological Survey (USGS) as well as Twitter through a series of queries, which is used to modify request URL sent to data sources. This national scale prototype combines flood peak rates and river level information with geotagged tweets to identify a dynamic set of at-risk locations to flooding. The list of prioritized areas can be updated every 15 minutes as the crowdsourced data and environmental information and condition change. In addition, FAIS uses Google Vision API (application programming interface) and image processing algorithms to detect objects (flood, road, vehicle, river, etc.) in time-lapse digital images and build valuable metadata into image catalog. The application performs Flood Frequency Analysis (FFA) and computes design flow values corresponding to specific return periods that can help engineers in designing safe structures and in protection against economic losses due to maintenance of civil infrastructure. FAIS is successfully tested in real-time during Hurricane Dorian flooding event across the Carolinas where the storm made extensive damage and disruption to critical infrastructure and the environment. The prototype is also verified during historical events such as Hurricanes Matthew and Florence flooding for the Lower PeeDee Basin in the Carolinas

    An investigation into the role of crowdsourcing in generating information for flood risk management

    Get PDF
    Flooding is a major global hazard whose management relies on an accurate understanding of its risks. Crowdsourcing represents a major opportunity for supporting flood risk management as members of the public are highly capable of producing useful flood information. This thesis explores a wide range of issues related to flood crowdsourcing using an interdisciplinary approach. Through an examination of 31 different projects a flood crowdsourcing typology was developed. This identified five key types of flood crowdsourcing: i) Incident Reporting, ii) Media Engagement, iii) Collaborative Mapping, iv) Online Volunteering and v) Passive VGI. These represent a wide range of initiatives with radically different aims, objectives, datasets and relationships with volunteers. Online Volunteering was explored in greater detail using Tomnod as a case study. This is a micro-tasking platform in which volunteers analyse satellite imagery to support disaster response. Volunteer motivations for participating on Tomnod were found to be largely altruistic. Demographics of participants were significant, with retirement, disability or long-term health problems identified as major drivers for participation. Many participants emphasised that effective communication between volunteers and the site owner is strongly linked to their appreciation of the platform. In addition, the feedback on the quality and impact of their contributions was found to be crucial in maintaining interest. Through an examination of their contributions, volunteers were found to be able to ascertain with a higher degree of accuracy, many features in satellite imagery which supervised image classification struggled to identify. This was more pronounced in poorer quality imagery where image classification had a very low accuracy. However, supervised classification was found to be far more systematic and succeeded in identifying impacts in many regions which were missed by volunteers. The efficacy of using crowdsourcing for flood risk management was explored further through the iterative development of a Collaborative Mapping web-platform called Floodcrowd. Through interviews and focus groups, stakeholders from the public and private sector expressed an interest in crowdsourcing as a tool for supporting flood risk management. Types of data which stakeholders are particularly interested in with regards to crowdsourcing differ between organisations. Yet, they typically include flood depths, photos, timeframes of events and historical background information. Through engagement activities, many citizens were found to be able and motivated to share such observations. Yet, motivations were strongly affected by the level of attention their contributions receive from authorities. This presents many opportunities as well as challenges for ensuring that the future of flood crowdsourcing improves flood risk management and does not damage stakeholder relationships with participants

    Reinventing the Social Scientist and Humanist in the Era of Big Data

    Get PDF
    This book explores the big data evolution by interrogating the notion that big data is a disruptive innovation that appears to be challenging existing epistemologies in the humanities and social sciences. Exploring various (controversial) facets of big data such as ethics, data power, and data justice, the book attempts to clarify the trajectory of the epistemology of (big) data-driven science in the humanities and social sciences
    • …
    corecore