1,934 research outputs found
Recommended from our members
Social Measurement and Causal Inference with Text
The digital age has dramatically increased access to large-scale collections of digitized text documents. These corpora include, for example, digital traces from social media, decades of archived news reports, and transcripts of spoken interactions in political, legal, and economic spheres. For social scientists, this new widespread data availability has potential for improved quantitative analysis of relationships between language use and human thought, actions, and societal structure. However, the large-scale nature of these collections means that traditional manual approaches to analyzing content are extremely costly and do not scale. Furthermore, incorporating unstructured text data into quantitative analysis is difficult due to texts’ high-dimensional nature and linguistic complexity.
This thesis blends (a) the computational strengths of natural language processing (NLP) and machine learning to automate and scale-up quantitative text analysis with (b) two themes central to social scientific studies but often under-addressed in NLP: measurement—creating quantifiable summaries of empirical phenomena—and causal inference—estimating the effects of interventions. First, we address measuring class prevalence in document collections; we contribute a generative probabilistic modeling approach to prevalence estimation and show empirically that our model is more robust to shifts in class priors between training and inference. Second, we examine cross- document entity-event measurement; we contribute an empirical pipeline and a novel latent disjunction model to identify the names of civilians killed by police from our corpus of web-scraped news reports. Third, we gather and categorize applications that use text to reduce confounding from causal estimates and contribute a list of open problems as well as guidance about data processing and evaluation decisions in this area. Finally, we contribute a new causal research design to estimate the natural indirect and direct effects of social group signals (e.g. race or gender) on conversational outcomes with separate aspects of language as causal mediators; this chapter is motivated by a theoretical case study of U.S. Supreme Court oral arguments and the effect of an advocate’s gender on interruptions from justices. We conclude by discussing the relationship between measurement and causal inference with text and future work at this intersection
Transforming Graph Representations for Statistical Relational Learning
Relational data representations have become an increasingly important topic
due to the recent proliferation of network datasets (e.g., social, biological,
information networks) and a corresponding increase in the application of
statistical relational learning (SRL) algorithms to these domains. In this
article, we examine a range of representation issues for graph-based relational
data. Since the choice of relational data representation for the nodes, links,
and features can dramatically affect the capabilities of SRL algorithms, we
survey approaches and opportunities for relational representation
transformation designed to improve the performance of these algorithms. This
leads us to introduce an intuitive taxonomy for data representation
transformations in relational domains that incorporates link transformation and
node transformation as symmetric representation tasks. In particular, the
transformation tasks for both nodes and links include (i) predicting their
existence, (ii) predicting their label or type, (iii) estimating their weight
or importance, and (iv) systematically constructing their relevant features. We
motivate our taxonomy through detailed examples and use it to survey and
compare competing approaches for each of these tasks. We also discuss general
conditions for transforming links, nodes, and features. Finally, we highlight
challenges that remain to be addressed
Inferring Complex Activities for Context-aware Systems within Smart Environments
The rising ageing population worldwide and the prevalence of age-related conditions such as physical fragility, mental impairments and chronic diseases have significantly impacted the quality of life and caused a shortage of health and care services. Over-stretched healthcare providers are leading to a paradigm shift in public healthcare provisioning. Thus, Ambient Assisted Living (AAL) using Smart Homes (SH) technologies has been rigorously investigated to help address the aforementioned problems.
Human Activity Recognition (HAR) is a critical component in AAL systems which enables applications such as just-in-time assistance, behaviour analysis, anomalies detection and emergency notifications. This thesis is aimed at investigating challenges faced in accurately recognising Activities of Daily Living (ADLs) performed by single or multiple inhabitants within smart environments. Specifically, this thesis explores five complementary research challenges in HAR. The first study contributes to knowledge by developing a semantic-enabled data segmentation approach with user-preferences. The second study takes the segmented set of sensor data to investigate and recognise human ADLs at multi-granular action level; coarse- and fine-grained action level. At the coarse-grained actions level, semantic relationships between the sensor, object and ADLs are deduced, whereas, at fine-grained action level, object usage at the satisfactory threshold with the evidence fused from multimodal sensor data is leveraged to verify the intended actions. Moreover, due to imprecise/vague interpretations of multimodal sensors and data fusion challenges, fuzzy set theory and fuzzy web ontology language (fuzzy-OWL) are leveraged. The third study focuses on incorporating uncertainties caused in HAR due to factors such as technological failure, object malfunction, and human errors. Hence, existing studies uncertainty theories and approaches are analysed and based on the findings, probabilistic ontology (PR-OWL) based HAR approach is proposed. The fourth study extends the first three studies to distinguish activities conducted by more than one inhabitant in a shared smart environment with the use of discriminative sensor-based techniques and time-series pattern analysis. The final study investigates in a suitable system architecture with a real-time smart environment tailored to AAL system and proposes microservices architecture with sensor-based off-the-shelf and bespoke sensing methods.
The initial semantic-enabled data segmentation study was evaluated with 100% and 97.8% accuracy to segment sensor events under single and mixed activities scenarios. However, the average classification time taken to segment each sensor events have suffered from 3971ms and 62183ms for single and mixed activities scenarios, respectively. The second study to detect fine-grained-level user actions was evaluated with 30 and 153 fuzzy rules to detect two fine-grained movements with a pre-collected dataset from the real-time smart environment. The result of the second study indicate good average accuracy of 83.33% and 100% but with the high average duration of 24648ms and 105318ms, and posing further challenges for the scalability of fusion rule creations. The third study was evaluated by incorporating PR-OWL ontology with ADL ontologies and Semantic-Sensor-Network (SSN) ontology to define four types of uncertainties presented in the kitchen-based activity. The fourth study illustrated a case study to extended single-user AR to multi-user AR by combining RFID tags and fingerprint sensors discriminative sensors to identify and associate user actions with the aid of time-series analysis. The last study responds to the computations and performance requirements for the four studies by analysing and proposing microservices-based system architecture for AAL system. A future research investigation towards adopting fog/edge computing paradigms from cloud computing is discussed for higher availability, reduced network traffic/energy, cost, and creating a decentralised system.
As a result of the five studies, this thesis develops a knowledge-driven framework to estimate and recognise multi-user activities at fine-grained level user actions. This framework integrates three complementary ontologies to conceptualise factual, fuzzy and uncertainties in the environment/ADLs, time-series analysis and discriminative sensing environment. Moreover, a distributed software architecture, multimodal sensor-based hardware prototypes, and other supportive utility tools such as simulator and synthetic ADL data generator for the experimentation were developed to support the evaluation of the proposed approaches. The distributed system is platform-independent and currently supported by an Android mobile application and web-browser based client interfaces for retrieving information such as live sensor events and HAR results
- …