29 research outputs found
A Framework for Comparing Groups of Documents
We present a general framework for comparing multiple groups of documents. A
bipartite graph model is proposed where document groups are represented as one
node set and the comparison criteria are represented as the other node set.
Using this model, we present basic algorithms to extract insights into
similarities and differences among the document groups. Finally, we demonstrate
the versatility of our framework through an analysis of NSF funding programs
for basic research.Comment: 6 pages; 2015 Conference on Empirical Methods in Natural Language
Processing (EMNLP '15
Topic Similarity Networks: Visual Analytics for Large Document Sets
We investigate ways in which to improve the interpretability of LDA topic
models by better analyzing and visualizing their outputs. We focus on examining
what we refer to as topic similarity networks: graphs in which nodes represent
latent topics in text collections and links represent similarity among topics.
We describe efficient and effective approaches to both building and labeling
such networks. Visualizations of topic models based on these networks are shown
to be a powerful means of exploring, characterizing, and summarizing large
collections of unstructured text documents. They help to "tease out"
non-obvious connections among different sets of documents and provide insights
into how topics form larger themes. We demonstrate the efficacy and
practicality of these approaches through two case studies: 1) NSF grants for
basic research spanning a 14 year period and 2) the entire English portion of
Wikipedia.Comment: 9 pages; 2014 IEEE International Conference on Big Data (IEEE BigData
2014
CausalNLP: A Practical Toolkit for Causal Inference with Text
The vast majority of existing methods and systems for causal inference assume
that all variables under consideration are categorical or numerical (e.g.,
gender, price, blood pressure, enrollment). In this paper, we present
CausalNLP, a toolkit for inferring causality from observational data that
includes text in addition to traditional numerical and categorical variables.
CausalNLP employs the use of meta-learners for treatment effect estimation and
supports using raw text and its linguistic properties as both a treatment and a
"controlled-for" variable (e.g., confounder). The library is open-source and
available at: https://github.com/amaiya/causalnlp.Comment: 7 page
Mining Measured Information from Text
We present an approach to extract measured information from text (e.g., a
1370 degrees C melting point, a BMI greater than 29.9 kg/m^2 ). Such
extractions are critically important across a wide range of domains -
especially those involving search and exploration of scientific and technical
documents. We first propose a rule-based entity extractor to mine measured
quantities (i.e., a numeric value paired with a measurement unit), which
supports a vast and comprehensive set of both common and obscure measurement
units. Our method is highly robust and can correctly recover valid measured
quantities even when significant errors are introduced through the process of
converting document formats like PDF to plain text. Next, we describe an
approach to extracting the properties being measured (e.g., the property "pixel
pitch" in the phrase "a pixel pitch as high as 352 {\mu}m"). Finally, we
present MQSearch: the realization of a search engine with full support for
measured information.Comment: 4 pages; 38th International ACM SIGIR Conference on Research and
Development in Information Retrieval (SIGIR '15
Kinetics and kinematics of diabetic foot in type 2 diabetes mellitus with and without peripheral neuropathy: a systematic review and meta-analysis.
Diabetes mellitus patients are at increased risk of developing diabetic foot with peripheral neuropathy, vascular and musculoskeletal complications. Problems develop with a relatively high risk of infection, gangrene and amputation. In addition, altered plantar pressure distribution is an important etiopathogenic risk factor for the development of foot ulcers. The purpose of this systematic review is to understand the biomechanical changes involved through studies of foot kinematic and kinetic in type 2 diabetes mellitus. Scientific articles were identified using electronic databases including Science Direct, CINAHL, Springer Link, Medline, Web of Science, and Pubmed. The selection of articles to include in the systematic review was narrorwed after reading the full text, focusing on studies that used experimental designs relating to the biomechanics of diabetic foot. The meta-analysis report on gait velocity (neuropathy = 128 and non-diabetes = 131) showed that there was a significantly lower gait velocity in neuropathy participants compared to non-diabetes age-matched participants at a high effect level (-0.09, 95% CI -0.13 to 0.05; p < 0.0001). Regarding knee-joint flexion range, there was a significant difference between neuropathy and non-diabetes groups (4.75, 95% CI, -7.53 to 1.97, p = 0.0008). The systematic review found significant differences in kinematic and kinetic variables among diabetic with neuropathy, diabetic without neuropathy and non-diabetic individuals. The review also found that the sample sizes used in some studies were not statistically significant enough contribute reliably to the meta-analysis, so further studies with higher sample sizes are required
Sampling and Inference in Complex Networks.
Sampling and Inference in Complex Networks
Expansion and search in networks
Borrowing from concepts in expander graphs, we study the expansion properties of real-world, complex networks (e.g. social networks, unstructured peer-to-peer or P2P networks) and the extent to which these properties can be exploited to understand and address the problem of decentralized search. We first produce samples that concisely capture the overall expansion properties of an entire network, which we collec-tively refer to as the expansion signature. Using these signa-tures, we find a correspondence between the magnitude of maximum expansion and the extent to which a network can be efficiently searched. We further find evidence that stan-dard graph-theoretic measures, such as average path length, fail to fully explain the level of “searchability ” or ease of in-formation diffusion and dissemination in a network. Finally, we demonstrate that this high expansion can be leveraged to facilitate decentralized search in networks and show that an expansion-based search strategy outperforms typical search methods