206 research outputs found
Recommended from our members
Graph analytics and subset selection problems in machine learning
In this dissertation we examine two topics relevant to modern machine learning research: 1) Subgraph counting and 2) High-dimensional subset selection. The former can be used to construct features for performing graph analytics. The latter has applications in sparse modeling such as feature selection, sparse regression, and interpretable machine learning. Since these problems become intractable for large datasets, we design efficient approximation algorithms for both tasks with data-dependent performance guarantees.
In the first part of the dissertation, we study the problem of approximating all three and four node induced subgraphs in a large graph. These counts are called the 3 and 4-profile, respectively, and describe a graph's connectivity properties. This problem generalizes graphlet counting and has found applications ranging from bioinformatics to spam detection. Our algorithms use the novel concept of graph profile sparsifiers: sparse graphs that can be used to approximate the full profile counts for a given large graph. We obtain novel concentration results showing that graphs can be substantially sparsified and still retain good approximation quality for the global graph profile. We also study the problem of counting local and ego profiles centered at each vertex of the graph. These quantities embed every vertex into a low-dimensional space that characterizes the local geometry of its neighborhood. We introduce the concept of edge pivots and show that all local 3 and 4-profiles can be computed as vertex programs using compressed two-hop information. Our algorithms are local, distributed message-passing schemes and compute all graph profiles in parallel. We empirically evaluate these algorithms with a distributed GraphLab implementation, and show improvements over previous state-of-the-art in experiments scaling up to 640 cores on Amazon EC2.
In the second part we shift to the problem of subset selection: for example, selecting a few features from a large feature set.
Motivated by the need for interpretable, nonlinear regression models for high-dimensional data, we draw a novel connection between this and submodular maximization. We extend an earlier concept of weak submodularity from the setting of sparse linear regression to a broad class of objective functions, including generalized linear model likelihoods. We then show that three greedy algorithms (Oblivous, Forward Stepwise, and Orthogonal Matching Pursuit) perform within a constant factor from the best possible subset. Our methods do not require any statistical modeling assumptions and allow direct control over the number of obtained features. This contrasts with other work that uses regularization parameters to control sparsity only implicitly. Our proof technique connecting convex analysis and submodular set function theory may be of independent interest for other statistical learning applications that have combinatorial structure.
In the third part, we consider the problem of explaining the predictions of a given black-box classifier. For example, why does a deep neural network assign an image to a particular class? We cast interpretability of black-box classifiers as a subset selection problem and propose to solve it with an efficient streaming algorithm. We provide a constant factor approximation guarantee for this algorithm in the case of a random stream order and a weakly submodular objective function. This is the first such theoretical guarantee for this general class of functions, and we also show that no such algorithm exists for a worst case stream order. Our algorithm obtains similar explanations of Inception V3 predictions 10 times faster than the state-of-the-art LIME framework.Electrical and Computer Engineerin
Implementation and Evaluation of iSCSI over RDMA
iSCSI is an emerging storage network technology that al-lows block-level access to storage devices, such as disk drives, over a computer network. Since iSCSI runs over the ubiquitous TCP/IP protocol, it has many advantages over its more proprietary alternatives. Due to the recent movement toward 10 gigabit Ethernet, storage vendors are interested to see the benefits this large increase in network bandwidth could bring to iSCSI. In order to make full use of the bandwidth provided by a 10 gigabit Ethernet link, specialized Remote Direct Memory Access hardware is being developed to offload processing and reduce the data-copy-overhead found in a standard TCP/IP network stack. This paper focuses on the development of an iSCSI implementation that is capa
Application of High-field NMR Spectroscopy for Differentiating Cathinones for Forensic Identification
Synthetic cathinone family compounds or designer drugs are the major naturally-occurring psychostimulant and hallucinogenic designer drugs that are used illegally in the United States and several other countries for their cocaine, methylenedioxymethamphetamine (MDMA), and amphetamine-like effects. Since 2009, forensic labs have identified synthetic cathinones in an increasing percentage of cases. One of the problems crime labs face when analyzing submitted drug evidence is that the samples are often mixtures and can contain one or more of several cutting agents. In this work, we demonstrate the utility of high-field 1H-NMR as a screening tool to detect cathinones in the presence adulterants or “cutting agents”. We collected 1H- and 13C-NMR spectra of three structurally distinct cathinones: alpha-piperidinobutiophenone, alpha-pyrrolidinopentiothiophenone, and pentylone. The spectra were collected with the pure cathinones and in the presence of a cutting agent, commercial powdered sugar (sucrose), and in two solvents. Without knowing the mixture components, it is impossible to select a solvent that will (ideally) only dissolve the drug of interest for interpretation. High-field NMR can be used to provide a spectral assignment and structure determination of a sample of an unknown cathinone and spectral signatures for screening, even when the cutting agent is also very soluble as observed when the solvent was D2O. The NMR spectra provide evidence that rapidly acquired 1H spectra can be used to strongly indicate the identity of cathinones in a sample if they are present in a library
Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game
While Large Language Models (LLMs) are increasingly being used in real-world
applications, they remain vulnerable to prompt injection attacks: malicious
third party prompts that subvert the intent of the system designer. To help
researchers study this problem, we present a dataset of over 126,000 prompt
injection attacks and 46,000 prompt-based "defenses" against prompt injection,
all created by players of an online game called Tensor Trust. To the best of
our knowledge, this is currently the largest dataset of human-generated
adversarial examples for instruction-following LLMs. The attacks in our dataset
have a lot of easily interpretable stucture, and shed light on the weaknesses
of LLMs. We also use the dataset to create a benchmark for resistance to two
types of prompt injection, which we refer to as prompt extraction and prompt
hijacking. Our benchmark results show that many models are vulnerable to the
attack strategies in the Tensor Trust dataset. Furthermore, we show that some
attack strategies from the dataset generalize to deployed LLM-based
applications, even though they have a very different set of constraints to the
game. We release all data and source code at https://tensortrust.ai/pape
Point of Contact:Investigating change in perception through a serious game for COVID-19 preventive measures
COVID-19 exposed the need to identify newer tools to understand perception of information, behavioral conformance to instructions and model the effects of individual motivation and decisions on the success of measures being put in place. We approach this challenge through the lens of serious games. Serious games are designed to instruct and inform within the confines of their magic circle. We built a multiplayer serious game, Point of Contact (PoC), to investigate effects of a serious game on perception and behavior. We conducted a study with 23 participants to gauge perceptions of COVID-19 preventive measures and quantify the change after playing PoC. The results show a significant positive change to participants’ perceptions towards COVID-19 preventive measures, shifting perceptions towards following guidelines more strictly due to a greater awareness of how the virus spreads. We discuss these implications and the value of a serious game like PoC towards pandemic risk modelling at a microcosm level
Assessment of Small Unmanned Aircraft Systems for Pavement Inspections
692M15-20-T-00034Pavement inspections play an integral role in ensuring airport safety. The FAA Airport Technology Research and Development (ATR) branch performed research to assess the integration of small Unmanned Aircraft Systems (sUAS) into an airport\u2019s Pavement Management Program (PMP). To conduct sUAS-based pavement inspections, the research team tested across five different airports between 2020 and 2022. The objective was to provide a repeatable set of processes and procedures for data collection, analysis, and reporting for sUAS-based pavement inspections. This report presents sUAS data collection parameters, data processing techniques, and data analysis, as well as workflows associated with each inspection. A summary of distresses identifiable via sUAS is also provided
Recommended from our members
Children’s moderate-to-vigorous physical activity on weekdays versus weekend days:A multi-country analysis
PURPOSE: The Structured Days Hypothesis (SDH) posits that children's behaviors associated with obesity - such as physical activity - are more favorable on days that contain more 'structure' (i.e., a pre-planned, segmented, and adult-supervised environment) such as school weekdays, compared to days with less structure, such as weekend days. The purpose of this study was to compare children's moderate-to-vigorous physical activity (MVPA) levels on weekdays versus weekend days using a large, multi-country, accelerometer-measured physical activity dataset. METHODS: Data were received from the International Children's Accelerometer Database (ICAD) July 2019. The ICAD inclusion criteria for a valid day of wear, only non-intervention data (e.g., baseline intervention data), children with at least 1 weekday and 1 weekend day, and ICAD studies with data collected exclusively during school months, were included for analyses. Mixed effects models accounting for the nested nature of the data (i.e., days within children) assessed MVPA minutes per day (min/day MVPA) differences between weekdays and weekend days by region/country, adjusted for age, sex, and total wear time. Separate meta-analytical models explored differences by age and country/region for sex and child weight-status. RESULTS/FINDINGS: Valid data from 15 studies representing 5794 children (61% female, 10.7 ± 2.1 yrs., 24% with overweight/obesity) and 35,263 days of valid accelerometer data from 5 distinct countries/regions were used. Boys and girls accumulated 12.6 min/day (95% CI: 9.0, 16.2) and 9.4 min/day (95% CI: 7.2, 11.6) more MVPA on weekdays versus weekend days, respectively. Children from mainland Europe had the largest differences (17.1 min/day more MVPA on weekdays versus weekend days, 95% CI: 15.3, 19.0) compared to the other countries/regions. Children who were classified as overweight/obese or normal weight/underweight accumulated 9.5 min/day (95% CI: 6.9, 12.2) and 10.9 min/day (95% CI: 8.3, 13.5) of additional MVPA on weekdays versus weekend days, respectively. CONCLUSIONS: Children from multiple countries/regions accumulated significantly more MVPA on weekdays versus weekend days during school months. This finding aligns with the SDH and warrants future intervention studies to prioritize less-structured days, such as weekend days, and to consider providing opportunities for all children to access additional opportunities to be active
A Mixed Blessing: Market-Mediated Religious Authority in Neopaganism
This research explores how marketplace dynamics affect religious authority in the context of Neopagan religion. Drawing on an interpretivist study of Wiccan practitioners in Italy, we reveal that engagement with the market may cause considerable, ongoing tensions, based on the inherent contradictions that are perceived to exist between spirituality and commercial gain. As a result, market success is a mixed blessing that can increase religious authority and influence, but is just as likely to decrease authority and credibility. Using an extended case study method, we propose a theoretical framework that depicts the links between our informants’ situated experiences and the macro-level factors affecting religious authority as it interacts with market-mediated dynamics at the global level. Overall, our study extends previous work in macromarketing that has looked at religious authority in the marketplace) and how the processes of globalization are affecting religion
BIOSAFETY. Safeguarding gene drive experiments in the laboratory.
Multiple stringent confinement strategies should be used whenever possibleThis is the author accepted manuscript. The final version is available from AAAS via http://dx.doi.org/10.1126/science.aac793
- …