72,890 research outputs found
Hybrid Data Race Detection for Multicore Software
Multithreaded programs are prone to concurrency errors such as deadlocks, race conditions and atomicity violations. These errors are notoriously difficult to detect due to the non-deterministic nature of concurrent software running on multicore hardware. Data races result from the concurrent access of shared data by multiple threads and can result in unexpected program behaviors. Main dynamic data race detection techniques in the literature are happens-before and lockset algorithms which suffer from high execution time and memory overhead, miss many data races or produce a high number of false alarms. Our goal is to improve the performance of dynamic data race detection, while at the same time improving its accuracy by generating fewer false alarms. We develop a hybrid data race detection algorithm that is a combination of the happens-before and lockset algorithms in a tool. Rather than focusing on individual memory accesses by each thread, we focus on sequence of memory accesses by each thread, called a segment. This allows us to improve the performance of data race detection. We implement several optimizations on our hybrid data race detector and compare our technique with traditional happens-before and lockset detectors. The experiments are performed with C/C++ multithreaded benchmarks using Pthreads library from PARSEC suite and large applications such as Apache web server. Our experiments showed that our hybrid detector is 15 % faster than the happens-before detector and produces 50 % less potential data races than the lockset detector. Ultimately, a hybrid data race detector can improve the performance and accuracy of data race detection, enhancing its usability in practice
Using Deep Learning and Google Street View to Estimate the Demographic Makeup of the US
The United States spends more than $1B each year on initiatives such as the
American Community Survey (ACS), a labor-intensive door-to-door study that
measures statistics relating to race, gender, education, occupation,
unemployment, and other demographic factors. Although a comprehensive source of
data, the lag between demographic changes and their appearance in the ACS can
exceed half a decade. As digital imagery becomes ubiquitous and machine vision
techniques improve, automated data analysis may provide a cheaper and faster
alternative. Here, we present a method that determines socioeconomic trends
from 50 million images of street scenes, gathered in 200 American cities by
Google Street View cars. Using deep learning-based computer vision techniques,
we determined the make, model, and year of all motor vehicles encountered in
particular neighborhoods. Data from this census of motor vehicles, which
enumerated 22M automobiles in total (8% of all automobiles in the US), was used
to accurately estimate income, race, education, and voting patterns, with
single-precinct resolution. (The average US precinct contains approximately
1000 people.) The resulting associations are surprisingly simple and powerful.
For instance, if the number of sedans encountered during a 15-minute drive
through a city is higher than the number of pickup trucks, the city is likely
to vote for a Democrat during the next Presidential election (88% chance);
otherwise, it is likely to vote Republican (82%). Our results suggest that
automated systems for monitoring demographic trends may effectively complement
labor-intensive approaches, with the potential to detect trends with fine
spatial resolution, in close to real time.Comment: 41 pages including supplementary material. Under review at PNA
Adversarial Removal of Demographic Attributes from Text Data
Recent advances in Representation Learning and Adversarial Training seem to
succeed in removing unwanted features from the learned representation. We show
that demographic information of authors is encoded in -- and can be recovered
from -- the intermediate representations learned by text-based neural
classifiers. The implication is that decisions of classifiers trained on
textual data are not agnostic to -- and likely condition on -- demographic
attributes. When attempting to remove such demographic information using
adversarial training, we find that while the adversarial component achieves
chance-level development-set accuracy during training, a post-hoc classifier,
trained on the encoded sentences from the first part, still manages to reach
substantially higher classification accuracies on the same data. This behavior
is consistent across several tasks, demographic properties and datasets. We
explore several techniques to improve the effectiveness of the adversarial
component. Our main conclusion is a cautionary one: do not rely on the
adversarial training to achieve invariant representation to sensitive features
Analytic lymph node number establishes staging accuracy by occult tumor burden in colorectal cancer.
BACKGROUND AND OBJECTIVES: Recurrence in lymph node-negative (pN0) colorectal cancer suggests the presence of undetected occult metastases. Occult tumor burden in nodes estimated by GUCY2C RT-qPCR predicts risk of disease recurrence. This study explored the impact of the number of nodes analyzed by RT-qPCR (analytic) on the prognostic utility of occult tumor burden.
METHODS: Lymph nodes (range: 2-159) from 282 prospectively enrolled pN0 colorectal cancer patients, followed for a median of 24 months (range: 2-63), were analyzed by GUCY2C RT-qPCR. Prognostic risk categorization defined using occult tumor burden was the primary outcome measure. Association of prognostic variables and risk category were defined by multivariable polytomous and semi-parametric polytomous logistic regression.
RESULTS: Occult tumor burden stratified this pN0 cohort into categories of low (60%; recurrence rate (RR) = 2.3% [95% CI 0.1-4.5%]), intermediate (31%; RR = 33.3% [23.7-44.1%]), and high (9%; RR = 68.0% [46.5-85.1%], P \u3c 0.001) risk of recurrence. Beyond race and T stage, the number of analytic nodes was an independent marker of risk category (P \u3c 0.001). When \u3e12 nodes were analyzed, occult tumor burden almost completely resolved prognostic risk classification of pN0 patients.
CONCLUSIONS: The prognostic utility of occult tumor burden assessed by GUCY2C RT-qPCR is dependent on the number of analytic lymph nodes
Co-training for Demographic Classification Using Deep Learning from Label Proportions
Deep learning algorithms have recently produced state-of-the-art accuracy in
many classification tasks, but this success is typically dependent on access to
many annotated training examples. For domains without such data, an attractive
alternative is to train models with light, or distant supervision. In this
paper, we introduce a deep neural network for the Learning from Label
Proportion (LLP) setting, in which the training data consist of bags of
unlabeled instances with associated label distributions for each bag. We
introduce a new regularization layer, Batch Averager, that can be appended to
the last layer of any deep neural network to convert it from supervised
learning to LLP. This layer can be implemented readily with existing deep
learning packages. To further support domains in which the data consist of two
conditionally independent feature views (e.g. image and text), we propose a
co-training algorithm that iteratively generates pseudo bags and refits the
deep LLP model to improve classification accuracy. We demonstrate our models on
demographic attribute classification (gender and race/ethnicity), which has
many applications in social media analysis, public health, and marketing. We
conduct experiments to predict demographics of Twitter users based on their
tweets and profile image, without requiring any user-level annotations for
training. We find that the deep LLP approach outperforms baselines for both
text and image features separately. Additionally, we find that co-training
algorithm improves image and text classification by 4% and 8% absolute F1,
respectively. Finally, an ensemble of text and image classifiers further
improves the absolute F1 measure by 4% on average
- …