48,104 research outputs found
Timed Fault Tree Models of the China Yongwen Railway Accident
Safety is an essential requirement for railway transportation. There are many methods that have been developed to predict, prevent and mitigate accidents in this context. All of these methods have their own purpose and limitations. This paper presents a new useful analysis technique: timed fault tree analysis. This method extends traditional fault tree analysis with temporal events and fault characteristics. Timed Fault Trees (TFTs) can determine which faults need to be eliminated urgently, and it can also provide a safe time window to repair them. They can also be used to determine the time taken for railway maintenance requirements, and thereby improve maintenance efficiency, and reduce risks. In this paper, we present the features and functionality of a railway transportation system based on timed fault tree models. We demonstrate the applicability of our framework via a case study of the China Yongwen line railway accident
Building a Sentiment Corpus of Tweets in Brazilian Portuguese
The large amount of data available in social media, forums and websites
motivates researches in several areas of Natural Language Processing, such as
sentiment analysis. The popularity of the area due to its subjective and
semantic characteristics motivates research on novel methods and approaches for
classification. Hence, there is a high demand for datasets on different domains
and different languages. This paper introduces TweetSentBR, a sentiment corpora
for Brazilian Portuguese manually annotated with 15.000 sentences on TV show
domain. The sentences were labeled in three classes (positive, neutral and
negative) by seven annotators, following literature guidelines for ensuring
reliability on the annotation. We also ran baseline experiments on polarity
classification using three machine learning methods, reaching 80.99% on
F-Measure and 82.06% on accuracy in binary classification, and 59.85% F-Measure
and 64.62% on accuracy on three point classification.Comment: Accepted for publication in 11th International Conference on Language
Resources and Evaluation (LREC 2018
Does William Shakespeare REALLY Write Hamlet? Knowledge Representation Learning with Confidence
Knowledge graphs (KGs), which could provide essential relational information
between entities, have been widely utilized in various knowledge-driven
applications. Since the overall human knowledge is innumerable that still grows
explosively and changes frequently, knowledge construction and update
inevitably involve automatic mechanisms with less human supervision, which
usually bring in plenty of noises and conflicts to KGs. However, most
conventional knowledge representation learning methods assume that all triple
facts in existing KGs share the same significance without any noises. To
address this problem, we propose a novel confidence-aware knowledge
representation learning framework (CKRL), which detects possible noises in KGs
while learning knowledge representations with confidence simultaneously.
Specifically, we introduce the triple confidence to conventional
translation-based methods for knowledge representation learning. To make triple
confidence more flexible and universal, we only utilize the internal structural
information in KGs, and propose three kinds of triple confidences considering
both local and global structural information. In experiments, We evaluate our
models on knowledge graph noise detection, knowledge graph completion and
triple classification. Experimental results demonstrate that our
confidence-aware models achieve significant and consistent improvements on all
tasks, which confirms the capability of CKRL modeling confidence with
structural information in both KG noise detection and knowledge representation
learning.Comment: 8 page
GOGGLES: Automatic Image Labeling with Affinity Coding
Generating large labeled training data is becoming the biggest bottleneck in
building and deploying supervised machine learning models. Recently, the data
programming paradigm has been proposed to reduce the human cost in labeling
training data. However, data programming relies on designing labeling functions
which still requires significant domain expertise. Also, it is prohibitively
difficult to write labeling functions for image datasets as it is hard to
express domain knowledge using raw features for images (pixels).
We propose affinity coding, a new domain-agnostic paradigm for automated
training data labeling. The core premise of affinity coding is that the
affinity scores of instance pairs belonging to the same class on average should
be higher than those of pairs belonging to different classes, according to some
affinity functions. We build the GOGGLES system that implements affinity coding
for labeling image datasets by designing a novel set of reusable affinity
functions for images, and propose a novel hierarchical generative model for
class inference using a small development set.
We compare GOGGLES with existing data programming systems on 5 image labeling
tasks from diverse domains. GOGGLES achieves labeling accuracies ranging from a
minimum of 71% to a maximum of 98% without requiring any extensive human
annotation. In terms of end-to-end performance, GOGGLES outperforms the
state-of-the-art data programming system Snuba by 21% and a state-of-the-art
few-shot learning technique by 5%, and is only 7% away from the fully
supervised upper bound.Comment: Published at 2020 ACM SIGMOD International Conference on Management
of Dat
- …