660 research outputs found

    Enhancing Sensitivity Classification with Semantic Features using Word Embeddings

    Get PDF
    Government documents must be reviewed to identify any sensitive information they may contain, before they can be released to the public. However, traditional paper-based sensitivity review processes are not practical for reviewing born-digital documents. Therefore, there is a timely need for automatic sensitivity classification techniques, to assist the digital sensitivity review process. However, sensitivity is typically a product of the relations between combinations of terms, such as who said what about whom, therefore, automatic sensitivity classification is a difficult task. Vector representations of terms, such as word embeddings, have been shown to be effective at encoding latent term features that preserve semantic relations between terms, which can also be beneficial to sensitivity classification. In this work, we present a thorough evaluation of the effectiveness of semantic word embedding features, along with term and grammatical features, for sensitivity classification. On a test collection of government documents containing real sensitivities, we show that extending text classification with semantic features and additional term n-grams results in significant improvements in classification effectiveness, correctly classifying 9.99% more sensitive documents compared to the text classification baseline

    Combining support vector machines and segmentation algorithms for efficient anomaly detection: a petroleum industry application

    Get PDF
    Proceedings of: International Joint Conference SOCO’14-CISIS’14-ICEUTE’14, Bilbao, Spain, June 25th–27th, 2014, ProceedingsAnomaly detection is the problem of finding patterns in data that do not conform to expected behavior. Similarly, when patterns are numerically distant from the rest of sample, anomalies are indicated as outliers. Anomaly detection had recently attracted the attention of the research community for real-world applications. The petroleum industry is one of the application contexts where these problems are present. The correct detection of such types of unusual information empowers the decision maker with the capacity to act on the system in order to correctly avoid, correct, or react to the situations associated with them. In that sense, heavy extraction machines for pumping and generation operations like turbomachines are intensively monitored by hundreds of sensors each that send measurements with a high frequency for damage prevention. For dealing with this and with the lack of labeled data, in this paper we propose a combination of a fast and high quality segmentation algorithm with a one-class support vector machine approach for efficient anomaly detection in turbomachines. As result we perform empirical studies comparing our approach to other methods applied to benchmark problems and a real-life application related to oil platform turbomachinery anomaly detection.This work was partially funded by CNPq BJT Project 407851/2012-7 and CNPq PVE Project 314017/2013-

    The first Automatic Translation Memory Cleaning Shared Task

    Get PDF
    This is an accepted manuscript of an article published by Springer in Machine Translation on 21/01/2017, available online: https://doi.org/10.1007/s10590-016-9183-x The accepted version of the publication may differ from the final published version.This paper reports on the organization and results of the rst Automatic Translation Memory Cleaning Shared Task. This shared task is aimed at nding automatic ways of cleaning translation memories (TMs) that have not been properly curated and thus include incorrect translations. As a follow up of the shared task, we also conducted two surveys, one targeting the teams participating in the shared task, and the other one targeting professional translators. While the researchers-oriented survey aimed at gathering information about the opinion of participants on the shared task, the translators-oriented survey aimed to better understand what constitutes a good TM unit and inform decisions that will be taken in future editions of the task. In this paper, we report on the process of data preparation and the evaluation of the automatic systems submitted, as well as on the results of the collected surveys

    Self Hyper-parameter Tuning for Stream Recommendation Algorithms

    Get PDF
    E-commerce platforms explore the interaction between users and digital content – user generated streams of events – to build and maintain dynamic user preference models which are used to make meaningful recommendations. However, the accuracy of these incremental models is critically affected by the choice of hyper-parameters. So far, the incremental recommendation algorithms used to process data streams rely on human expertise for hyper-parameter tuning. In this work we apply our Self Hyper-Parameter Tuning (SPT) algorithm to incremental recommendation algorithms. SPT adapts the Melder-Mead optimisation algorithm to perform hyper-parameter tuning. First, it creates three models with random hyper-parameter values and, then, at dynamic size intervals, assesses and applies the Melder-Mead operators to update their hyper-parameters until the models converge. The main contribution of this work is the adaptation of the SPT method to incremental matrix factorisation recommendation algorithms. The proposed method was evaluated with well-known recommendation data sets. The results show that SPT systematically improves data stream recommendations.info:eu-repo/semantics/publishedVersio

    The effects of aging and dual task performance on language production

    Get PDF
    This is an electronic version of an article published in Kemper, S., Schmalzried, R., Herman, R., Leedahl, S., & Mohankumar, D. (2009). The effects of aging and dual task performance on language production. Aging, Neuropsychology, and Cognition, 16, 241-259. PM#2674132. Aging, Neuropsychology, and Cognition is available online at www.taylorandfrancis.comA digital pursuit rotor task was used to measure dual task costs of language production by young and older adults. After training on the pursuit rotor, participants were asked to track the moving target while providing a language sample. When simultaneously engaged, young adults experienced greater dual task costs to tracking, fluency, and grammatical complexity than older adults. Older adults were able to preserve their tracking performance by speaking more slowly. Individual differences in working memory, processing speed, and Stroop interference affected vulnerability to dual task costs. These results demonstrate the utility of using a digital pursuit rotor to study the effects of aging and dual task demands on language production and confirm prior findings that young and older adults use different strategies to accommodate to dual task demands

    Recurrent Connections Aid Occluded Object Recognition by Discounting Occluders

    Full text link
    Recurrent connections in the visual cortex are thought to aid object recognition when part of the stimulus is occluded. Here we investigate if and how recurrent connections in artificial neural networks similarly aid object recognition. We systematically test and compare architectures comprised of bottom-up (B), lateral (L) and top-down (T) connections. Performance is evaluated on a novel stereoscopic occluded object recognition dataset. The task consists of recognizing one target digit occluded by multiple occluder digits in a pseudo-3D environment. We find that recurrent models perform significantly better than their feedforward counterparts, which were matched in parametric complexity. Furthermore, we analyze how the network's representation of the stimuli evolves over time due to recurrent connections. We show that the recurrent connections tend to move the network's representation of an occluded digit towards its un-occluded version. Our results suggest that both the brain and artificial neural networks can exploit recurrent connectivity to aid occluded object recognition.Comment: 13 pages, 5 figures, accepted at the 28th International Conference on Artificial Neural Networks, published in Springer Lecture Notes in Computer Science vol 1172

    Evaluation Method, Dataset Size or Dataset Content: How to Evaluate Algorithms for Image Matching?

    Get PDF
    Most vision papers have to include some evaluation work in order to demonstrate that the algorithm proposed is an improvement on existing ones. Generally, these evaluation results are presented in tabular or graphical forms. Neither of these is ideal because there is no indication as to whether any performance differences are statistically significant. Moreover, the size and nature of the dataset used for evaluation will obviously have a bearing on the results, and neither of these factors are usually discussed. This paper evaluates the effectiveness of commonly used performance characterization metrics for image feature detection and description for matching problems and explores the use of statistical tests such as McNemar’s test and ANOVA as better alternatives

    Computer-assisted assessment of the Human Epidermal Growth Factor Receptor 2 immunohistochemical assay in imaged histologic sections using a membrane isolation algorithm and quantitative analysis of positive controls

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Breast cancers that overexpress the human epidermal growth factor receptor 2 (HER2) are eligible for effective biologically targeted therapies, such as trastuzumab. However, accurately determining HER2 overexpression, especially in immunohistochemically equivocal cases, remains a challenge. Manual analysis of HER2 expression is dependent on the assessment of membrane staining as well as comparisons with positive controls. In spite of the strides that have been made to standardize the assessment process, intra- and inter-observer discrepancies in scoring is not uncommon. In this manuscript we describe a pathologist assisted, computer-based continuous scoring approach for increasing the precision and reproducibility of assessing imaged breast tissue specimens.</p> <p>Methods</p> <p>Computer-assisted analysis on HER2 IHC is compared with manual scoring and fluorescence in situ hybridization results on a test set of 99 digitally imaged breast cancer cases enriched with equivocally scored (2+) cases. Image features are generated based on the staining profile of the positive control tissue and pixels delineated by a newly developed Membrane Isolation Algorithm. Evaluation of results was performed using Receiver Operator Characteristic (ROC) analysis.</p> <p>Results</p> <p>A computer-aided diagnostic approach has been developed using a membrane isolation algorithm and quantitative use of positive immunostaining controls. By incorporating internal positive controls into feature analysis a greater Area Under the Curve (AUC) in ROC analysis was achieved than feature analysis without positive controls. Evaluation of HER2 immunostaining that utilized membrane pixels, controls, and percent area stained showed significantly greater AUC than manual scoring, and significantly less false positive rate when used to evaluate immunohistochemically equivocal cases.</p> <p>Conclusion</p> <p>It has been shown that by incorporating both a membrane isolation algorithm and analysis of known positive controls a computer-assisted diagnostic algorithm was developed that can reproducibly score HER2 status in IHC stained clinical breast cancer specimens. For equivocal scoring cases, this approach performed better than standard manual evaluation as assessed by ROC analysis in our test samples. Finally, there exists potential for utilizing image-analysis techniques for improving HER2 scoring at the immunohistochemically equivocal range.</p
    • …
    corecore