56 research outputs found

    ANALYSIS OF SECURITY MEASURES FOR SEQUENCES

    Get PDF
    Stream ciphers are private key cryptosystems used for security in communication and data transmission systems. Because they are used to encrypt streams of data, it is necessary for stream ciphers to use primitives that are easy to implement and fast to operate. LFSRs and the recently invented FCSRs are two such primitives, which give rise to certain security measures for the cryptographic strength of sequences, which we refer to as complexity measures henceforth following the convention. The linear (resp. N-adic) complexity of a sequence is the length of the shortest LFSR (resp. FCSR) that can generate the sequence. Due to the availability of shift register synthesis algorithms, sequences used for cryptographic purposes should have high values for these complexity measures. It is also essential that the complexity of these sequences does not decrease when a few symbols are changed. The k-error complexity of a sequence is the smallest value of the complexity of a sequence obtained by altering k or fewer symbols in the given sequence. For a sequence to be considered cryptographically ‘strong’ it should have both high complexity and high error complexity values. An important problem regarding sequence complexity measures is to determine good bounds on a specific complexity measure for a given sequence. In this thesis we derive new nontrivial lower bounds on the k-operation complexity of periodic sequences in both the linear and N-adic cases. Here the operations considered are combinations of insertions, deletions, and substitutions. We show that our bounds are tight and also derive several auxiliary results based on them. A second problem on sequence complexity measures useful in the design and analysis of stream ciphers is to determine the number of sequences with a given fixed (error) complexity value. In this thesis we address this problem for the k-error linear complexity of 2n-periodic binary sequences. More specifically: 1. We characterize 2n-periodic binary sequences with fixed 2- or 3-error linear complexity and obtain the counting function for the number of such sequences with fixed k-error linear complexity for k = 2 or 3. 2. We obtain partial results on the number of 2n-periodic binary sequences with fixed k-error linear complexity when k is the minimum number of changes required to lower the linear complexity

    End-to-End nn-ary Relation Extraction for Combination Drug Therapies

    Full text link
    Combination drug therapies are treatment regimens that involve two or more drugs, administered more commonly for patients with cancer, HIV, malaria, or tuberculosis. Currently there are over 350K articles in PubMed that use the "combination drug therapy" MeSH heading with at least 10K articles published per year over the past two decades. Extracting combination therapies from scientific literature inherently constitutes an nn-ary relation extraction problem. Unlike in the general nn-ary setting where nn is fixed (e.g., drug-gene-mutation relations where n=3n=3), extracting combination therapies is a special setting where n≥2n \geq 2 is dynamic, depending on each instance. Recently, Tiktinsky et al. (NAACL 2022) introduced a first of its kind dataset, CombDrugExt, for extracting such therapies from literature. Here, we use a sequence-to-sequence style end-to-end extraction method to achieve an F1-Score of 66.7%66.7\% on the CombDrugExt test set for positive (or effective) combinations. This is an absolute ≈5%\approx 5\% F1-score improvement even over the prior best relation classification score with spotted drug entities (hence, not end-to-end). Thus our effort introduces a state-of-the-art first model for end-to-end extraction that is already superior to the best prior non end-to-end model for this task. Our model seamlessly extracts all drug entities and relations in a single pass and is highly suitable for dynamic nn-ary extraction scenarios.Comment: Accepted to appear in IEEE ICHI 2023. Code: https://github.com/bionlproc/end-to-end-CombDrugEx

    Literature Retrieval for Precision Medicine with Neural Matching and Faceted Summarization

    Full text link
    Information retrieval (IR) for precision medicine (PM) often involves looking for multiple pieces of evidence that characterize a patient case. This typically includes at least the name of a condition and a genetic variation that applies to the patient. Other factors such as demographic attributes, comorbidities, and social determinants may also be pertinent. As such, the retrieval problem is often formulated as ad hoc search but with multiple facets (e.g., disease, mutation) that may need to be incorporated. In this paper, we present a document reranking approach that combines neural query-document matching and text summarization toward such retrieval scenarios. Our architecture builds on the basic BERT model with three specific components for reranking: (a). document-query matching (b). keyword extraction and (c). facet-conditioned abstractive summarization. The outcomes of (b) and (c) are used to essentially transform a candidate document into a concise summary that can be compared with the query at hand to compute a relevance score. Component (a) directly generates a matching score of a candidate document for a query. The full architecture benefits from the complementary potential of document-query matching and the novel document transformation approach based on summarization along PM facets. Evaluations using NIST's TREC-PM track datasets (2017--2019) show that our model achieves state-of-the-art performance. To foster reproducibility, our code is made available here: https://github.com/bionlproc/text-summ-for-doc-retrieval.Comment: Accepted to EMNLP 2020 Findings as Long Paper (11 page, 4 figures

    Literature Retrieval for Precision Medicine with Neural Matching and Faceted Summarization

    Get PDF
    Information retrieval (IR) for precision medicine (PM) often involves looking for multiple pieces of evidence that characterize a patient case. This typically includes at least the name of a condition and a genetic variation that applies to the patient. Other factors such as demographic attributes, comorbidities, and social determinants may also be pertinent. As such, the retrieval problem is often formulated as ad hoc search but with multiple facets (e.g., disease, mutation) that may need to be incorporated. In this paper, we present a document reranking approach that combines neural query-document matching and text summarization toward such retrieval scenarios. Our architecture builds on the basic BERT model with three specific components for reranking: (a). document-query matching (b). keyword extraction and (c). facet-conditioned abstractive summarization. The outcomes of (b) and (c) are used to essentially transform a candidate document into a concise summary that can be compared with the query at hand to compute a relevance score. Component (a) directly generates a matching score of a candidate document for a query. The full architecture benefits from the complementary potential of document-query matching and the novel document transformation approach based on summarization along PM facets. Evaluations using NIST’s TREC-PM track datasets (2017–2019) show that our model achieves state-of-the-art performance. To foster reproducibility, our code is made available here: https://github.com/bionlproc/text-summ-for-doc-retrieval

    Ordinal Convolutional Neural Networks for Predicting RDoC Positive Valence Psychiatric Symptom Severity Scores

    Get PDF
    Background—The CEGS N-GRID 2016 Shared Task in Clinical Natural Language Processing (NLP) provided a set of 1000 neuropsychiatric notes to participants as part of a competition to predict psychiatric symptom severity scores. This paper summarizes our methods, results, and experiences based on our participation in the second track of the shared task. Objective—Classical methods of text classification usually fall into one of three problem types: binary, multi-class, and multi-label classification. In this effort, we study ordinal regression problems with text data where misclassifications are penalized differently based on how far apart the ground truth and model predictions are on the ordinal scale. Specifically, we present our entries (methods and results) in the N-GRID shared task in predicting research domain criteria (RDoC) positive valence ordinal symptom severity scores (absent, mild, moderate, and severe) from psychiatric notes. Methods—We propose a novel convolutional neural network (CNN) model designed to handle ordinal regression tasks on psychiatric notes. Broadly speaking, our model combines an ordinal loss function, a CNN, and conventional feature engineering (wide features) into a single model which is learned end-to-end. Given interpretability is an important concern with nonlinear models, we apply a recent approach called locally interpretable model-agnostic explanation (LIME) to identify important words that lead to instance specific predictions. Results—Our best model entered into the shared task placed third among 24 teams and scored a macro mean absolute error (MMAE) based normalized score (100 · (1 − M M AE)) of 83.86. Since the competition, we improved our score (using basic ensembling) to 85.55, comparable with the winning shared task entry. Applying LIME to model predictions, we demonstrate the feasibility of instance specific prediction interpretation by identifying words that led to a particular decision

    Predicting Mental Conditions Based on History of Present Illness in Psychiatric Notes with Deep Neural Networks

    Get PDF
    Background—Applications of natural language processing to mental health notes are not common given the sensitive nature of the associated narratives. The CEGS N-GRID 2016 Shared Task in Clinical Natural Language Processing (NLP) changed this scenario by providing the first set of neuropsychiatric notes to participants. This study summarizes our efforts and results in proposing a novel data use case for this dataset as part of the third track in this shared task. Objective—We explore the feasibility and effectiveness of predicting a set of common mental conditions a patient has based on the short textual description of patient’s history of present illness typically occurring in the beginning of a psychiatric initial evaluation note. Materials and Methods—We clean and process the 1000 records made available through the N-GRID clinical NLP task into a key-value dictionary and build a dataset of 986 examples for which there is a narrative for history of present illness as well as Yes/No responses with regards to presence of specific mental conditions. We propose two independent deep neural network models: one based on convolutional neural networks (CNN) and another based on recurrent neural networks with hierarchical attention (ReHAN), the latter of which allows for interpretation of model decisions. We conduct experiments to compare these methods to each other and to baselines based on linear models and named entity recognition (NER). Results—Our CNN model with optimized thresholding of output probability estimates achieves best overall mean micro-F score of 63.144% for 11 common mental conditions with statistically significant gains (p \u3c 0.05) over all other models. The ReHAN model with interpretable attention mechanism scored 61.904% mean micro-F1 score. Both models’ improvements over baseline models (support vector machines and NER) are statistically significant. The ReHAN model additionally aids in interpretation of the results by surfacing important words and sentences that lead to a particular prediction for each instance

    Adversarial Discriminative Domain Adaptation for Extracting Protein-Protein Interactions from Text

    Get PDF
    Relation extraction is the process of extracting structured information from unstructured text. Recently, neural networks (NNs) have produced state-of-art results in extracting protein-protein interactions (PPIs) from text. While multiple corpora have been created to extract PPIs from text, most methods have shown poor cross-corpora generalization. In other words, models trained on one dataset perform poorly on other datasets for the same task. In the case of PPI, the F1 has been shown to vary by as much as 30% between different datasets. In this work, we utilize adversarial discriminative domain adaptation (ADDA) to improve the generalization between the source and target corpora. Specifically, we introduce a method of unsupervised domain adaptation, where we assume we have no labeled data in the target dataset
    • …
    corecore