8 research outputs found

    Domain specific BERT representation for Named Entity Recognition of lab protocol

    Full text link
    Supervised models trained to predict properties from representations have been achieving high accuracy on a variety of tasks. For instance, the BERT family seems to work exceptionally well on the downstream task from NER tagging to the range of other linguistic tasks. But the vocabulary used in the medical field contains a lot of different tokens used only in the medical industry such as the name of different diseases, devices, organisms, medicines, etc. that makes it difficult for traditional BERT model to create contextualized embedding. In this paper, we are going to illustrate the System for Named Entity Tagging based on Bio-Bert. Experimental results show that our model gives substantial improvements over the baseline and stood the fourth runner up in terms of F1 score, and first runner up in terms of Recall with just 2.21 F1 score behind the best one.Comment: EMNLP 2020 Workshop; 5 page

    A Review on Homa Farming – A Vedic Touch to Modern Agriculture

    Get PDF
    Homa farming is a Vedic discipline which denotes the method of annihilating harmful circumstances of environmental elements and refines the atmosphere by action of flame, outfitted with copper pyramid. Agnihotra is essential flame in Homa farming. It religiously associates living beings on this earth to control energy from space. It is practiced when all hope is gone and has proved to be beneficial in increasing yield of crop , reducing microbial pathogenicity , decontaminating soil and water, against pest and disease infestation. Homa farming is comprehensive method of healing of agriculture and can be used in conjunction with any good organic farming system as it is extremely inexpensive and can be performed by anybody but requires discipline and consistency. In course of time this knowledge has lost because the farming is becoming more modernized with invention of new technologies like GI , GPS , satellite imaging, moisture sensors. Farmers are approaching new methods and practices of farming and they completely relies on chemicals like pesticides, rodenticides, Fertilizers, herbicides to enhance his production. Then it becomes difficult for them to believe in traditional and Vedic type of agriculture as it purely organic in nature and only depends on healing effects of agnihotra. Nowadays this knowledge is being revived by many scientists to give individuals the  guidance about how to address  polluted conditions  of planet. Many scientists have demonstrated the scientific validation of Homa farming methodologies and have conducted experiments to prove the beneficial effect of Homafarming. A famous Scientist named Abhang Pathade, has conducted many experiments to prove that this technique actually works and can be very effective in getting rid of major problems like environmental pollution , disease and pest attack on plants , less crop yield and soil infertility. View Article DOI: 10.47856/ijaast.2022.v09i05.00

    LORD: Low Rank Decomposition Of Monolingual Code LLMs For One-Shot Compression

    Full text link
    Low Rank Decomposition of matrix - splitting a large matrix into a product of two smaller matrix offers a means for compression that reduces the parameters of a model without sparsification, and hence delivering more speedup on modern hardware. Moreover, unlike quantization, the compressed linear layers remain fully differentiable and all the parameters trainable, while being able to leverage the existing highly efficient kernels over floating point matrices. We study the potential to compress Large Language Models (LLMs) for monolingual Code generation via Low Rank Decomposition (LoRD) and observe that ranks for the linear layers in these models can be reduced by upto 39.58% with less than 1% increase in perplexity. We then use Low Rank Decomposition (LoRD) to compress StarCoder 16B to 13.2B parameter with no drop and to 12.3B with minimal drop in HumanEval Pass@1 score, in less than 10 minutes on a single A100. The compressed models speeds up inference by up to 22.35% with just a single line of change in code over huggingface's implementation with pytorch backend. Low Rank Decomposition (LoRD) models remain compatible with state of the art near-lossless quantization method such as SpQR, which allows leveraging further compression gains of quantization. Lastly, QLoRA over Low Rank Decomposition (LoRD) model further reduces memory requirements by as much as 21.2% over vanilla QLoRA while offering similar gains from parameter efficient fine tuning. Our work shows Low Rank Decomposition (LoRD) as a promising new paradigm for LLM compression.Comment: 9 page

    Efficient Encoders for Streaming Sequence Tagging

    Full text link
    A naive application of state-of-the-art bidirectional encoders for streaming sequence tagging would require encoding each token from scratch for each new token in an incremental streaming input (like transcribed speech). The lack of re-usability of previous computation leads to a higher number of Floating Point Operations (or FLOPs) and higher number of unnecessary label flips. Increased FLOPs consequently lead to higher wall-clock time and increased label flipping leads to poorer streaming performance. In this work, we present a Hybrid Encoder with Adaptive Restart (HEAR) that addresses these issues while maintaining the performance of bidirectional encoders over the offline (or complete) inputs while improving performance on streaming (or incomplete) inputs. HEAR has a Hybrid unidirectional-bidirectional encoder architecture to perform sequence tagging, along with an Adaptive Restart Module (ARM) to selectively guide the restart of bidirectional portion of the encoder. Across four sequence tagging tasks, HEAR offers FLOP savings in streaming settings upto 71.1% and also outperforms bidirectional encoders for streaming predictions by upto +10% streaming exact match.Comment: EACL 202

    INDEPROP: Information-Preserving De-propagandization of News Articles (Student Abstract)

    No full text
    We propose INDEPROP, a novel Natural Language Processing (NLP) application for combating online disinformation by mitigating propaganda from news articles. INDEPROP (Information-Preserving De-propagandization) involves fine-grained propaganda detection and its removal while maintaining document level coherence, grammatical correctness and most importantly, preserving the news articles’ information content. We curate the first large-scale dataset of its kind consisting of around 1M tokens. We also propose a set of automatic evaluation metrics for the same and observe its high correlation with human judgment. Furthermore, we show that fine-tuning the existing propaganda detection systems on our dataset considerably improves their generalization to the test set

    Multifocal primary amyloidosis of the bladder presenting with gross hematuria: A case report and review of literature

    No full text
    Amyloidosis is defined as extracelluar deposition of amyloid, a fibrillary protein in one or more body sites. It can involve genito-urinary tract, primarily or secondarily, but isolated primary bladder amyloidosis is an extremely rare presentation. We herein report a rare case of 48-year-male patient presented with symptoms mimicking carcinoma urinary bladder especially painless haematuria. Transurethral resection of the mass was done in one sitting. The histopathological examination revealed to be a primary bladder amyloidosis. In the follow-up, patient had improvement in symptoms and no recurrence. We also briefly review the literature on primary bladder amyloidosis

    Causal Direction of Data Collection Matters: Implications of Causal and Anticausal Learning for NLP

    No full text
    The principle of independent causal mechanisms (ICM) states that generative processes of real world data consist of independent modules which do not influence or inform each other. While this idea has led to fruitful developments in the field of causal inference, it is not widely-known in the NLP community. In this work, we argue that the causal direction of the data collection process bears nontrivial implications that can explain a number of published NLP findings, such as differences in semi-supervised learning (SSL) and domain adaptation (DA) performance across different settings. We categorize common NLP tasks according to their causal direction and empirically assay the validity of the ICM principle for text data using minimum description length. We conduct an extensive meta-analysis of over 100 published SSL and 30 DA studies, and find that the results are consistent with our expectations based on causal insights. This work presents the first attempt to analyze the ICM principle in NLP, and provides constructive suggestions for future modeling choices
    corecore