8 research outputs found
Domain specific BERT representation for Named Entity Recognition of lab protocol
Supervised models trained to predict properties from representations have
been achieving high accuracy on a variety of tasks. For instance, the BERT
family seems to work exceptionally well on the downstream task from NER tagging
to the range of other linguistic tasks. But the vocabulary used in the medical
field contains a lot of different tokens used only in the medical industry such
as the name of different diseases, devices, organisms, medicines, etc. that
makes it difficult for traditional BERT model to create contextualized
embedding. In this paper, we are going to illustrate the System for Named
Entity Tagging based on Bio-Bert. Experimental results show that our model
gives substantial improvements over the baseline and stood the fourth runner up
in terms of F1 score, and first runner up in terms of Recall with just 2.21 F1
score behind the best one.Comment: EMNLP 2020 Workshop; 5 page
A Review on Homa Farming – A Vedic Touch to Modern Agriculture
Homa farming is a Vedic discipline which denotes the method of annihilating harmful circumstances of environmental elements and refines the atmosphere by action of flame, outfitted with copper pyramid. Agnihotra is essential flame in Homa farming. It religiously associates living beings on this earth to control energy from space. It is practiced when all hope is gone and has proved to be beneficial in increasing yield of crop , reducing microbial pathogenicity , decontaminating soil and water, against pest and disease infestation. Homa farming is comprehensive method of healing of agriculture and can be used in conjunction with any good organic farming system as it is extremely inexpensive and can be performed by anybody but requires discipline and consistency. In course of time this knowledge has lost because the farming is becoming more modernized with invention of new technologies like GI , GPS , satellite imaging, moisture sensors. Farmers are approaching new methods and practices of farming and they completely relies on chemicals like pesticides, rodenticides, Fertilizers, herbicides to enhance his production. Then it becomes difficult for them to believe in traditional and Vedic type of agriculture as it purely organic in nature and only depends on healing effects of agnihotra. Nowadays this knowledge is being revived by many scientists to give individuals the  guidance about how to address  polluted conditions of planet. Many scientists have demonstrated the scientific validation of Homa farming methodologies and have conducted experiments to prove the beneficial effect of Homafarming. A famous Scientist named Abhang Pathade, has conducted many experiments to prove that this technique actually works and can be very effective in getting rid of major problems like environmental pollution , disease and pest attack on plants , less crop yield and soil infertility.
View Article
DOI: 10.47856/ijaast.2022.v09i05.00
LORD: Low Rank Decomposition Of Monolingual Code LLMs For One-Shot Compression
Low Rank Decomposition of matrix - splitting a large matrix into a product of
two smaller matrix offers a means for compression that reduces the parameters
of a model without sparsification, and hence delivering more speedup on modern
hardware. Moreover, unlike quantization, the compressed linear layers remain
fully differentiable and all the parameters trainable, while being able to
leverage the existing highly efficient kernels over floating point matrices. We
study the potential to compress Large Language Models (LLMs) for monolingual
Code generation via Low Rank Decomposition (LoRD) and observe that ranks for
the linear layers in these models can be reduced by upto 39.58% with less than
1% increase in perplexity. We then use Low Rank Decomposition (LoRD) to
compress StarCoder 16B to 13.2B parameter with no drop and to 12.3B with
minimal drop in HumanEval Pass@1 score, in less than 10 minutes on a single
A100. The compressed models speeds up inference by up to 22.35% with just a
single line of change in code over huggingface's implementation with pytorch
backend. Low Rank Decomposition (LoRD) models remain compatible with state of
the art near-lossless quantization method such as SpQR, which allows leveraging
further compression gains of quantization. Lastly, QLoRA over Low Rank
Decomposition (LoRD) model further reduces memory requirements by as much as
21.2% over vanilla QLoRA while offering similar gains from parameter efficient
fine tuning. Our work shows Low Rank Decomposition (LoRD) as a promising new
paradigm for LLM compression.Comment: 9 page
Efficient Encoders for Streaming Sequence Tagging
A naive application of state-of-the-art bidirectional encoders for streaming
sequence tagging would require encoding each token from scratch for each new
token in an incremental streaming input (like transcribed speech). The lack of
re-usability of previous computation leads to a higher number of Floating Point
Operations (or FLOPs) and higher number of unnecessary label flips. Increased
FLOPs consequently lead to higher wall-clock time and increased label flipping
leads to poorer streaming performance. In this work, we present a Hybrid
Encoder with Adaptive Restart (HEAR) that addresses these issues while
maintaining the performance of bidirectional encoders over the offline (or
complete) inputs while improving performance on streaming (or incomplete)
inputs. HEAR has a Hybrid unidirectional-bidirectional encoder architecture to
perform sequence tagging, along with an Adaptive Restart Module (ARM) to
selectively guide the restart of bidirectional portion of the encoder. Across
four sequence tagging tasks, HEAR offers FLOP savings in streaming settings
upto 71.1% and also outperforms bidirectional encoders for streaming
predictions by upto +10% streaming exact match.Comment: EACL 202
INDEPROP: Information-Preserving De-propagandization of News Articles (Student Abstract)
We propose INDEPROP, a novel Natural Language Processing (NLP) application for combating online disinformation by mitigating propaganda from news articles. INDEPROP (Information-Preserving De-propagandization) involves fine-grained propaganda detection and its removal while maintaining document level coherence, grammatical correctness and most importantly, preserving the news articles’ information content. We curate the first large-scale dataset of its kind consisting of around 1M tokens. We also propose a set of automatic evaluation metrics for the same and observe its high correlation with human judgment. Furthermore, we show that fine-tuning the existing propaganda detection systems on our dataset considerably improves their generalization to the test set
Multifocal primary amyloidosis of the bladder presenting with gross hematuria: A case report and review of literature
Amyloidosis is defined as extracelluar deposition of amyloid, a fibrillary protein in one or more body sites. It can involve genito-urinary tract, primarily or secondarily, but isolated primary bladder amyloidosis is an extremely rare presentation. We herein report a rare case of 48-year-male patient presented with symptoms mimicking carcinoma urinary bladder especially painless haematuria. Transurethral resection of the mass was done in one sitting. The histopathological examination revealed to be a primary bladder amyloidosis. In the follow-up, patient had improvement in symptoms and no recurrence. We also briefly review the literature on primary bladder amyloidosis
Causal Direction of Data Collection Matters: Implications of Causal and Anticausal Learning for NLP
The principle of independent causal mechanisms (ICM) states that generative processes of real world data consist of independent modules which do not influence or inform each other. While this idea has led to fruitful developments in the field of causal inference, it is not widely-known in the NLP community. In this work, we argue that the causal direction of the data collection process bears nontrivial implications that can explain a number of published NLP findings, such as differences in semi-supervised learning (SSL) and domain adaptation (DA) performance across different settings. We categorize common NLP tasks according to their causal direction and empirically assay the validity of the ICM principle for text data using minimum description length. We conduct an extensive meta-analysis of over 100 published SSL and 30 DA studies, and find that the results are consistent with our expectations based on causal insights. This work presents the first attempt to analyze the ICM principle in NLP, and provides constructive suggestions for future modeling choices