28 research outputs found
Cross-Geography Generalization of Machine Learning Methods for Classification of Flooded Regions in Aerial Images
Identification of regions affected by floods is a crucial piece of
information required for better planning and management of post-disaster relief
and rescue efforts. Traditionally, remote sensing images are analysed to
identify the extent of damage caused by flooding. The data acquired from
sensors onboard earth observation satellites are analyzed to detect the flooded
regions, which can be affected by low spatial and temporal resolution. However,
in recent years, the images acquired from Unmanned Aerial Vehicles (UAVs) have
also been utilized to assess post-disaster damage. Indeed, a UAV based platform
can be rapidly deployed with a customized flight plan and minimum dependence on
the ground infrastructure. This work proposes two approaches for identifying
flooded regions in UAV aerial images. The first approach utilizes texture-based
unsupervised segmentation to detect flooded areas, while the second uses an
artificial neural network on the texture features to classify images as flooded
and non-flooded. Unlike the existing works where the models are trained and
tested on images of the same geographical regions, this work studies the
performance of the proposed model in identifying flooded regions across
geographical regions. An F1-score of 0.89 is obtained using the proposed
segmentation-based approach which is higher than existing classifiers. The
robustness of the proposed approach demonstrates that it can be utilized to
identify flooded regions of any region with minimum or no user intervention
An automated essay evaluation system using natural language processing and sentiment analysi
An automated essay evaluation system is a machine-based approach leveraging long short-term memory (LSTM) model to award grades to essays written in English language. natural language processing (NLP) is used to extract feature representations from the essays. The LSTM network learns from the extracted features and generates parameters for testing and validation. The main objectives of the research include proposing and training an LSTM model using a dataset of manually graded essays with scores. Sentiment analysis is performed to determine the sentiment of the essay as either positive, negative or neutral. The twitter sample dataset is used to build sentiment classifier that analyzes the sentiment based on the student’s approach towards a topic. Additionally, each essay is subjected to detection of syntactical errors as well as plagiarism check to detect the novelty of the essay. The overall grade is calculated based on the quality of the essay, the number of syntactic errors, the percentage of plagiarism found and sentiment of the essay. The corrected essay is provided as feedback to the students. This essay grading model has gained an average quadratic weighted kappa (QWK) score of 0.911 with 99.4% accuracy for the sentiment analysis classifier
PolyIE: A Dataset of Information Extraction from Polymer Material Scientific Literature
Scientific information extraction (SciIE), which aims to automatically
extract information from scientific literature, is becoming more important than
ever. However, there are no existing SciIE datasets for polymer materials,
which is an important class of materials used ubiquitously in our daily lives.
To bridge this gap, we introduce POLYIE, a new SciIE dataset for polymer
materials. POLYIE is curated from 146 full-length polymer scholarly articles,
which are annotated with different named entities (i.e., materials, properties,
values, conditions) as well as their N-ary relations by domain experts. POLYIE
presents several unique challenges due to diverse lexical formats of entities,
ambiguity between entities, and variable-length relations. We evaluate
state-of-the-art named entity extraction and relation extraction models on
POLYIE, analyze their strengths and weaknesses, and highlight some difficult
cases for these models. To the best of our knowledge, POLYIE is the first SciIE
benchmark for polymer materials, and we hope it will lead to more research
efforts from the community on this challenging task. Our code and data are
available on: https://github.com/jerry3027/PolyIE.Comment: Work in progres
A general-purpose material property data extraction pipeline from large polymer corpora using Natural Language Processing
The ever-increasing number of materials science articles makes it hard to
infer chemistry-structure-property relations from published literature. We used
natural language processing (NLP) methods to automatically extract material
property data from the abstracts of polymer literature. As a component of our
pipeline, we trained MaterialsBERT, a language model, using 2.4 million
materials science abstracts, which outperforms other baseline models in three
out of five named entity recognition datasets when used as the encoder for
text. Using this pipeline, we obtained ~300,000 material property records from
~130,000 abstracts in 60 hours. The extracted data was analyzed for a diverse
range of applications such as fuel cells, supercapacitors, and polymer solar
cells to recover non-trivial insights. The data extracted through our pipeline
is made available through a web platform at https://polymerscholar.org which
can be used to locate material property data recorded in abstracts
conveniently. This work demonstrates the feasibility of an automatic pipeline
that starts from published literature and ends with a complete set of extracted
material property information
Toward Fairness in Speech Recognition: Discovery and mitigation of performance disparities
As for other forms of AI, speech recognition has recently been examined with
respect to performance disparities across different user cohorts. One approach
to achieve fairness in speech recognition is to (1) identify speaker cohorts
that suffer from subpar performance and (2) apply fairness mitigation measures
targeting the cohorts discovered. In this paper, we report on initial findings
with both discovery and mitigation of performance disparities using data from a
product-scale AI assistant speech recognition system. We compare cohort
discovery based on geographic and demographic information to a more scalable
method that groups speakers without human labels, using speaker embedding
technology. For fairness mitigation, we find that oversampling of
underrepresented cohorts, as well as modeling speaker cohort membership by
additional input variables, reduces the gap between top- and bottom-performing
cohorts, without deteriorating overall recognition accuracy.Comment: Proc. Interspeech 202
Draft genome sequence of Sclerospora graminicola, the pearl millet downy mildew pathogen:Genome sequence of pearl millet downy mildew pathogen
Sclerospora graminicola pathogen is one of the most important biotic production constraints of pearl millet worldwide. We report a de novo whole genome assembly and analysis of pathotype 1. The draft genome assembly contained 299,901,251 bp with 65,404 genes. Pearl millet [Pennisetum glaucum (L.) R. Br.], is an important crop of the semi-arid and arid regions of the world. It is capable of growing in harsh and marginal environments with highest degree of tolerance to drought and heat among cereals (1). Downy mildew is the most devastating disease of pearl millet caused by Sclerospora graminicola (sacc. Schroet), particularly on genetically uniform hybrids. Estimated annual grain yield loss due to downy mildew is approximately 10?80 % (2-7). Pathotype 1 has been reported to be the highly virulent pathotype of Sclerospora graminicola in India (8). We report a de novo whole genome assembly and analysis of Sclerospora graminicola pathotype 1 from India. A susceptible pearl millet genotype Tift 23D2B1P1-P5 was used for obtaining single-zoospore isolates from the original oosporic sample. The library for whole genome sequencing was prepared according to the instructions by NEB ultra DNA library kit for Illumina (New England Biolabs, USA). The libraries were normalised, pooled and sequenced on Illumina HiSeq 2500 (Illumina Inc., San Diego, CA, USA) platform at 2 x100 bp length. Mate pair (MP) libraries were prepared using the Nextera mate pair library preparation kit (Illumina Inc., USA). 1 ?g of Genomic DNA was subject to tagmentation and was followed by strand displacement. Size selection tagmented/strand displaced DNA was carried out using AmpureXP beads. The libraries were validated using an Agilent Bioanalyser using DNA HS chip. The libraries were normalised, pooled and sequenced on Illumina MiSeq (Illumina Inc., USA) platform at 2 x300 bp length. The whole genome sequencing was performed by sequencing of 7.38 Gb with 73,889,924 paired end reads from paired end library, and 1.15 Gb with 3,851,788 reads from mate pair library generated from Illumina HiSeq2500 and Illumina MiSeq, respectively. The sequences were assembled using various assemblers like ABySS, MaSuRCA, Velvet, SOAPdenovo2, and ALLPATHS-LG. The assembly generated by MaSuRCA (9) algorithm was observed superior over other algorithms and hence used for scaffolding using SSPACE. Assembled draft genome sequence of S. graminicola pathotype 1 was 299,901,251 bp long, with a 47.2 % GC content consisting of 26,786 scaffolds with N50 of 17,909 bp with longest scaffold size of 238,843 bp. The overall coverage was 40X. The draft genome sequence was used for gene prediction using AUGUSTUS. The completeness of the assembly was investigated using CEGMA and revealed 92.74% proteins completely present and 95.56% proteins partially present, while BUSCO fungal dataset indicated 64.9% complete, 12.4% fragmented, 22.7% missing out of 290 BUSCO groups. A total of 52,285 predicted genes were annotated using BLASTX and 38,120 genes were observed with significant BLASTX match. Repetitive element analysis in the assembly revealed 8,196 simple repeats, 1,058 low complexity repeats and 5,562 dinucleotide to hexanucleotide microsatellite repeats.publishersversionPeer reviewe
Comparison of Small Gut and Whole Gut Microbiota of First-Degree Relatives With Adult Celiac Disease Patients and Controls
Recent studies on celiac disease (CeD) have reported alterations in the gut microbiome. Whether this alteration in the microbial community is the cause or effect of the disease is not well understood, especially in adult onset of disease. The first-degree relatives (FDRs) of CeD patients may provide an opportunity to study gut microbiome in pre-disease state as FDRs are genetically susceptible to CeD. By using 16S rRNA gene sequencing, we observed that ecosystem level diversity measures were not significantly different between the disease condition (CeD), pre-disease (FDR) and control subjects. However, differences were observed at the level of amplicon sequence variant (ASV), suggesting alterations in specific ASVs between pre-disease and diseased condition. Duodenal biopsies showed higher differences in ASVs compared to fecal samples indicating larger disruption of the microbiota at the disease site. The duodenal microbiota of FDR was characterized by significant abundance of ASVs belonging to Parvimonas, Granulicatella, Gemella, Bifidobacterium, Anaerostipes, and Actinomyces genera. The duodenal microbiota of CeD was characterized by higher abundance of ASVs from genera Megasphaera and Helicobacter compared to the FDR microbiota. The CeD and FDR fecal microbiota had reduced abundance of ASVs classified as Akkermansia and Dorea when compared to control group microbiota. In addition, predicted functional metagenome showed reduced ability of gluten degradation by CeD fecal microbiota in comparison to FDRs and controls. The findings of the present study demonstrate differences in ASVs and predicts reduced ability of CeD fecal microbiota to degrade gluten compared to the FDR fecal microbiota. Further research is required to investigate the strain level and active functional profiles of FDR and CeD microbiota to better understand the role of gut microbiome in pathophysiology of CeD
Recommended from our members
Association Between Antiretroviral Treatment Regimen and Tuberculosis Preventative Treatment Completion for HIV-Positive Patients in Botswana
Tuberculosis (TB) is a major global health concern and is responsible for significant morbidity and mortality, especially among people living with HIV (PLHIV). TB preventative therapy using isoniazid (IPT) for latent TB in PLHIV is a commonly recommended, although often underutilized, treatment to decrease progression to active disease, as well as reduce the possibility of onward disease transmission. This study investigates factors associated with IPT course completion in a large cohort of PLHIV in Botswana, focusing on the antiretroviral (ARV) therapy a patient is receiving. 57,359 PLHIV were evaluated for IPT, 40,379 (70.4%) patients initiated IPT, and 38,293 (94.8%) of these completed the course of therapy. Logistic regression modeling was used to evaluate the association between ARV regimen, as well as other independent variables of age, gender, pregnancy status, and daily pill burden, with the dependent outcomes of IPT completion, IPT initiation, side effects, and death. We found that certain ARV regimens were associated with the likelihood of IPT completion; when compared to the reference ARV of TDF/FTC/EFV, TDF/3TC/DTG was found to be associated with an increased likelihood of treatment completion (OR = 1.24; 95% CI = 1.08, 1.43), while AZT/3TC_EFV (OR = 0.75, 95% CI = 0.62, 0.90), AZT/3TC_NVP (OR = 0.82, 95% CI = 0.68, 1.00), and TDF/FTC_LPV/R (OR = 0.70, 95% CI = 0.51, 0.98) were found to be associated with a decreased likelihood of treatment completion. Part of this relationship is possibly secondary to the daily pill burden of those ARV regimens, as well as side effects that may occur with concomitant IPT. Additionally, ARV regimen, age, and gender, were found to be associated with initiation of IPT, suggesting that targeted educational interventions may be needed in specific groups to increase participation in IPT programs. These findings should be taken into consideration by clinicians and managers intending to increase the operational effectiveness of IPT programs
Managing a front-line field hospital in Libya: Description of case mix and lessons learned for future humanitarian emergencies
Between June and August 2011, International Medical Corps deployed a field hospital near the front-line of the fighting between government troops and opposition fighters in Western Libya. The field hospital cared for over 1300 combatants and non-combatants from both sides of the conflict during that time period, the vast majority of them presenting with war-related injuries. Over 60% of battle-related injuries were due to shrapnel wounds and blast injuries from exploding small mortars, with smaller percentages due to battle-related motor vehicle accidents, gun shot wounds, burns, and other causes. The most pertinent lessons learned from our experience were the importance of dedicating significant resources to logistics and supply chain management, the rewards garnered from building strong ties with the local community early in the deployment of the field hospital, and the need to pay careful attention to basic principles of humanitarian ethics
A review of histopathological and immunohistochemical parameters in diagnosis of metastatic renal cell carcinoma with a case of gingival metastasis
The oral cavity constitutes a site of low prevalence for metastasis of malignant tumors. However, oral metastasis of a renal origin is relatively more common and represents 2% of all cancer deaths. Renal cancer may metastasize to any part of the body, with a 15% risk of metastasis to the head and neck regions, and pose one of the greatest diagnostic challenges in medical sciences. Approximately 25% of patients have a metastatic disease at initial assessment, which is often responsible for initiating the diagnosis in the first place. Here we present a review of literature of renal cell carcinoma along with a case of gingival metastasis