Search CORE

5,539 research outputs found

LFTK: Handcrafted Features in Computational Linguistics

Author: Lee Bruce W.
Lee Jason Hyung-Jong
Publication venue
Publication date: 01/06/2023
Field of study

Past research has identified a rich set of handcrafted linguistic features that can potentially assist various tasks. However, their extensive number makes it difficult to effectively select and utilize existing handcrafted features. Coupled with the problem of inconsistent implementation across research works, there has been no categorization scheme or generally-accepted feature names. This creates unwanted confusion. Also, most existing handcrafted feature extraction libraries are not open-source or not actively maintained. As a result, a researcher often has to build such an extraction system from the ground up. We collect and categorize more than 220 popular handcrafted features grounded on past literature. Then, we conduct a correlation analysis study on several task-specific datasets and report the potential use cases of each feature. Lastly, we devise a multilingual handcrafted linguistic feature extraction system in a systematically expandable manner. We open-source our system for public access to a rich set of pre-implemented handcrafted features. Our system is coined LFTK and is the largest of its kind. Find it at github.com/brucewlee/lftk.Comment: BEA @ ACL 202

arXiv.org e-Print Archive

Recommended from our members

Guttate leukoderma and acrokeratosis verruciformis of Hopf: a rare combination in Darier disease

Author: Grossman Shoshana K
Hsu Sylvia
Lee Jason B
Sun Christina W
Valdes-Rodriguez Rodrigo
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

A distinct Darier phenotype presenting with confetti-like hypopigmented macules was first described in 1965. Designated as "guttate leukoderma," this skin finding is a rarely-reported presentation of Darier disease. It has been theorized that the mutation in ATP2A2 causes defective E-cadherin, which in turn disrupts the adhesion of melanocytes to keratinocytes, thus leading to impaired dendrite formation, hindered melanin transfer, and ultimately to melanocyte apoptosis. Herein, we contribute a case of a 56-year old woman who presented with the rarely-described guttate leukoderma of Darier disease and acrokeratosis verruciformis of Hopf

eScholarship - University of California

A Side-by-side Comparison of Transformers for English Implicit Discourse Relation Classification

Author: Lee Bruce W.
Lee Jason Hyung-Jong
Yang BongSeok
Publication venue
Publication date: 07/07/2023
Field of study

Though discourse parsing can help multiple NLP fields, there has been no wide language model search done on implicit discourse relation classification. This hinders researchers from fully utilizing public-available models in discourse analysis. This work is a straightforward, fine-tuned discourse performance comparison of seven pre-trained language models. We use PDTB-3, a popular discourse relation annotated dataset. Through our model search, we raise SOTA to 0.671 ACC and obtain novel observations. Some are contrary to what has been reported before (Shi and Demberg, 2019b), that sentence-level pre-training objectives (NSP, SBO, SOP) generally fail to produce the best performing model for implicit discourse relation classification. Counterintuitively, similar-sized PLMs with MLM and full attention led to better performance.Comment: TrustNLP @ ACL 202

arXiv.org e-Print Archive

Data augmentation and semi-supervised learning for deep neural networks-based text classifier

Author: Devlin Jacob
Lee Dong-Hyun
Liu Yinhan
Sorower Mohammad S
Wei Jason W
Wu Yuxiang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

User feedback is essential for understanding user needs. In this paper, we use free-text obtained from a survey on sleep-related issues to build a deep neural networks-based text classifier. However, to train the deep neural networks model, a lot of labelled data is needed. To reduce manual data labelling, we propose a method which is a combination of data augmentation and pseudo-labelling: data augmentation is applied to labelled data to increase the size of the initial train set and then the trained model is used to annotate unlabelled data with pseudo-labels. The result shows that the model with the data augmentation achieves macro-averaged f1 score of 65.2% while using 4,300 training data, whereas the model without data augmentation achieves macro-averaged f1 score of 68.2% with around 14,000 training data. Furthermore, with the combination of pseudo-labelling, the model achieves macro-averaged f1 score of 62.7% with only using 1,400 training data with labels. In other words, with the proposed method we can reduce the amount of labelled data for training while achieving relatively good performance

Crossref

Ghent University Academic Bibliography

Sporting Faith: Exploring Displays of Faith as Part of Christian Higher Education Athletic Program Identity

Author: Lee Jason W.
McRee Laci
Tolbert Dawn
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/01/2022
Field of study

Contemporary higher education is made of a marketplace where institutions aggressively market themselves to student consumers who “shop” for school options (Tolbert, 2014). This study examines the marketing of faith-based higher education institutions’ athletic programs to determine how faith-related missions are revealed on institutional websites. Higher education institutions analyzed in this study consisted of 112 of the 141 member institutions that are members of the Council for Christian Colleges & Universities (CCCU), which compete in sanctioned intercollegiate athletics programs (e.g., NCAA, NAIA, NCCAA). The study attempted to quantify how strongly a university’s athletic program portrays the faith dimension of the school’s identity through the visual marketing tool of the athletic departments’ website to determine whether that measure is indicative of external perception. For this study, institutional websites were examined to measure the strength of faith identity presented on the sites using a content analysis of the university tagline, university mission statement, and athletic department mission statement. Faith expression was lacking in 53% of taglines and 33% of athletic department mission statements. Study results reflect that CCCU member institutions should streamline the faith expression of the university mission statement into the message conveyed in the tagline and the athletic department mission statement

University of Tennessee, Knoxville: Trace

Associations among Human-Associated Fecal Contamination, Microcystis aeruginosa, and Microcystin at Lake Erie Beaches

Author: Chang Soo Lee
Cheonghoon Lee
Jason W. Marion
Jiyoung Lee
Melissa Cheung
Publication venue: Encompass
Publication date: 01/09/2015
Field of study

Lake Erie beaches exhibit impaired water quality due to fecal contamination and cyanobacterial blooms, though few studies address potential relationships between these two public health hazards. Using quantitative polymerase chain reaction (qPCR), Microcystis aeruginosa was monitored in conjunction with a human-associated fecal marker (Bacteroides fragilis group; g-Bfra), microcystin, and water quality parameters at two beaches to evaluate their potential associations. During the summer of 2010, water samples were collected 32 times from both Euclid and Villa Angela beaches. The phycocyanin intergenic spacer (PC-IGS) and the microcystin-producing (mcyA) gene in M. aeruginosa were quantified with qPCR. PC-IGS and mcyA were detected in 50.0% and 39.1% of samples, respectively, and showed increased occurrences after mid-August. Correlation and regression analyses showed that water temperature was negatively correlated with M. aeruginosa markers and microcystin. The densities of mcyA and the g-Bfra were predicted by nitrate, implicating fecal contamination as contributing to the growth of M. aeruginosa by nitrate loading. Microcystin was correlated with mcyA (r = 0.413, p \u3c 0.01), suggesting toxin-producing M. aeruginosa populations may significantly contribute to microcystin production. Additionally, microcystin was correlated with total phosphorus (r = 0.628, p \u3c 0.001), which was higher at Euclid (p \u3c 0.05), possibly contributing to higher microcystin concentrations at Euclid

Crossref

Eastern Kentucky University

Directory of Open Access Journals

PubMed Central

K2-231 b: A sub-Neptune exoplanet transiting a solar twin in Ruprecht 147

Author: Curtis Jason Lee
Fulton Benjamin J.
Henze Christopher E.
Howard Andrew W.
Huber Daniel
Isaacson Howard
Kraus Adam L.
Mann Andrew W.
Rizzuto Aaron C.
Torres Guillermo
Vanderburg Andrew
Wright Jason T.
Publication venue: 'American Astronomical Society'
Publication date: 01/01/2018
Field of study

We identify a sub-Neptune exoplanet (

R_p = 2.5 \pm 0.2

_\oplus

) transiting a solar twin in the Ruprecht 147 star cluster (3 Gyr, 300 pc, [Fe/H] = +0.1 dex). The ~81 day light curve for EPIC 219800881 (V = 12.71) from K2 Campaign 7 shows six transits with a period of 13.84 days, a depth of ~0.06%, and a duration of ~4 hours. Based on our analysis of high-resolution MIKE spectra, broadband optical and NIR photometry, the cluster parallax and interstellar reddening, and isochrone models from PARSEC, Dartmouth, and MIST, we estimate the following properties for the host star:

M_\star = 1.01 \pm 0.03

_\odot

R_\star= 0.95 \pm 0.03

_\odot

, and

T_{\rm eff} = 5695 \pm 50

K. This star appears to be single, based on our modeling of the photometry, the low radial velocity variability measured over nearly ten years, and Keck/NIRC2 adaptive optics imaging and aperture-masking interferometry. Applying a probabilistic mass-radius relation, we estimate that the mass of this planet is

M_p = 7 +5 -3

_\oplus

, which would cause a RV semi-amplitude of

K = 2 \pm 1

m s

^{-1}

that may be measurable with existing precise RV facilities. After statistically validating this planet with BLENDER, we now designate it K2-231 b, making it the second sub-stellar object to be discovered in Ruprecht 147 and the first planet; it joins the small but growing ranks of 23 other planets found in open clusters.Comment: 24 pages, 7 figures, light curve included as separate fil

arXiv.org e-Print Archive

Carolina Digital Repository

Caltech Authors