5,539 research outputs found

    LFTK: Handcrafted Features in Computational Linguistics

    Full text link
    Past research has identified a rich set of handcrafted linguistic features that can potentially assist various tasks. However, their extensive number makes it difficult to effectively select and utilize existing handcrafted features. Coupled with the problem of inconsistent implementation across research works, there has been no categorization scheme or generally-accepted feature names. This creates unwanted confusion. Also, most existing handcrafted feature extraction libraries are not open-source or not actively maintained. As a result, a researcher often has to build such an extraction system from the ground up. We collect and categorize more than 220 popular handcrafted features grounded on past literature. Then, we conduct a correlation analysis study on several task-specific datasets and report the potential use cases of each feature. Lastly, we devise a multilingual handcrafted linguistic feature extraction system in a systematically expandable manner. We open-source our system for public access to a rich set of pre-implemented handcrafted features. Our system is coined LFTK and is the largest of its kind. Find it at github.com/brucewlee/lftk.Comment: BEA @ ACL 202

    A Side-by-side Comparison of Transformers for English Implicit Discourse Relation Classification

    Full text link
    Though discourse parsing can help multiple NLP fields, there has been no wide language model search done on implicit discourse relation classification. This hinders researchers from fully utilizing public-available models in discourse analysis. This work is a straightforward, fine-tuned discourse performance comparison of seven pre-trained language models. We use PDTB-3, a popular discourse relation annotated dataset. Through our model search, we raise SOTA to 0.671 ACC and obtain novel observations. Some are contrary to what has been reported before (Shi and Demberg, 2019b), that sentence-level pre-training objectives (NSP, SBO, SOP) generally fail to produce the best performing model for implicit discourse relation classification. Counterintuitively, similar-sized PLMs with MLM and full attention led to better performance.Comment: TrustNLP @ ACL 202

    Data augmentation and semi-supervised learning for deep neural networks-based text classifier

    Get PDF
    User feedback is essential for understanding user needs. In this paper, we use free-text obtained from a survey on sleep-related issues to build a deep neural networks-based text classifier. However, to train the deep neural networks model, a lot of labelled data is needed. To reduce manual data labelling, we propose a method which is a combination of data augmentation and pseudo-labelling: data augmentation is applied to labelled data to increase the size of the initial train set and then the trained model is used to annotate unlabelled data with pseudo-labels. The result shows that the model with the data augmentation achieves macro-averaged f1 score of 65.2% while using 4,300 training data, whereas the model without data augmentation achieves macro-averaged f1 score of 68.2% with around 14,000 training data. Furthermore, with the combination of pseudo-labelling, the model achieves macro-averaged f1 score of 62.7% with only using 1,400 training data with labels. In other words, with the proposed method we can reduce the amount of labelled data for training while achieving relatively good performance

    Sporting Faith: Exploring Displays of Faith as Part of Christian Higher Education Athletic Program Identity

    Get PDF
    Contemporary higher education is made of a marketplace where institutions aggressively market themselves to student consumers who “shop” for school options (Tolbert, 2014). This study examines the marketing of faith-based higher education institutions’ athletic programs to determine how faith-related missions are revealed on institutional websites. Higher education institutions analyzed in this study consisted of 112 of the 141 member institutions that are members of the Council for Christian Colleges & Universities (CCCU), which compete in sanctioned intercollegiate athletics programs (e.g., NCAA, NAIA, NCCAA). The study attempted to quantify how strongly a university’s athletic program portrays the faith dimension of the school’s identity through the visual marketing tool of the athletic departments’ website to determine whether that measure is indicative of external perception. For this study, institutional websites were examined to measure the strength of faith identity presented on the sites using a content analysis of the university tagline, university mission statement, and athletic department mission statement. Faith expression was lacking in 53% of taglines and 33% of athletic department mission statements. Study results reflect that CCCU member institutions should streamline the faith expression of the university mission statement into the message conveyed in the tagline and the athletic department mission statement

    Associations among Human-Associated Fecal Contamination, Microcystis aeruginosa, and Microcystin at Lake Erie Beaches

    Get PDF
    Lake Erie beaches exhibit impaired water quality due to fecal contamination and cyanobacterial blooms, though few studies address potential relationships between these two public health hazards. Using quantitative polymerase chain reaction (qPCR), Microcystis aeruginosa was monitored in conjunction with a human-associated fecal marker (Bacteroides fragilis group; g-Bfra), microcystin, and water quality parameters at two beaches to evaluate their potential associations. During the summer of 2010, water samples were collected 32 times from both Euclid and Villa Angela beaches. The phycocyanin intergenic spacer (PC-IGS) and the microcystin-producing (mcyA) gene in M. aeruginosa were quantified with qPCR. PC-IGS and mcyA were detected in 50.0% and 39.1% of samples, respectively, and showed increased occurrences after mid-August. Correlation and regression analyses showed that water temperature was negatively correlated with M. aeruginosa markers and microcystin. The densities of mcyA and the g-Bfra were predicted by nitrate, implicating fecal contamination as contributing to the growth of M. aeruginosa by nitrate loading. Microcystin was correlated with mcyA (r = 0.413, p \u3c 0.01), suggesting toxin-producing M. aeruginosa populations may significantly contribute to microcystin production. Additionally, microcystin was correlated with total phosphorus (r = 0.628, p \u3c 0.001), which was higher at Euclid (p \u3c 0.05), possibly contributing to higher microcystin concentrations at Euclid

    K2-231 b: A sub-Neptune exoplanet transiting a solar twin in Ruprecht 147

    Get PDF
    We identify a sub-Neptune exoplanet (Rp=2.5±0.2R_p = 2.5 \pm 0.2 R_\oplus) transiting a solar twin in the Ruprecht 147 star cluster (3 Gyr, 300 pc, [Fe/H] = +0.1 dex). The ~81 day light curve for EPIC 219800881 (V = 12.71) from K2 Campaign 7 shows six transits with a period of 13.84 days, a depth of ~0.06%, and a duration of ~4 hours. Based on our analysis of high-resolution MIKE spectra, broadband optical and NIR photometry, the cluster parallax and interstellar reddening, and isochrone models from PARSEC, Dartmouth, and MIST, we estimate the following properties for the host star: M=1.01±0.03M_\star = 1.01 \pm 0.03 M_\odot, R=0.95±0.03R_\star= 0.95 \pm 0.03 R_\odot, and Teff=5695±50T_{\rm eff} = 5695 \pm 50 K. This star appears to be single, based on our modeling of the photometry, the low radial velocity variability measured over nearly ten years, and Keck/NIRC2 adaptive optics imaging and aperture-masking interferometry. Applying a probabilistic mass-radius relation, we estimate that the mass of this planet is Mp=7+53M_p = 7 +5 -3 M_\oplus, which would cause a RV semi-amplitude of K=2±1K = 2 \pm 1 m s1^{-1} that may be measurable with existing precise RV facilities. After statistically validating this planet with BLENDER, we now designate it K2-231 b, making it the second sub-stellar object to be discovered in Ruprecht 147 and the first planet; it joins the small but growing ranks of 23 other planets found in open clusters.Comment: 24 pages, 7 figures, light curve included as separate fil
    corecore