199 research outputs found

    Random Text Perturbations Work, but not Always

    Full text link
    We present three large-scale experiments on binary text matching classification task both in Chinese and English to evaluate the effectiveness and generalizability of random text perturbations as a data augmentation approach for NLP. It is found that the augmentation can bring both negative and positive effects to the test set performance of three neural classification models, depending on whether the models train on enough original training examples. This remains true no matter whether five random text editing operations, used to augment text, are applied together or separately. Our study demonstrates with strong implication that the effectiveness of random text perturbations is task specific and not generally positive.Comment: 7 pages; 8 tables; 3 figure

    Probabilistic Linguistic Knowledge and Token-level Text Augmentation

    Full text link
    This paper investigates the effectiveness of token-level text augmentation and the role of probabilistic linguistic knowledge within a linguistically-motivated evaluation context. Two text augmentation programs, REDA and REDANG_{NG}, were developed, both implementing five token-level text editing operations: Synonym Replacement (SR), Random Swap (RS), Random Insertion (RI), Random Deletion (RD), and Random Mix (RM). REDANG_{NG} leverages pretrained nn-gram language models to select the most likely augmented texts from REDA's output. Comprehensive and fine-grained experiments were conducted on a binary question matching classification task in both Chinese and English. The results strongly refute the general effectiveness of the five token-level text augmentation techniques under investigation, whether applied together or separately, and irrespective of various common classification model types used, including transformers. Furthermore, the role of probabilistic linguistic knowledge is found to be minimal.Comment: 20 pages; 3 figures; 8 table

    Self Contrastive Learning for Session-based Recommendation

    Full text link
    Session-based recommendation, which aims to predict the next item of users' interest as per an existing sequence interaction of items, has attracted growing applications of Contrastive Learning (CL) with improved user and item representations. However, these contrastive objectives: (1) serve a similar role as the cross-entropy loss while ignoring the item representation space optimisation; and (2) commonly require complicated modelling, including complex positive/negative sample constructions and extra data augmentation. In this work, we introduce Self-Contrastive Learning (SCL), which simplifies the application of CL and enhances the performance of state-of-the-art CL-based recommendation techniques. Specifically, SCL is formulated as an objective function that directly promotes a uniform distribution among item representations and efficiently replaces all the existing contrastive objective components of state-of-the-art models. Unlike previous works, SCL eliminates the need for any positive/negative sample construction or data augmentation, leading to enhanced interpretability of the item representation space and facilitating its extensibility to existing recommender systems. Through experiments on three benchmark datasets, we demonstrate that SCL consistently improves the performance of state-of-the-art models with statistical significance. Notably, our experiments show that SCL improves the performance of two best-performing models by 8.2% and 9.5% in P@10 (Precision) and 9.9% and 11.2% in MRR@10 (Mean Reciprocal Rank) on average across different benchmarks. Additionally, our analysis elucidates the improvement in terms of alignment and uniformity of representations, as well as the effectiveness of SCL with a low computational cost.Comment: Technical Repor

    Strongly lensed repeating Fast Radio Bursts as precision probes of the universe

    Get PDF
    Fast Radio bursts (FRBs), bright transients with millisecond durations at ∼\sim GHz and typical redshifts probably >0.8>0.8, are likely to be gravitationally lensed by intervening galaxies. Since the time delay between images of strongly lensed FRB can be measured to extremely high precision because of the large ratio ∼109\sim10^9 between the typical galaxy-lensing delay time ∼O\sim\mathcal{O}(10 days) and the width of bursts ∼O\sim\mathcal{O}(ms), we propose strongly lensed FRBs as precision probes of the universe. We show that, within the flat Λ\LambdaCDM model, the Hubble constant H0H_0 can be constrained with a ∼0.91%\sim0.91\% uncertainty from 10 such systems probably observed with the Square Kilometer Array (SKA) in << 30 years. More importantly, the cosmic curvature can be model-independently constrained to a precision of ∼0.076\sim0.076. This constraint can directly test the validity of the cosmological principle and break the intractable degeneracy between the cosmic curvature and dark energy.Comment: 8 pages, 6 figure

    Novel enzyme-fermentation process for bioconversion of restaurant food waste into isomaltooligosaccharide-and L-lactic acid-enriched animal feed

    Get PDF
    IntroductionConsidering the valuable organic fraction contained, restaurant food waste (RFW) has attracted more attention as an alternative substrate for animal feed production. In this work, a new enzyme-fermentation process (EFP) for diverting RFW into synbiotic animal feed was developed, and its economic and environmental benefits were evaluated.MethodsThe process initiated with enzymatic hydrolysis of RFWs, intending to convert starch into isomaltooligosaccharides (IMOs) via simultaneous saccharification and transglycosylation (SST). Subsequently, the hydrolysate underwent fermentation with engineered Pichia pastoris GSL to form L-lactic acid (L-LA) from the free glucose and to biologically enhance the nutritional value.Results and discussionThe results indicated that employing the EFP yielded the highest IMOs levels, ranging from 17.10–38.00 g/L. Simultaneously, the process achieved the maximum L-LA concentration (20.75–27.16 g/L), with a conversion efficiency of 0.64–0.78 g/g. Additionally, 5.0–8.5 g/L of yeast biomass was generated. Economic estimates elucidated that the cost of RFW-derived animal feed through EFP was about $0.16/kg, signifying a substantial cost reduction (≥ 70%) compared to traditional feeds. Achieving complete conversion of RFW into animal feed while eliminating residual waste highlights the significant environmental benefits and the compatibility of the present technology with the zero-waste concept

    Developing an online academic writing tutorial for non-native English speaking international graduate students in diverse programs of studies

    Get PDF
    This presentation introduces a research/practice project that aims to help non-native English speaking international graduate students improve academic literature review writing through a series of extracurricular online tutorials. The presentation introduces the tutorial delivered via Moodle (an open-source learning management system) and supported by h5p interactive content. To support learners’ academic discourse socialization process, we have added interactive elements such as peer review, collaborative writing, and instructor feedback to the main writing tasks in the tutorials. The participants come from two Canadian universities and represent six different first languages and four disciplines. We report the preliminary results regarding the following research questions: what the participants’ challenges in academic writing are; and what kind of learning experiences they are getting from the tutorial (their genre awareness, reflection of collaborative writing practices, tutorial materials, etc.)

    Interfacial Properties of Bilayer and Trilayer Graphene on Metal Substrates

    Full text link
    One popular approach to prepare graphene is to grow them on transition metal substrates via chemical vapor deposition. By using the density functional theory with dispersion correction, we systematically investigate for the first time the interfacial properties of bilayer (BLG) and trilayer graphene (TLG) on metal substrates. Three categories of interfacial structures are revealed. The adsorption of B(T)LG on Al, Ag, Cu, Au, and Pt substrates is a weak physisorption, but a band gap can be opened. The adsorption of B(T)LG on Ti, Ni, and Co substrates is a strong chemisorption, and a stacking-insensitive band gap is opened for the two uncontacted layers of TLG. The adsorption of B(T)LG on Pd substrate is a weaker chemisorption, with a band gap opened for the uncontacted layers. This fundamental study also helps for B(T)LG device study due to inevitable graphene/metal contact.Comment: 1 table, 8 figure

    LucidDraw: Efficiently visualizing complex biochemical networks within MATLAB

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Biochemical networks play an essential role in systems biology. Rapidly growing network data and versatile research activities call for convenient visualization tools to aid intuitively perceiving abstract structures of networks and gaining insights into the functional implications of networks. There are various kinds of network visualization software, but they are usually not adequate for visual analysis of complex biological networks mainly because of the two reasons: 1) most existing drawing methods suitable for biochemical networks have high computation loads and can hardly achieve near real-time visualization; 2) available network visualization tools are designed for working in certain network modeling platforms, so they are not convenient for general analyses due to lack of broader range of readily accessible numerical utilities.</p> <p>Results</p> <p>We present LucidDraw as a visual analysis tool, which features (a) speed: typical biological networks with several hundreds of nodes can be drawn in a few seconds through a new layout algorithm; (b) ease of use: working within MATLAB makes it convenient to manipulate and analyze the network data using a broad spectrum of sophisticated numerical functions; (c) flexibility: layout styles and incorporation of other available information about functional modules can be controlled by users with little effort, and the output drawings are interactively modifiable.</p> <p>Conclusions</p> <p>Equipped with a new grid layout algorithm proposed here, LucidDraw serves as an auxiliary network analysis tool capable of visualizing complex biological networks in near real-time with controllable layout styles and drawing details. The framework of the algorithm enables easy incorporation of extra biological information, if available, to influence the output layouts with predefined node grouping features.</p
    • …
    corecore