199 research outputs found
Random Text Perturbations Work, but not Always
We present three large-scale experiments on binary text matching
classification task both in Chinese and English to evaluate the effectiveness
and generalizability of random text perturbations as a data augmentation
approach for NLP. It is found that the augmentation can bring both negative and
positive effects to the test set performance of three neural classification
models, depending on whether the models train on enough original training
examples. This remains true no matter whether five random text editing
operations, used to augment text, are applied together or separately. Our study
demonstrates with strong implication that the effectiveness of random text
perturbations is task specific and not generally positive.Comment: 7 pages; 8 tables; 3 figure
Probabilistic Linguistic Knowledge and Token-level Text Augmentation
This paper investigates the effectiveness of token-level text augmentation
and the role of probabilistic linguistic knowledge within a
linguistically-motivated evaluation context. Two text augmentation programs,
REDA and REDA, were developed, both implementing five token-level text
editing operations: Synonym Replacement (SR), Random Swap (RS), Random
Insertion (RI), Random Deletion (RD), and Random Mix (RM). REDA
leverages pretrained -gram language models to select the most likely
augmented texts from REDA's output. Comprehensive and fine-grained experiments
were conducted on a binary question matching classification task in both
Chinese and English. The results strongly refute the general effectiveness of
the five token-level text augmentation techniques under investigation, whether
applied together or separately, and irrespective of various common
classification model types used, including transformers. Furthermore, the role
of probabilistic linguistic knowledge is found to be minimal.Comment: 20 pages; 3 figures; 8 table
Self Contrastive Learning for Session-based Recommendation
Session-based recommendation, which aims to predict the next item of users'
interest as per an existing sequence interaction of items, has attracted
growing applications of Contrastive Learning (CL) with improved user and item
representations. However, these contrastive objectives: (1) serve a similar
role as the cross-entropy loss while ignoring the item representation space
optimisation; and (2) commonly require complicated modelling, including complex
positive/negative sample constructions and extra data augmentation. In this
work, we introduce Self-Contrastive Learning (SCL), which simplifies the
application of CL and enhances the performance of state-of-the-art CL-based
recommendation techniques. Specifically, SCL is formulated as an objective
function that directly promotes a uniform distribution among item
representations and efficiently replaces all the existing contrastive objective
components of state-of-the-art models. Unlike previous works, SCL eliminates
the need for any positive/negative sample construction or data augmentation,
leading to enhanced interpretability of the item representation space and
facilitating its extensibility to existing recommender systems. Through
experiments on three benchmark datasets, we demonstrate that SCL consistently
improves the performance of state-of-the-art models with statistical
significance. Notably, our experiments show that SCL improves the performance
of two best-performing models by 8.2% and 9.5% in P@10 (Precision) and 9.9% and
11.2% in MRR@10 (Mean Reciprocal Rank) on average across different benchmarks.
Additionally, our analysis elucidates the improvement in terms of alignment and
uniformity of representations, as well as the effectiveness of SCL with a low
computational cost.Comment: Technical Repor
Strongly lensed repeating Fast Radio Bursts as precision probes of the universe
Fast Radio bursts (FRBs), bright transients with millisecond durations at
GHz and typical redshifts probably , are likely to be
gravitationally lensed by intervening galaxies. Since the time delay between
images of strongly lensed FRB can be measured to extremely high precision
because of the large ratio between the typical galaxy-lensing delay
time (10 days) and the width of bursts (ms),
we propose strongly lensed FRBs as precision probes of the universe. We show
that, within the flat CDM model, the Hubble constant can be
constrained with a uncertainty from 10 such systems probably
observed with the Square Kilometer Array (SKA) in 30 years. More
importantly, the cosmic curvature can be model-independently constrained to a
precision of . This constraint can directly test the validity of the
cosmological principle and break the intractable degeneracy between the cosmic
curvature and dark energy.Comment: 8 pages, 6 figure
Novel enzyme-fermentation process for bioconversion of restaurant food waste into isomaltooligosaccharide-and L-lactic acid-enriched animal feed
IntroductionConsidering the valuable organic fraction contained, restaurant food waste (RFW) has attracted more attention as an alternative substrate for animal feed production. In this work, a new enzyme-fermentation process (EFP) for diverting RFW into synbiotic animal feed was developed, and its economic and environmental benefits were evaluated.MethodsThe process initiated with enzymatic hydrolysis of RFWs, intending to convert starch into isomaltooligosaccharides (IMOs) via simultaneous saccharification and transglycosylation (SST). Subsequently, the hydrolysate underwent fermentation with engineered Pichia pastoris GSL to form L-lactic acid (L-LA) from the free glucose and to biologically enhance the nutritional value.Results and discussionThe results indicated that employing the EFP yielded the highest IMOs levels, ranging from 17.10–38.00 g/L. Simultaneously, the process achieved the maximum L-LA concentration (20.75–27.16 g/L), with a conversion efficiency of 0.64–0.78 g/g. Additionally, 5.0–8.5 g/L of yeast biomass was generated. Economic estimates elucidated that the cost of RFW-derived animal feed through EFP was about $0.16/kg, signifying a substantial cost reduction (≥ 70%) compared to traditional feeds. Achieving complete conversion of RFW into animal feed while eliminating residual waste highlights the significant environmental benefits and the compatibility of the present technology with the zero-waste concept
Developing an online academic writing tutorial for non-native English speaking international graduate students in diverse programs of studies
This presentation introduces a research/practice project that aims to help non-native English speaking international graduate students improve academic literature review writing through a series of extracurricular online tutorials. The presentation introduces the tutorial delivered via Moodle (an open-source learning management system) and supported by h5p interactive content. To support learners’ academic discourse socialization process, we have added interactive elements such as peer review, collaborative writing, and instructor feedback to the main writing tasks in the tutorials. The participants come from two Canadian universities and represent six different first languages and four disciplines. We report the preliminary results regarding the following research questions: what the participants’ challenges in academic writing are; and what kind of learning experiences they are getting from the tutorial (their genre awareness, reflection of collaborative writing practices, tutorial materials, etc.)
Interfacial Properties of Bilayer and Trilayer Graphene on Metal Substrates
One popular approach to prepare graphene is to grow them on transition metal
substrates via chemical vapor deposition. By using the density functional
theory with dispersion correction, we systematically investigate for the first
time the interfacial properties of bilayer (BLG) and trilayer graphene (TLG) on
metal substrates. Three categories of interfacial structures are revealed. The
adsorption of B(T)LG on Al, Ag, Cu, Au, and Pt substrates is a weak
physisorption, but a band gap can be opened. The adsorption of B(T)LG on Ti,
Ni, and Co substrates is a strong chemisorption, and a stacking-insensitive
band gap is opened for the two uncontacted layers of TLG. The adsorption of
B(T)LG on Pd substrate is a weaker chemisorption, with a band gap opened for
the uncontacted layers. This fundamental study also helps for B(T)LG device
study due to inevitable graphene/metal contact.Comment: 1 table, 8 figure
LucidDraw: Efficiently visualizing complex biochemical networks within MATLAB
<p>Abstract</p> <p>Background</p> <p>Biochemical networks play an essential role in systems biology. Rapidly growing network data and versatile research activities call for convenient visualization tools to aid intuitively perceiving abstract structures of networks and gaining insights into the functional implications of networks. There are various kinds of network visualization software, but they are usually not adequate for visual analysis of complex biological networks mainly because of the two reasons: 1) most existing drawing methods suitable for biochemical networks have high computation loads and can hardly achieve near real-time visualization; 2) available network visualization tools are designed for working in certain network modeling platforms, so they are not convenient for general analyses due to lack of broader range of readily accessible numerical utilities.</p> <p>Results</p> <p>We present LucidDraw as a visual analysis tool, which features (a) speed: typical biological networks with several hundreds of nodes can be drawn in a few seconds through a new layout algorithm; (b) ease of use: working within MATLAB makes it convenient to manipulate and analyze the network data using a broad spectrum of sophisticated numerical functions; (c) flexibility: layout styles and incorporation of other available information about functional modules can be controlled by users with little effort, and the output drawings are interactively modifiable.</p> <p>Conclusions</p> <p>Equipped with a new grid layout algorithm proposed here, LucidDraw serves as an auxiliary network analysis tool capable of visualizing complex biological networks in near real-time with controllable layout styles and drawing details. The framework of the algorithm enables easy incorporation of extra biological information, if available, to influence the output layouts with predefined node grouping features.</p
- …