50 research outputs found
Token Alignment via Character Matching for Subword Completion
Generative models, widely utilized in various applications, can often
struggle with prompts corresponding to partial tokens. This struggle stems from
tokenization, where partial tokens fall out of distribution during inference,
leading to incorrect or nonsensical outputs. This paper examines a technique to
alleviate the tokenization artifact on text completion in generative models,
maintaining performance even in regular non-subword cases. The method, termed
token alignment, involves backtracking to the last complete tokens and ensuring
the model's generation aligns with the prompt. This approach showcases marked
improvement across many partial token scenarios, including nuanced cases like
space-prefix and partial indentation, with only a minor time increase. The
technique and analysis detailed in this paper contribute to the continuous
advancement of generative models in handling partial inputs, bearing relevance
for applications like code completion and text autocompletion
Large expert-curated database for benchmarking document similarity detection in biomedical literature search
Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe
Numerical Simulation and Hydrodynamic Performance Predicting of 2 Two-Dimensional Hydrofoils in Tandem Configuration
In this study we investigated the performance of NACA 0012 hydrofoils aligned in tandem using parametric method and Neural Networks. We use the 2D viscous numerical model (STAR-CCM+) to simulate the hydrofoil system. To validate the numerical model, we modeled a single NACA 0012 configuration and compared it to experimental results. Results are found in concordance with the published experimental results. Then two NACA 0012 hydrofoils in tandem configuration were studied in relation to 788 combinations of the following parameters: spacing between two hydrofoils, angle of attack (AOA) of upstream hydrofoil and AOA of downstream hydrofoil. The effects exerted by these three parameters on the hydrodynamic coefficients Lift coefficient (CL), Drag Coefficient (CD) and Lift-Drag Ratio (LDR), are consistent with the behavior of the system. To establish a control system for the hydrofoil craft, a timely analysis of the hydrodynamic system is needed due to the computational resource constraints, analysis of a large combination and time consuming of the three parameters established. To provide a broader and faster way to predict the hydrodynamic performance of two hydrofoils in tandem configuration, an optimal artificial neural network (ANN) was trained using the large combination of three parameters generated from the numerical simulations. Regression analysis of the output of ANN was performed, and the results are consistent with numerical simulation with a correlation coefficient greater than 99.99%. The optimized spacing of 6.6c are suggested where the system has the lowest CD while obtaining the highest CL and LDR. The formula of the ANN was then presented, providing a reliable predicting method of hydrofoils in tandem configuration
Mediastinal high-grade vasculogenic mesenchymal tumour with seminoma: a case report and literature review
Abstract Germ cell tumours with somatic-type solid malignancy (GCT-STM) are a rare disease of the mediastinum. Recently, a cohort of vasculogenic mesenchymal tumour (VMT)-nonseminoma cases with different prognoses were recognized and reported. Here, we report a case of mediastinal high-grade VMT with a seminoma. A 16-year-old male had a fever, chest tightness and fatigue. Chest CT showed a 7.5 cm×5.3 cm solid mass in the right anterior mediastinum. The serum levels of alpha-fetoprotein (AFP), beta-human chorionic gonadotropin (β-HCG) and carcinoembryonic antigen (CEA) were within the normal range. Tumorectomy was performed. The tumour was irregular, and no capsule was found. The cut surface was greyish white and greyish brown with medium consistency. There were foci of bleeding and necrosis. Microscopic histology showed prominent vascular proliferation, which was lined by mildly atypical endothelial cells in a cellular stroma with significant cytologic atypia. The vascular spectrum varied from crevice-like or antler-like thin- to thick-walled vessels. Beyond the tumour area, inside the remnant thymus tissues, there were small clusters of polygonal tumour cells with clear cytoplasm, distinct cell membranes, and round to polygonal nuclei with prominent nucleoli that were positive for Oct4, PLAP, SALL4 and CD117. The patient did not receive any treatments pre- or postoperation, and his condition was stable without progression after 14 months of follow-up evaluation. Here, we added a new entity of GCT-STM of the mediastinum composed of VMT and seminoma. A better understanding of the pathological features of GCT-VMT could help pathologists improve their awareness of these rare diseases
Study on Kinetics of Carbonization Reaction of Hardened Cement Paste Powder Based on Carbonization Degree
The hardened cement paste powder (HCP) powder, devoid of the hydration cementing property, can be regenerated and cemented into a test block with practical strength of almost 60 MPa via CO2 carbonization using appropriate means. This study established a kinetic model of CO2 curing of an HCP powder test block based on the degree of carbonization to study the carbonization reaction kinetic characteristics of the test block. The model was modified according to the characteristics of the evident temperature differences in the reaction kettle in the early, middle, and late stages of the carbonization process. The proposed model can be used to formulate and control the carbonization and cementation processes of HCP powder and can also be applied to describe the kinetics of the reaction processes of other similar systems
Mechanism of Speed Loss Reduction and Propulsion Efficiency Improvement of ONR Tumblehome with Active-Controlled Stern Flaps in Resonance Waves
The stern flap is a practical hull appendage equipment that enhances ship navigation performance and saves energy. The existing studies mainly focus on the fixed stern flap, other than an actively controlled one, so it is worth further exploring its effect and mechanism. By implanting the PID controller to the stern flap, this paper proposed a free-running CFD model on the ONRT (the Office of Naval Research Tumblehome) ship coupled with the active-controlled stern flap to investigate the hydrodynamic performance in resonance waves. The free-running performance in calm water and regular waves is numerically researched and verified versus the experimental and referenced results. Then, the effect of different PID coefficients and control strategies of the stern flap on the traveling speed, attitudes, and propulsion performance under the resonance wave condition is conducted, and the influence mechanism is explored. The results show that adopting a fixed flap controller and PID controller can reduce the original speed loss by 4.2% and 6.9%, respectively, and increase the average propulsive efficiency of the propeller by 1.0% and 1.4%, respectively. Further analysis reveals that the global effect of the suppressed motion attitudes due to the installation of the fixed flap effectively contributes to the resistance reduction. However, the local effect of the stern flap increases the resistance due to interaction with the propeller and stern. The PID-controlled stern flap exhibits similar average attitudes compared to the fixed one, which means the resistance reduction of the global effect is kept the same, and the active stern flap further improves the stern flow field, where the resistance increment of the local effect is weakened, enhancing the traveling speed and improving the propulsion efficiency
First-Break Picking of Large-Offset Seismic Data Based on CNNs with Weighted Data
Deep reflection seismic data are usually accompanied by large-offset data, and the accurate and rapid identification of the first arrivals of seismic records plays an important role in eliminating the effects of topography and other factors that increase with the increasing offsets. In this paper, we propose a method based on convolutional neural networks (CNNs) that can accurately identify the first arrivals of large-offset seismic data. A time window for linear dynamic correction was established to convert the raw seismic data into rectangular images so as to reduce the amount of invalid sample data and improve the training efficiency. In order to enhance the prediction effect of the far-offset first arrivals, we propose the strategy of adjusting the weight of the far-offset data to increase the weight of the far-offset data in the training dataset and, thus, to improve the first arrival accuracy. The manually picked first arrivals are used as labels and the input to the CNNs for training, and the full-offset first arrivals are the output. The travel time tomography velocity is modeled and compared based on the first arrivals obtained through manual picking, industrial software automatic picking, and CNN prediction. The results show that the application of CNNs to large-offset seismic datasets can help researchers to obtain the first arrivals at different offsets, while the inclusion of far-offset weights can effectively improve the modeling depth of the tomography inversion, and the accuracy of the results is high
Research on ecological restoration zoning from the hydro-ecological perspective: A case study of Yanqing District, Beijing
To prevent the degradation and destruction of ecosystems, ecological restoration has become a worldwide issue. In recent years, China has been comprehensively carrying out relevant work on national spatial ecological restoration planning, among which the work on ecological restoration zoning can provide strong support for the implementation of ecological restoration strategies. We propose a theoretical framework and technical route for ecological restoration zoning from the hydro-ecological perspective, and suggest forming a regional ecological restoration zoning system based on the watershed as the basic unit and the dynamic evolution process of hydro-ecology as the core. We conduct an empirical study in Yanqing District, Beijing. The results show that: (i) From 2000 to 2022, the hydro-ecological comprehensive index of Yanqing District shows an upward trend. In terms of spatial pattern changes, it roughly shows a rising trend in the northeastern and southern mountainous areas, and a downward trend in the central-western and southwestern basins. (ii) Six types of ecological restoration zones are formed with the watershed as the basic unit. Zone Ⅰ accounts for the largest area (26.87%), and Zone Ⅵ accounts for the smallest proportion (7.14%). (iii) The proportion of areas with good or superior hydro-ecology in the Ecological Policy Zone of Yanqing District increased by 15.89% from 2000 to 2022, which is significantly higher than the average level, confirming the effectiveness of ecological policies such as nature reserves. This study provides a paradigm for ecological restoration zoning from the hydro-ecological perspective, and provides assistance for regional ecological restoration and sustainable development in China and even globally