103 research outputs found

    Optimal Algorithms for Crawling a Hidden Database in the Web

    Full text link
    A hidden database refers to a dataset that an organization makes accessible on the web by allowing users to issue queries through a search interface. In other words, data acquisition from such a source is not by following static hyper-links. Instead, data are obtained by querying the interface, and reading the result page dynamically generated. This, with other facts such as the interface may answer a query only partially, has prevented hidden databases from being crawled effectively by existing search engines. This paper remedies the problem by giving algorithms to extract all the tuples from a hidden database. Our algorithms are provably efficient, namely, they accomplish the task by performing only a small number of queries, even in the worst case. We also establish theoretical results indicating that these algorithms are asymptotically optimal -- i.e., it is impossible to improve their efficiency by more than a constant factor. The derivation of our upper and lower bound results reveals significant insight into the characteristics of the underlying problem. Extensive experiments confirm the proposed techniques work very well on all the real datasets examined.Comment: VLDB201

    Ownership Concentration, Financial Leverage and Inefficient Investment-evidence from Chinese A-share Market

    Get PDF
    This paper analyzes the data from Chinese A-share market during 2 years from 2014 to 2015. Basing on 2297 listed firms, we use theoretical analysis and empirical analysis to explore and validate the relationship between ownership concentration, financial leverage and the company's inefficient investment behavior. The result shows that in Chinese A-share market, financial leverage can effectively inhibit the company's inefficient investment behavior; the concentration of equity will effectively inhibit the company's inefficient investment behavior

    Evaluating Large Language Models on Controlled Generation Tasks

    Full text link
    While recent studies have looked into the abilities of large language models in various benchmark tasks, including question generation, reading comprehension, multilingual and etc, there have been few studies looking into the controllability of large language models on generation tasks. We present an extensive analysis of various benchmarks including a sentence planning benchmark with different granularities. After comparing large language models against state-of-the-start finetuned smaller models, we present a spectrum showing large language models falling behind, are comparable, or exceed the ability of smaller models. We conclude that **large language models struggle at meeting fine-grained hard constraints**.Comment: EMNLP 202

    Global analysis of N6-methyladenosine functions and its disease association using deep learning and network-based methods

    Get PDF
    <div><p>N6-methyladenosine (m<sup>6</sup>A) is the most abundant methylation, existing in >25% of human mRNAs. Exciting recent discoveries indicate the close involvement of m<sup>6</sup>A in regulating many different aspects of mRNA metabolism and diseases like cancer. However, our current knowledge about how m<sup>6</sup>A levels are controlled and whether and how regulation of m<sup>6</sup>A levels of a specific gene can play a role in cancer and other diseases is mostly elusive. We propose in this paper a computational scheme for predicting m<sup>6</sup>A-regulated genes and m<sup>6</sup>A-associated disease, which includes Deep-m<sup>6</sup>A, the first model for detecting condition-specific m<sup>6</sup>A sites from MeRIP-Seq data with a single base resolution using deep learning and Hot-m<sup>6</sup>A, a new network-based pipeline that prioritizes functional significant m<sup>6</sup>A genes and its associated diseases using the Protein-Protein Interaction (PPI) and gene-disease heterogeneous networks. We applied Deep-m<sup>6</sup>A and this pipeline to 75 MeRIP-seq human samples, which produced a compact set of 709 functionally significant m<sup>6</sup>A-regulated genes and nine functionally enriched subnetworks. The functional enrichment analysis of these genes and networks reveal that m<sup>6</sup>A targets key genes of many critical biological processes including transcription, cell organization and transport, and cell proliferation and cancer-related pathways such as Wnt pathway. The m<sup>6</sup>A-associated disease analysis prioritized five significantly associated diseases including leukemia and renal cell carcinoma. These results demonstrate the power of our proposed computational scheme and provide new leads for understanding m<sup>6</sup>A regulatory functions and its roles in diseases.</p></div

    Comparison of a solvent mixture assisted dilute acid and alkali pretreatment in sugar production from hybrid Pennisetum

    Get PDF
    Abstract(#br)The effects of an acetone-butanol-ethanol (ABE) mixture on dilute H 2 SO 4 and NaOH pretreatment for enzymatic saccharification of hybrid Pennisetum (HP) were investigated. The results showed that ABE assisted the removal of xylan and lignin during H 2 SO 4 and NaOH pretreatment, respectively. The glucose yield of HP increased from 33.6% to 52.9% with the assistance of a relatively higher concentration of ABE mixture (ABE4) during H 2 SO 4 pretreatment, and during NaOH pretreatment, a lower concentration of ABE (ABE2) increased the glucose yield from 64.6% to 80.2%. The hydrolysis yield increases were related to the compositional change and surface characteristics of the pretreated materials. As observed by X-ray photoelectron spectroscopy, ABE4 resulted in a greater lignin content on the surface of materials than that produced by ABE2 during NaOH pretreatment, which possibly increased the non-productive adsorption of cellulase, thus decreasing the hydrolysis yield. The results suggested that an ABE mixture could be used as an auxiliary agent for further increasing of the digestibility of acid- and alkali-pretreated lignocellulosic materials. However, the digestibility was different depending on the concentrations of ABE during acid and alkali pretreatments

    Population pharmacokinetics of Amisulpride in Chinese patients with schizophrenia with external validation: the impact of renal function

    Get PDF
    Introduction: Amisulpride is primarily eliminated via the kidneys. Given the clear influence of renal clearance on plasma concentration, we aimed to explicitly examine the impact of renal function on amisulpride pharmacokinetics (PK) via population PK modelling and Monte Carlo simulations.Method: Plasma concentrations from 921 patients (776 in development and 145 in validation) were utilized.Results: Amisulpride PK could be described by a one-compartment model with linear elimination where estimated glomerular filtration rate, eGFR, had a significant influence on clearance. All PK parameters (estimate, RSE%) were precisely estimated: apparent volume of distribution (645 L, 18%), apparent clearance (60.5 L/h, 2%), absorption rate constant (0.106 h−1, 12%) and coefficient of renal function on clearance (0.817, 10%). No other significant covariate was found. The predictive performance of the model was externally validated. Covariate analysis showed an inverse relationship between eGFR and exposure, where subjects with eGFR= 30 mL/min/1.73 m2 had more than 2-fold increase in AUC, trough and peak concentration. Simulation results further illustrated that, given a dose of 800 mg, plasma concentrations of all patients with renal impairment would exceed 640 ng/mL.Discussion: Our work demonstrated the importance of renal function in amisulpride dose adjustment and provided a quantitative framework to guide individualized dosing for Chinese patients with schizophrenia

    Multi-site, Multi-domain Airway Tree Modeling (ATM'22): A Public Benchmark for Pulmonary Airway Segmentation

    Full text link
    Open international challenges are becoming the de facto standard for assessing computer vision and image analysis algorithms. In recent years, new methods have extended the reach of pulmonary airway segmentation that is closer to the limit of image resolution. Since EXACT'09 pulmonary airway segmentation, limited effort has been directed to quantitative comparison of newly emerged algorithms driven by the maturity of deep learning based approaches and clinical drive for resolving finer details of distal airways for early intervention of pulmonary diseases. Thus far, public annotated datasets are extremely limited, hindering the development of data-driven methods and detailed performance evaluation of new algorithms. To provide a benchmark for the medical imaging community, we organized the Multi-site, Multi-domain Airway Tree Modeling (ATM'22), which was held as an official challenge event during the MICCAI 2022 conference. ATM'22 provides large-scale CT scans with detailed pulmonary airway annotation, including 500 CT scans (300 for training, 50 for validation, and 150 for testing). The dataset was collected from different sites and it further included a portion of noisy COVID-19 CTs with ground-glass opacity and consolidation. Twenty-three teams participated in the entire phase of the challenge and the algorithms for the top ten teams are reviewed in this paper. Quantitative and qualitative results revealed that deep learning models embedded with the topological continuity enhancement achieved superior performance in general. ATM'22 challenge holds as an open-call design, the training data and the gold standard evaluation are available upon successful registration via its homepage.Comment: 32 pages, 16 figures. Homepage: https://atm22.grand-challenge.org/. Submitte

    Carbon dots-based dual-emission ratiometric fluorescence sensor for dopamine detection

    Get PDF
    The detection of Dopamine (DA) is significant for disease surveillance and prevention. However, the development of the precise and simple detection techniques is still at a preliminary stage due to their high tester requirements, time-consuming process, and low accuracy. In this work, we present a novel dual-emission ratiometric fluorescence sensing system based on a hybrid of carbon dots (CDs) and 7-amino-4-methylcoumarin (AMC) to quickly monitor the DA concentration. Linked via amide bonds, the CDs and AMC offered dual-emissions with peaks located at 455 and 505 nm, respectively, under a single excitation wavelength of 300 nm. Attributed to the fluorescence of the CDs and AMC in the nanohybrid system can be quenched by DA, the concentration of DA could be quantitatively detected by monitoring the ratiometric ratio change in fluorescent intensity. More importantly, the CDs-AMC-based dual-emission ratiometric fluorescence sensing system demonstrated a remarkable linear relationship in the range of 0–33.6 μM to detection of DA, and a low detection limit of 5.67 nM. Additionally, this sensor successfully applied to the detection of DA in real samples. Therefore, the ratiometric fluorescence sensing system may become promising to find potential applications in biomedical dopamine detection

    Potential of Core-Collapse Supernova Neutrino Detection at JUNO

    Get PDF
    JUNO is an underground neutrino observatory under construction in Jiangmen, China. It uses 20kton liquid scintillator as target, which enables it to detect supernova burst neutrinos of a large statistics for the next galactic core-collapse supernova (CCSN) and also pre-supernova neutrinos from the nearby CCSN progenitors. All flavors of supernova burst neutrinos can be detected by JUNO via several interaction channels, including inverse beta decay, elastic scattering on electron and proton, interactions on C12 nuclei, etc. This retains the possibility for JUNO to reconstruct the energy spectra of supernova burst neutrinos of all flavors. The real time monitoring systems based on FPGA and DAQ are under development in JUNO, which allow prompt alert and trigger-less data acquisition of CCSN events. The alert performances of both monitoring systems have been thoroughly studied using simulations. Moreover, once a CCSN is tagged, the system can give fast characterizations, such as directionality and light curve

    Detection of the Diffuse Supernova Neutrino Background with JUNO

    Get PDF
    As an underground multi-purpose neutrino detector with 20 kton liquid scintillator, Jiangmen Underground Neutrino Observatory (JUNO) is competitive with and complementary to the water-Cherenkov detectors on the search for the diffuse supernova neutrino background (DSNB). Typical supernova models predict 2-4 events per year within the optimal observation window in the JUNO detector. The dominant background is from the neutral-current (NC) interaction of atmospheric neutrinos with 12C nuclei, which surpasses the DSNB by more than one order of magnitude. We evaluated the systematic uncertainty of NC background from the spread of a variety of data-driven models and further developed a method to determine NC background within 15\% with {\it{in}} {\it{situ}} measurements after ten years of running. Besides, the NC-like backgrounds can be effectively suppressed by the intrinsic pulse-shape discrimination (PSD) capabilities of liquid scintillators. In this talk, I will present in detail the improvements on NC background uncertainty evaluation, PSD discriminator development, and finally, the potential of DSNB sensitivity in JUNO
    corecore