92,818 research outputs found

    Boilerplate Removal using a Neural Sequence Labeling Model

    Full text link
    The extraction of main content from web pages is an important task for numerous applications, ranging from usability aspects, like reader views for news articles in web browsers, to information retrieval or natural language processing. Existing approaches are lacking as they rely on large amounts of hand-crafted features for classification. This results in models that are tailored to a specific distribution of web pages, e.g. from a certain time frame, but lack in generalization power. We propose a neural sequence labeling model that does not rely on any hand-crafted features but takes only the HTML tags and words that appear in a web page as input. This allows us to present a browser extension which highlights the content of arbitrary web pages directly within the browser using our model. In addition, we create a new, more current dataset to show that our model is able to adapt to changes in the structure of web pages and outperform the state-of-the-art model.Comment: WWW20 Demo pape

    Measuring gravitational waves from binary black hole coalescences: II. the waves' information and its extraction, with and without templates

    Get PDF
    We discuss the extraction of information from detected binary black hole (BBH) coalescence gravitational waves, focusing on the merger phase that occurs after the gradual inspiral and before the ringdown. Our results are: (1) If numerical relativity simulations have not produced template merger waveforms before BBH detections by LIGO/VIRGO, one can band-pass filter the merger waves. For BBHs smaller than about 40 solar masses detected via their inspiral waves, the band pass filtering signal to noise ratio indicates that the merger waves should typically be just barely visible in the noise for initial and advanced LIGO interferometers. (2) We derive an optimized (maximum likelihood) method for extracting a best-fit merger waveform from the noisy detector output; one "perpendicularly projects" this output onto a function space (specified using wavelets) that incorporates our prior knowledge of the waveforms. An extension of the method allows one to extract the BBH's two independent waveforms from outputs of several interferometers. (3) If numerical relativists produce codes for generating merger templates but running the codes is too expensive to allow an extensive survey of the merger parameter space, then a coarse survey of this parameter space, to determine the ranges of the several key parameters and to explore several qualitative issues which we describe, would be useful for data analysis purposes. (4) A complete set of templates could be used to test the nonlinear dynamics of general relativity and to measure some of the binary parameters. We estimate the number of bits of information obtainable from the merger waves (about 10 to 60 for LIGO/VIRGO, up to 200 for LISA), estimate the information loss due to template numerical errors or sparseness in the template grid, and infer approximate requirements on template accuracy and spacing.Comment: 33 pages, Rextex 3.1 macros, no figures, submitted to Phys Rev

    DNA crosslinking and biological activity of a hairpin polyamide–chlorambucil conjugate

    Get PDF
    A prototype of a novel class of DNA alkylating agents, which combines the DNA crosslinking moiety chlorambucil (Chl) with a sequence-selective hairpin pyrrole-imidazole polyamide ImPy-beta-ImPy-gamma-ImPy-beta-Dp (polyamide 1), was evaluated for its ability to damage DNA and induce biological responses. Polyamide 1-Chl conjugate (1-Chl) alkylates and interstrand crosslinks DNA in cell-free systems. The alkylation occurs predominantly at 5'-AGCTGCA-3' sequence, which represents the polyamide binding site. Conjugate-induced lesions were first detected on DNA treated for 1 h with 0.1 muM 1-Chl, indicating that the conjugate is at least 100-fold more potent than Chl. Prolonged incubation allowed for DNA damage detection even at 0.01 muM concentration. Treatment with 1-Chl decreased DNA template activity in simian virus 40 (SV40) in vitro replication assays. 1-Chl inhibited mammalian cell growth, genomic DNA replication and cell cycle progression, and arrested cells in the G(2)/M phase. Moreover, cellular effects were observed at 1-Chl concentrations similar to those needed for DNA damage in cell-free systems. Neither of the parent compounds, unconjugated Chl or polyamide 1, demonstrated any cellular activity in the same concentration range. The conjugate molecule 1-Chl possesses the sequence-selectivity of a polyamide and the enhanced DNA reactivity of Chl

    Automatic detection of change in address blocks for reply forms processing

    Get PDF
    In this paper, an automatic method to detect the presence of on-line erasures/scribbles/corrections/over-writing in the address block of various types of subscription and utility payment forms is presented. The proposed approach employs bottom-up segmentation of the address block. Heuristic rules based on structural features are used to automate the detection process. The algorithm is applied on a large dataset of 5,780 real world document forms of 200 dots per inch resolution. The proposed algorithm performs well with an average processing time of 108 milliseconds per document with a detection accuracy of 98.96%

    MicroRNA-like RNAs from the same miRNA precursors play a role in cassava chilling responses

    Get PDF
    Abstract MicroRNAs (miRNAs) are known to play important roles in various cellular processes and stress responses. MiRNAs can be identified by analyzing reads from high-throughput deep sequencing. The reads realigned to miRNA precursors besides canonical miRNAs were initially considered as sequencing noise and ignored from further analysis. Here we reported a small-RNA species of phased and half-phased miRNA-like RNAs different from canonical miRNAs from cassava miRNA precursors detected under four distinct chilling conditions. They can form abundant multiple small RNAs arranged along precursors in a tandem and phased or half-phased fashion. Some of these miRNA-like RNAs were experimentally confirmed by re-amplification and re-sequencing, and have a similar qRT-PCR detection ratio as their cognate canonical miRNAs. The target genes of those phased and half-phased miRNA-like RNAs function in process of cell growth metabolism and play roles in protein kinase. Half-phased miR171d.3 was confirmed to have cleavage activities on its target gene P-glycoprotein 11, a broad substrate efflux pump across cellular membranes, which is thought to provide protection for tropical cassava during sharp temperature decease. Our results showed that the RNAs from miRNA precursors are miRNA-like small RNAs that are viable negative gene regulators and may have potential functions in cassava chilling responses
    corecore