485 research outputs found

    Space-efficient Feature Maps for String Alignment Kernels

    Get PDF
    String kernels are attractive data analysis tools for analyzing string data. Among them, alignment kernels are known for their high prediction accuracies in string classifications when tested in combination with SVM in various applications. However, alignment kernels have a crucial drawback in that they scale poorly due to their quadratic computation complexity in the number of input strings, which limits large-scale applications in practice. We address this need by presenting the first approximation for string alignment kernels, which we call space-efficient feature maps for edit distance with moves (SFMEDM), by leveraging a metric embedding named edit sensitive parsing (ESP) and feature maps (FMs) of random Fourier features (RFFs) for large-scale string analyses. The original FMs for RFFs consume a huge amount of memory proportional to the dimension d of input vectors and the dimension D of output vectors, which prohibits its large-scale applications. We present novel space-efficient feature maps (SFMs) of RFFs for a space reduction from O(dD) of the original FMs to O(d) of SFMs with a theoretical guarantee with respect to concentration bounds. We experimentally test SFMEDM on its ability to learn SVM for large-scale string classifications with various massive string data, and we demonstrate the superior performance of SFMEDM with respect to prediction accuracy, scalability and computation efficiency.Comment: Full version for ICDM'19 pape

    Evolving rules for document classification

    Get PDF
    We describe a novel method for using Genetic Programming to create compact classification rules based on combinations of N-Grams (character strings). Genetic programs acquire fitness by producing rules that are effective classifiers in terms of precision and recall when evaluated against a set of training documents. We describe a set of functions and terminals and provide results from a classification task using the Reuters 21578 dataset. We also suggest that because the induced rules are meaningful to a human analyst they may have a number of other uses beyond classification and provide a basis for text mining applications

    Transductive Learning with String Kernels for Cross-Domain Text Classification

    Full text link
    For many text classification tasks, there is a major problem posed by the lack of labeled data in a target domain. Although classifiers for a target domain can be trained on labeled text data from a related source domain, the accuracy of such classifiers is usually lower in the cross-domain setting. Recently, string kernels have obtained state-of-the-art results in various text classification tasks such as native language identification or automatic essay scoring. Moreover, classifiers based on string kernels have been found to be robust to the distribution gap between different domains. In this paper, we formally describe an algorithm composed of two simple yet effective transductive learning approaches to further improve the results of string kernels in cross-domain settings. By adapting string kernels to the test set without using the ground-truth test labels, we report significantly better accuracy rates in cross-domain English polarity classification.Comment: Accepted at ICONIP 2018. arXiv admin note: substantial text overlap with arXiv:1808.0840

    Evolving text classification rules with genetic programming

    Get PDF
    We describe a novel method for using genetic programming to create compact classification rules using combinations of N-grams (character strings). Genetic programs acquire fitness by producing rules that are effective classifiers in terms of precision and recall when evaluated against a set of training documents. We describe a set of functions and terminals and provide results from a classification task using the Reuters 21578 dataset. We also suggest that the rules may have a number of other uses beyond classification and provide a basis for text mining applications

    Associations between the LEP -2548G/A Promoter and Baseline Weight and between LEPR Gln223Arg and Lys656Asn Variants and Change in BMI z Scores in Arab Children and Adolescents Treated with Risperidone

    Get PDF
    Data on baseline (antipsychotic-naïve) age, weight, and height and change in these over three subsequent follow up time points up to 313.6 days (CI 303.5-323.7), were collected from 181 risperidone-treated children and adolescents (mean age 12.58 years, SD 4.99, range 2.17-17.7) attending a pediatric neurology clinic in Saudi Arabia. Owing to differences in genotypic distributions in subsamples, results are reported from the white Arabs (N=144). Age and gender-normed BMI-standardised z scores (BMI z) were calculated (lmsgrowth program). Linear regression was performed for baseline weight and BMI z, while change in BMI z was assessed using random effects ordered logistic regression. The following SNPs were analyzed: rs7799039 in the LEP promoter, rs1805094 (previously rs8179183), rs1137100 and rs1137101 in the LEPR, and rs1414334 in HTR2C. We found a nominally significant association between rs7799309 and baseline weight, adjusting for height, age, gender and diagnosis (A/G, P=0.035, β=-3.62, compared to G/G). rs1137101 (G/G, P=0.018, OR=4.13 compared to A/A) and rs1805094 C-allele carriers (P=0.019, OR=0.51) showed nominally significant associations with change in BMI z categories. Our data support and replicate previous relevant associations for these variants including with weight gain on risperidone, whilst being the first to report such associations in those of Arab ethnicity

    Macroalgae Decrease Growth and Alter Microbial Community Structure of the Reef-Building Coral, Porites astreoides

    Get PDF
    This is the publisher’s final pdf. The published article is copyrighted by the Public Library of Science and can be found at: http://www.plosone.org/home.action.With the continued and unprecedented decline of coral reefs worldwide, evaluating the factors that contribute to coral demise is of critical importance. As coral cover declines, macroalgae are becoming more common on tropical reefs. Interactions between these macroalgae and corals may alter the coral microbiome, which is thought to play an important role in colony health and survival. Together, such changes in benthic macroalgae and in the coral microbiome may result in a feedback mechanism that contributes to additional coral cover loss. To determine if macroalgae alter the coral microbiome, we conducted a field-based experiment in which the coral Porites astreoides was placed in competition with five species of macroalgae. Macroalgal contact increased variance in the coral-associated microbial community, and two algal species significantly altered microbial community composition. All macroalgae caused the disappearance of a γ-proteobacterium previously hypothesized to be an important mutualist of P. astreoides. Macroalgal contact also triggered: 1) increases or 2) decreases in microbial taxa already present in corals, 3) establishment of new taxa to the coral microbiome, and 4) vectoring and growth of microbial taxa from the macroalgae to the coral. Furthermore, macroalgal competition decreased coral growth rates by an average of 36.8%. Overall, this study found that competition between corals and certain species of macroalgae leads to an altered coral microbiome, providing a potential mechanism by which macroalgae-coral interactions reduce coral health and lead to coral loss on impacted reefs

    'Countries in the Air': Travel and Geomodernism in Louis MacNeice's BBC Features

    Get PDF
    In the middle stretch of his twenty-two-year BBC career, the poet and producer Louis MacNeice earned a reputation as one of the ‘undisputed masters of creative sound broadcasting’, a reputation derived, in part, from a huge range of radio features that were founded upon his journeys abroad. Through close examination of some of his most significant overseas soundscapes – including Portrait of Rome (1947) and Portrait of Delhi (1948) – this article will consider the role and function of travel in shaping MacNeice’s engagement with the radio feature as a modernist form at a particular transcultural moment when Britain moved through the end of the Second World War and the eventual disintegration of its empire

    Inferring Human Colonization History Using a Copying Model

    Get PDF
    Genome-wide scans of genetic variation can potentially provide detailed information on how modern humans colonized the world but require new methods of analysis. We introduce a statistical approach that uses Single Nucleotide Polymorphism (SNP) data to identify sharing of chromosomal segments between populations and uses the pattern of sharing to reconstruct a detailed colonization scenario. We apply our model to the SNP data for the 53 populations of the Human Genome Diversity Project described in Conrad et al. (Nature Genetics 38,1251-60, 2006). Our results are consistent with the consensus view of a single “Out-of-Africa” bottleneck and serial dilution of diversity during global colonization, including a prominent East Asian bottleneck. They also suggest novel details including: (1) the most northerly East Asian population in the sample (Yakut) has received a significant genetic contribution from the ancestors of the most northerly European one (Orcadian). (2) Native South Americans have received ancestry from a source closely related to modern North-East Asians (Mongolians and Oroquen) that is distinct from the sources for native North Americans, implying multiple waves of migration into the Americas. A detailed depiction of the peopling of the world is available in animated form
    corecore