1,011 research outputs found

    HiGitClass: Keyword-Driven Hierarchical Classification of GitHub Repositories

    Full text link
    GitHub has become an important platform for code sharing and scientific exchange. With the massive number of repositories available, there is a pressing need for topic-based search. Even though the topic label functionality has been introduced, the majority of GitHub repositories do not have any labels, impeding the utility of search and topic-based analysis. This work targets the automatic repository classification problem as keyword-driven hierarchical classification. Specifically, users only need to provide a label hierarchy with keywords to supply as supervision. This setting is flexible, adaptive to the users' needs, accounts for the different granularity of topic labels and requires minimal human effort. We identify three key challenges of this problem, namely (1) the presence of multi-modal signals; (2) supervision scarcity and bias; (3) supervision format mismatch. In recognition of these challenges, we propose the HiGitClass framework, comprising of three modules: heterogeneous information network embedding; keyword enrichment; topic modeling and pseudo document generation. Experimental results on two GitHub repository collections confirm that HiGitClass is superior to existing weakly-supervised and dataless hierarchical classification methods, especially in its ability to integrate both structured and unstructured data for repository classification.Comment: 10 pages; Accepted to ICDM 2019; Some typos fixe

    Statistical Algorithms for Ontology-based Annotation of Scientific Literature

    Get PDF
    Background: Ontologies encode relationships within a domain in robust data structures that can be used to annotate data objects, including scientific papers, in ways that ease tasks such as search and meta-analysis. However, the annotation process requires significant time and effort when performed by humans. Text mining algorithms can facilitate this process, but they render an analysis mainly based upon keyword, synonym and semantic matching. They do not leverage information embedded in an ontology’s structure. Methods: We present a probabilistic framework that facilitates the automatic annotation of literature by indirectly modeling the restrictions among the different classes in the ontology. Our research focuses on annotating human functional neuroimaging literature within the Cognitive Paradigm Ontology (CogPO). We use an approach that combines the stochastic simplicity of naïve Bayes with the formal transparency of decision trees. Our data structure is easily modifiable to reflect changing domain knowledge. Results: We compare our results across naïve Bayes, Bayesian Decision Trees, and Constrained Decision Tree classifiers that keep a human expert in the loop, in terms of the quality measure of the F1-mirco score. Conclusions: Unlike traditional text mining algorithms, our framework can model the knowledge encoded by the dependencies in an ontology, albeit indirectly. We successfully exploit the fact that CogPO has explicitly stated restrictions, and implicit dependencies in the form of patterns in the expert curated annotations

    Automated Annotation of Functional Imaging Experiments via Multi-Label Classification

    Get PDF
    Identifying the experimental methods in human neuroimaging papers is important for grouping meaningfully similar experiments for meta-analyses. Currently, this can only be done by human readers. We present the performance of common machine learning (text mining) methods applied to the problem of automatically classifying or labeling this literature. Labeling terms are from the Cognitive Paradigm Ontology (CogPO), the text corpora are abstracts of published functional neuroimaging papers, and the methods use the performance of a human expert as training data. We aim to replicate the expert’s annotation of multiple labels per abstract identifying the experimental stimuli, cognitive paradigms, response types, and other relevant dimensions of the experiments. We use several standard machine learning methods: naive Bayes (NB), k -nearest neighbor, and support vector machines (specifically SMO or sequential minimal optimization). Exact match performance ranged from only 15% in the worst cases to 78% in the best cases. NB methods combined with binary relevance transformations performed strongly and were robust to overfitting. This collection of results demonstrates what can be achieved with off-the-shelf software components and little to no pre-processing of raw text

    Higher-order multipole amplitude measurement in ψ(2S)γχc2\psi(2S)\to\gamma\chi_{c2}

    Full text link
    Using 106×106106\times10^6 ψ(2S)\psi(2S) events collected with the BESIII detector at the BEPCII storage ring, the higher-order multipole amplitudes in the radiative transition ψ(2S)γχc2γππ/γKK\psi(2S)\to\gamma\chi_{c2}\to\gamma\pi\pi/\gamma KK are measured. A fit to the χc2\chi_{c2} production and decay angular distributions yields M2=0.046±0.010±0.013M2=0.046\pm0.010\pm0.013 and E3=0.015±0.008±0.018E3=0.015\pm0.008\pm0.018, where the first errors are statistical and the second systematic. Here M2M2 denotes the normalized magnetic quadrupole amplitude and E3E3 the normalized electric octupole amplitude. This measurement shows evidence for the existence of the M2M2 signal with 4.4σ4.4\sigma statistical significance and is consistent with the charm quark having no anomalous magnetic moment.Comment: 14 pages, 4 figure

    Observation of a near-threshold enhancement in th p pbar mass spectrum from radiative J/psi-->gamma p pbar decays

    Full text link
    We observe a narrow enhancement near 2mp in the invariant mass spectrum of ppbar pairs from radiative J/psi-->gamma ppbar decays. The enhancement can be fit with either an S- or P-wave Breit Wigner fuction. In the case of the S-wave fit, the peak mass is below the 2mp threshold and the full width is less than 30 MeV. These mass and width values are not consistent with the properties of any known meson resonance.Comment: 5 pages, 4 figures, submitted to Phys. Rev. Let

    Search for the decay J/ψγ+invisibleJ/\psi\to\gamma + \rm {invisible}

    Full text link
    We search for J/ψJ/\psi radiative decays into a weakly interacting neutral particle, namely an invisible particle, using the J/ψJ/\psi produced through the process ψ(3686)π+πJ/ψ\psi(3686)\to\pi^+\pi^-J/\psi in a data sample of (448.1±2.9)×106(448.1\pm2.9)\times 10^6 ψ(3686)\psi(3686) decays collected by the BESIII detector at BEPCII. No significant signal is observed. Using a modified frequentist method, upper limits on the branching fractions are set under different assumptions of invisible particle masses up to 1.2  GeV/c2\mathrm{\ Ge\kern -0.1em V}/c^2. The upper limit corresponding to an invisible particle with zero mass is 7.0×107\times 10^{-7} at the 90\% confidence level

    First Observation of the Decays chi_{cJ} -> pi^0 pi^0 pi^0 pi^0

    Full text link
    We present a study of the P-wave spin -triplet charmonium chi_{cJ} decays (J=0,1,2) into pi^0 pi^0 pi^0 pi^0. The analysis is based on 106 million \psiprime decays recorded with the BESIII detector at the BEPCII electron positron collider. The decay into the pi^0 pi^0 pi^0 pi^0 hadronic final state is observed for the first time. We measure the branching fractions B(chi_{c0} -> pi^0 pi^0 pi^0 pi^0)=(3.34 +- 0.06 +- 0.44)*10^{-3}, B(chi_{c1} -> pi^0 pi^0 pi^0 pi^0)=(0.57 +- 0.03 +- 0.08)*10^{-3}, and B(chi_{c2} -> pi^0 pi^0 pi^0 pi^0)=(1.21 +- 0.05 +- 0.16)*10^{-3}, where the uncertainties are statistical and systematical, respectively.Comment: 7 pages, 3 figure

    Observation of χc1\chi_{c1} decays into vector meson pairs ϕϕ\phi\phi, ωω\omega\omega, and ωϕ\omega\phi

    Get PDF
    Decays of χc1\chi_{c1} to vector meson pairs ϕϕ\phi\phi, ωω\omega\omega and ωϕ\omega\phi are observed for the first time using (106±4)×106(106\pm4)\times 10^6 \psip events accumulated at the BESIII detector at the BEPCII e+ee^+e^- collider. The branching fractions are measured to be (4.4±0.3±0.5)×104(4.4\pm 0.3\pm 0.5)\times 10^{-4}, (6.0±0.3±0.7)×104(6.0\pm 0.3\pm 0.7)\times 10^{-4}, and (2.2±0.6±0.2)×105(2.2\pm 0.6\pm 0.2)\times 10^{-5}, for χc1ϕϕ\chi_{c1}\to \phi\phi, ωω\omega\omega, and ωϕ\omega\phi, respectively. The observation of χc1\chi_{c1} decays into a pair of vector mesons ϕϕ\phi\phi, ωω\omega\omega and ωϕ\omega\phi indicates that the hadron helicity selection rule is significantly violated in χcJ\chi_{cJ} decays. In addition, the measurement of χcJωϕ\chi_{cJ}\to \omega\phi gives the rate of doubly OZI-suppressed decay. Branching fractions for χc0\chi_{c0} and χc2\chi_{c2} decays into other vector meson pairs are also measured with improved precision.Comment: 4 pages, 2 figure
    corecore