1,011 research outputs found
HiGitClass: Keyword-Driven Hierarchical Classification of GitHub Repositories
GitHub has become an important platform for code sharing and scientific
exchange. With the massive number of repositories available, there is a
pressing need for topic-based search. Even though the topic label functionality
has been introduced, the majority of GitHub repositories do not have any
labels, impeding the utility of search and topic-based analysis. This work
targets the automatic repository classification problem as keyword-driven
hierarchical classification. Specifically, users only need to provide a label
hierarchy with keywords to supply as supervision. This setting is flexible,
adaptive to the users' needs, accounts for the different granularity of topic
labels and requires minimal human effort. We identify three key challenges of
this problem, namely (1) the presence of multi-modal signals; (2) supervision
scarcity and bias; (3) supervision format mismatch. In recognition of these
challenges, we propose the HiGitClass framework, comprising of three modules:
heterogeneous information network embedding; keyword enrichment; topic modeling
and pseudo document generation. Experimental results on two GitHub repository
collections confirm that HiGitClass is superior to existing weakly-supervised
and dataless hierarchical classification methods, especially in its ability to
integrate both structured and unstructured data for repository classification.Comment: 10 pages; Accepted to ICDM 2019; Some typos fixe
Statistical Algorithms for Ontology-based Annotation of Scientific Literature
Background: Ontologies encode relationships within a domain in robust data structures that can be used to annotate data objects, including scientific papers, in ways that ease tasks such as search and meta-analysis. However, the annotation process requires significant time and effort when performed by humans. Text mining algorithms can facilitate this process, but they render an analysis mainly based upon keyword, synonym and semantic matching. They do not leverage information embedded in an ontology’s structure. Methods: We present a probabilistic framework that facilitates the automatic annotation of literature by indirectly modeling the restrictions among the different classes in the ontology. Our research focuses on annotating human functional neuroimaging literature within the Cognitive Paradigm Ontology (CogPO). We use an approach that combines the stochastic simplicity of naïve Bayes with the formal transparency of decision trees. Our data structure is easily modifiable to reflect changing domain knowledge. Results: We compare our results across naïve Bayes, Bayesian Decision Trees, and Constrained Decision Tree classifiers that keep a human expert in the loop, in terms of the quality measure of the F1-mirco score. Conclusions: Unlike traditional text mining algorithms, our framework can model the knowledge encoded by the dependencies in an ontology, albeit indirectly. We successfully exploit the fact that CogPO has explicitly stated restrictions, and implicit dependencies in the form of patterns in the expert curated annotations
Automated Annotation of Functional Imaging Experiments via Multi-Label Classification
Identifying the experimental methods in human neuroimaging papers is important for grouping meaningfully similar experiments for meta-analyses. Currently, this can only be done by human readers. We present the performance of common machine learning (text mining) methods applied to the problem of automatically classifying or labeling this literature. Labeling terms are from the Cognitive Paradigm Ontology (CogPO), the text corpora are abstracts of published functional neuroimaging papers, and the methods use the performance of a human expert as training data. We aim to replicate the expert’s annotation of multiple labels per abstract identifying the experimental stimuli, cognitive paradigms, response types, and other relevant dimensions of the experiments. We use several standard machine learning methods: naive Bayes (NB), k -nearest neighbor, and support vector machines (specifically SMO or sequential minimal optimization). Exact match performance ranged from only 15% in the worst cases to 78% in the best cases. NB methods combined with binary relevance transformations performed strongly and were robust to overfitting. This collection of results demonstrates what can be achieved with off-the-shelf software components and little to no pre-processing of raw text
Higher-order multipole amplitude measurement in
Using events collected with the BESIII detector at
the BEPCII storage ring, the higher-order multipole amplitudes in the radiative
transition are measured.
A fit to the production and decay angular distributions yields
and , where the first
errors are statistical and the second systematic. Here denotes the
normalized magnetic quadrupole amplitude and the normalized electric
octupole amplitude. This measurement shows evidence for the existence of the
signal with statistical significance and is consistent with
the charm quark having no anomalous magnetic moment.Comment: 14 pages, 4 figure
Observation of a near-threshold enhancement in th p pbar mass spectrum from radiative J/psi-->gamma p pbar decays
We observe a narrow enhancement near 2mp in the invariant mass spectrum of
ppbar pairs from radiative J/psi-->gamma ppbar decays. The enhancement can be
fit with either an S- or P-wave Breit Wigner fuction. In the case of the S-wave
fit, the peak mass is below the 2mp threshold and the full width is less than
30 MeV. These mass and width values are not consistent with the properties of
any known meson resonance.Comment: 5 pages, 4 figures, submitted to Phys. Rev. Let
Search for the decay
We search for radiative decays into a weakly interacting neutral
particle, namely an invisible particle, using the produced through the
process in a data sample of
decays collected by the BESIII detector
at BEPCII. No significant signal is observed. Using a modified frequentist
method, upper limits on the branching fractions are set under different
assumptions of invisible particle masses up to 1.2 . The upper limit corresponding to an invisible particle with zero mass
is 7.0 at the 90\% confidence level
First Observation of the Decays chi_{cJ} -> pi^0 pi^0 pi^0 pi^0
We present a study of the P-wave spin -triplet charmonium chi_{cJ} decays
(J=0,1,2) into pi^0 pi^0 pi^0 pi^0. The analysis is based on 106 million
\psiprime decays recorded with the BESIII detector at the BEPCII electron
positron collider. The decay into the pi^0 pi^0 pi^0 pi^0 hadronic final state
is observed for the first time. We measure the branching fractions B(chi_{c0}
-> pi^0 pi^0 pi^0 pi^0)=(3.34 +- 0.06 +- 0.44)*10^{-3}, B(chi_{c1} -> pi^0 pi^0
pi^0 pi^0)=(0.57 +- 0.03 +- 0.08)*10^{-3}, and B(chi_{c2} -> pi^0 pi^0 pi^0
pi^0)=(1.21 +- 0.05 +- 0.16)*10^{-3}, where the uncertainties are statistical
and systematical, respectively.Comment: 7 pages, 3 figure
Observation of decays into vector meson pairs , , and
Decays of to vector meson pairs , and
are observed for the first time using
\psip events accumulated at the BESIII detector at the BEPCII
collider. The branching fractions are measured to be , , and , for , , and ,
respectively. The observation of decays into a pair of vector
mesons , and indicates that the hadron
helicity selection rule is significantly violated in decays. In
addition, the measurement of gives the rate of doubly
OZI-suppressed decay. Branching fractions for and
decays into other vector meson pairs are also measured with improved precision.Comment: 4 pages, 2 figure
- …