12,210 research outputs found
Predicting Good Configurations for GitHub and Stack Overflow Topic Models
Software repositories contain large amounts of textual data, ranging from
source code comments and issue descriptions to questions, answers, and comments
on Stack Overflow. To make sense of this textual data, topic modelling is
frequently used as a text-mining tool for the discovery of hidden semantic
structures in text bodies. Latent Dirichlet allocation (LDA) is a commonly used
topic model that aims to explain the structure of a corpus by grouping texts.
LDA requires multiple parameters to work well, and there are only rough and
sometimes conflicting guidelines available on how these parameters should be
set. In this paper, we contribute (i) a broad study of parameters to arrive at
good local optima for GitHub and Stack Overflow text corpora, (ii) an
a-posteriori characterisation of text corpora related to eight programming
languages, and (iii) an analysis of corpus feature importance via per-corpus
LDA configuration. We find that (1) popular rules of thumb for topic modelling
parameter configuration are not applicable to the corpora used in our
experiments, (2) corpora sampled from GitHub and Stack Overflow have different
characteristics and require different configurations to achieve good model fit,
and (3) we can predict good configurations for unseen corpora reliably. These
findings support researchers and practitioners in efficiently determining
suitable configurations for topic modelling when analysing textual data
contained in software repositories.Comment: to appear as full paper at MSR 2019, the 16th International
Conference on Mining Software Repositorie
Fracture toughness testing data: A technology survey
Technical abstracts for about 90 significant documents relating to fracture toughness testing for various structural materials including information on plane strain and the developing areas of mixed mode and plane stress test conditions are presented. An overview of the state-of-the-art represented in the documents that have been abstracted is included. The abstracts in the report are mostly for publications in the period April 1962 through April 1974. The purpose of this report is to provide, in quick reference form, a dependable source for current information in the subject field
Graph Theory and Networks in Biology
In this paper, we present a survey of the use of graph theoretical techniques
in Biology. In particular, we discuss recent work on identifying and modelling
the structure of bio-molecular networks, as well as the application of
centrality measures to interaction networks and research on the hierarchical
structure of such networks and network motifs. Work on the link between
structural network properties and dynamics is also described, with emphasis on
synchronization and disease propagation.Comment: 52 pages, 5 figures, Survey Pape
Soil, grain and water chemistry and human selenium imbalances in Enshi district, Hubei Province, China
Many elements which are essential to human and other animal health in small doses can
be toxic if ingested in excess. Selenium (Se), a naturally occurring metalloid element is
found in all natural materials on earth including rocks, soils, waters, air, plant and
animal tissues. Since the early 1930’s, it has been recognised that Se toxicity causes
hoof disorders and hair loss in livestock. Se was also identified as an essential trace
element to humans and other animals in the late 1950’s. It forms a vital constituent of
the biologically important enzyme glutathione peroxidase which acts as an anti-oxidant
preventing cell degeneration. Se deficiency has been implicated in the aetiology of
several diseases including cancer, muscular dystrophy, muscular sclerosis and cystic
fibrosis. Se can be assimilated in humans through several pathways including food,
drinking water and inhalation of Se-bearing particles from the atmosphere. In the
majority of situations, food is the most important source of Se, as levels in water are
very low. The narrow range between deficiency levels (<40 pg per day) and toxic
levels in susceptible people (> 900 pg per day) makes it necessary to carefully control
the amount of Se in the diet.
In China, Se deficiency has been linked to an endemic degenerative heart disease
known as Keshan Disease (KD) and an endemic osteoarthropathy which causes
deformity of affected joints, known as Kaschin-Beck Disease. These diseases occur in
a geographic belt stretching from Heilongjiang Province in north-east China to Yunnan
Province in the south-west. In the period between 1959 and 1970, peak KD incidence
rates exceeded 40 per 100 000 (approximately 8500 cases per annum) with 1400 - 3000
deaths recorded each year. Incidence rates have since fallen to less than 5 per 100 000
with approximately 1000 new cases reported annually (Levander, 1986). Se toxicity
(selenosis) resulting in hair and nail loss and disorders of the nervous system in the
human population, has also been recorded in Enshi District, Hubei Province and in
Ziyang County, Shanxi Province. China possesses one of the best epidemiological
databases in the world on Se-related diseases which has been used in conjunction with
geochemical data to demonstrate a significant geochemical control on human Se
exposure. However, the precise geographical areas at risk and the geochemical controls
on selenium availability have yet to be established
NDE: An effective approach to improved reliability and safety. A technology survey
Technical abstracts are presented for about 100 significant documents relating to nondestructive testing of aircraft structures or related structural testing and the reliability of the more commonly used evaluation methods. Particular attention is directed toward acoustic emission; liquid penetrant; magnetic particle; ultrasonics; eddy current; and radiography. The introduction of the report includes an overview of the state-of-the-art represented in the documents that have been abstracted
AKARI Infrared Camera Survey of the Large Magellanic Cloud. I. Point Source Catalog
We present a near- to mid-infrared point source catalog of 5 photometric
bands at 3.2, 7, 11, 15 and 24 um for a 10 deg2 area of the Large Magellanic
Cloud (LMC) obtained with the Infrared Camera (IRC) onboard the AKARI
satellite. To cover the survey area the observations were carried out at 3
separate seasons from 2006 May to June, 2006 October to December, and 2007
March to July.
The 10-sigma limiting magnitudes of the present survey are 17.9, 13.8, 12.4,
9.9, and 8.6 mag at 3.2, 7, 11, 15 and 24 um, respectively. The photometric
accuracy is estimated to be about 0.1 mag at 3.2 um and 0.06--0.07 mag in the
other bands. The position accuracy is 0.3" at 3.2, 7 and 11um and 1.0" at 15
and 24 um. The sensitivities at 3.2, 7, and 24 um are roughly comparable to
those of the Spitzer SAGE LMC point source catalog, while the AKARI catalog
provides the data at 11 and 15 um, covering the mid-infrared spectral range
contiguously. Two types of catalog are provided: a Catalog and an Archive. The
Archive contains all the detected sources, while the Catalog only includes the
sources that have a counterpart in the Spitzer SAGE point source catalog. The
Archive contains about 650,000, 140,000, 97,000, 43,000, and 52,000 sources at
3.2, 7, 11, 15, and 24 um, respectively. Based on the catalog, we discuss the
luminosity functions at each band, the color-color diagram, and the
color-magnitude diagram using the 3.2, 7, and 11 um band data. Stars without
circumstellar envelopes, dusty C-rich and O-rich stars, young stellar objects,
and background galaxies are located at distinct regions in the diagrams,
suggesting that the present catalog is useful for the classification of objects
towards the LMC.Comment: 59 pages, 12 figures, accepted for the Astronomical Journa
Galaxy And Mass Assembly (GAMA): end of survey report and data release 2
The Galaxy And Mass Assembly (GAMA) survey is one of the largest contemporary spectroscopic surveys of low redshift galaxies. Covering an area of ˜286 deg2 (split among five survey regions) down to a limiting magnitude of r < 19.8 mag, we have collected spectra and reliable redshifts for 238 000 objects using the AAOmega spectrograph on the Anglo-Australian Telescope. In addition, we have assembled imaging data from a number of independent surveys in order to generate photometry spanning the wavelength range 1 nm-1 m. Here, we report on the recently completed spectroscopic survey and present a series of diagnostics to assess its final state and the quality of the redshift data. We also describe a number of survey aspects and procedures, or updates thereof, including changes to the input catalogue, redshifting and re-redshifting, and the derivation of ultraviolet, optical and near-infrared photometry. Finally, we present the second public release of GAMA data. In this release, we provide input catalogue and targeting information, spectra, redshifts, ultraviolet, optical and near-infrared photometry, single-component Sérsic fits, stellar masses, Hα-derived star formation rates, environment information, and group properties for all galaxies with r < 19.0 mag in two of our survey regions, and for all galaxies with r < 19.4 mag in a third region (72 225 objects in total). The data base serving these data is available at http://www.gama-survey.org/
- …