266 research outputs found
Recent Developments in Document Clustering
This report aims to give a brief overview of the current state of document clustering research and present recent developments in a well-organized manner. Clustering algorithms are considered with two hypothetical scenarios in mind: online query clustering with tight efficiency constraints, and offline clustering with an emphasis on accuracy. A comparative analysis of the algorithms is performed along with a table summarizing important properties, and open problems as well as directions for future research are discussed
Clustering for Data Reduction: A Divide and Conquer Approach
We consider the problem of reducing a potentially very large dataset to a subset of representative prototypes. Rather than searching over the entire space of prototypes, we first roughly divide the data into balanced clusters using bisecting k-means and spectral cuts, and then find the prototypes for each cluster by affinity propagation. We apply our algorithm to text data, where we perform an order of magnitude faster than simply looking for prototypes on the entire dataset. Furthermore, our "divide and conquer" approach actually performs more accurately on datasets which are well bisected, as the greedy decisions of affinity propagation are confined to classes of already similar items
Modelling the hepatitis B vaccination programme in prisons
A vaccination programme offering hepatitis B (HBV) vaccine at reception into prison has been introduced into selected prisons in England and Wales. Over the coming years it is anticipated this vaccination programme will be extended. A model has been developed to assess the potential impact of the programme on the vaccination coverage of prisoners, ex-prisoners, and injecting drug users (IDUs). Under a range of coverage scenarios, the model predicts the change over time in the vaccination status of new entrants to prison, current prisoners and IDUs in the community. The model predicts that at baseline in 2012 57% of the IDU population will be vaccinated with up to 72% being vaccinated depending on the vaccination scenario implemented. These results are sensitive to the size of the IDU population in England and Wales and the average time served by an IDU during each prison visit. IDUs that do not receive HBV vaccine in the community are at increased risk from HBV infection. The HBV vaccination programme in prisons is an effective way of vaccinating this hard-to-reach population although vaccination coverage on prison reception must be increased to achieve this
The Spitzer Survey of Interstellar Clouds in the Gould Belt. III. A Multi-Wavelength View of Corona Australis
We present Spitzer Space Telescope IRAC and MIPS observations of a 0.85 deg^2
field including the Corona Australis (CrA) star-forming region. At a distance
of 130 pc, CrA is one of the closest regions known to be actively forming
stars, particularly within its embedded association, the Coronet. Using the
Spitzer data, we identify 51 young stellar objects (YSOs) in CrA which include
sources in the well-studied Coronet cluster as well as distributed throughout
the molecular cloud. Twelve of the YSOs discussed are new candidates, one of
which is located in the Coronet. Known YSOs retrieved from the literature are
also added to the list, and a total of 116 candidate YSOs in CrA are compiled.
Based on these YSO candidates, the star formation rate is computed to be 12 M_o
Myr^-1, similar to that of the Lupus clouds. A clustering analysis was also
performed, finding that the main cluster core, consisting of 68 members, is
elongated (having an aspect ratio of 2.36), with a circular radius of 0.59 pc
and mean surface density of 150 pc^-2.
In addition, we analyze outflows and jets in CrA by means of new CO and H_2
data. We present 1.3 mm interferometric continuum observations made with the
Submillimeter Array (SMA) covering R CrA, IRS 5, IRS 7, and IRAS 18595-3712
(IRAS 32). We also present multi-epoch H_2 maps and detect jets and outflows,
study their proper motions, and identify exciting sources. The Spitzer and
ISAAC/VLT observations of IRAS 32 show a bipolar precessing jet, which drives a
CO (2-1) outflow detected in the SMA observations. There is also clear evidence
for a parsec-scale precessing outflow, E-W oriented, and originating in the SMA
2 region, likely driven by SMA 2 or IRS 7A.Comment: Accepted for publication in ApJS. 112 pages, 42 figures (quality
reduced), 13 tables. Full resolution version can be found at
http://www.cfa.harvard.edu/~dpeterson/CrA/CrA_highres.pd
Estimating global and North American methane emissions with high spatial resolution using GOSAT satellite data
We use 2009-2011 space-borne methane observations from the Greenhouse Gases Observing SATellite (GOSAT) to estimate global and North American methane emissions with 4° x 5° and up to 50 km x 50 km spatial resolution, respectively. GEOS-Chem and GOSAT data are first evaluated with atmospheric methane observations from surface and tower networks (NOAA/ESRL, TCCON) and aircraft (NOAA/ESRL, HIPPO), using the GEOS-Chem chemical transport model as a platform to facilitate comparison of GOSAT with in situ data. This identifies a high-latitude bias between the GOSAT data and GEOS-Chem that we correct via quadratic regression. Our global adjoint-based inversion yields a total methane source of 539 Tg a−1 with some important regional corrections to the EDGARv4.2 inventory used as a prior. Results serve as dynamic boundary conditions for an analytical inversion of North American methane emissions using radial basis functions to achieve high resolution of large sources and provide error characterization. We infer a US anthropogenic methane source of 40.2-42.7 Tg a−1, as compared to 24.9-27.0 Tg a−1 in the EDGAR and EPA bottom-up inventories, and 30.0-44.5 Tg a−1 in recent inverse studies. Our estimate is supported by independent surface and aircraft data and by previous inverse studies for California. We find that the emissions are highest in the southern-central US, the Central Valley of California, and Florida wetlands; large isolated point sources such as the US Four Corners also contribute. Using prior information on source locations, we attribute 29-44 % of US anthropogenic methane emissions to livestock, 22-31 % to oil/gas, 20 % to landfills/wastewater, and 11-15 % to coal. Wetlands contribute an additional 9.0-10.1 Tg a−1
Investigation of the Role of Mitochondrial DNA in Multiple Sclerosis Susceptibility
Several lines of evidence suggest that mitochondrial genetic factors may influence susceptibility to multiple sclerosis. To explore this hypothesis further, we re-sequenced the mitochondrial genome (mtDNA) from 159 patients with multiple sclerosis and completed a haplogroup analysis including a further 835 patients and 1,506 controls. A trend towards over-representation of super-haplogroup U was the only evidence for association with mtDNA that we identified in these samples. In a parallel analysis of nuclear encoded mitochondrial genes, we also found a trend towards association with the complex I gene, NDUFS2. These results add to the evidence suggesting that variation in mtDNA and nuclear encoded mitochondrial genes may contribute to disease susceptibility in multiple sclerosis
Daily survey participation and positive changes in mental health symptom scores among Royal Canadian Mounted Police Cadets
IntroductionRoyal Canadian Mounted Police (RCMP) officers self-report high levels of mental health disorder symptoms, such as alcohol use disorder, generalized anxiety disorder, major depressive disorder, panic disorder, and posttraumatic stress disorder. Participation in regular mental health monitoring has been associated with improved mental health disorder symptom reporting and may provide an accessible tool to support RCMP mental health. The current study assessed relationships between self-reported mental health disorder symptoms and the completion of daily surveys (i.e., daily mental health disorder symptom monitoring) by RCMP cadets during the Cadet Training Program (CTP).MethodsParticipants were RCMP cadets (n = 394; 76.1% men) in the Standard Training Program who completed the 26-week CTP and daily self-monitoring surveys, as well as full mental health assessments at pre-training (i.e., starting the CTP) and pre-deployment (i.e., ~2 weeks prior to deployment to the field). Symptoms of alcohol use disorder, generalized anxiety disorder, major depressive disorder, panic disorder, and posttraumatic stress disorder were assessed. Changes in mental health disorder symptom reporting from pre-training to pre-deployment were calculated. Spearman’s rank correlations were estimated for number of daily surveys completed and change in mental health disorder symptom scores between pre-training and pre-deployment.ResultsThere were statistically significant inverse relationships between number of daily surveys completed and number of mental health disorder symptoms reported; specifically, cadets who completed more daily surveys during CTP reported fewer symptoms of alcohol use disorder, generalized anxiety disorder, major depressive disorder, panic disorder, and posttraumatic stress disorder.ConclusionAn inverse correlation between number of daily surveys completed and mental health disorder symptom scores indicated that participation in daily mental health monitoring was associated with improvements in self-reported mental health disorder symptoms between pre-training and pre-deployment. Regular self-monitoring of mental health disorder symptoms may help to mitigate mental health challenges among RCMP cadets and officers
The Ninth Data Release of the Sloan Digital Sky Survey: First Spectroscopic Data from the SDSS-III Baryon Oscillation Spectroscopic Survey
The Sloan Digital Sky Survey III (SDSS-III) presents the first spectroscopic
data from the Baryon Oscillation Spectroscopic Survey (BOSS). This ninth data
release (DR9) of the SDSS project includes 535,995 new galaxy spectra (median
z=0.52), 102,100 new quasar spectra (median z=2.32), and 90,897 new stellar
spectra, along with the data presented in previous data releases. These spectra
were obtained with the new BOSS spectrograph and were taken between 2009
December and 2011 July. In addition, the stellar parameters pipeline, which
determines radial velocities, surface temperatures, surface gravities, and
metallicities of stars, has been updated and refined with improvements in
temperature estimates for stars with T_eff<5000 K and in metallicity estimates
for stars with [Fe/H]>-0.5. DR9 includes new stellar parameters for all stars
presented in DR8, including stars from SDSS-I and II, as well as those observed
as part of the SDSS-III Sloan Extension for Galactic Understanding and
Exploration-2 (SEGUE-2).
The astrometry error introduced in the DR8 imaging catalogs has been
corrected in the DR9 data products. The next data release for SDSS-III will be
in Summer 2013, which will present the first data from the Apache Point
Observatory Galactic Evolution Experiment (APOGEE) along with another year of
data from BOSS, followed by the final SDSS-III data release in December 2014.Comment: 9 figures; 2 tables. Submitted to ApJS. DR9 is available at
http://www.sdss3.org/dr
The Fourteenth Data Release of the Sloan Digital Sky Survey: First Spectroscopic Data from the extended Baryon Oscillation Spectroscopic Survey and from the second phase of the Apache Point Observatory Galactic Evolution Experiment
The fourth generation of the Sloan Digital Sky Survey (SDSS-IV) has been in
operation since July 2014. This paper describes the second data release from
this phase, and the fourteenth from SDSS overall (making this, Data Release
Fourteen or DR14). This release makes public data taken by SDSS-IV in its first
two years of operation (July 2014-2016). Like all previous SDSS releases, DR14
is cumulative, including the most recent reductions and calibrations of all
data taken by SDSS since the first phase began operations in 2000. New in DR14
is the first public release of data from the extended Baryon Oscillation
Spectroscopic Survey (eBOSS); the first data from the second phase of the
Apache Point Observatory (APO) Galactic Evolution Experiment (APOGEE-2),
including stellar parameter estimates from an innovative data driven machine
learning algorithm known as "The Cannon"; and almost twice as many data cubes
from the Mapping Nearby Galaxies at APO (MaNGA) survey as were in the previous
release (N = 2812 in total). This paper describes the location and format of
the publicly available data from SDSS-IV surveys. We provide references to the
important technical papers describing how these data have been taken (both
targeting and observation details) and processed for scientific use. The SDSS
website (www.sdss.org) has been updated for this release, and provides links to
data downloads, as well as tutorials and examples of data use. SDSS-IV is
planning to continue to collect astronomical data until 2020, and will be
followed by SDSS-V.Comment: SDSS-IV collaboration alphabetical author data release paper. DR14
happened on 31st July 2017. 19 pages, 5 figures. Accepted by ApJS on 28th Nov
2017 (this is the "post-print" and "post-proofs" version; minor corrections
only from v1, and most of errors found in proofs corrected
- …