266 research outputs found

    Recent Developments in Document Clustering

    Get PDF
    This report aims to give a brief overview of the current state of document clustering research and present recent developments in a well-organized manner. Clustering algorithms are considered with two hypothetical scenarios in mind: online query clustering with tight efficiency constraints, and offline clustering with an emphasis on accuracy. A comparative analysis of the algorithms is performed along with a table summarizing important properties, and open problems as well as directions for future research are discussed

    Clustering for Data Reduction: A Divide and Conquer Approach

    Get PDF
    We consider the problem of reducing a potentially very large dataset to a subset of representative prototypes. Rather than searching over the entire space of prototypes, we first roughly divide the data into balanced clusters using bisecting k-means and spectral cuts, and then find the prototypes for each cluster by affinity propagation. We apply our algorithm to text data, where we perform an order of magnitude faster than simply looking for prototypes on the entire dataset. Furthermore, our "divide and conquer" approach actually performs more accurately on datasets which are well bisected, as the greedy decisions of affinity propagation are confined to classes of already similar items

    Modelling the hepatitis B vaccination programme in prisons

    Get PDF
    A vaccination programme offering hepatitis B (HBV) vaccine at reception into prison has been introduced into selected prisons in England and Wales. Over the coming years it is anticipated this vaccination programme will be extended. A model has been developed to assess the potential impact of the programme on the vaccination coverage of prisoners, ex-prisoners, and injecting drug users (IDUs). Under a range of coverage scenarios, the model predicts the change over time in the vaccination status of new entrants to prison, current prisoners and IDUs in the community. The model predicts that at baseline in 2012 57% of the IDU population will be vaccinated with up to 72% being vaccinated depending on the vaccination scenario implemented. These results are sensitive to the size of the IDU population in England and Wales and the average time served by an IDU during each prison visit. IDUs that do not receive HBV vaccine in the community are at increased risk from HBV infection. The HBV vaccination programme in prisons is an effective way of vaccinating this hard-to-reach population although vaccination coverage on prison reception must be increased to achieve this

    The Spitzer Survey of Interstellar Clouds in the Gould Belt. III. A Multi-Wavelength View of Corona Australis

    Full text link
    We present Spitzer Space Telescope IRAC and MIPS observations of a 0.85 deg^2 field including the Corona Australis (CrA) star-forming region. At a distance of 130 pc, CrA is one of the closest regions known to be actively forming stars, particularly within its embedded association, the Coronet. Using the Spitzer data, we identify 51 young stellar objects (YSOs) in CrA which include sources in the well-studied Coronet cluster as well as distributed throughout the molecular cloud. Twelve of the YSOs discussed are new candidates, one of which is located in the Coronet. Known YSOs retrieved from the literature are also added to the list, and a total of 116 candidate YSOs in CrA are compiled. Based on these YSO candidates, the star formation rate is computed to be 12 M_o Myr^-1, similar to that of the Lupus clouds. A clustering analysis was also performed, finding that the main cluster core, consisting of 68 members, is elongated (having an aspect ratio of 2.36), with a circular radius of 0.59 pc and mean surface density of 150 pc^-2. In addition, we analyze outflows and jets in CrA by means of new CO and H_2 data. We present 1.3 mm interferometric continuum observations made with the Submillimeter Array (SMA) covering R CrA, IRS 5, IRS 7, and IRAS 18595-3712 (IRAS 32). We also present multi-epoch H_2 maps and detect jets and outflows, study their proper motions, and identify exciting sources. The Spitzer and ISAAC/VLT observations of IRAS 32 show a bipolar precessing jet, which drives a CO (2-1) outflow detected in the SMA observations. There is also clear evidence for a parsec-scale precessing outflow, E-W oriented, and originating in the SMA 2 region, likely driven by SMA 2 or IRS 7A.Comment: Accepted for publication in ApJS. 112 pages, 42 figures (quality reduced), 13 tables. Full resolution version can be found at http://www.cfa.harvard.edu/~dpeterson/CrA/CrA_highres.pd

    Estimating global and North American methane emissions with high spatial resolution using GOSAT satellite data

    Get PDF
    We use 2009-2011 space-borne methane observations from the Greenhouse Gases Observing SATellite (GOSAT) to estimate global and North American methane emissions with 4° x 5° and up to 50 km x 50 km spatial resolution, respectively. GEOS-Chem and GOSAT data are first evaluated with atmospheric methane observations from surface and tower networks (NOAA/ESRL, TCCON) and aircraft (NOAA/ESRL, HIPPO), using the GEOS-Chem chemical transport model as a platform to facilitate comparison of GOSAT with in situ data. This identifies a high-latitude bias between the GOSAT data and GEOS-Chem that we correct via quadratic regression. Our global adjoint-based inversion yields a total methane source of 539 Tg a−1 with some important regional corrections to the EDGARv4.2 inventory used as a prior. Results serve as dynamic boundary conditions for an analytical inversion of North American methane emissions using radial basis functions to achieve high resolution of large sources and provide error characterization. We infer a US anthropogenic methane source of 40.2-42.7 Tg a−1, as compared to 24.9-27.0 Tg a−1 in the EDGAR and EPA bottom-up inventories, and 30.0-44.5 Tg a−1 in recent inverse studies. Our estimate is supported by independent surface and aircraft data and by previous inverse studies for California. We find that the emissions are highest in the southern-central US, the Central Valley of California, and Florida wetlands; large isolated point sources such as the US Four Corners also contribute. Using prior information on source locations, we attribute 29-44 % of US anthropogenic methane emissions to livestock, 22-31 % to oil/gas, 20 % to landfills/wastewater, and 11-15 % to coal. Wetlands contribute an additional 9.0-10.1 Tg a−1

    Investigation of the Role of Mitochondrial DNA in Multiple Sclerosis Susceptibility

    Get PDF
    Several lines of evidence suggest that mitochondrial genetic factors may influence susceptibility to multiple sclerosis. To explore this hypothesis further, we re-sequenced the mitochondrial genome (mtDNA) from 159 patients with multiple sclerosis and completed a haplogroup analysis including a further 835 patients and 1,506 controls. A trend towards over-representation of super-haplogroup U was the only evidence for association with mtDNA that we identified in these samples. In a parallel analysis of nuclear encoded mitochondrial genes, we also found a trend towards association with the complex I gene, NDUFS2. These results add to the evidence suggesting that variation in mtDNA and nuclear encoded mitochondrial genes may contribute to disease susceptibility in multiple sclerosis

    Daily survey participation and positive changes in mental health symptom scores among Royal Canadian Mounted Police Cadets

    Get PDF
    IntroductionRoyal Canadian Mounted Police (RCMP) officers self-report high levels of mental health disorder symptoms, such as alcohol use disorder, generalized anxiety disorder, major depressive disorder, panic disorder, and posttraumatic stress disorder. Participation in regular mental health monitoring has been associated with improved mental health disorder symptom reporting and may provide an accessible tool to support RCMP mental health. The current study assessed relationships between self-reported mental health disorder symptoms and the completion of daily surveys (i.e., daily mental health disorder symptom monitoring) by RCMP cadets during the Cadet Training Program (CTP).MethodsParticipants were RCMP cadets (n = 394; 76.1% men) in the Standard Training Program who completed the 26-week CTP and daily self-monitoring surveys, as well as full mental health assessments at pre-training (i.e., starting the CTP) and pre-deployment (i.e., ~2 weeks prior to deployment to the field). Symptoms of alcohol use disorder, generalized anxiety disorder, major depressive disorder, panic disorder, and posttraumatic stress disorder were assessed. Changes in mental health disorder symptom reporting from pre-training to pre-deployment were calculated. Spearman’s rank correlations were estimated for number of daily surveys completed and change in mental health disorder symptom scores between pre-training and pre-deployment.ResultsThere were statistically significant inverse relationships between number of daily surveys completed and number of mental health disorder symptoms reported; specifically, cadets who completed more daily surveys during CTP reported fewer symptoms of alcohol use disorder, generalized anxiety disorder, major depressive disorder, panic disorder, and posttraumatic stress disorder.ConclusionAn inverse correlation between number of daily surveys completed and mental health disorder symptom scores indicated that participation in daily mental health monitoring was associated with improvements in self-reported mental health disorder symptoms between pre-training and pre-deployment. Regular self-monitoring of mental health disorder symptoms may help to mitigate mental health challenges among RCMP cadets and officers

    The Ninth Data Release of the Sloan Digital Sky Survey: First Spectroscopic Data from the SDSS-III Baryon Oscillation Spectroscopic Survey

    Get PDF
    The Sloan Digital Sky Survey III (SDSS-III) presents the first spectroscopic data from the Baryon Oscillation Spectroscopic Survey (BOSS). This ninth data release (DR9) of the SDSS project includes 535,995 new galaxy spectra (median z=0.52), 102,100 new quasar spectra (median z=2.32), and 90,897 new stellar spectra, along with the data presented in previous data releases. These spectra were obtained with the new BOSS spectrograph and were taken between 2009 December and 2011 July. In addition, the stellar parameters pipeline, which determines radial velocities, surface temperatures, surface gravities, and metallicities of stars, has been updated and refined with improvements in temperature estimates for stars with T_eff<5000 K and in metallicity estimates for stars with [Fe/H]>-0.5. DR9 includes new stellar parameters for all stars presented in DR8, including stars from SDSS-I and II, as well as those observed as part of the SDSS-III Sloan Extension for Galactic Understanding and Exploration-2 (SEGUE-2). The astrometry error introduced in the DR8 imaging catalogs has been corrected in the DR9 data products. The next data release for SDSS-III will be in Summer 2013, which will present the first data from the Apache Point Observatory Galactic Evolution Experiment (APOGEE) along with another year of data from BOSS, followed by the final SDSS-III data release in December 2014.Comment: 9 figures; 2 tables. Submitted to ApJS. DR9 is available at http://www.sdss3.org/dr

    The Fourteenth Data Release of the Sloan Digital Sky Survey: First Spectroscopic Data from the extended Baryon Oscillation Spectroscopic Survey and from the second phase of the Apache Point Observatory Galactic Evolution Experiment

    Get PDF
    The fourth generation of the Sloan Digital Sky Survey (SDSS-IV) has been in operation since July 2014. This paper describes the second data release from this phase, and the fourteenth from SDSS overall (making this, Data Release Fourteen or DR14). This release makes public data taken by SDSS-IV in its first two years of operation (July 2014-2016). Like all previous SDSS releases, DR14 is cumulative, including the most recent reductions and calibrations of all data taken by SDSS since the first phase began operations in 2000. New in DR14 is the first public release of data from the extended Baryon Oscillation Spectroscopic Survey (eBOSS); the first data from the second phase of the Apache Point Observatory (APO) Galactic Evolution Experiment (APOGEE-2), including stellar parameter estimates from an innovative data driven machine learning algorithm known as "The Cannon"; and almost twice as many data cubes from the Mapping Nearby Galaxies at APO (MaNGA) survey as were in the previous release (N = 2812 in total). This paper describes the location and format of the publicly available data from SDSS-IV surveys. We provide references to the important technical papers describing how these data have been taken (both targeting and observation details) and processed for scientific use. The SDSS website (www.sdss.org) has been updated for this release, and provides links to data downloads, as well as tutorials and examples of data use. SDSS-IV is planning to continue to collect astronomical data until 2020, and will be followed by SDSS-V.Comment: SDSS-IV collaboration alphabetical author data release paper. DR14 happened on 31st July 2017. 19 pages, 5 figures. Accepted by ApJS on 28th Nov 2017 (this is the "post-print" and "post-proofs" version; minor corrections only from v1, and most of errors found in proofs corrected
    corecore