8,602 research outputs found

    Software Newsroom – an approach to automation of news search and editing

    Get PDF
    We have developed tools and applied methods for automated identification of potential news from textual data for an automated news search system called Software Newsroom. The purpose of the tools is to analyze data collected from the internet and to identify information that has a high probability of containing new information. The identified information is summarized in order to help understanding the semantic contents of the data, and to assist the news editing process. It has been demonstrated that words with a certain set of syntactic and semantic properties are effective when building topic models for English. We demonstrate that words with the same properties in Finnish are useful as well. Extracting such words requires knowledge about the special characteristics of the Finnish language, which are taken into account in our analysis. Two different methodological approaches have been applied for the news search. One of the methods is based on topic analysis and it applies Multinomial Principal Component Analysis (MPCA) for topic model creation and data profiling. The second method is based on word association analysis and applies the log-likelihood ratio (LLR). For the topic mining, we have created English and Finnish language corpora from Wikipedia and Finnish corpora from several Finnish news archives and we have used bag-of-words presentations of these corpora as training data for the topic model. We have performed topic analysis experiments with both the training data itself and with arbitrary text parsed from internet sources. The results suggest that the effectiveness of news search strongly depends on the quality of the training data and its linguistic analysis. In the association analysis, we use a combined methodology for detecting novel word associations in the text. For detecting novel associations we use the background corpus from which we extract common word associations. In parallel, we collect the statistics of word co-occurrences from the documents of interest and search for associations with larger likelihood in these documents than in the background. We have demonstrated the applicability of these methods for Software Newsroom. The results indicate that the background-foreground model has significant potential in news search. The experiments also indicate great promise in employing background-foreground word associations for other applications. A combined application of the two methods is planned as well as the application of the methods on social media using a pre-translator of social media language.Peer reviewe

    COrE (Cosmic Origins Explorer) A White Paper

    Full text link
    COrE (Cosmic Origins Explorer) is a fourth-generation full-sky, microwave-band satellite recently proposed to ESA within Cosmic Vision 2015-2025. COrE will provide maps of the microwave sky in polarization and temperature in 15 frequency bands, ranging from 45 GHz to 795 GHz, with an angular resolution ranging from 23 arcmin (45 GHz) and 1.3 arcmin (795 GHz) and sensitivities roughly 10 to 30 times better than PLANCK (depending on the frequency channel). The COrE mission will lead to breakthrough science in a wide range of areas, ranging from primordial cosmology to galactic and extragalactic science. COrE is designed to detect the primordial gravitational waves generated during the epoch of cosmic inflation at more than 3σ3\sigma for r=(T/S)>=103r=(T/S)>=10^{-3}. It will also measure the CMB gravitational lensing deflection power spectrum to the cosmic variance limit on all linear scales, allowing us to probe absolute neutrino masses better than laboratory experiments and down to plausible values suggested by the neutrino oscillation data. COrE will also search for primordial non-Gaussianity with significant improvements over Planck in its ability to constrain the shape (and amplitude) of non-Gaussianity. In the areas of galactic and extragalactic science, in its highest frequency channels COrE will provide maps of the galactic polarized dust emission allowing us to map the galactic magnetic field in areas of diffuse emission not otherwise accessible to probe the initial conditions for star formation. COrE will also map the galactic synchrotron emission thirty times better than PLANCK. This White Paper reviews the COrE science program, our simulations on foreground subtraction, and the proposed instrumental configuration.Comment: 90 pages Latex 15 figures (revised 28 April 2011, references added, minor errors corrected

    Science Impacts of the SPHEREx All-Sky Optical to Near-Infrared Spectral Survey: Report of a Community Workshop Examining Extragalactic, Galactic, Stellar and Planetary Science

    Full text link
    SPHEREx is a proposed SMEX mission selected for Phase A. SPHEREx will carry out the first all-sky spectral survey and provide for every 6.2" pixel a spectra between 0.75 and 4.18 μ\mum [with R\sim41.4] and 4.18 and 5.00 μ\mum [with R\sim135]. The SPHEREx team has proposed three specific science investigations to be carried out with this unique data set: cosmic inflation, interstellar and circumstellar ices, and the extra-galactic background light. It is readily apparent, however, that many other questions in astrophysics and planetary sciences could be addressed with the SPHEREx data. The SPHEREx team convened a community workshop in February 2016, with the intent of enlisting the aid of a larger group of scientists in defining these questions. This paper summarizes the rich and varied menu of investigations that was laid out. It includes studies of the composition of main belt and Trojan/Greek asteroids; mapping the zodiacal light with unprecedented spatial and spectral resolution; identifying and studying very low-metallicity stars; improving stellar parameters in order to better characterize transiting exoplanets; studying aliphatic and aromatic carbon-bearing molecules in the interstellar medium; mapping star formation rates in nearby galaxies; determining the redshift of clusters of galaxies; identifying high redshift quasars over the full sky; and providing a NIR spectrum for most eROSITA X-ray sources. All of these investigations, and others not listed here, can be carried out with the nominal all-sky spectra to be produced by SPHEREx. In addition, the workshop defined enhanced data products and user tools which would facilitate some of these scientific studies. Finally, the workshop noted the high degrees of synergy between SPHEREx and a number of other current or forthcoming programs, including JWST, WFIRST, Euclid, GAIA, K2/Kepler, TESS, eROSITA and LSST.Comment: Report of the First SPHEREx Community Workshop, http://spherex.caltech.edu/Workshop.html , 84 pages, 28 figure

    CMB-S4 Science Book, First Edition

    Full text link
    This book lays out the scientific goals to be addressed by the next-generation ground-based cosmic microwave background experiment, CMB-S4, envisioned to consist of dedicated telescopes at the South Pole, the high Chilean Atacama plateau and possibly a northern hemisphere site, all equipped with new superconducting cameras. CMB-S4 will dramatically advance cosmological studies by crossing critical thresholds in the search for the B-mode polarization signature of primordial gravitational waves, in the determination of the number and masses of the neutrinos, in the search for evidence of new light relics, in constraining the nature of dark energy, and in testing general relativity on large scales

    Grids and the Virtual Observatory

    Get PDF
    We consider several projects from astronomy that benefit from the Grid paradigm and associated technology, many of which involve either massive datasets or the federation of multiple datasets. We cover image computation (mosaicking, multi-wavelength images, and synoptic surveys); database computation (representation through XML, data mining, and visualization); and semantic interoperability (publishing, ontologies, directories, and service descriptions)
    corecore