3 research outputs found

    Software Newsroom – an approach to automation of news search and editing

    Get PDF
    We have developed tools and applied methods for automated identification of potential news from textual data for an automated news search system called Software Newsroom. The purpose of the tools is to analyze data collected from the internet and to identify information that has a high probability of containing new information. The identified information is summarized in order to help understanding the semantic contents of the data, and to assist the news editing process. It has been demonstrated that words with a certain set of syntactic and semantic properties are effective when building topic models for English. We demonstrate that words with the same properties in Finnish are useful as well. Extracting such words requires knowledge about the special characteristics of the Finnish language, which are taken into account in our analysis. Two different methodological approaches have been applied for the news search. One of the methods is based on topic analysis and it applies Multinomial Principal Component Analysis (MPCA) for topic model creation and data profiling. The second method is based on word association analysis and applies the log-likelihood ratio (LLR). For the topic mining, we have created English and Finnish language corpora from Wikipedia and Finnish corpora from several Finnish news archives and we have used bag-of-words presentations of these corpora as training data for the topic model. We have performed topic analysis experiments with both the training data itself and with arbitrary text parsed from internet sources. The results suggest that the effectiveness of news search strongly depends on the quality of the training data and its linguistic analysis. In the association analysis, we use a combined methodology for detecting novel word associations in the text. For detecting novel associations we use the background corpus from which we extract common word associations. In parallel, we collect the statistics of word co-occurrences from the documents of interest and search for associations with larger likelihood in these documents than in the background. We have demonstrated the applicability of these methods for Software Newsroom. The results indicate that the background-foreground model has significant potential in news search. The experiments also indicate great promise in employing background-foreground word associations for other applications. A combined application of the two methods is planned as well as the application of the methods on social media using a pre-translator of social media language.Peer reviewe

    The first SEPServer event catalogue 68-MeV solar proton events observed at 1 AU in 1996-2010

    Get PDF
    SEPServer is a three-year collaborative project funded by the seventh framework programme (FP7-SPACE) of the European Union. The objective of the project is to provide access to state-of-the-art observations and analysis tools for the scientific community on solar energetic particle (SEP) events and related electromagnetic (EM) emissions. The project will eventually lead to better understanding of the particle acceleration and transport processes at the Sun and in the inner heliosphere. These processes lead to SEP events that form one of the key elements of space weather. In this paper we present the first results from the systematic analysis work performed on the following datasets: SOHO/ERNE, SOHO/EPHIN, ACE/EPAM, Wind/WAVES and GOES X-rays. A catalogue of SEP events at 1 AU, with complete coverage over solar cycle 23, based on high-energy (~68-MeV) protons from SOHO/ERNE and electron recordings of the events by SOHO/EPHIN and ACE/EPAM are presented. A total of 115 energetic particle events have been identified and analysed using velocity dispersion analysis (VDA) for protons and time-shifting analysis (TSA) for electrons and protons in order to infer the SEP release times at the Sun. EM observations during the times of the SEP event onset have been gathered and compared to the release time estimates of particles. Data from those events that occurred during the European day-time, i.e., those that also have observations from ground-based observatories included in SEPServer, are listed and a preliminary analysis of their associations is presented. We find that VDA results for protons can be a useful tool for the analysis of proton release times, but if the derived proton path length is out of a range of 1 AU < s[3 AU, the result of the analysis may be compromised, as indicated by the anti-correlation of the derived path length and release time delay from the asso ciated X-ray flare. The average path length derived from VDA is about 1.9 times the nominal length of the spiral magnetic field line. This implies that the path length of first-arriving MeV to deka-MeV protons is affected by interplanetary scattering. TSA of near-relativistic electrons results in a release time that shows significant scatter with respect to the EM emissions but with a trend of being delayed more with increasing distance between the flare and the nominal footpoint of the Earth-connected field line

    The first SEPServer event catalogue 68-MeV solar proton events observed at 1

    No full text
    ABSTRACT SEPServer is a three-year collaborative project funded by the seventh framework programme (FP7-SPACE) of the European Union. The objective of the project is to provide access to state-of-the-art observations and analysis tools for the scientific community on solar energetic particle (SEP) events and related electromagnetic (EM) emissions. The project will eventually lead to better understanding of the particle acceleration and transport processes at the Sun and in the inner heliosphere. These processes lead to SEP events that form one of the key elements of space weather. In this paper we present the first results from the systematic analysis work performed on the following datasets: SOHO/ERNE, SOHO/EPHIN, ACE/EPAM, Wind/WAVES and GOES X-rays. A catalogue of SEP events at 1 AU, with complete coverage over solar cycle 23, based on high-energy (~68-MeV) protons from SOHO/ERNE and electron recordings of the events by SOHO/EPHIN and ACE/EPAM are presented. A total of 115 energetic particle events have been identified and analysed using velocity dispersion analysis (VDA) for protons and time-shifting analysis (TSA) for electrons and protons in order to infer the SEP release times at the Sun. EM observations during the times of the SEP event onset have been gathered and compared to the release time estimates of particles. Data from those events that occurred during the European day-time, i.e., those that also have observations from ground-based observatories included in SEPServer, are listed and a preliminary analysis of their associations is presented. We find that VDA results for protons can be a useful tool for the analysis of proton release times, but if the derived proton path length is out of a range of 1 AU &lt; s [ 3 AU, the result of the analysis may be compromised, as indicated by the anti-correlation of the derived path length and release time delay from the associated X-ray flare. The average path length derived from VDA is about 1.9 times the nominal length of the spiral magnetic field line. This implies that the path length of first-arriving MeV to deka-MeV protons is affected by interplanetary scattering. TSA of near-relativistic electrons results in a release time that shows significant scatter with respect to the EM emissions but with a trend of being delayed more with increasing distance between the flare and the nominal footpoint of the Earth-connected field line
    corecore