4 research outputs found

    Mining streams of short text for analysis of world-wide event evolutions

    No full text
    Streams of short text, such as news titles, enable us to effectively and efficiently learn the real world events that occur anywhere and anytime. Short text messages that are companied by timestamps and generally brief events using only a few words differ from other longer text documents, such as web pages, news stories, blogs, technical papers and books. For example, few words repeat in the same news titles, thus frequency of the term (i.e., TF) is not as important in short text corpus as in longer text corpus. Therefore, analysis of short text faces new challenges. Also, detecting and tracking events through short text analysis need to reliably identify events from constant topic clusters; however, existing methods, such as Latent Dirichlet Allocation (LDA), generates different topic results for a corpus at different executions. In this paper, we provide a Finding Topic Clusters using Co-occurring Terms (FTCCT) algorithm to automatically generate topics from a short text corpus, and develop an Event Evolution Mining (EEM) algorithm to discover hot events and their evolutions (i.e., the popularity degrees of events changing over time). In FTCCT, a term (i.e., a single word or a multiple-words phrase) belongs to only one topic in a corpus. Experiments on news titles of 157 countries within 4 months (from July to October, 2013) demonstrate that our FTCCT-based method (combining FTCCT and EEM) achieves far higher quality of the event's content and description words than LDA-based method (combining LDA and EEM) for analysis of streams of short text. Our method also visualizes the evolutions of the hot events. The discovered world-wide event evolutions have explored some interesting correlations of the world-wide events; for example, successive extreme weather phenomenon occur in different locations - typhoon in Hong Kong and Philippines followed hurricane and storm flood in Mexico in September 2013. © 2014 Springer Science+Business Media New York.Streams of short text, such as news titles, enable us to effectively and efficiently learn the real world events that occur anywhere and anytime. Short text messages that are companied by timestamps and generally brief events using only a few words differ from other longer text documents, such as web pages, news stories, blogs, technical papers and books. For example, few words repeat in the same news titles, thus frequency of the term (i.e., TF) is not as important in short text corpus as in longer text corpus. Therefore, analysis of short text faces new challenges. Also, detecting and tracking events through short text analysis need to reliably identify events from constant topic clusters; however, existing methods, such as Latent Dirichlet Allocation (LDA), generates different topic results for a corpus at different executions. In this paper, we provide a Finding Topic Clusters using Co-occurring Terms (FTCCT) algorithm to automatically generate topics from a short text corpus, and develop an Event Evolution Mining (EEM) algorithm to discover hot events and their evolutions (i.e., the popularity degrees of events changing over time). In FTCCT, a term (i.e., a single word or a multiple-words phrase) belongs to only one topic in a corpus. Experiments on news titles of 157 countries within 4 months (from July to October, 2013) demonstrate that our FTCCT-based method (combining FTCCT and EEM) achieves far higher quality of the event's content and description words than LDA-based method (combining LDA and EEM) for analysis of streams of short text. Our method also visualizes the evolutions of the hot events. The discovered world-wide event evolutions have explored some interesting correlations of the world-wide events; for example, successive extreme weather phenomenon occur in different locations - typhoon in Hong Kong and Philippines followed hurricane and storm flood in Mexico in September 2013. © 2014 Springer Science+Business Media New York

    Particle size distribution and air pollution patterns in threeurban environments in Xi’an, China

    No full text
    Three urban environments, office, apartment and restaurant, were selected to investigate the indoor and outdoor air quality as an inter-comparison in which CO2, particulate matter (PM) concentration and particle size ranging were concerned. In this investigation, CO2 level in the apartment (623 ppm) was the highest among the indoor environments and indoor levels were always higher than outdoor levels. The PM10 (333 lg/m3), PM2.5 (213 lg/m3), PM1 (148 lg/m3) concentrations in the office were 10&ndash;50 % higher than in the restaurant and apartment, and the three indoor PM10 levels all exceeded the China standard of 150 lg/m3. Particles ranging from 0.3 to 0.4 lm, 0.4 to 0.5 lm and 0.5 to 0.65 lm make largest contribution to particle mass in indoor air, and fine particles number concentrations were much higher than outdoor levels. Outdoor air pollution is mainly affected by heavy traffic, while indoor air pollution has various sources. Particularly, office environment was mainly affected by outdoor sources like soil dust and traffic emission; apartment particles were mainly caused by human activities; restaurant indoor air quality was affected by multiple sources among which cooking-generated fine particles and the human steam are main factors.</p

    Characteristics and applications of size-segregated biomass burningtracers in China's Pearl River Delta region

    No full text
    Biomass burning activities in China are ubiquitous and the resulting smoke emissions may pose considerable threats to human health and the environment. In the present study, size-segregated biomass burning tracers, including anhydrosugars (levoglucosan (LG) and mannosan (MN)) and nonsea-salt potassium (nss-K&thorn;), were determined at an urban and a suburban site in the Pearl River Delta (PRD) region. The size distributions of biomass burning tracers were generally characterized by a unimodal pattern peaking in the particle size range of 0.44e1.0 mm, except for MN during the wet season, for which a bimodal pattern (one in fine and one in coarse mode) was observed. These observed biomass burning tracers in the PRD region shifted towards larger particle sizes compared to the typical size distributions of fresh biomass smoke particles. Elevated biomass burning tracers were observed during the dry season when biomass burning activities were intensive and meteorological conditions favored the transport of biomass smoke particles from the rural areas in the PRD and neighboring areas to the sampling sites. The fine mode biomass burning tracers significantly correlated with each other, confirming their common sources. Rather high DLG/DMN ratios were observed at both sites, indicating limited influence from softwood combustion. High Dnss-K&thorn;/DLG ratios further suggested that biomass burning aerosols in the PRD were predominately associated with burning of crop residues. Using a simplified receptor-oriented approach with an emission factor of 0.075 (LG/TC) obtained from several chamber studies, average contributions of biomass burning emissions to total carbon in fine particles were estimated to be 23% and 16% at the urban and suburban site, respectively, during the dry season. In contrast, the relative contributions to total carbon were lower than 8% at both sites during the wet season.</p
    corecore