36 research outputs found
Using Robust PCA to estimate regional characteristics of language use from geo-tagged Twitter messages
Principal component analysis (PCA) and related techniques have been
successfully employed in natural language processing. Text mining applications
in the age of the online social media (OSM) face new challenges due to
properties specific to these use cases (e.g. spelling issues specific to texts
posted by users, the presence of spammers and bots, service announcements,
etc.). In this paper, we employ a Robust PCA technique to separate typical
outliers and highly localized topics from the low-dimensional structure present
in language use in online social networks. Our focus is on identifying
geospatial features among the messages posted by the users of the Twitter
microblogging service. Using a dataset which consists of over 200 million
geolocated tweets collected over the course of a year, we investigate whether
the information present in word usage frequencies can be used to identify
regional features of language use and topics of interest. Using the PCA pursuit
method, we are able to identify important low-dimensional features, which
constitute smoothly varying functions of the geographic location
Race, Religion and the City: Twitter Word Frequency Patterns Reveal Dominant Demographic Dimensions in the United States
Recently, numerous approaches have emerged in the social sciences to exploit
the opportunities made possible by the vast amounts of data generated by online
social networks (OSNs). Having access to information about users on such a
scale opens up a range of possibilities, all without the limitations associated
with often slow and expensive paper-based polls. A question that remains to be
satisfactorily addressed, however, is how demography is represented in the OSN
content? Here, we study language use in the US using a corpus of text compiled
from over half a billion geo-tagged messages from the online microblogging
platform Twitter. Our intention is to reveal the most important spatial
patterns in language use in an unsupervised manner and relate them to
demographics. Our approach is based on Latent Semantic Analysis (LSA) augmented
with the Robust Principal Component Analysis (RPCA) methodology. We find
spatially correlated patterns that can be interpreted based on the words
associated with them. The main language features can be related to slang use,
urbanization, travel, religion and ethnicity, the patterns of which are shown
to correlate plausibly with traditional census data. Our findings thus validate
the concept of demography being represented in OSN language use and show that
the traits observed are inherently present in the word frequencies without any
previous assumptions about the dataset. Thus, they could form the basis of
further research focusing on the evaluation of demographic data estimation from
other big data sources, or on the dynamical processes that result in the
patterns found here
Magyar FĂ©szek+ = Hungarian Nest : New Types of Energy Spaces in Sustainable Architecture
Developing ideas explored in the Solar Decathlon competition the reality of exploring how to improve on the Hungarian cube-like house type with the hope of expanding upon the use of vernacular elements to create a low-cost passive housing typology. Here the external and intermediate spaces have been included in the generation of a successful microclimatic experiment. Zoning of a home from private to public has proven to hold environmental impact solutions for energy positive homes
Efficient classification of billions of points into complex geographic regions using hierarchical triangular mesh
We present a case study about the spatial indexing and regional
classification of billions of geographic coordinates from geo-tagged social
network data using Hierarchical Triangular Mesh (HTM) implemented for Microsoft
SQL Server. Due to the lack of certain features of the HTM library, we use it
in conjunction with the GIS functions of SQL Server to significantly increase
the efficiency of pre-filtering of spatial filter and join queries. For
example, we implemented a new algorithm to compute the HTM tessellation of
complex geographic regions and precomputed the intersections of HTM triangles
and geographic regions for faster false-positive filtering. With full control
over the index structure, HTM-based pre-filtering of simple containment
searches outperforms SQL Server spatial indices by a factor of ten and
HTM-based spatial joins run about a hundred times faster.Comment: appears in Proceedings of the 26th International Conference on
Scientific and Statistical Database Management (2014
Application of Carbohydrates with Methylene or Vinyl Groups in Heck–Mizoroki Cross-Coupling Reactions with O-Heterocycles
Structurally novel carbohydrate–O-heterocycle derivatives linked by various unsaturated carbon bridges were synthesized by palladium-catalyzed cross-coupling reactions
A csĂkos szöcskeegĂ©r helyzete a Hernád-völgyben
A csĂkos szöcskeegĂ©rnek (Sicista subtilis trizona, Frivaldszky 1865) jelenleg mindössze kettĹ‘ elĹ‘fordulási helyĂ©t ismerjĂĽk MagyarországrĂłl. Ezek közĂĽl az egyik a 2006-ban leĂrt populáciĂł a Borsodi- MezĹ‘sĂ©g terĂĽletĂ©rĹ‘l, mĂg a másik az eddig mĂ©ltatlanul mellĹ‘zött hernád-völgyi. UtĂłbbi helyrĹ‘l eddig csak bagolyköpetekbĹ‘l kerĂĽlt elĹ‘; Ă©lĹ‘ pĂ©ldányt valĂłszĂnűleg itt soha nem láttak. 2014-ben bagolyköpet-gyűjtĂ©st Ă©s elemzĂ©st vĂ©geztĂĽnk, amely során rĂ©gebbi mintákat is feltártunk. ElemzĂ©seink alatt egyetlen szöcskeegĂ©r pĂ©ldány maradványai kerĂĽltek elĹ‘ egy 2008-as aszalĂłi gyűjtĂ©sbĹ‘l. A Hernád-völgy alapos bejárása után talajcsapdázást vĂ©geztĂĽnk a potenciális Ă©lĹ‘helyeken. Az egykori elterjedĂ©si terĂĽlet behatárolásához 1960-as Ă©vekben kĂ©szĂĽlt lĂ©gifotĂłkat használtunk, amelyeken a lehetsĂ©ges korábbi Ă©lĹ‘helyek elkĂĽlönĂthetĹ‘k. Csapdázásaink során sajnos nem sikerĂĽlt kimutatni a Hernád-völgybĹ‘l a fajt. ValĂłszĂnűleg a terĂĽlet gyakori Ă©getĂ©se okozhatta a faj eltűnĂ©sĂ©t a terĂĽletrĹ‘l. Azonban mĂ©g további csapdázások szĂĽksĂ©gesek ahhoz, hogy biztosan kijelenthessĂĽk: a szöcskeegĂ©r kipusztult a Hernád-völgyben