36 research outputs found

    Vas István Róma-versei : [absztrakt]

    Get PDF

    Using Robust PCA to estimate regional characteristics of language use from geo-tagged Twitter messages

    Full text link
    Principal component analysis (PCA) and related techniques have been successfully employed in natural language processing. Text mining applications in the age of the online social media (OSM) face new challenges due to properties specific to these use cases (e.g. spelling issues specific to texts posted by users, the presence of spammers and bots, service announcements, etc.). In this paper, we employ a Robust PCA technique to separate typical outliers and highly localized topics from the low-dimensional structure present in language use in online social networks. Our focus is on identifying geospatial features among the messages posted by the users of the Twitter microblogging service. Using a dataset which consists of over 200 million geolocated tweets collected over the course of a year, we investigate whether the information present in word usage frequencies can be used to identify regional features of language use and topics of interest. Using the PCA pursuit method, we are able to identify important low-dimensional features, which constitute smoothly varying functions of the geographic location

    Race, Religion and the City: Twitter Word Frequency Patterns Reveal Dominant Demographic Dimensions in the United States

    Get PDF
    Recently, numerous approaches have emerged in the social sciences to exploit the opportunities made possible by the vast amounts of data generated by online social networks (OSNs). Having access to information about users on such a scale opens up a range of possibilities, all without the limitations associated with often slow and expensive paper-based polls. A question that remains to be satisfactorily addressed, however, is how demography is represented in the OSN content? Here, we study language use in the US using a corpus of text compiled from over half a billion geo-tagged messages from the online microblogging platform Twitter. Our intention is to reveal the most important spatial patterns in language use in an unsupervised manner and relate them to demographics. Our approach is based on Latent Semantic Analysis (LSA) augmented with the Robust Principal Component Analysis (RPCA) methodology. We find spatially correlated patterns that can be interpreted based on the words associated with them. The main language features can be related to slang use, urbanization, travel, religion and ethnicity, the patterns of which are shown to correlate plausibly with traditional census data. Our findings thus validate the concept of demography being represented in OSN language use and show that the traits observed are inherently present in the word frequencies without any previous assumptions about the dataset. Thus, they could form the basis of further research focusing on the evaluation of demographic data estimation from other big data sources, or on the dynamical processes that result in the patterns found here

    Magyar FĂ©szek+ = Hungarian Nest : New Types of Energy Spaces in Sustainable Architecture

    Get PDF
    Developing ideas explored in the Solar Decathlon competition the reality of exploring how to improve on the Hungarian cube-like house type with the hope of expanding upon the use of vernacular elements to create a low-cost passive housing typology. Here the external and intermediate spaces have been included in the generation of a successful microclimatic experiment. Zoning of a home from private to public has proven to hold environmental impact solutions for energy positive homes

    Efficient classification of billions of points into complex geographic regions using hierarchical triangular mesh

    Get PDF
    We present a case study about the spatial indexing and regional classification of billions of geographic coordinates from geo-tagged social network data using Hierarchical Triangular Mesh (HTM) implemented for Microsoft SQL Server. Due to the lack of certain features of the HTM library, we use it in conjunction with the GIS functions of SQL Server to significantly increase the efficiency of pre-filtering of spatial filter and join queries. For example, we implemented a new algorithm to compute the HTM tessellation of complex geographic regions and precomputed the intersections of HTM triangles and geographic regions for faster false-positive filtering. With full control over the index structure, HTM-based pre-filtering of simple containment searches outperforms SQL Server spatial indices by a factor of ten and HTM-based spatial joins run about a hundred times faster.Comment: appears in Proceedings of the 26th International Conference on Scientific and Statistical Database Management (2014

    Application of Carbohydrates with Methylene or Vinyl Groups in Heck–Mizoroki Cross-Coupling Reactions with O-Heterocycles

    Get PDF
    Structurally novel carbohydrate–O-heterocycle derivatives linked by various unsaturated carbon bridges were synthesized by palladium-catalyzed cross-coupling reactions

    A csíkos szöcskeegér helyzete a Hernád-völgyben

    Get PDF
    A csíkos szöcskeegérnek (Sicista subtilis trizona, Frivaldszky 1865) jelenleg mindössze kettő előfordulási helyét ismerjük Magyarországról. Ezek közül az egyik a 2006-ban leírt populáció a Borsodi- Mezőség területéről, míg a másik az eddig méltatlanul mellőzött hernád-völgyi. Utóbbi helyről eddig csak bagolyköpetekből került elő; élő példányt valószínűleg itt soha nem láttak. 2014-ben bagolyköpet-gyűjtést és elemzést végeztünk, amely során régebbi mintákat is feltártunk. Elemzéseink alatt egyetlen szöcskeegér példány maradványai kerültek elő egy 2008-as aszalói gyűjtésből. A Hernád-völgy alapos bejárása után talajcsapdázást végeztünk a potenciális élőhelyeken. Az egykori elterjedési terület behatárolásához 1960-as években készült légifotókat használtunk, amelyeken a lehetséges korábbi élőhelyek elkülöníthetők. Csapdázásaink során sajnos nem sikerült kimutatni a Hernád-völgyből a fajt. Valószínűleg a terület gyakori égetése okozhatta a faj eltűnését a területről. Azonban még további csapdázások szükségesek ahhoz, hogy biztosan kijelenthessük: a szöcskeegér kipusztult a Hernád-völgyben
    corecore