1,568 research outputs found

    Analyzing the Language of Food on Social Media

    Full text link
    We investigate the predictive power behind the language of food on social media. We collect a corpus of over three million food-related posts from Twitter and demonstrate that many latent population characteristics can be directly predicted from this data: overweight rate, diabetes rate, political leaning, and home geographical location of authors. For all tasks, our language-based models significantly outperform the majority-class baselines. Performance is further improved with more complex natural language processing, such as topic modeling. We analyze which textual features have most predictive power for these datasets, providing insight into the connections between the language of food, geographic locale, and community characteristics. Lastly, we design and implement an online system for real-time query and visualization of the dataset. Visualization tools, such as geo-referenced heatmaps, semantics-preserving wordclouds and temporal histograms, allow us to discover more complex, global patterns mirrored in the language of food.Comment: An extended abstract of this paper will appear in IEEE Big Data 201

    Developing an Open Database to Support Forensic Investigation of Disasters in South East Asia: FORINSEA v1.0

    Get PDF
    This article describes the development of a bespoke database, FORINSEA1.0, created to address the need for a systematic curation of information needed for the descriptive phase of the FORIN approach and its application to two study areas in the South East Asia region. FORINSEA1.0 allows researchers, for the first time, to explore and make use of subnational, geocoded data on major disasters triggered by natural hazards (flooding, earthquake, landslide and meteorological hazards) since 1945 until 2020 in the hydrological catchment of the Red River in Vietnam and the Marikina Basin in the Philippines. FORINSEA1.0 also contains relevant subnational information on relevant socio-economic policies and development of key infrastructure to provide the basis of the descriptive FORIN analysis. While the catchment approach is potentially transferable to other regions, this Data Report does not show how these records might be applied or integrated to support a FORIN investigation of a specific disaster or event, neither provide basic ground rules for setting up similar systems in other countries

    Language Archive Records: Interoperability Of Referencing Practices And Metadata Models

    Get PDF
    With the rise of the digital language archive and the plethora of referenceable content, a critical question arises: “How easy is it for authors to use existing tools to cite the content they are referencing?” This is especially important as people use archived materials as evidence within published language descriptions. Archived resource metadata is well discussed in language documentation circles; however, bibliographic metadata and its accessibility are less discussed. Discoverability metadata, a subset of archived resource metadata, serves aggregators like OLAC by declaring a resource exists. In contrast, bibliographic metadata functions within documents by declaring where to find a resource that is known to exist. In this thesis I look at the interaction between Zotero, an open source reference manager, five different archives (PARADISEC, Pangloss, SIL Language & Culture Archives, ELAR, and Kaipuleohone), and three methods of importing metadata from them into Zotero (DOI import, HTML embedded metadata, and file based import). I report on collection and audio artifact metadata provided by the archive to the author via Zotero’s interfaces: what’s included, what’s missing, and what’s misaligned. Understanding the processes by which authors collect metadata for the purpose of citation and referencing, what metadata they need, and if it is being provided, facilitates the design of useful interfaces to archives which elevate the value of archives to all groups who interact with them. I propose that interaction design is an additional factor to those presented by Chang (2010) in her well received checklist for evaluating language archives. Interaction design, the technical field concerned with designing how people interact with objects and services, is the design process by which archives manage the interactions they have with those they serve. I specifically argue that interaction design adds value to an archive’s brand, as perceived by the network of archive users, when it facilitates the interaction with bibliographic metadata about artifacts within holdings. This added value speaks to the sustainability of an archive within its sphere of influence. It is increasingly important in the career development of scholars to meet metric-based assessments of their influence in scholarly discussions. Reference counts, including those pointing to the evidentiary record housed in archives, play a significant role in establishing quantitative baseline metrics for scholars

    Understanding Social Media through Large Volume Measurements

    Get PDF
    The amount of user-generated web content has grown drastically in the past 15 years and many social media services are exceedingly popular nowadays. In this thesis we study social media content creation and consumption through large volume measurements of three prominent social media services, namely Twitter, YouTube, and Wikipedia. Common to the services is that they have millions of users, they are free to use, and the users of the services can both create and consume content. The motivation behind this thesis is to examine how users create and consume social media content, investigate why social media services are as popular as they are, what drives people to contribute on them, and see if it is possible to model the conduct of the users. We study how various aspects of social media content be that for example its creation and consumption or its popularity can be measured, characterized, and linked to real world occurrences. We have gathered more than 20 million tweets, metadata of more than 10 million YouTube videos and a complete six-year page view history of 19 different Wikipedia language editions. We show, for example, daily and hourly patterns for the content creation and consumption, content popularity distributions, characteristics of popular content, and user statistics. We will also compare social media with traditional news services and show the interaction with social media, news, and stock prices. In addition, we combine natural language processing with social media analysis, and discover interesting correlations between news and social media content. Moreover, we discuss the importance of correct measurement methods and show the effects of different sampling methods using YouTube measurements as an example.Sosiaalisen median suosio ja sen käyttäjien luoman sisällön määrä on kasvanut valtavasti viimeisen 15 vuoden aikana ja palvelut kuten Facebook, Instagram, Twitter, YouTube ja Wikipedia ovat erittäin suosittuja. Tässä väitöskirjassa tarkastellaan sosiaalisen median sisällön luonti- ja kulutusmalleja laajavoluumisen mittausdatan kautta. Väitöskirja sisältää mittausdataa Twitter-, YouTube- ja Wikipedia -palveluista. Yhteistä näille kolmelle palvelulle on muuan muassa se, että niillä on miljoonia käyttäjiä, niitä voi käyttää maksutta ja käyttäjät voivat luoda sekä kuluttaa sisältöä. Mittausdata sisältää yli 20 miljoona Twitter -viestiä, metadatatietoja yli kymmenestä miljoonasta YouTube -videosta ja täydellisen artikkelien katselukertojen tiedot kuudelta vuodelta 19 eri Wikipedian kieliversiosta. Tutkimuksen tarkoituksena on tarkastella kuinka käyttäjät luovat ja kuluttavat sisältöä sekä löytää niihin liittyviä malleja, joita voi hyödyntää tiedon jaossa, replikoinnissa ja tallentamisessa. Tutkimuksessa pyritään siis selvittämään miksi miksi sosiaalisen median palvelut ovat niin suosittuja kuin ne nyt ovat, mikä saa käyttäjät tuottamaan sisältöä niihin ja onko palveluiden käyttöä mahdollista mallintaa ja ennakoida. Väitöskirjassa verrataan myös sosiaalisen median ja tavallisten uutispalveluiden luonti- ja kulutusmalleja. Lisäksi näytetään kuinka sosiaalisen median sisältö, uutiset ja pörssikurssi hinnat ovat vuorovaikutuksessa toisiinsa. Väitöskirja sisältää myös pohdintaa oikean mittausmenetelmän valinnasta ja käyttämisestä sekä näytetään eri mittausmenetelmien vaikutuksista tuloksiin YouTube -mittausdatan avulla

    Spartan Daily, April 10, 1991

    Get PDF
    Volume 96, Issue 45https://scholarworks.sjsu.edu/spartandaily/8113/thumbnail.jp

    Perceptions of Low SES, High Academic Achievement Vietnamese Middle Grades Students of Factors that Have Contributed to Their School Achievement

    Get PDF
    This study examines the perceptions of low socioeconomonic status, high academic achievement Vietnamese middle grades students in the Vietnamese community with respect to the roles that their parents and communities play in supporting academic achievement. Previous research has established the positive relationships between parent involvement and student achievement, and between high SES and student achievement. However, this study explores the perceptions of high achieving middle grades students with low SES. Through focus group discussions and interviews, this study examines student achievement within the theoretical framework of social capital

    Revolutionary War and the Development International Humanitarian Law

    Get PDF
    Making Endless War is built on the premise that any attempt to understand how the content and function of the laws of war changed in the second half of the twentieth century should consider two major armed conflicts, fought on opposite edges of Asia, and the legal pathways that link them together across time and space. The Vietnam and Arab-Israeli conflicts have been particularly significant in the shaping and attempted remaking of international law from 1945 right through to the present day. This carefully curated collection of essays by lawyers, historians, philosophers, sociologists, and political geographers of war explores the significance of these two conflicts, including their impact on the politics and culture of the world's most powerful nation, the United States of America. The volume foregrounds attempts to develop legal rationales for the continued waging of war after 1945 by moving beyond explaining the end of war as a legal institution, and toward understanding the attempted institutionalization of endless war

    Flexible RDF data extraction from Wiktionary - Leveraging the power of community build linguistic wikis

    Get PDF
    We present a declarative approach implemented in a comprehensive opensource framework (based on DBpedia) to extract lexical-semantic resources (an ontology about language use) from Wiktionary. The data currently includes language, part of speech, senses, definitions, synonyms, taxonomies (hyponyms, hyperonyms, synonyms, antonyms) and translations for each lexical word. Main focus is on flexibility to the loose schema and configurability towards differing language-editions ofWiktionary. This is achieved by a declarative mediator/wrapper approach. The goal is, to allow the addition of languages just by configuration without the need of programming, thus enabling the swift and resource-conserving adaptation of wrappers by domain experts. The extracted data is as fine granular as the source data in Wiktionary and additionally follows the lemon model. It enables use cases like disambiguation or machine translation. By offering a linked data service, we hope to extend DBpedia’s central role in the LOD infrastructure to the world of Open Linguistics.

    The Anchor, Volume 85.14: February 2, 1973

    Get PDF
    The Anchor began in 1887 and was first issued weekly in 1914. Covering national and campus news alike, Hope College’s student-run newspaper has grown over the years to encompass over two-dozen editors, reporters, and staff. For much of The Anchor\u27s history, the latest issue was distributed across campus each Wednesday throughout the academic school year (with few exceptions). As of Fall 2019 The Anchor has moved to monthly print issues and a more frequently updated website. Occasionally, the volume and/or issue numbering is irregular