Search CORE

31 research outputs found

A Technical Approach and Distributed Model for Validation of Digital Objects

Author: Littman Justin
Publication venue: 'CNRI Acct'
Publication date: 01/01/2006
Field of study

This article describes the current technical approach for digital object validation used by the National Digital Newspaper Program (NDNP), a partnership between the Library of Congress (LC) and the National Endowment for the Humanities for the digitization of historical newspapers. The article also describes the scheme for distributing validation across the participating institutions that will be creating and submitting digital objects to NDNP. The approaches and schemes are now being tested for the first development phase of NDNP, but if successful, they could be generalized to other similar project

Secretaría de Estado de Cultura

Where to get Twitter data for academic research poster

Author: Justin Littman
Publication venue: 'Center for Open Science'
Publication date: 16/02/2018
Field of study

OSF Preprints

Hurricanes Harvey and Irma Tweet ids

Author: Littman Justin
Publication venue: Harvard Dataverse
Publication date
Field of study

This dataset contains the tweet ids of 35,596,281 tweets related to Hurricanes Irma and Harvey. They were collected during these events from the Twitter API using Social Feed Manager. These tweet ids are broken up into 2 collections. Each collection was collected using the POST statuses/filter method of the Twitter Stream API. The collections are: Hurricane Irma: irma_filter_tweet_ids.txt Hurricane Harvey: harvey_filter_tweet_ids.txt There is a README.txt file for each collection containing additional documentation on how it was collected. The GET statuses/lookup method supports retrieving the complete tweet for a tweet id (known as hydrating). Tools such as Twarc or Hydrator can be used to hydrate tweets. Per Twitter’s Developer Policy, tweet ids may be publicly shared for academic purposes; tweets may not. Questions about this dataset can be sent to [email protected]. George Washington University researchers should contact us for access to the tweets.</p

Harvard Dataverse Network

Winter Olympics 2018 Tweet Ids

Author: Littman Justin
Publication venue: Harvard Dataverse
Publication date
Field of study

This dataset contains the tweet ids of 13,816,206 tweets related to the 2018 Winter Olympics held in Pyeongchang, South Korea. They were collected between January 31, 2018 and February 27, 2018 from the Twitter filter stream API (POST statuses/filter) using Social Feed Manager. The filter tracked "#olympics, #pyeongchang2018, #winterolympics, #평창동계올림픽". There is a README.txt file containing additional documentation on how it was collected. The GET statuses/lookup method supports retrieving the complete tweet for a tweet id (known as hydrating). Tools such as Twarc or Hydrator can be used to hydrate tweets. Per Twitter’s Developer Policy, tweet ids may be publicly shared for academic purposes; tweets may not. Questions about this dataset can be sent to [email protected]. George Washington University researchers should contact us for access to the tweets.</p

Harvard Dataverse Network

Immigration and Travel Ban Tweet Ids

Author: Littman Justin
Publication venue: Harvard Dataverse
Publication date
Field of study

This dataset contains the tweet ids of 16,875,766 tweets related to the immigration and travel ban executive order announced by the Trump Administration in January 2017. They were collected between January 30, 2017 and April 20, 2017 from the Twitter filter stream API using Social Feed Manager. The terms using for the filter were: #MuslimBan, #NoBanNoWall, #NoMuslimBan, #JFKTerminal4, #RefugeesWelcome, muslim ban, immigrant ban, immigration ban, travel ban, immigration order, #ImmigrationBan, #TravelBan. There is a README.txt file containing additional documentation on how it was collected. The GET statuses/lookup method supports retrieving the complete tweet for a tweet id (known as hydrating). Tools such as Twarc or Hydrator can be used to hydrate tweets. Per Twitter’s Developer Policy, tweet ids may be publicly shared for academic purposes; tweets may not. Questions about this dataset can be sent to [email protected]. George Washington University researchers should contact us for access to the tweets.</p

Harvard Dataverse Network

115th U.S. Congress Tweet Ids

Author: Littman Justin
Publication venue: Harvard Dataverse
Publication date
Field of study

This dataset contains the tweet ids of 2,041,399 tweets from the Twitter accounts of members of the 115th U.S. Congress. They were collected between January 27, 2017 and January 2, 2019 from the Twitter API using Social Feed Manager. Some tweets may come before this time period. These tweet ids are broken up into 2 collections. Each collection was collected either from the GET statuses/user_timeline method of the Twitter REST API (retrieved on a weekly schedule). The collections are: Senators: senators.txt Representatives: representatives.txt There is a README.txt file for each collection containing additional documentation on how it was collected. There is also an accounts.csv file for each collection collected from the GET statuses/user_timeline method listing the Twitter accounts that were collected. The GET statuses/lookup method supports retrieving the complete tweet for a tweet id (known as hydrating). Tools such as Twarc or Hydrator can be used to hydrate tweets. Per Twitter’s Developer Policy, tweet ids may be publicly shared for academic purposes; tweets may not. We intend to update this dataset periodically. Questions about this dataset can be sent to [email protected]. George Washington University researchers should contact us for access to the tweets.</p

Harvard Dataverse Network

Charlottesville Tweet Ids

Author: Littman Justin
Publication venue: Harvard Dataverse
Publication date
Field of study

This dataset contains the tweet ids of 7,665,497 tweets related to events in Charlottesville, Virginia in August, 2017. They were collected from the Twitter API using Social Feed Manager. These tweet ids are broken up into 2 collections. The collections are: Twitter search (charlottesville-search.txt): Search performed using the query "#charlottesville OR #standwithcharlottesville OR #defendCville OR #HeatherHeyer OR #UnityCville" Twitter filter (charlottesville-filter.txt): Filter stream using the filter "#charlottesville, #standwithcharlottesville, #defendCville, #HeatherHeyer, #UnityCville" There is a README.txt file for each collection containing additional documentation on how it was collected. The GET statuses/lookup method supports retrieving the complete tweet for a tweet id (known as hydrating). Tools such as Twarc or Hydrator can be used to hydrate tweets. Per Twitter’s Developer Policy, tweet ids may be publicly shared for academic purposes; tweets may not. Questions about this dataset can be sent to [email protected]. George Washington University researchers should contact us for access to the tweets.</p

Harvard Dataverse Network

Ireland 8th Tweet Ids

Author: Littman Justin
Publication venue: Harvard Dataverse
Publication date
Field of study

This dataset contains the tweet ids of 2,279,396 tweets related to the referendum to repeal the 8th amendment to the Irish constitution on May 25, 2018. They were collected between April 13, 2018 and June 4, 2018 from the Twitter filter stream API using Social Feed Manager. The final set of terms that were used for the filter are: #8thref, #HomeToVote, #JoinTheRebellion, #trustwomen, #repealthe8th, #Together4Yes, #TogetherForYes, #voteyes, #time4choice, #knowyourrepealers, #mybodymychoice, #savethe8th, #loveboth, #LoveBothVoteNO, #VoteNotoAbortion, #StandUpForLife, #lifecanvass, #ProtectThe8th, #8thamendment, #WhoNeedsYourYes, #Men4Yes, #Register4Yes, #roadtorepeal, #repealfacts, #healthcarenotairfare, #repeal, #trustwomen, #ItsTime, #whyimvotingyes, #deaftogetherforyes, #doctorsforyes, #repeal4betterbirth, #TogetherForNo, #men4no, #whoneedsyourno, #RallyforLife, #VoteNotoAbortion, #bemyyes, #academicsforyes, #hometovoteno, #hometocanvass, #abortionreferendum, #savita, #repealshield, #farmersforyes, #lawyersforchoice, #lawyersforyes, #StudentsForChoice, #archivingthe8th, #RepealedThe8th, #wemadehistory, #NowForNI, #WeTrustWomen. Note that the terms changed during the course of data collection. There is additional documentation included in README.txt. The GET statuses/lookup method supports retrieving the complete tweet for a tweet id (known as hydrating). Tools such as Twarc or Hydrator can be used to hydrate tweets. Per Twitter’s Developer Policy, tweet ids may be publicly shared for academic purposes; tweets may not. Questions about this dataset can be sent to [email protected]. George Washington University researchers should contact us for access to the tweets.</p

Harvard Dataverse Network

Healthcare Tweet Ids

Author: Littman Justin
Publication venue: Harvard Dataverse
Publication date
Field of study

This dataset contains the tweet ids of approximately 132,907,659 tweets related to announcement of the American Health Care Act (AHCA). They were collected between March 9, 2017 and April 13, 2018 from the Twitter API using Social Feed Manager. These tweet ids are broken up into 2 collections. Each collection was collected either from the GET statuses/search method of the Twitter REST API (retrieved on a weekly schedule) or the POST statuses/filter method of the Twitter Stream API. The collections are: Healthcare filter (Twitter filter): healthcare-filter_ids.txt.[00-13] Healthcare search (Twitter seasrch): healthcare-search_ids.txt There is a README.txt file for each collection containing additional documentation on how it was collected, including the keywords used in each collection. The GET statuses/lookup method supports retrieving the complete tweet for a tweet id (known as hydrating). Tools such as Twarc or Hydrator can be used to hydrate tweets. Per Twitter’s Developer Policy, tweet ids may be publicly shared for academic purposes; tweets may not. Questions about this dataset can be sent to [email protected]. George Washington University researchers should contact us for access to the tweets.</p

Harvard Dataverse Network