31 research outputs found
A Technical Approach and Distributed Model for Validation of Digital Objects
This article describes the current technical approach for digital object validation used by the National Digital Newspaper Program (NDNP), a partnership between the Library of Congress (LC) and the National Endowment for the Humanities for the digitization of historical newspapers. The article also describes the scheme for distributing validation across the participating institutions that will be creating and submitting digital objects to NDNP. The approaches and schemes are now being tested for the first development phase of NDNP, but if successful, they could be generalized to other similar project
Hurricanes Harvey and Irma Tweet ids
This dataset contains the tweet ids of 35,596,281 tweets related to Hurricanes Irma and Harvey. They were collected during these events from the Twitter API using Social Feed Manager.
These tweet ids are broken up into 2 collections. Each collection was collected using the POST statuses/filter method of the Twitter Stream API. The collections are:
Hurricane Irma: irma_filter_tweet_ids.txt
Hurricane Harvey: harvey_filter_tweet_ids.txt
There is a README.txt file for each collection containing additional documentation on how it was collected.
The GET statuses/lookup method supports retrieving the complete tweet for a tweet id (known as hydrating). Tools such as Twarc or Hydrator can be used to hydrate tweets.
Per Twitterâs Developer Policy, tweet ids may be publicly shared for academic purposes; tweets may not.
Questions about this dataset can be sent to [email protected]. George Washington University researchers should contact us for access to the tweets.</p
Winter Olympics 2018 Tweet Ids
This dataset contains the tweet ids of 13,816,206 tweets related to the 2018 Winter Olympics held in Pyeongchang, South Korea. They were collected between January 31, 2018 and February 27, 2018 from the Twitter filter stream API (POST statuses/filter) using Social Feed Manager. The filter tracked "#olympics, #pyeongchang2018, #winterolympics, #íě°˝ëęłěŹëŚźí˝".
There is a README.txt file containing additional documentation on how it was collected.
The GET statuses/lookup method supports retrieving the complete tweet for a tweet id (known as hydrating). Tools such as Twarc or Hydrator can be used to hydrate tweets.
Per Twitterâs Developer Policy, tweet ids may be publicly shared for academic purposes; tweets may not.
Questions about this dataset can be sent to [email protected]. George Washington University researchers should contact us for access to the tweets.</p
Immigration and Travel Ban Tweet Ids
This dataset contains the tweet ids of 16,875,766 tweets related to the immigration and travel ban executive order announced by the Trump Administration in January 2017. They were collected between January 30, 2017 and April 20, 2017 from the Twitter filter stream API using Social Feed Manager. The terms using for the filter were: #MuslimBan, #NoBanNoWall, #NoMuslimBan, #JFKTerminal4, #RefugeesWelcome, muslim ban, immigrant ban, immigration ban, travel ban, immigration order, #ImmigrationBan, #TravelBan.
There is a README.txt file containing additional documentation on how it was collected.
The GET statuses/lookup method supports retrieving the complete tweet for a tweet id (known as hydrating). Tools such as Twarc or Hydrator can be used to hydrate tweets.
Per Twitterâs Developer Policy, tweet ids may be publicly shared for academic purposes; tweets may not.
Questions about this dataset can be sent to [email protected]. George Washington University researchers should contact us for access to the tweets.</p
115th U.S. Congress Tweet Ids
This dataset contains the tweet ids of 2,041,399 tweets from the Twitter accounts of members of the 115th U.S. Congress. They were collected between January 27, 2017 and January 2, 2019 from the Twitter API using Social Feed Manager. Some tweets may come before this time period.
These tweet ids are broken up into 2 collections. Each collection was collected either from the GET statuses/user_timeline method of the Twitter REST API (retrieved on a weekly schedule). The collections are:
Senators: senators.txt
Representatives: representatives.txt
There is a README.txt file for each collection containing additional documentation on how it was collected. There is also an accounts.csv file for each collection collected from the GET statuses/user_timeline method listing the Twitter accounts that were collected.
The GET statuses/lookup method supports retrieving the complete tweet for a tweet id (known as hydrating). Tools such as Twarc or Hydrator can be used to hydrate tweets.
Per Twitterâs Developer Policy, tweet ids may be publicly shared for academic purposes; tweets may not.
We intend to update this dataset periodically.
Questions about this dataset can be sent to [email protected]. George Washington University researchers should contact us for access to the tweets.</p
Charlottesville Tweet Ids
This dataset contains the tweet ids of 7,665,497 tweets related to events in Charlottesville, Virginia in August, 2017. They were collected from the Twitter API using Social Feed Manager.
These tweet ids are broken up into 2 collections. The collections are:
Twitter search (charlottesville-search.txt): Search performed using the query "#charlottesville OR #standwithcharlottesville OR #defendCville OR #HeatherHeyer OR #UnityCville"
Twitter filter (charlottesville-filter.txt): Filter stream using the filter "#charlottesville, #standwithcharlottesville, #defendCville, #HeatherHeyer, #UnityCville"
There is a README.txt file for each collection containing additional documentation on how it was collected.
The GET statuses/lookup method supports retrieving the complete tweet for a tweet id (known as hydrating). Tools such as Twarc or Hydrator can be used to hydrate tweets.
Per Twitterâs Developer Policy, tweet ids may be publicly shared for academic purposes; tweets may not.
Questions about this dataset can be sent to [email protected]. George Washington University researchers should contact us for access to the tweets.</p
Ireland 8th Tweet Ids
This dataset contains the tweet ids of 2,279,396 tweets related to the referendum to repeal the 8th amendment to the Irish constitution on May 25, 2018. They were collected between April 13, 2018 and June 4, 2018 from the Twitter filter stream API using Social Feed Manager. The final set of terms that were used for the filter are: #8thref, #HomeToVote, #JoinTheRebellion, #trustwomen, #repealthe8th, #Together4Yes, #TogetherForYes, #voteyes, #time4choice, #knowyourrepealers, #mybodymychoice, #savethe8th, #loveboth, #LoveBothVoteNO, #VoteNotoAbortion, #StandUpForLife, #lifecanvass, #ProtectThe8th, #8thamendment, #WhoNeedsYourYes, #Men4Yes, #Register4Yes, #roadtorepeal, #repealfacts, #healthcarenotairfare, #repeal, #trustwomen, #ItsTime, #whyimvotingyes, #deaftogetherforyes, #doctorsforyes, #repeal4betterbirth, #TogetherForNo, #men4no, #whoneedsyourno, #RallyforLife, #VoteNotoAbortion, #bemyyes, #academicsforyes, #hometovoteno, #hometocanvass, #abortionreferendum, #savita, #repealshield, #farmersforyes, #lawyersforchoice, #lawyersforyes, #StudentsForChoice, #archivingthe8th, #RepealedThe8th, #wemadehistory, #NowForNI, #WeTrustWomen.
Note that the terms changed during the course of data collection. There is additional documentation included in README.txt.
The GET statuses/lookup method supports retrieving the complete tweet for a tweet id (known as hydrating). Tools such as Twarc or Hydrator can be used to hydrate tweets.
Per Twitterâs Developer Policy, tweet ids may be publicly shared for academic purposes; tweets may not.
Questions about this dataset can be sent to [email protected]. George Washington University researchers should contact us for access to the tweets.</p
Healthcare Tweet Ids
This dataset contains the tweet ids of approximately 132,907,659 tweets related to announcement of the American Health Care Act (AHCA). They were collected between March 9, 2017 and April 13, 2018 from the Twitter API using Social Feed Manager.
These tweet ids are broken up into 2 collections. Each collection was collected either from the GET statuses/search method of the Twitter REST API (retrieved on a weekly schedule) or the POST statuses/filter method of the Twitter Stream API. The collections are:
Healthcare filter (Twitter filter): healthcare-filter_ids.txt.[00-13]
Healthcare search (Twitter seasrch): healthcare-search_ids.txt
There is a README.txt file for each collection containing additional documentation on how it was collected, including the keywords used in each collection.
The GET statuses/lookup method supports retrieving the complete tweet for a tweet id (known as hydrating). Tools such as Twarc or Hydrator can be used to hydrate tweets.
Per Twitterâs Developer Policy, tweet ids may be publicly shared for academic purposes; tweets may not.
Questions about this dataset can be sent to [email protected]. George Washington University researchers should contact us for access to the tweets.</p