54 research outputs found
Facilitating Issue Categorization & Analysis in Rulemaking
One task common to all notice-and-comment rulemaking is identifying substantive claims and arguments made in the comments by stakeholders and other members of the public. Extracting and summarizing this material may be helpful to internal decisionmaking; to produce the legally required public explanation of the final rule, it is essential. When comments are lengthy or numerous, natural language processing and machine learning techniques can help the rulewriter work more quickly and comprehensively. Even when a smaller volume of comment material is received, the ability to annotate relevant portions and store information about them in a way that permits retrieval and generation of reports can be useful to the agency, especially over time. We describe a prototype application for these purposes. The Workspace for Issue Categorization and Analysis (WICA) allows the rulewriter to create a list of relevant substantive categories and assign them to marked portions of comment text. She can then retrieve all instances of a given issue within the comment pool. Preliminary results of experiments that apply text categorization and active learning methods to comment sets suggest that these techniques can facilitate the marking and category assignment process in lengthy or numerous comment sets. WICA will incorporate these techniques. Other possible applications of WICA within the rulemaking process are discussed
Facilitating Issue Categorization & Analysis in Rulemaking
One task common to all notice-and-comment rulemaking is identifying substantive claims and arguments made in the comments by stakeholders and other members of the public. Extracting and summarizing this material may be helpful to internal decisionmaking; to produce the legally required public explanation of the final rule, it is essential. When comments are lengthy or numerous, natural language processing and machine learning techniques can help the rulewriter work more quickly and comprehensively. Even when a smaller volume of comment material is received, the ability to annotate relevant portions and store information about them in a way that permits retrieval and generation of reports can be useful to the agency, especially over time. We describe a prototype application for these purposes. The Workspace for Issue Categorization and Analysis (WICA) allows the rulewriter to create a list of relevant substantive categories and assign them to marked portions of comment text. She can then retrieve all instances of a given issue within the comment pool. Preliminary results of experiments that apply text categorization and active learning methods to comment sets suggest that these techniques can facilitate the marking and category assignment process in lengthy or numerous comment sets. WICA will incorporate these techniques. Other possible applications of WICA within the rulemaking process are discussed
A Study in Rule-Specific Issue Categorization for e-Rulemaking
We address the e-rulemaking problem of categorizing public comments according to the issues that they address. In contrast to previous text categorization research in e-rulemaking [5, 6], and in an attempt to more closely duplicate the comment analysis process in federal agencies, we employ a set of rule-specific categories, each of which corresponds to a significant issue raised in the comments. We describe the creation of a corpus to support this text categorization task and report interannotator agreement results for a group of six annotators. We outline those features of the task and of the e-rulemaking context that engender both a non-traditional text categorization corpus and a correspondingly difficult machine learning problem. Finally, we investigate the application of standard and hierarchical text categorization techniques to the e-rulemaking data sets and find that automatic categorization methods show promise as a means of reducing the manual labor required to analyze large comment sets: the automatic annotation methods approach the performance of human annotators for both flat and hierarchical issue categorization
A Study in Rule-Specific Issue Categorization for e-Rulemaking
We address the e-rulemaking problem of categorizing public comments according to the issues that they address. In contrast to previous text categorization research in e-rulemaking, and in an attempt to more closely duplicate the comment analysis process in federal agencies, we employ a set of rule-specific categories, each of which corresponds to a significant issue raised in the comments. We describe the creation of a corpus to support this text categorization task and report interannotator agreement results for a group of six annotators. We outline those features of the task and of the e-rulemaking context that engender both a non-traditional text categorization corpus and a correspondingly difficult machine learning problem. Finally, we investigate the application of standard and hierarchical text categorization techniques to the e-rulemaking data sets and find that automatic categorization methods show promise as a means of reducing the manual labor required to analyze large comment sets: the automatic annotation methods approach the performance of human annotators for both flat and hierarchical issue categorization
Prevention of sexual transmission of Ebola in Liberia through a national semen testing and counselling programme for survivors: an analysis of Ebola virus RNA results and behavioural data
Background Ebola virus has been detected in semen of Ebola virus disease survivors after recovery. Liberia’s Men’s
Health Screening Program (MHSP) off ers Ebola virus disease survivors semen testing for Ebola virus. We present
preliminary results and behavioural outcomes from the fi rst national semen testing programme for Ebola virus.
Methods The MHSP operates out of three locations in Liberia: Redemption Hospital in Montserrado County, Phebe
Hospital in Bong County, and Tellewoyan Hospital in Lofa County. Men aged 15 years and older who had an Ebola
treatment unit discharge certifi cate are eligible for inclusion. Participants’ semen samples were tested for Ebola virus
RNA by real-time RT-PCR and participants received counselling on safe sexual practices. Participants graduated after
receiving two consecutive negative semen tests. Counsellors collected information on sociodemographics and sexual
behaviours using questionnaires administered at enrolment, follow up, and graduation visits. Because the programme
is ongoing, data analysis was restricted to data obtained from July 7, 2015, to May 6, 2016.
Findings As of May 6, 2016, 466 Ebola virus disease survivors had enrolled in the programme; real-time RT-PCR
results were available from 429 participants. 38 participants (9%) produced at least one semen specimen that tested
positive for Ebola virus RNA. Of these, 24 (63%) provided semen specimens that tested positive 12 months or longer
after Ebola virus disease recovery. The longest interval between discharge from an Ebola treatment unit and collection
of a positive semen sample was 565 days. Among participants who enrolled and provided specimens more than
90 days since their Ebola treatment unit discharge, men older than 40 years were more likely to have a semen sample
test positive than were men aged 40 years or younger (p=0·0004). 84 (74%) of 113 participants who reported not using
a condom at enrolment reported using condoms at their fi rst follow-up visit (p<0·0001). 176 (46%) of 385 participants
who reported being sexually active at enrolment reported abstinence at their follow-up visit (p<0·0001).
Interpretation Duration of detection of Ebola virus RNA by real-time RT-PCR varies by individual and might be
associated with age. By combining behavioural counselling and laboratory testing, the Men’s Health Screening
Program helps male Ebola virus disease survivors understand their individual risk and take appropriate measures to
protect their sexual partners
Automated Classification of Congressional Legislation
For social science researchers, content analysis and classification of United States Congressional legislative activities has been time consuming and costly. The Library of Congress THOMAS system provides detailed information about bills and laws, but its classification system, the Legislative Indexing Vocabulary (LIV), is geared toward information retrieval instead of the pattern or historical trend recognition that social scientists value. The same event (a bill) may be coded with many subjects at the same time, with little indication of its primary emphasis. In addition, because the LIV system has not been applied to other activities, it cannot be used to compare (for example) legislative issue attention to executive, media, or public issue attention. This paper presents the Congressional Bills Project’s (www.congressionalbills.org) automated classification system. This system applies a topic spotting classification algorithm to the task of coding legislative activities into one of 226 subtopic areas. The algorithm uses a traditional bag-of-words document representation, an extensive set of human coded examples, and an exhaustive topic coding system developed for use by the Congressional Bills Project and the Policy Agendas Project (www.policyagendas.org). Experimental results demonstrate that the automated system is about as effective as human assessors, but with significant time and cost savings. The paper concludes by discussing challenges to moving the system into operational use
Active Learning for e-Rulemaking: Public Comment Categorization
We address the e-rulemaking problem of reducing the manual labor required to analyze public comment sets. In current and previous work, for example, text categorization techniques have been used to speed up the comment analysis phase of e-rulemaking — by classifying sentences automatically, according to the rule-specific issues [2] or general topics that they address[7, 8]. Manually annotated data, however, is still required to train the supervised inductive learning algorithms that perform the categorization. This paper, therefore, investigates the application of active learning methods for public comment categorization: we develop two new, general-purpose, active learning techniques to selectively sample from the available training data for human labeling when building the sentence-level classifiers employed in public comment categorization. Using an e-rulemaking corpus developed for our purposes [2], we compare our methods to the well-known query by committee (QBC) active learning algorithm [5] and to a baseline that randomly selects instances for labeling in each round of active learning. We show that our methods statistically significantly exceed the performance of the random selection active learner and the query by committee (QBC) variation, requiring many fewer training examples to reach the same levels of accuracy on a held-out test set. This provides promising evidence that automated text categorization methods might be used effectively to support public comment analysis
- …