426 research outputs found

    Next steps in near-duplicate detection for eRulemaking

    Full text link

    Performance and Comparative Analysis of the Two Contrary Approaches for Detecting Near Duplicate Web Documents in Web Crawling

    Get PDF
    Recent years have witnessed the drastic development of World Wide Web (WWW). Information is being accessible at the finger tip anytime anywhere through the massive web repository. The performance and reliability of web engines thus face huge problems due to the presence of enormous amount of web data. The voluminous amount of web documents has resulted in problems for search engines leading to the fact that the search results are of less relevance to the user. In addition to this, the presence of duplicate and near-duplicate web documents has created an additional overhead for the search engines critically affecting their performance. The demand for integrating data from heterogeneous sources leads to the problem of near-duplicate web pages. The detection of near duplicate documents within a collection has recently become an area of great interest. In this research, we have presented an efficient approach for the detection of near duplicate web pages in web crawling which uses keywords and the distance measure. Besides that, G.S. Manku et al.’s fingerprint based approach proposed in 2007 was considered as one of the “state-of-the-art" algorithms for finding near-duplicate web pages. Then we have implemented both the approaches and conducted an extensive comparative study between our similarity score based approach and G.S. Manku et al.’s fingerprint based approach. We have analyzed our results in terms of time complexity, space complexity, Memory usage and the confusion matrix parameters. After taking into account the above mentioned performance factors for the two approaches, the comparison study clearly portrays our approach the better (less complex) of the two based on the factors considered.DOI:http://dx.doi.org/10.11591/ijece.v2i6.1746

    Achieving the Potential: The Future of Federal e-Rulemaking: A Report to Congress and the President

    Get PDF
    Federal regulations are among the most important and widely used tools for implementing the laws of the land – affecting the food we eat, the air we breathe, the safety of consumer products, the quality of the workplace, the soundness of our financial institutions, the smooth operation of our businesses, and much more. Despite the central role of rulemaking in executing public policy, both regulated entities (especially small businesses) and the general public find it extremely difficult to follow the regulatory process; actively participating in it is even harder. E-rulemaking is the use of technology (particularly, computers and the World Wide Web) to: (i) help develop proposed rules; (ii) make rulemaking materials broadly available online, along with tools for searching, analyzing, explaining and managing the information they contain; and (iii) enable more effective and diverse public participation. E-rulemaking has transformative potential to increase the comprehensibility, transparency and accountability of the regulatory process. Specifically, e-rulemaking – effectively implemented – can open the rulemaking process to a broader range of participants, offer easier access to rulemaking and implementation materials, facilitate dialogue among interested parties about policy and enforcement, enhance regulatory coordination, and help produce better decisions that lead to more effective, accepted and enforceable rules. If realized, this vision would greatly strengthen civic participation and our democratic form of government

    Achieving the Potential: The Future of Federal e-Rulemaking: A Report to Congress and the President

    Get PDF
    Federal regulations are among the most important and widely used tools for implementing the laws of the land – affecting the food we eat, the air we breathe, the safety of consumer products, the quality of the workplace, the soundness of our financial institutions, the smooth operation of our businesses, and much more. Despite the central role of rulemaking in executing public policy, both regulated entities (especially small businesses) and the general public find it extremely difficult to follow the regulatory process; actively participating in it is even harder. E-rulemaking is the use of technology (particularly, computers and the World Wide Web) to: (i) help develop proposed rules; (ii) make rulemaking materials broadly available online, along with tools for searching, analyzing, explaining and managing the information they contain; and (iii) enable more effective and diverse public participation. E-rulemaking has transformative potential to increase the comprehensibility, transparency and accountability of the regulatory process. Specifically, e-rulemaking – effectively implemented – can open the rulemaking process to a broader range of participants, offer easier access to rulemaking and implementation materials, facilitate dialogue among interested parties about policy and enforcement, enhance regulatory coordination, and help produce better decisions that lead to more effective, accepted and enforceable rules. If realized, this vision would greatly strengthen civic participation and our democratic form of government

    Facilitating Issue Categorization & Analysis in Rulemaking

    Get PDF
    One task common to all notice-and-comment rulemaking is identifying substantive claims and arguments made in the comments by stakeholders and other members of the public. Extracting and summarizing this material may be helpful to internal decisionmaking; to produce the legally required public explanation of the final rule, it is essential. When comments are lengthy or numerous, natural language processing and machine learning techniques can help the rulewriter work more quickly and comprehensively. Even when a smaller volume of comment material is received, the ability to annotate relevant portions and store information about them in a way that permits retrieval and generation of reports can be useful to the agency, especially over time. We describe a prototype application for these purposes. The Workspace for Issue Categorization and Analysis (WICA) allows the rulewriter to create a list of relevant substantive categories and assign them to marked portions of comment text. She can then retrieve all instances of a given issue within the comment pool. Preliminary results of experiments that apply text categorization and active learning methods to comment sets suggest that these techniques can facilitate the marking and category assignment process in lengthy or numerous comment sets. WICA will incorporate these techniques. Other possible applications of WICA within the rulemaking process are discussed

    Facilitating Issue Categorization & Analysis in Rulemaking

    Get PDF
    One task common to all notice-and-comment rulemaking is identifying substantive claims and arguments made in the comments by stakeholders and other members of the public. Extracting and summarizing this material may be helpful to internal decisionmaking; to produce the legally required public explanation of the final rule, it is essential. When comments are lengthy or numerous, natural language processing and machine learning techniques can help the rulewriter work more quickly and comprehensively. Even when a smaller volume of comment material is received, the ability to annotate relevant portions and store information about them in a way that permits retrieval and generation of reports can be useful to the agency, especially over time. We describe a prototype application for these purposes. The Workspace for Issue Categorization and Analysis (WICA) allows the rulewriter to create a list of relevant substantive categories and assign them to marked portions of comment text. She can then retrieve all instances of a given issue within the comment pool. Preliminary results of experiments that apply text categorization and active learning methods to comment sets suggest that these techniques can facilitate the marking and category assignment process in lengthy or numerous comment sets. WICA will incorporate these techniques. Other possible applications of WICA within the rulemaking process are discussed

    Automated classification of congressional legislation

    Full text link

    Rulemaking 2.0

    Get PDF

    Rulemaking 2.0

    Get PDF
    In response to President Obama\u27s Memorandum on Transparency and Open Government, federal agencies are on the verge of a new generation in online rulemaking. However, unless we recognize the several barriers to making rulemaking a more broadly participatory process, and purposefully adapt Web 2.0 technologies and methods to lower those barriers, Rulemaking 2.0 is likely to disappoint agencies and open-government advocates alike. This article describes the design, operation, and initial results of Regulation Room, a pilot public rulemaking participation platform created by a cross-disciplinary group of Cornell researchers in collaboration with the Department of Transportation. Regulation Room uses selected live rulemakings to experiment with human and computer support for public comment. The ultimate project goal is to provide guidance on design, technological, and human intervention strategies, grounded in theory and tested in practice, for effective Rulemaking 2.0 systems. Early results give some cause for optimism about the open-government potential of Web 2.0-supported rulemaking. But significant challenges remain. Broader, better public participation is hampered by 1) ignorance of the rulemaking process; 2) unawareness that rulemakings of interest are going on; and 3) information overload from the length and complexity of rulemaking materials. No existing, commonly used Web services or applications are good analogies for what a Rulemaking 2.0 system must do to lower these barriers. To be effective, the system must not only provide the right mix of technology, content, and human assistance to support users in the unfamiliar environment of complex government policymaking; it must also spur them to revise their expectations about how they engage information on the Web and also, perhaps, about what is required for civic participation
    • …
    corecore