264,261 research outputs found

    A Brief History of Web Crawlers

    Full text link
    Web crawlers visit internet applications, collect data, and learn about new web pages from visited pages. Web crawlers have a long and interesting history. Early web crawlers collected statistics about the web. In addition to collecting statistics about the web and indexing the applications for search engines, modern crawlers can be used to perform accessibility and vulnerability checks on the application. Quick expansion of the web, and the complexity added to web applications have made the process of crawling a very challenging one. Throughout the history of web crawling many researchers and industrial groups addressed different issues and challenges that web crawlers face. Different solutions have been proposed to reduce the time and cost of crawling. Performing an exhaustive crawl is a challenging question. Additionally capturing the model of a modern web application and extracting data from it automatically is another open question. What follows is a brief history of different technique and algorithms used from the early days of crawling up to the recent days. We introduce criteria to evaluate the relative performance of web crawlers. Based on these criteria we plot the evolution of web crawlers and compare their performanc

    Moving from Data-Constrained to Data-Enabled Research: Experiences and Challenges in Collecting, Validating and Analyzing Large-Scale e-Commerce Data

    Get PDF
    Widespread e-commerce activity on the Internet has led to new opportunities to collect vast amounts of micro-level market and nonmarket data. In this paper we share our experiences in collecting, validating, storing and analyzing large Internet-based data sets in the area of online auctions, music file sharing and online retailer pricing. We demonstrate how such data can advance knowledge by facilitating sharper and more extensive tests of existing theories and by offering observational underpinnings for the development of new theories. Just as experimental economics pushed the frontiers of economic thought by enabling the testing of numerous theories of economic behavior in the environment of a controlled laboratory, we believe that observing, often over extended periods of time, real-world agents participating in market and nonmarket activity on the Internet can lead us to develop and test a variety of new theories. Internet data gathering is not controlled experimentation. We cannot randomly assign participants to treatments or determine event orderings. Internet data gathering does offer potentially large data sets with repeated observation of individual choices and action. In addition, the automated data collection holds promise for greatly reduced cost per observation. Our methods rely on technological advances in automated data collection agents. Significant challenges remain in developing appropriate sampling techniques integrating data from heterogeneous sources in a variety of formats, constructing generalizable processes and understanding legal constraints. Despite these challenges, the early evidence from those who have harvested and analyzed large amounts of e-commerce data points toward a significant leap in our ability to understand the functioning of electronic commerce.Comment: Published at http://dx.doi.org/10.1214/088342306000000231 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Effect of personality type on internet anxiety in Kerman Dental School students (2015-2016)

    Get PDF
    Introduction and objective: in recent years internet has turned to be one of the most popular global medias due to its unique qualities such as easy accessibility, utilization convenience, users’ obscurity and low cost. This study targets at examining the personality types’ effects on internet anxiety in students of dental faculty in Medical University of Kerman.Methodology: This cross-sectional study was conducted on 235 dental students who were selected through census sampling method. Data collecting tools consisted of standard internet anxiety questionnaire (including 20 items), personality type’s questionnaire (including 25 items), demographic characteristics (age, sex, entrance year) and also eight related questions via internet. The collected data were entered the computer and analyzed via SPSS statistics software version 18 and linear regression statistics test and t at the significance level of 5%.Findings: From 235 participant students in this study 141 (66.0 %) were females and the rest were males and the average of their age was 23.85 ± 5.36. the mean score of internet anxiety was 54.01 ± 8.39. According to anxiety intensity, 57 (24.3%) persons were in normal range, 176 persons (74.9%) were in mild anxiety range and 2 persons (0.8%) had sever anxiety levels. There were a significant correlation between the mean score of internet anxiety and year of education and hours spent using internet (p=0.028, p=0.017). There was significant correlation between personality type and internet anxiety as well (p= 0.016).Conclusion: Based on this study internet anxiety was lower than moderate in dental school students and type A students who have characteristics like fast and quick, nervous and hottempered, biased life style, anxious, impatient and being competitive had more anxiety.Keywords: internet, dental student, Kerman, Personality type, anxiet

    Unveiling Internet Network Quality: A Wireshark Analysis at SMKN 2 Rejang Lebong

    Get PDF
    Quality of Service (QoS) analysis is crucial for ensuring high-quality network performance. This study, conducted at SMKN 2 Rejang Lebong, utilizes Wireshark version 2.0.4 to analyze QoS parameters such as throughput, delay, jitter, and packet loss. The research employs a quantitative methodology, collecting data through questionnaires and employing descriptive statistics using SPSS version 20. The findings reveal that the QoS index for the Office Building and Teachers' Room falls within the 'Satisfactory' category with values of 3.06 and 3.26, respectively. However, the LAB Room scores a lower QoS index of 2.45, categorized as 'Unsatisfactory.' Overall, the internet network quality at SMKN 2 Rejang Lebong aligns with TIPHON standards, classifying as 'Less Satisfactory' with an average score of 2.92. Keyword: Quality of Service (QoS), Internet Network Analysis, SMKN 2 Rejang Lebon

    Use tax collections

    Get PDF
    The article reports on a study which investigated the level of compliance with U.S. state use tax laws and the techniques employed by the states in order to enforce use tax. Most states utilize either of two forms of tax reporting and collection. These are: the introduction of a separate use tax form/return; or the use of a separate line on the state income tax return. It was observed that utilizing a separate line item on the state income tax return might cause a rise in the number of taxpayers

    European Information System for Organic Markets (EISFOM QLK5-2002-02400): WP 2: “Data collection and processing systems (DCPS) for the conventional markets” and WP 3: “Data collection and processing systems for organic markets” = Deliverable D2

    Get PDF
    European markets for organic products are developing fast. In Europe, as other parts of the world, more and more farm land is being converted to organic production. In order to adjust production and consumption levels, detailed market information is needed, especially where decisions with a long-term impact need to be taken, for example on converting specific land or livestock enterprises requiring high levels of investment in glasshouses, housing, processing facilities etc. Since public subsidies (regional / national / European) are heavily involved in these investments, valid, accurate and up-to-date information is essential not only for farmers and growers, but also for policy-makers, consultants, processing industry etc. EU-research projects such as OFCAP (FAIR3-CT96-1794) and OMIaRD (QLK5-2000-01124) have shown that regional or national data gathering takes place in many countries, but often only very basic data are reported, such as certified organic holdings, land areas and livestock numbers. Important market data, e.g. the amount of production, consumption, international trade or producer and consumer prices, do not exist in most European countries. In some European countries there are only rough estimates of the levels of production and consumption. There is no standardization and data are seldom comparable. Furthermore, detailed information on specific commodities is missing. Hence, investment decisions are taken under conditions of great uncertainty. Likewise, if politicians want to support organic agriculture, they do not know whether it would be better to support production or consumption or to address problems in the marketing channel. The EU concerted action EISfOM (QLK5-2002-02400) (European Information System for Organic Markets) is attempting to take the first steps in solving these problems. The aim of this concerted action is to build up a framework for reporting valid and reliable data for relevant production and market sectors of the European organic sector in order to meet the needs of policy-makers, farmers, processors, wholesalers and other actors involved in organic markets. In order to reach this aim, this action was split into several workpackages. This report describes the approach and results of workpackages 2 and 3. In this first chapter the objective and general approach of these work packages are described. Chapters 2 and 3 provide an overview of international statistics and data collection systems within the food supply chain at the public and the private level. Chapter 4 describes national statistics and data collection systems within the food supply chain. In Chapter 5, an analysis and appraisal is made of the results with regard to organic data collection and processing systems (DCPSs) and their integration into existing common DCPSs. Chapter 6 draws several general conclusions. Two substantial annexes complete the report, one with the country reports on the situation of data collection and processing in all investigated countries and the other with the first and the second stage questionnaires covering the different data collection levels

    Internet Use in Teacher Preparation Programs: The Relationship between Pedagogy and Practice in the Pennsylvania State System of Higher Education

    Get PDF
    The overall purpose of the study was to examine the relationship between Pennsylvania State System of Higher Education (PASSHE) teacher educators\u27 pedagogical beliefs and their use of telecollaborative Internet activities in practice. The goal of this examination was to address the U.S. Department of Education\u27s Office of Educational Research and Improvement (April 2002) call for collecting data about how digital content is being used and to make recommendations for action. The study collected data, via a web-based survey, about pedagogical beliefs and practices of PASSHE teacher educators. The analysis of descriptive statistics, rankings, Spearman rho correlations, and ANOVA calculations revealed a gap between constructivist pedagogical beliefs and actual instructional practice. Using a typology of constructivist telecollaborative activities, the study pinpointed areas of Internet-specific Pedagogical Knowledge and Technological Knowledge to be developed in PASSHE teacher educators. Recommendations were made for PASSHE programs to collaboratively create telecollaborative inquiry and communication activities, provide professional development in the use of telecollaborative activities, and support integration into teacher preparation programs

    Empirical evidence on copyright earnings

    Get PDF
    corecore