20,067 research outputs found

    Web Data Extraction, Applications and Techniques: A Survey

    Full text link
    Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

    How Unique and Traceable are Usernames?

    Full text link
    Suppose you find the same username on different online services, what is the probability that these usernames refer to the same physical person? This work addresses what appears to be a fairly simple question, which has many implications for anonymity and privacy on the Internet. One possible way of estimating this probability would be to look at the public information associated to the two accounts and try to match them. However, for most services, these information are chosen by the users themselves and are often very heterogeneous, possibly false and difficult to collect. Furthermore, several websites do not disclose any additional public information about users apart from their usernames (e.g., discus- sion forums or Blog comments), nonetheless, they might contain sensitive information about users. This paper explores the possibility of linking users profiles only by looking at their usernames. The intuition is that the probability that two usernames refer to the same physical person strongly depends on the "entropy" of the username string itself. Our experiments, based on crawls of real web services, show that a significant portion of the users' profiles can be linked using their usernames. To the best of our knowledge, this is the first time that usernames are considered as a source of information when profiling users on the Internet

    Three essays on problem-solving in collaborative open productions

    Get PDF
    The term “open production” is frequently used to describe production systems that rely on volunteer participants who are willing to participate, produce, and bear private costs in order to provide a public good. Examples of open production are becoming increasingly common in many industries. What make these productions possible? How may they be sustained in a world of organizations in which the evolutionary products of economic selection are elaborate hierarchical forms of organization? One way to address these questions is to look at how open productions solve problems that are common to all production organizations such as, for example, problems in the division of labor, allocation of tasks, collaboration, coordination, and maintaining balance between inducement and contributions. Under the conditions of extreme decentralization that are the defining feature of open productions, this approach implies a detailed observation of individual problem solving practices. This is the approach I develop in my dissertation. Unlike much of the prior literature on open productions, I deemphasize motivational elements, status-seeking motives, and allocation of property rights issues. I focus instead on actual work practices as revealed by the day-by-day problem solving activities that qualify open productions projects as production organizations despite the absence of formal contractual arrangements to regulate principal-agent relations. What my work adds to the extensive, informative, and well-developed discipline-based explanations that are currently available, is a focus on the emergence of micro-organizational mechanisms through which problem assignment (Chapter 2), problem resolution (Chapter 3), and sustained participation (Chapter 4) are obtained in open productions. In my essays, I draw from organizational sociology and the behavioral theory of the firm to specify models that relate individual problem-solving activities to structured patterns of action through emergent work practices. In the models that I specify and test, I emphasize processes of attention allocation (Chapter 2), repeated collaboration and group diversity (Chapter 3) and identity construction (Chapter 4) as central to our understanding of the dynamics of problem-solving in organizations. One element of novelty in my study is that my research design makes these work practices directly observable at a level of detail, completeness, and precision that was inaccessible in the past. To illustrate the empirical value of the view that I develop I examine problem-solving activities – i.e., bug fixing and code production – within two Free/Open Source Software (F/OSS) projects during their entire life span. Readers of my work will know more about how organizational micro-mechanisms emerge in open productions

    Network of excellence in internet science: D13.2.1 Internet science – going forward: internet science roadmap (preliminary version)

    No full text

    The Future of the Curriculum: School knowledge in the digital age

    Get PDF
    Digital media and learning has become a critical area for educational research in the twenty-first century. Yet little research has been carried out on the practical and conceptual implications for the school curriculum in the digital age. This report asks a very simple question: what might be the future of the curriculum in the digital age? It examines a series of twenty-first century curriculum innovations in order to show how various ideas about the future curriculum are now being styled into school practice, and it seeks to understand the emerging issues raised by meshing the curriculum and digital media together. It explores a range of contemporary social, political, economic, and cultural issues facing the future of the curriculum and examines the production of ideas about the practical organization and planning of a future curriculum. What kinds of visions for the curriculum of the future are being imagined, invented, and promoted? How is the curriculum of the future being made thinakble, intelligible and practicable as a problem requiring reformatory intervention

    Deep Learning Data and Indexes in a Database

    Get PDF
    A database is used to store and retrieve data, which is a critical component for any software application. Databases requires configuration for efficiency, however, there are tens of configuration parameters. It is a challenging task to manually configure a database. Furthermore, a database must be reconfigured on a regular basis to keep up with newer data and workload. The goal of this thesis is to use the query workload history to autonomously configure the database and improve its performance. We achieve proposed work in four stages: (i) we develop an index recommender using deep reinforcement learning for a standalone database. We evaluated the effectiveness of our algorithm by comparing with several state-of-the-art approaches, (ii) we build a real-time index recommender that can, in real-time, dynamically create and remove indexes for better performance in response to sudden changes in the query workload, (iii) we develop a database advisor. Our advisor framework will be able to learn latent patterns from a workload. It is able to enhance a query, recommend interesting queries, and summarize a workload, (iv) we developed LinkSocial, a fast, scalable, and accurate framework to gain deeper insights from heterogeneous data

    The social, cosmopolitanism and beyond

    Get PDF
    First, this article will outline the metaphysics of ‘the social’ that implicitly and explicitly connects the work of lassical and contemporary cosmopolitan sociologists as different as Durkheim, Weber, Beck and Luhmann. In a second step, I will show that the cosmopolitan outlook of classical sociology is driven by exclusive differences. In understanding human affairs, both classical sociology and contemporary cosmopolitan sociology reflect a very modernist outlook of epistemological, conceptual, methodological and disciplinary rigour that separates the cultural sphere from the natural objects of concern. I will suggest that classical sociology – in order to be cosmopolitan – is forced (1) to exclude non-social and non-human objects as part of its conceptual and methodological rigour, and (2) consequently and methodologically to rule out the non-social and the non-human. Cosmopolitan sociology imagines ‘the social’ as a global, universal explanatory device to conceive and describe the non-social and non-human. In a third and final step the article draws upon the work of the French sociologist Gabriel Tarde and offers a possible alternative to the modernist social and cultural other-logics of social sciences. It argues for a inclusive conception of ‘the social’ that gives the non-social and non-human a cosmopolitan voice as well
    • 

    corecore