87 research outputs found

    A New Estimator of Intrinsic Dimension Based on the Multipoint Morisita Index

    Full text link
    The size of datasets has been increasing rapidly both in terms of number of variables and number of events. As a result, the empty space phenomenon and the curse of dimensionality complicate the extraction of useful information. But, in general, data lie on non-linear manifolds of much lower dimension than that of the spaces in which they are embedded. In many pattern recognition tasks, learning these manifolds is a key issue and it requires the knowledge of their true intrinsic dimension. This paper introduces a new estimator of intrinsic dimension based on the multipoint Morisita index. It is applied to both synthetic and real datasets of varying complexities and comparisons with other existing estimators are carried out. The proposed estimator turns out to be fairly robust to sample size and noise, unaffected by edge effects, able to handle large datasets and computationally efficient

    Addressing Digital Divide through Digital Literacy Training Programs: A Systematic Literature Review

    Get PDF
    Digital literacy training programs (DLTPs) are influential in developing digital skills to help build a more inclusive and participatory ecosystem. This study presents a review of 86 studies related to DLTPs for marginalised populations in developed and developing countries. It aims to understand (a) the profile of DLTPs, (b) the digital competences incorporated in the training curriculum and (c) tangible outcomes of Internet use post-training. The review indicated that developed countries focus more upon developing digital literacy in elderly populations. In contrast, the focus still lies in developing digital literacy among people with low skills and education levels in developing countries. The training curriculums focus mainly on developing information-seeking and communication competencies, besides the basic operations of digital devices. Most of the studies reported an increase in the personal-level outcomes around health, leisure and self-actualisation achieved post-training. This study can help policymakers, practitioners, and educational researchers improve the scope and quality of educational programs and contribute to people's digital empowerment and well-being

    A Brief History of Web Crawlers

    Full text link
    Web crawlers visit internet applications, collect data, and learn about new web pages from visited pages. Web crawlers have a long and interesting history. Early web crawlers collected statistics about the web. In addition to collecting statistics about the web and indexing the applications for search engines, modern crawlers can be used to perform accessibility and vulnerability checks on the application. Quick expansion of the web, and the complexity added to web applications have made the process of crawling a very challenging one. Throughout the history of web crawling many researchers and industrial groups addressed different issues and challenges that web crawlers face. Different solutions have been proposed to reduce the time and cost of crawling. Performing an exhaustive crawl is a challenging question. Additionally capturing the model of a modern web application and extracting data from it automatically is another open question. What follows is a brief history of different technique and algorithms used from the early days of crawling up to the recent days. We introduce criteria to evaluate the relative performance of web crawlers. Based on these criteria we plot the evolution of web crawlers and compare their performanc

    Knowledge extraction from minutes of Portuguese municipalities meetings

    Get PDF
    A very relevant problem in e-government is that a great amount of knowledge is in natural language unstructured documents. If that knowledge was stored using a computer-processable representation it would be more easily accessed. In this paper we present the architecture, modules and initial results of a prototype under development for extracting information from government documents. The prototype stores the information using a formal representation of the set of concepts and the relationships between those concepts - an ontology. The system was tested using minutes of Portuguese Municipal Boards meetings. Initial results are presented for an important and frequent topic of the minutes: the subsidies granted by municipalities

    Privacy Tradeoffs in Predictive Analytics

    Full text link
    Online services routinely mine user data to predict user preferences, make recommendations, and place targeted ads. Recent research has demonstrated that several private user attributes (such as political affiliation, sexual orientation, and gender) can be inferred from such data. Can a privacy-conscious user benefit from personalization while simultaneously protecting her private attributes? We study this question in the context of a rating prediction service based on matrix factorization. We construct a protocol of interactions between the service and users that has remarkable optimality properties: it is privacy-preserving, in that no inference algorithm can succeed in inferring a user's private attribute with a probability better than random guessing; it has maximal accuracy, in that no other privacy-preserving protocol improves rating prediction; and, finally, it involves a minimal disclosure, as the prediction accuracy strictly decreases when the service reveals less information. We extensively evaluate our protocol using several rating datasets, demonstrating that it successfully blocks the inference of gender, age and political affiliation, while incurring less than 5% decrease in the accuracy of rating prediction.Comment: Extended version of the paper appearing in SIGMETRICS 201

    A Comparison Study of Second-Order Screening Designs and Their Extension

    Get PDF
    Recent literature has proposed employing a single experimental design capable of preforming both factor screening and response surface estimation when conducting sequential experiments is unrealistic due to time, budget, or other constraints. Military systems, particularly aerodynamic systems, are complex. It is not unusual for these systems to exhibit nonlinear response behavior. Developmental testing may be tasked to characterize the nonlinear behavior of such systems while being restricted in how much testing can be accomplished. Second-order screening designs provide a means in a single design experiment to effectively focus test resources onto those factors driving system performance. Sponsored by the Office of the Secretary of Defense (ODS) in support of the Science of Test initiative, this research characterizes and adds to the area of second-order screening designs, particularly as applied to defense testing. Existing design methods are empirically tested and examined for robustness. The leading design method, a method that is very run efficient, is extended to overcome limitations when screening for non-linear effects. A case study and screening design guidance for defense testers is also provided

    Grid Data Management: Open Problems and New Issues

    Get PDF
    International audienceInitially developed for the scientific community, Grid computing is now gaining much interest in important areas such as enterprise information systems. This makes data management critical since the techniques must scale up while addressing the autonomy, dynamicity and heterogeneity of the data sources. In this paper, we discuss the main open problems and new issues related to Grid data management. We first recall the main principles behind data management in distributed systems and the basic techniques. Then we make precise the requirements for Grid data management. Finally, we introduce the main techniques needed to address these requirements. This implies revisiting distributed database techniques in major ways, in particular, using P2P techniques
    corecore