924 research outputs found

    Towards the Automatic Classification of Documents in User-generated Classifications

    Get PDF
    There is a huge amount of information scattered on the World Wide Web. As the information flow occurs at a high speed in the WWW, there is a need to organize it in the right manner so that a user can access it very easily. Previously the organization of information was generally done manually, by matching the document contents to some pre-defined categories. There are two approaches for this text-based categorization: manual and automatic. In the manual approach, a human expert performs the classification task, and in the second case supervised classifiers are used to automatically classify resources. In a supervised classification, manual interaction is required to create some training data before the automatic classification task takes place. In our new approach, we intend to propose automatic classification of documents through semantic keywords and building the formulas generation by these keywords. Thus we can reduce this human participation by combining the knowledge of a given classification and the knowledge extracted from the data. The main focus of this PhD thesis, supervised by Prof. Fausto Giunchiglia, is the automatic classification of documents into user-generated classifications. The key benefits foreseen from this automatic document classification is not only related to search engines, but also to many other fields like, document organization, text filtering, semantic index managing

    Top-Down Skiplists

    Full text link
    We describe todolists (top-down skiplists), a variant of skiplists (Pugh 1990) that can execute searches using at most log2εn+O(1)\log_{2-\varepsilon} n + O(1) binary comparisons per search and that have amortized update time O(ε1logn)O(\varepsilon^{-1}\log n). A variant of todolists, called working-todolists, can execute a search for any element xx using log2εw(x)+o(logw(x))\log_{2-\varepsilon} w(x) + o(\log w(x)) binary comparisons and have amortized search time O(ε1logw(w))O(\varepsilon^{-1}\log w(w)). Here, w(x)w(x) is the "working-set number" of xx. No previous data structure is known to achieve a bound better than 4log2w(x)4\log_2 w(x) comparisons. We show through experiments that, if implemented carefully, todolists are comparable to other common dictionary implementations in terms of insertion times and outperform them in terms of search times.Comment: 18 pages, 5 figure

    Improved Bounds for Multipass Pairing Heaps and Path-Balanced Binary Search Trees

    Get PDF
    We revisit multipass pairing heaps and path-balanced binary search trees (BSTs), two classical algorithms for data structure maintenance. The pairing heap is a simple and efficient "self-adjusting" heap, introduced in 1986 by Fredman, Sedgewick, Sleator, and Tarjan. In the multipass variant (one of the original pairing heap variants described by Fredman et al.) the minimum item is extracted via repeated pairing rounds in which neighboring siblings are linked. Path-balanced BSTs, proposed by Sleator (cf. Subramanian, 1996), are a natural alternative to Splay trees (Sleator and Tarjan, 1983). In a path-balanced BST, whenever an item is accessed, the search path leading to that item is re-arranged into a balanced tree. Despite their simplicity, both algorithms turned out to be difficult to analyse. Fredman et al. showed that operations in multipass pairing heaps take amortized O(log n * log log n / log log log n) time. For searching in path-balanced BSTs, Balasubramanian and Raman showed in 1995 the same amortized time bound of O(log n * log log n / log log log n), using a different argument. In this paper we show an explicit connection between the two algorithms and improve both bounds to O(log n * 2^{log^* n} * log^* n), respectively O(log n * 2^{log^* n} * (log^* n)^2), where log^* denotes the slowly growing iterated logarithm function. These are the first improvements in more than three, resp. two decades, approaching the information-theoretic lower bound of Omega(log n)

    08081 Abstracts Collection -- Data Structures

    Get PDF
    From February 17th to 22nd 2008, the Dagstuhl Seminar 08081 ``Data Structures\u27\u27 was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. It brought together 49 researchers from four continents to discuss recent developments concerning data structures in terms of research but also in terms of new technologies that impact how data can be stored, updated, and retrieved. During the seminar a fair number of participants presented their current research. There was discussion of ongoing work, and in addition an open problem session was held. This paper first describes the seminar topics and goals in general, then gives the minutes of the open problem session, and concludes with abstracts of the presentations given during the seminar. Where appropriate and available, links to extended abstracts or full papers are provided

    Control of Humanoid Robots for Use in Unstructured Environments

    Get PDF
    Humanoid robots have the potential to replace human beings for dangerous tasks, such as disaster relief. One of the most important abilities for a humanoid robot is the ability to manipulate its surroundings. We developed human-in-the-loop techniques for the Atlas platform to compete in the DARPA Robotics Challenge. Many of the tasks in the event required actuation of the environment, such as turning a valve, pulling a lever, and opening a door. This paper will detail our work on manipulation for humanoid robots. In particular, we will discuss our approaches to effective operator interface design, manipulation techniques and motion planning

    Approximate resilience, monotonicity, and the complexity of agnostic learning

    Full text link
    A function ff is dd-resilient if all its Fourier coefficients of degree at most dd are zero, i.e., ff is uncorrelated with all low-degree parities. We study the notion of approximate\mathit{approximate} resilience\mathit{resilience} of Boolean functions, where we say that ff is α\alpha-approximately dd-resilient if ff is α\alpha-close to a [1,1][-1,1]-valued dd-resilient function in 1\ell_1 distance. We show that approximate resilience essentially characterizes the complexity of agnostic learning of a concept class CC over the uniform distribution. Roughly speaking, if all functions in a class CC are far from being dd-resilient then CC can be learned agnostically in time nO(d)n^{O(d)} and conversely, if CC contains a function close to being dd-resilient then agnostic learning of CC in the statistical query (SQ) framework of Kearns has complexity of at least nΩ(d)n^{\Omega(d)}. This characterization is based on the duality between 1\ell_1 approximation by degree-dd polynomials and approximate dd-resilience that we establish. In particular, it implies that 1\ell_1 approximation by low-degree polynomials, known to be sufficient for agnostic learning over product distributions, is in fact necessary. Focusing on monotone Boolean functions, we exhibit the existence of near-optimal α\alpha-approximately Ω~(αn)\widetilde{\Omega}(\alpha\sqrt{n})-resilient monotone functions for all α>0\alpha>0. Prior to our work, it was conceivable even that every monotone function is Ω(1)\Omega(1)-far from any 11-resilient function. Furthermore, we construct simple, explicit monotone functions based on Tribes{\sf Tribes} and CycleRun{\sf CycleRun} that are close to highly resilient functions. Our constructions are based on a fairly general resilience analysis and amplification. These structural results, together with the characterization, imply nearly optimal lower bounds for agnostic learning of monotone juntas
    corecore