291 research outputs found

    The Traveling Salesman Problem in the Natural Environment

    Get PDF
    Is it possible for humans to navigate in the natural environment wherein the path taken between various destinations is 'optimal' in some way? In the domain of optimization this challenge is traditionally framed as the "Traveling Salesman Problem" (TSP). What strategies and ecological considerations are plausible for human navigation? When given a two-dimensional map-like presentation of the destinations, participants solve this optimization exceptionally well (only 2-3% longer than optimum)^1, 2^. In the following experiments we investigate the effect of effort and its environmental affordance on navigation decisions when humans solve the TSP in the natural environment. Fifteen locations were marked on two outdoor landscapes with flat and varied terrains respectively. Performance in the flat-field condition was excellent (∼6% error) and was worse but still quite good in the variable-terrain condition (∼20% error), suggesting participants do not globally pre-plan routes but rather develop them on the fly. We suggest that perceived effort guides participant solutions due to the dynamic constraints of effortful locomotion and obstacle avoidance

    Fifty years of spellchecking

    Get PDF
    A short history of spellchecking from the late 1950s to the present day, describing its development through dictionary lookup, affix stripping, correction, confusion sets, and edit distance to the use of gigantic databases

    Units of measure identification in unstructured scientific documents in microbial risk in food

    Get PDF
    International audienceOBJECTIVE(S) A preliminary step in microbial risk assessment in food is to gather and capitalize experimental data. Data capitalization is a crucial stake in an overall decision support system which consists of predicting microbial behavior [1]. In the framework of the French ANR project MAP'OPT (Equilibrium Gas Composition in Modified Atmosphere Packaging and Food Quality), the predictive modeling platform Sym'Previus (www.symprevius.org) should be able to propose a global approach to establish a scientifically sound method for choosing an appropriate modified atmosphere and associated packaging solution. Our work is part of this overall system and aims at extracting semi-automatically experimental data from unstructured scientific documents. Indeed, these documents use natural language combined with domain-specific terminology that is extremely time-consuming and tedious to extract in the free form of text and therefore to gather and capitalize. Our work relies on the MAP'OPT-Onto ontology [4], which has been built as an extension of the ontology used in Sym'Previus by adding concepts about food packaging, quantity concepts and concepts managing units of measures. Experimental data are often expressed with concepts (e.g packaging, permeability) or a numerical value often followed with its unit of measure (e.g. 258 amol m-1 s-1 Pa-1). In this paper, our work deals with unit recognition, known as a scientific challenge. METHOD(S) Extracting automatically quantitative data is a painstaking process because units suffer from different ways of writing within documents. We can encounter same units written in different manners such as amol m-1 s-1 Pa-1 written as amol.m-1 .s-1 .Pa-1 or as amol/m/s/Pa. We aim at focusing on the extraction and identification of these variant units seen as synonyms, in order to enrich iteratively an ontology, which represents a predefined vocabulary used to annotate, capitalize and query experimental data extracted from texts [2]. Our work addresses unit extraction and identification issues from texts to enrich an ontology in a two-step approach. First, we use text-mining methods and supervised learning approaches in order to predict relevant parts of the text where synonyms of units or new units are. The second step of our method consists in extracting specific strings representing units in the segments of texts found in the previous step. The extracted candidates are compared to units already present in the ontology using a new edit measure based on Damerau-Levenshtein [3]. RESULTS We have made experiments on 115 scientific documents (i.e. around 35 000 sentences) on food packaging. Each unit is recognized from a list of 211 units already defined in the MAP'OPT-Onto. Our learning algorithms predict that almost 5 000 sentences contain units. This prediction is correct for 95,5% of cases. In the second step, we have successfully extracted 38 terms as either synonyms or new units from sentences selected in the first step. So, we can propose 18% of enrichment of the pre-existing MAP'OPT-Onto

    Improving the Usability and Security of Digital Authentication

    Get PDF
    The need for both usable and secure authentication is more pronounced than ever before. Security researchers and professionals will need to have a deep understanding of human factors to address these issues. Due to their ubiquity, recoverability, and low barrier of entry, passwords remain the most common means of digital authentication. However, fundamental human nature dictates that it is exceedingly difficult for people to generate secure passwords on their own. System-generated random passwords can be secure but are often unusable, which is why most passwords are still created by humans. We developed a simple system for automatically generating mnemonic phrases and supporting mnemonic images for randomly generated passwords. We found that study participants remembered their passwords significantly better using our system than with existing systems. To combat shoulder surfing – looking at a user’s screen or keyboard as he or she enters sensitive input such as passwords – we developed an input masking technique that was demonstrated to minimize the threat of shoulder surfing attacks while improving the usability of password entry over existing methods. Extending this previous work to support longer passphrases will lead to advancements in the state of digital authentication

    A practical index for approximate dictionary matching with few mismatches

    Get PDF
    Approximate dictionary matching is a classic string matching problem (checking if a query string occurs in a collection of strings) with applications in, e.g., spellchecking, online catalogs, geolocation, and web searchers. We present a surprisingly simple solution called a split index, which is based on the Dirichlet principle, for matching a keyword with few mismatches, and experimentally show that it offers competitive space-time tradeoffs. Our implementation in the C++ language is focused mostly on data compaction, which is beneficial for the search speed (e.g., by being cache friendly). We compare our solution with other algorithms and we show that it performs better for the Hamming distance. Query times in the order of 1 microsecond were reported for one mismatch for the dictionary size of a few megabytes on a medium-end PC. We also demonstrate that a basic compression technique consisting in qq-gram substitution can significantly reduce the index size (up to 50% of the input text size for the DNA), while still keeping the query time relatively low

    Spelling Checker using Algorithm Damerau Levenshtein Distance and Cosine Similarity

    Get PDF
    Writing is an embodiment of the author's ideas that are to be conveyed to others. A writer often experiences typos in typing the script, so that it can influence the meaning of the text. Therefore, a system is needed to detect word errors. In this study, checking is done by using the Dictionary Lookup method and giving the candidate words using the Damerau Levenshtein Distance algorithm. Candidates will then determine the ranking by breaking the word into Bigram form and calculating the similarity value using the Cosine Similarity algorithm. The test results based on the data used yield different Mean Reciprocal Rank (MRR) values for each type of error. The type of error deletion produces an MRR value of 88.89%, the type of insertion error produces an MRR value of 97.78%, the type of substitution error produces an MRR value of 88.89%, the type of transposition error produces an MRR value of 89
    • …
    corecore