379 research outputs found

    Viewing the Dictionary as a Classification System

    Get PDF
    Information retrieval is one of the earliest applications of computers. Starting with the speculative wode of Vannevar Bush on Memex [Bush 45], to the development of Key Word in Context (KWIC) indexing by H.P. Luhn [Luhn 60] and Boolean retrieval by John Horty [Horty 62], to the statistical techniques for automatic indexing and document retrieval done in the 1960's and continuing to the present [Salton and McGill 83], Information Retrieval has continued to develop and progress. However, there is a growing consensus that current generation statistical techniques have gone about as far as they can go, and that further improvement requires the use of natural language processing and knowledge representation. We believe that the best place to start is by focusing on the lexicon, and to index documents not by words, but by word senses. Why use word senses? Conventional approaches advocate either indexing by the words themselves, or by manual indexing using a controlled vocabulary. Manual indexing offers some of the advantage of word senses, in that the terms are not ambiguous, but it suffers from problems of consistency. In addition, as text data bases continue to grow, it will only be possible to index a fraction of them by hand. In advocating word senses as indices we are not suggesting that they are the ultimate answer. There is much more to the meaning of a document then the senses of the words it contains; we are just saying that senses are a good start. Any approach to providing a semantic analysis must deal with the problem of word meaning. Existing retrieval systems try to go beyond single words by using a thesaurus,l but this has the problem that words are not synonymous in all contexts. The word 'term' may be synonymous with 'word' (as in a vocabulary term), 'sentence' (as in a prison term), or 'condition' (as in 'terms of agreement'). If we expand the query with words from a thesaurus, we must be careful to use the right senses of those words. We not only have to know the sense of the word in the query (in this example, the sense of the word 'term '), but the sense of the word that is being used to augment it (e.g., the appropriate sense of the word 'sentence'). The thesaurus we use should be one in which the senses of words are explicitly indicated [Chodorow et al. 88]. We contend that the best place to obtain word senses is a machine-readable dictionary. Although it is possible that another list of senses might be manually constructed, this strategy might cause some senses to be overlooked, and the task will entail a great degree of effort

    Viewing morphology as an inference process

    Get PDF
    AbstractMorphology is the area of linguistics concerned with the internal structure of words. Information retrieval has generally not paid much attention to word structure, other than to account for some of the variability in word forms via the use of stemmers. We report on our experiments to determine the importance of morphology, and the effect that it has on performance. We found that grouping morphological variants makes a significant improvement in retrieval performance. Improvements are seen by grouping inflectional as well as derivational variants. We also found that performance was enhanced by recognizing lexical phrases. We describe the interaction between morphology and lexical ambiguity, and how resolving that ambiguity will lead to further improvements in performance

    Wave Attenuation in Salt Marshes

    Get PDF
    Salt marshes can provide several ecosystem services including shoreline protection. One of the ways marshes do this is by attenuating waves. However, the rate at which waves are attenuated by marshes is highly variable. Previous studies have found that the rate of wave attenuation varies with water depth, vegetation characteristics, bottom profile, and wave conditions. This study examined how these factors affect wave attenuation rate at four marsh sites in coastal North Carolina with different vegetation and bottom profile characteristics. The rate of wave attenuation was found to increase with decreasing water depth and increasing vegetation height and increasing vegetation density. Sites with short vegetation, which is submerged more of the time, had larger rates of decrease in wave attenuation with increasing depth than sites with tall vegetation. Waves with long periods were attenuated less than waves with short periods. Attenuation rates were much less and decreased more with increasing water depth during storm conditions than during normal conditions at the same site.Bachelor of Scienc

    Viewing the Dictionary as a Classification System

    Get PDF
    Information retrieval is one of the earliest applications of computers. Starting with the speculative wode of Vannevar Bush on Memex [Bush 45], to the development of Key Word in Context (KWIC) indexing by H.P. Luhn [Luhn 60] and Boolean retrieval by John Horty [Horty 62], to the statistical techniques for automatic indexing and document retrieval done in the 1960's and continuing to the present [Salton and McGill 83], Information Retrieval has continued to develop and progress. However, there is a growing consensus that current generation statistical techniques have gone about as far as they can go, and that further improvement requires the use of natural language processing and knowledge representation. We believe that the best place to start is by focusing on the lexicon, and to index documents not by words, but by word senses. Why use word senses? Conventional approaches advocate either indexing by the words themselves, or by manual indexing using a controlled vocabulary. Manual indexing offers some of the advantage of word senses, in that the terms are not ambiguous, but it suffers from problems of consistency. In addition, as text data bases continue to grow, it will only be possible to index a fraction of them by hand. In advocating word senses as indices we are not suggesting that they are the ultimate answer. There is much more to the meaning of a document then the senses of the words it contains; we are just saying that senses are a good start. Any approach to providing a semantic analysis must deal with the problem of word meaning. Existing retrieval systems try to go beyond single words by using a thesaurus,l but this has the problem that words are not synonymous in all contexts. The word 'term' may be synonymous with 'word' (as in a vocabulary term), 'sentence' (as in a prison term), or 'condition' (as in 'terms of agreement'). If we expand the query with words from a thesaurus, we must be careful to use the right senses of those words. We not only have to know the sense of the word in the query (in this example, the sense of the word 'term '), but the sense of the word that is being used to augment it (e.g., the appropriate sense of the word 'sentence'). The thesaurus we use should be one in which the senses of words are explicitly indicated [Chodorow et al. 88]. We contend that the best place to obtain word senses is a machine-readable dictionary. Although it is possible that another list of senses might be manually constructed, this strategy might cause some senses to be overlooked, and the task will entail a great degree of effort

    The universality of iterated hashing over variable-length strings

    Get PDF
    Iterated hash functions process strings recursively, one character at a time. At each iteration, they compute a new hash value from the preceding hash value and the next character. We prove that iterated hashing can be pairwise independent, but never 3-wise independent. We show that it can be almost universal over strings much longer than the number of hash values; we bound the maximal string length given the collision probability

    Inferring hierarchical descriptions

    Get PDF

    Robust Authenticated-Encryption: AEZ and the Problem that it Solves

    Get PDF
    With a scheme for \textit{robust} authenticated-encryption a user can select an arbitrary value λ0\lambda \ge 0 and then encrypt a plaintext of any length into a ciphertext that\u27s λ\lambda characters longer. The scheme must provide all the privacy and authenticity possible for the requested~λ\lambda. We formalize and investigate this idea, and construct a well-optimized solution, AEZ, from the AES round function. Our scheme encrypts strings at almost the same rate as OCB-AES or CTR-AES (on Haswell, AEZ has a peak speed of about 0.7 cpb). To accomplish this we employ an approach we call \textit{prove-then-prune}: prove security and then instantiate with a \textit{scaled-down} primitive (e.g., reducing rounds for blockcipher calls)

    OCB Mode

    Get PDF
    This paper was prepared for NIST, which is considering new block-cipher modes of operation. It describes a parallelizable mode of operation that simultaneously provides both privacy and authenticity. OCB mode encrypts-and-authenticates an arbitrary message M\in\bits^* using only M/n+2\lceil |M|/n\rceil + 2 block-cipher invocations, where nn is the block length of the underlying block cipher. Additional overhead is small. OCB refines a scheme, IAPM, suggested by Jutla [IACR-2000/39], who was the first to devise an authenticated-encryption mode with minimal overhead compared to standard modes. Desirable new properties of OCB include: very cheap offset calculations; operating on an arbitrary message M\in\bits^*; producing ciphertexts of minimal length; using a single underlying cryptographic key; making a nearly optimal number of block-cipher calls; avoiding the need for a random IV; and rendering it infeasible for an adversary to find pretag collisions . The paper provides a full proof of security for OCB

    On the Molecular Basis of Ion Permeation in the Epithelial Na+ Channel

    Get PDF
    The epithelial Na+ channel (ENaC) is highly selective for Na+ and Li+ over K+ and is blocked by the diuretic amiloride. ENaC is a heterotetramer made of two α, one β, and one γ homologous subunits, each subunit comprising two transmembrane segments. Amino acid residues involved in binding of the pore blocker amiloride are located in the pre-M2 segment of β and γ subunits, which precedes the second putative transmembrane α helix (M2). A residue in the α subunit (αS589) at the NH2 terminus of M2 is critical for the molecular sieving properties of ENaC. ENaC is more permeable to Li+ than Na+ ions. The concentration of half-maximal unitary conductance is 38 mM for Na+ and 118 mM for Li+, a kinetic property that can account for the differences in Li+ and Na+ permeability. We show here that mutation of amino acid residues at homologous positions in the pre-M2 segment of α, β, and γ subunits (αG587, βG529, γS541) decreases the Li+/Na+ selectivity by changing the apparent channel affinity for Li+ and Na+. Fitting single-channel data of the Li+ permeation to a discrete-state model including three barriers and two binding sites revealed that these mutations increased the energy needed for the translocation of Li+ from an outer ion binding site through the selectivity filter. Mutation of βG529 to Ser, Cys, or Asp made ENaC partially permeable to K+ and larger ions, similar to the previously reported αS589 mutations. We conclude that the residues αG587 to αS589 and homologous residues in the β and γ subunits form the selectivity filter, which tightly accommodates Na+ and Li+ ions and excludes larger ions like K+

    A Cooperative Paradigm for Fighting Information Overload

    Full text link
    corecore