480 research outputs found

    Multilingual Word Sense Induction to Improve Web Search Result Clustering

    Get PDF
    In [12] a novel approach to Web search result clustering based on Word Sense Induction, i.e. the automatic discovery of word senses from raw text was presented; key to the proposed approach is the idea of, first, automatically in- ducing senses for the target query and, second, clustering the search results based on their semantic similarity to the word senses induced. In [1] we proposed an innovative Word Sense Induction method based on multilingual data; key to our approach was the idea that a multilingual context representation, where the context of the words is expanded by considering its translations in different languages, may im- prove the WSI results; the experiments showed a clear per- formance gain. In this paper we give some preliminary ideas to exploit our multilingual Word Sense Induction method to Web search result clustering

    Semantic Fuzzing with Zest

    Get PDF
    Programs expecting structured inputs often consist of both a syntactic analysis stage, which parses raw input, and a semantic analysis stage, which conducts checks on the parsed input and executes the core logic of the program. Generator-based testing tools in the lineage of QuickCheck are a promising way to generate random syntactically valid test inputs for these programs. We present Zest, a technique which automatically guides QuickCheck-like randominput generators to better explore the semantic analysis stage of test programs. Zest converts random-input generators into deterministic parametric generators. We present the key insight that mutations in the untyped parameter domain map to structural mutations in the input domain. Zest leverages program feedback in the form of code coverage and input validity to perform feedback-directed parameter search. We evaluate Zest against AFL and QuickCheck on five Java programs: Maven, Ant, BCEL, Closure, and Rhino. Zest covers 1.03x-2.81x as many branches within the benchmarks semantic analysis stages as baseline techniques. Further, we find 10 new bugs in the semantic analysis stages of these benchmarks. Zest is the most effective technique in finding these bugs reliably and quickly, requiring at most 10 minutes on average to find each bug.Comment: To appear in Proceedings of 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA'19

    A linked data approach for linking and aligning sign language and spoken language data

    Get PDF
    We present work dealing with a Linked Open Data (LOD)-compliant representation of Sign Language (SL) data, with the goal of supporting the cross-lingual alignment of SL data and their linking to Spoken Language (SpL) data. The proposed representation is based on activities of groups of researchers in the field of SL who have investigated the use of Open Multilingual Wordnet (OMW) datasets for (manually) cross-linking SL data or for linking SL and SpL data. Another group of researchers is proposing an XML encoding of articulatory elements of SLs and (manually) linking those to an SpL lexical resource. We propose an RDF-based representation of those various kinds of data. This unified formal representation offers a semantic repository of information on SL and SpL data that could be accessed for supporting the creation of datasets for training or evaluating NLP applications dealing with SLs, thinking for example of Machine Translation (MT) between SLs and between SLs and SpLs.peer-reviewe

    Invoking the Cyber-Muse:

    No full text
    The technology of automatic essay assessment has advanced rapidly in the past ten years. Several products are now commercially available. Although initially targeted for use in grading aptitude tests, these products will soon be integrated with online learning systems. This presents researchers with an opportunity to consider what it is they really wish to accomplish. The potential impact of automatic essay assessment on the learning environment is great and raises important issues for the online learning community. While automatic writing assessment promises new efficiencies for essay grading, it has the potential to redefine the learning activities it is intended to measure. As we approach emergent technology, such as automatic writing assessment, we need to think carefully about what we really want out of these innovations. There will be pressure to adopt the technology just because it is innovative. Persuasive arguments based on costeffectiveness will be advanced. Convenience and availability will be touted. But it is important to weigh all the issues. No plateau in technological innovation has been reached, nor is any in sight. The pressures brought to bear on culture will continue to intensify as the development of technology continues to accelerate. Turning away from the challenge is a common enough impulse--and this is true of governments as well as of individuals--but given the ubiquity and depth of technological penetration, turning away is not a workable option
    corecore