99 research outputs found

    Benchmarking Diverse-Modal Entity Linking with Generative Models

    Full text link
    Entities can be expressed in diverse formats, such as texts, images, or column names and cell values in tables. While existing entity linking (EL) models work well on per modality configuration, such as text-only EL, visual grounding, or schema linking, it is more challenging to design a unified model for diverse modality configurations. To bring various modality configurations together, we constructed a benchmark for diverse-modal EL (DMEL) from existing EL datasets, covering all three modalities including text, image, and table. To approach the DMEL task, we proposed a generative diverse-modal model (GDMM) following a multimodal-encoder-decoder paradigm. Pre-training \Model with rich corpora builds a solid foundation for DMEL without storing the entire KB for inference. Fine-tuning GDMM builds a stronger DMEL baseline, outperforming state-of-the-art task-specific EL models by 8.51 F1 score on average. Additionally, extensive error analyses are conducted to highlight the challenges of DMEL, facilitating future research on this task.Comment: 15 pages. ACL 202

    COSPO/CENDI Industry Day Conference

    Get PDF
    The conference's objective was to provide a forum where government information managers and industry information technology experts could have an open exchange and discuss their respective needs and compare them to the available, or soon to be available, solutions. Technical summaries and points of contact are provided for the following sessions: secure products, protocols, and encryption; information providers; electronic document management and publishing; information indexing, discovery, and retrieval (IIDR); automated language translators; IIDR - natural language capabilities; IIDR - advanced technologies; IIDR - distributed heterogeneous and large database support; and communications - speed, bandwidth, and wireless

    Learning natural language interfaces with neural models

    Get PDF
    Language is the primary and most natural means of communication for humans. The learning curve of interacting with various devices and services (e.g., digital assistants, and smart appliances) would be greatly reduced if we could talk to machines using human language. However, in most cases computers can only interpret and execute formal languages. In this thesis, we focus on using neural models to build natural language interfaces which learn to map naturally worded expressions onto machineinterpretable representations. The task is challenging due to (1) structural mismatches between natural language and formal language, (2) the well-formedness of output representations, (3) lack of uncertainty information and interpretability, and (4) the model coverage for language variations. In this thesis, we develop several flexible neural architectures to address these challenges. We propose a model based on attention-enhanced encoder-decoder neural networks for natural language interfaces. Beyond sequence modeling, we propose a tree decoder to utilize the compositional nature and well-formedness of meaning representations, which recursively generates hierarchical structures in a top-down manner. To model meaning at different levels of granularity, we present a structure-aware neural architecture which decodes semantic representations following a coarse-to-fine procedure. The proposed neural models remain difficult to interpret, acting in most cases as a black box. We explore ways to estimate and interpret the model’s confidence in its predictions, which we argue can provide users with immediate and meaningful feedback regarding uncertain outputs. We estimate confidence scores that indicate whether model predictions are likely to be correct. Moreover, we identify which parts of the input contribute to uncertain predictions allowing users to interpret their model. Model coverage is one of the major reasons resulting in uncertainty of natural language interfaces. Therefore, we develop a general framework to handle the many different ways natural language expresses the same information need. We leverage external resources to generate felicitous paraphrases for the input, and then feed them to a neural paraphrase scoring model which assigns higher weights to linguistic expressions most likely to yield correct answers. The model components are trained end-to-end using supervision signals provided by the target task. Experimental results show that the proposed neural models can be easily ported across tasks. Moreover, the robustness of natural language interfaces can be enhanced by considering the output well-formedness, confidence modeling, and improving model coverage

    Convergence: the next big step

    Get PDF
    Recently, web based multimedia services have gained popularity and have proven themselves to be viable means of communication. This has inspired the telecommunication service providers and network operators to reinvent themselves to try and provide value added IP centric services. There was need for a system which would allow new services to be introduced rapidly with reduced capital expense (CAPEX) and operational expense (OPEX) through increased efficiency in network utilization. Various organizations and standardization agencies have been working together to establish such a system. Internet Protocol Multimedia Subsystem (IMS) is a result of these efforts. IMS is an application level system. It is being developed by 3GPP (3rd Generation Partnership Project) and 3GPP2 (3rd Generation Partnership Project 2) in collaboration with IETF (Internet Engineering Task Force), ITU-T (International Telecommunication Union – Telecommunication Standardization Sector), and ETSI (European Telecommunications Standards Institute) etc. Initially, the main aim of IMS was to bring together the internet and the cellular world, but it has extended to include traditional wire line telecommunication systems as well. It utilizes existing internet protocols such as SIP (Session Initiation Protocol), AAA (Authentication, Authorization and Accounting protocol), and COPS (Common Open Policy Service) etc, and modifies them to meet the stringent requirements of reliable, real time communication systems. The advantages of IMS include easy service quality management (QoS), mobility management, service control and integration. At present a lot of attention is being paid to providing bundled up services in the home environment. Service providers have been successful in providing traditional telephony, high speed internet and cable services in a single package. But there is very little integration among these services. IMS can provide a way to integrate them as well as extend the possibility of various other services to be added to allow increased automation in the home environment. This thesis extends the concept of IMS to provide convergence and facilitate internetworking of the various bundled services available in the home environment; this may include but is not limited to communications (wired and wireless), entertainment, security etc. In this thesis, I present a converged home environment which has a number of elements providing a variety of communication and entertainment services. The proposed network would allow effective interworking of these elements, based on IMS architecture. My aim is to depict the possible advantages of using IMS to provide convergence, automation and integration at the residential level

    A GIS based multi-modal multiple optimal path transit advanced traveler information system

    Get PDF
    A method for the design and use of a Transit Advanced Traveler Information System (TATIS) using an off-the-shelf Geographic Information System (GIS) is developed in this thesis. The research design included: 1) representing multi-modal transit networks in a digital form with schedule databases; 2) development of a multiple optimal path algorithm that takes into account walking transfers using published time schedules; 3) incorporating user preferences and penalties in the algorithm; 4) development of a user-interface with suitable output capabilities; 5) using the prototype for sample inquiries giving performance measures. This prototype was developed using the Arc/Info GIS developed by ESRI, Inc. The principal results of the research demonstrated the effectiveness and robustness of the TATIS prototype with respect to the five previously mentioned issues. Areas of future improvement and research focus on performance measures and added functionality

    Towards an automatic speech recognition system for use by deaf students in lectures

    Get PDF
    According to the Royal National Institute for Deaf people there are nearly 7.5 million hearing-impaired people in Great Britain. Human-operated machine transcription systems, such as Palantype, achieve low word error rates in real-time. The disadvantage is that they are very expensive to use because of the difficulty in training operators, making them impractical for everyday use in higher education. Existing automatic speech recognition systems also achieve low word error rates, the disadvantages being that they work for read speech in a restricted domain. Moving a system to a new domain requires a large amount of relevant data, for training acoustic and language models. The adopted solution makes use of an existing continuous speech phoneme recognition system as a front-end to a word recognition sub-system. The subsystem generates a lattice of word hypotheses using dynamic programming with robust parameter estimation obtained using evolutionary programming. Sentence hypotheses are obtained by parsing the word lattice using a beam search and contributing knowledge consisting of anti-grammar rules, that check the syntactic incorrectness’ of word sequences, and word frequency information. On an unseen spontaneous lecture taken from the Lund Corpus and using a dictionary containing "2637 words, the system achieved 815% words correct with 15% simulated phoneme error, and 73.1% words correct with 25% simulated phoneme error. The system was also evaluated on 113 Wall Street Journal sentences. The achievements of the work are a domain independent method, using the anti- grammar, to reduce the word lattice search space whilst allowing normal spontaneous English to be spoken; a system designed to allow integration with new sources of knowledge, such as semantics or prosody, providing a test-bench for determining the impact of different knowledge upon word lattice parsing without the need for the underlying speech recognition hardware; the robustness of the word lattice generation using parameters that withstand changes in vocabulary and domain

    AN ARCHITECTURAL APPROACH FOR REDUCINGPOWER AND INCREASING SECURITY OF RFID TAGS

    Get PDF
    Radio Frequency Identification (RFID) technology is currently employed for a variety of applications such as RFID-based wireless payment, healthcare, homeland security, asset management,etc. Due to newer privacy requirements and increasingly secure applications, typical RFID tags are required to expand security features such as data encryption and safe transactions. However, RFID tags have extremely strict low-power consumption requirements. Thus, reduced power consumption and secure data transactions are two main problems for the next generation RFID tags.This dissertation presents an architectural approach to address these two main problems.This dissertation provides a multi-domain solution to improve the power consumption andsecurity, while also reducing design time and verification time of the system. In particular, Idescribe (1)a smart buffering technique to allow a tag to remain in a standby mode until addressed,(2)a multi-layer, low-power technique that transcends the passive-transaction, physical, and data layers to provide secure transactions, (3) an FPGA-based traffic profiler system to generate traces of RFID communications for both tag verification and power analysis without the need of actual hardware, and (4) a design automation technique to create physical layer encoding and decoding blocks in hardware suitable for RFID tags.This dissertation presents four contributions: (1) As a result, based on a Markov Process energymodel, the smart buffering technique is shown to reduce power consumption by 85% over a traditionalactive tag; (2) The multi-layer, low-power security technique provides protection againstmalicious reader attacks to disable the tag, to steal the information stored in or communicatedto the device. The power consumption overhead for implementing these layers of security is increased approximately 13% over the basic tag controller; (3) In addition, the FPGA-based traffic profiler system has been able to generate traces for ISO 18000 part 6C (EPC Gen2) protocol; and (4) The designs of endocing/decoding blocks are generated automatically by the Physical LayerSynthesis tool for five protocols used in or related to RFID. Consequently, any power consumption of five designs is less than 5 £gW. Furthermore, compared with five designs implemented by hand, the difference of the power consumption between two of them is less than 7% at most

    Natural language interface to relational database: a simplified customization approach

    Get PDF
    Natural language interfaces to databases (NLIDB) allow end-users with no knowledge of a formal language like SQL to query databases. One of the main open problems currently investigated is the development of NLIDB systems that are easily portable across several domains. The present study focuses on the development and evaluation of methods allowing to simplify customization of NLIDB targeting relational databases without sacrificing coverage and accuracy. This goal is approached by the introduction of two authoring frameworks that aim to reduce the workload required to port a NLIDB to a new domain. The first authoring approach is called top-down; it assumes the existence of a corpus of unannotated natural language sample questions used to pre-harvest key lexical terms to simplify customization. The top-down approach further reduces the configuration workload by autoincluding the semantics for negative form of verbs, comparative and superlative forms of adjectives in the configuration model. The second authoring approach introduced is bottom-up; it explores the possibility of building a configuration model with no manual customization using the information from the database schema and an off-the-shelf dictionary. The evaluation of the prototype system with geo-query, a benchmark query corpus, has shown that the top-down approach significantly reduces the customization workload: 93% of the entries defining the meaning of verbs and adjectives which represents the hard work has been automatically generated by the system; only 26 straightforward mappings and 3 manual definitions of meaning were required for customization. The top-down approach answered correctly 74.5 % of the questions. The bottom-up approach, however, has correctly answered only 1/3 of the questions due to insufficient lexicon and missing semantics. The use of an external lexicon did not improve the system's accuracy. The bottom-up model has nevertheless correctly answered 3/4 of the 105 simple retrieval questions in the query corpus not requiring nesting. Therefore, the bottom-up approach can be useful to build an initial lightweight configuration model that can be incrementally refined by using the failed queries to train a topdown model for example. The experimental results for top-down suggest that it is indeed possible to construct a portable NLIDB that reduces the configuration effort while maintaining a decent coverage and accuracy

    Analysis and Design of Speech-Recognition Grammars

    Get PDF
    Currently, most commercial speech-enabled products are constructed using grammar-based technology. Grammar design is a critical issue for good recognition accuracy. Two methods are commonly used for creating grammars: 1) to generate them automatically from a large corpus of input data which is very costly to acquire, or 2) to construct them using an iterative process involving manual design, followed by testing with end-user speech input. This is a time-consuming and very expensive process requiring expert knowledge of language design, as well as the application area. Another hurdle to the creation and use of speech-enabled applications is that expertise is also required to integrate the speech capability with the application code and to deploy the application for wide-scale use. An alternative approach, which we propose, is 1) to construct them using the iterative process described above, but to replace end-user testing by analysis of the recognition grammars using a set of grammar metrics which have been shown to be good indicators of recognition accuracy, 2) to improve recognition accuracy in the design process by encoding semantic constraints in the syntax rules of the grammar, 3) to augment the above process by generating recognition grammars automatically from specifications of the application, and 4) to use tools for creating speech-enabled applications together with an architecture for their deployment which enables expert users, as well as users who do not have expertise in language processing, to easily build speech applications and add them to the web

    Research in the Language, Information and Computation Laboratory of the University of Pennsylvania

    Get PDF
    This report takes its name from the Computational Linguistics Feedback Forum (CLiFF), an informal discussion group for students and faculty. However the scope of the research covered in this report is broader than the title might suggest; this is the yearly report of the LINC Lab, the Language, Information and Computation Laboratory of the University of Pennsylvania. It may at first be hard to see the threads that bind together the work presented here, work by faculty, graduate students and postdocs in the Computer Science and Linguistics Departments, and the Institute for Research in Cognitive Science. It includes prototypical Natural Language fields such as: Combinatorial Categorial Grammars, Tree Adjoining Grammars, syntactic parsing and the syntax-semantics interface; but it extends to statistical methods, plan inference, instruction understanding, intonation, causal reasoning, free word order languages, geometric reasoning, medical informatics, connectionism, and language acquisition. Naturally, this introduction cannot spell out all the connections between these abstracts; we invite you to explore them on your own. In fact, with this issue it’s easier than ever to do so: this document is accessible on the “information superhighway”. Just call up http://www.cis.upenn.edu/~cliff-group/94/cliffnotes.html In addition, you can find many of the papers referenced in the CLiFF Notes on the net. Most can be obtained by following links from the authors’ abstracts in the web version of this report. The abstracts describe the researchers’ many areas of investigation, explain their shared concerns, and present some interesting work in Cognitive Science. We hope its new online format makes the CLiFF Notes a more useful and interesting guide to Computational Linguistics activity at Penn
    corecore