Search CORE

13,235 research outputs found

Utilizing sub-topical structure of documents for information retrieval.

Author: Ganguly Debasis
Jones Gareth J.F.
Leveling Johannes
Publication venue
Publication date: 28/10/2011
Field of study

Text segmentation in natural language processing typically refers to the process of decomposing a document into constituent subtopics. Our work centers on the application of text segmentation techniques within information retrieval (IR) tasks. For example, for scoring a document by combining the retrieval scores of its constituent segments, exploiting the proximity of query terms in documents for ad-hoc search, and for question answering (QA), where retrieved passages from multiple documents are aggregated and presented as a single document to a searcher. Feedback in ad hoc IR task is shown to beneﬁt from the use of extracted sentences instead of terms from the pseudo relevant documents for query expansion. Retrieval effectiveness for patent prior art search task is enhanced by applying text segmentation to the patent queries. Another aspect of our work involves augmenting text segmentation techniques to produce segments which are more readable with less unresolved anaphora. This is particularly useful for QA and snippet generation tasks where the objective is to aggregate relevant and novel information from multiple documents satisfying user information need on one hand, and ensuring that the automatically generated content presented to the user is easily readable without reference to the original source document

CiteSeerX

Irish Universities

DCU Online Research Access Service

Extending the 5S Framework of Digital Libraries to support Complex Objects, Superimposed Information, and Content-Based Image Retrieval Services

Author: Archer David
Delcambre Lois
Fox Edward
Goncalves Marcos
Kozievitch Nadia
Leidig Jonathan
Murthy Uma
Torres Ricardo
Yang Seungwon
Publication venue
Publication date: 01/01/2010
Field of study

Advanced services in digital libraries (DLs) have been developed and widely used to address the required capabilities of an assortment of systems as DLs expand into diverse application domains. These systems may require support for images (e.g., Content-Based Image Retrieval), Complex (information) Objects, and use of content at fine grain (e.g., Superimposed Information). Due to the lack of consensus on precise theoretical definitions for those services, implementation efforts often involve ad hoc development, leading to duplication and interoperability problems. This article presents a methodology to address those problems by extending a precisely specified minimal digital library (in the 5S framework) with formal definitions of aforementioned services. The theoretical extensions of digital library functionality presented here are reinforced with practical case studies as well as scenarios for the individual and integrative use of services to balance theory and practice. This methodology has implications that other advanced services can be continuously integrated into our current extended framework whenever they are identified. The theoretical definitions and case study we present may impact future development efforts and a wide range of digital library researchers, designers, and developers

Computer Science Technical Reports @Virginia Tech

Poor Man's Content Centric Networking (with TCP)

Author: Budigere K.
Ott J.
Perkins C.S.
Sarolahti P.
Publication venue: Aalto University
Publication date: 01/01/2011
Field of study

A number of different architectures have been proposed in support of data-oriented or information-centric networking. Besides a similar visions, they share the need for designing a new networking architecture. We present an incrementally deployable approach to content-centric networking based upon TCP. Content-aware senders cooperate with probabilistically operating routers for scalable content delivery (to unmodified clients), effectively supporting opportunistic caching for time-shifted access as well as de-facto synchronous multicast delivery. Our approach is application protocol-independent and provides support beyond HTTP caching or managed CDNs. We present our protocol design along with a Linux-based implementation and some initial feasibility checks

Aaltodoc Publication Archive

Enlighten

Internet of things security implementation using blockchain for wireless technology

Author: Karim Karim Jabur
Publication venue
Publication date: 01/07/2019
Field of study

Blockchain is a new security system which group many data into a block or so called classifying the data into a block. The block can have many types and each of them content data and security code. By using a decentralize mechanism, one security code protect all the data. That could happen at the server. In this research, a network of wireless sensor technology is proposed. The transmission of sensor data is via the Internet of things (Internet of Thing) technology. As many data transmitted, they have to classified and group them into a block. All the blocks are then send to the central processing unit, like a microcontroller. The block of data is then processed, identified and encrypted before send over the internet network. At the receiver, a GUI or Apps is developed to open and view the data. The Apps or GUI have an encrypted data or security code. User must key in the password before they can view the data. The password used by the end user at the Apps or GUI must be equivalent to the one encrypted at the sensor nodes. This is to satisfy the decentralized concept used in the Blockchain. To demonstrate the Blockchain technology applied to the wireless sensor network, a MATLAB Simulink function is used. The expected results should show a number of block of data in cryptography manner and chain together. The two set of data. Both have the data encrypted using hash. The black dots indicate the data has been encrypted whereas the white dot indicate indicates the data is not encrypted. The half white and half black indicates the data is in progress of encrypted. All this data should arrange in cryptography order and chain together in a vertical line. A protocol called block and chain group the data into the block and then chain then. The data appears in the blocks and send over the network. As seen in the simulation results, the yellow color represents the user data. This data has a default amplitude as 1 or 5. The data is chained and blocked to produce the Blockchain waveform Keywords: Blockchain, Internet of things, Wireless Sensor Network and MATLAB Simulin

UTHM Institutional Repository

Team Learning, Development, and Adaptation

Author: Bell Bradford S.
Kozlowski Steve W. J.
Publication venue: DigitalCommons@ILR
Publication date: 01/01/2008
Field of study

[Excerpt] Our purpose is to explore conceptually these themes centered on team learning, development, and adaptation. We note at the onset that this chapter is not a comprehensive review of the literature. Indeed, solid conceptual and empirical work on these themes are sparse relative to the vast amount of work on team effectiveness more generally, and therefore a thematic set of topics that are ripe for conceptual development and integration. We draw on an ongoing stream of theory development and research in these areas to integrate and sculpt a distinct perspective on team learning, development, and adaptation

DigitalCommons@ILR

eCommons@Cornell

Entities of interest:Discovery in digital traces

Author: Graus D.P.
Publication venue
Publication date: 01/01/2017
Field of study

International Migration, Integration and Social Cohesion online publications

Genie: A Generator of Natural Language Semantic Parsers for Virtual Assistant Commands

Author: Alvarez-Melis David
Banarescu Laura
Chen David L
Chu Shumo
Ganitkevitch Juri
Kate Rohit J
Kingma Diederik P
Pasupat Panupong
Quirk Chris
Shetty Jitesh
Steedman Mark
Trakhtenbrot Boris A.
Wang Yushi
Wong Yuk Wah
Xu Xiaojun
Zelle John M
Zettlemoyer Luke S
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 18/04/2019
Field of study

To understand diverse natural language commands, virtual assistants today are trained with numerous labor-intensive, manually annotated sentences. This paper presents a methodology and the Genie toolkit that can handle new compound commands with significantly less manual effort. We advocate formalizing the capability of virtual assistants with a Virtual Assistant Programming Language (VAPL) and using a neural semantic parser to translate natural language into VAPL code. Genie needs only a small realistic set of input sentences for validating the neural model. Developers write templates to synthesize data; Genie uses crowdsourced paraphrases and data augmentation, along with the synthesized data, to train a semantic parser. We also propose design principles that make VAPL languages amenable to natural language translation. We apply these principles to revise ThingTalk, the language used by the Almond virtual assistant. We use Genie to build the first semantic parser that can support compound virtual assistants commands with unquoted free-form parameters. Genie achieves a 62% accuracy on realistic user inputs. We demonstrate Genie's generality by showing a 19% and 31% improvement over the previous state of the art on a music skill, aggregate functions, and access control.Comment: To appear in PLDI 201

arXiv.org e-Print Archive

Crossref

On enhancing the robustness of timeline summarization test collections

Author: Macdonald Craig
McCreadie Richard
Ounis Iadh
Rajput Shahzad
Soboroff Ian
Publication venue: 'Elsevier BV'
Publication date: 01/09/2019
Field of study

Timeline generation systems are a class of algorithms that produce a sequence of time-ordered sentences or text snippets extracted in real-time from high-volume streams of digital documents (e.g. news articles), focusing on retaining relevant and informative content for a particular information need (e.g. topic or event). These systems have a range of uses, such as producing concise overviews of events for end-users (human or artificial agents). To advance the field of automatic timeline generation, robust and reproducible evaluation methodologies are needed. To this end, several evaluation metrics and labeling methodologies have recently been developed - focusing on information nugget or cluster-based ground truth representations, respectively. These methodologies rely on human assessors manually mapping timeline items (e.g. sentences) to an explicit representation of what information a ‘good’ summary should contain. However, while these evaluation methodologies produce reusable ground truth labels, prior works have reported cases where such evaluations fail to accurately estimate the performance of new timeline generation systems due to label incompleteness. In this paper, we first quantify the extent to which the timeline summarization test collections fail to generalize to new summarization systems, then we propose, evaluate and analyze new automatic solutions to this issue. In particular, using a depooling methodology over 19 systems and across three high-volume datasets, we quantify the degree of system ranking error caused by excluding those systems when labeling. We show that when considering lower-effectiveness systems, the test collections are robust (the likelihood of systems being miss-ranked is low). However, we show that the risk of systems being mis-ranked increases as the effectiveness of systems held-out from the pool increases. To reduce the risk of mis-ranking systems, we also propose a range of different automatic ground truth label expansion techniques. Our results show that the proposed expansion techniques can be effective at increasing the robustness of the TREC-TS test collections, as they are able to generate large numbers missing matches with high accuracy, markedly reducing the number of mis-rankings by up to 50%

Enlighten