723 research outputs found
Text Segmentation Using Exponential Models
This paper introduces a new statistical approach to partitioning text
automatically into coherent segments. Our approach enlists both short-range and
long-range language models to help it sniff out likely sites of topic changes
in text. To aid its search, the system consults a set of simple lexical hints
it has learned to associate with the presence of boundaries through inspection
of a large corpus of annotated data. We also propose a new probabilistically
motivated error metric for use by the natural language processing and
information retrieval communities, intended to supersede precision and recall
for appraising segmentation algorithms. Qualitative assessment of our algorithm
as well as evaluation using this new metric demonstrate the effectiveness of
our approach in two very different domains, Wall Street Journal articles and
the TDT Corpus, a collection of newswire articles and broadcast news
transcripts.Comment: 12 pages, LaTeX source and postscript figures for EMNLP-2 pape
Recommended from our members
Wireless audio networking modifying the IEEE 802.11 standard to handle multi-channel real-time wireless audio networks
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel UniversityAudio networking is a rapidly increasing field which introduces new exiting possibilities for the professional audio industry. When well established, it will drastically change the way live sound systems will be designed, built and used. Today's networks have enough bandwidth that enables them to transfer hundreds of high quality audio channels, replacing analogue cables and intricate installations of conventional analogue audio systems. Currently there are many systems in the market that distribute audio over networks for live music and studio applications, but this technology is not yet widespread. The reasons that audio networks are not as popular as it was expected are mainly the lack of interoperability between different vendors and still, the need of a wired network infrastructure. Therefore, the development of a wireless digital audio networking system based on the existing widespread wireless technology is a major research challenge. However, the ΙΕΕΕ 802.11 standard, which is the primary wireless networking technology today, appears to be unable to handle this type of application despite the large bandwidth available. Apart from the well-known drawbacks of interference and security, encountered in all wireless data transmission systems, the way that ΙΕΕΕ 802.11 arbitrates the wireless channel access causes significantly high collision rate, low throughput and long overall delay. The aim of this research was to identify the causes that impede this technology to support real time wireless audio networks and to propose possible solutions. Initially the standard was tested thoroughly using a data traffic model which emulates a multi-channel real time audio environment. Broadcasting was found to be the optimal communication method, in order to satisfy the intolerance of live audio, when it comes to delay. The results were analysed and the drawback was identified in the hereditary weakness of the IEEE 802.11 standard to manage broadcasting, from multiple sources in the same network. To resolve this, a series of modifications was proposed for the Medium Access Control algorithm of the standard. First, the extended use of the "CTS-to-Self" control message was introduced in order to act as a protection mechanism in broadcasting, similar to the RTC/CTS protection mechanism, already used in unicast transmission. Then, an alternative "random backoff" method was proposed taking into account the characteristics of live audio wireless networks. For this method a novel "Exclusive Backoff Number Allocation" (EBNA) algorithm was designed aiming to minimize collisions. The results showed that significant improvement in throughput can be achieved using the above modifications but further improvement was needed, when it comes to delay, in order to reach the internationally accepted standards for real time audio delivery. Thus, a traffic adaptive version of the EBNA algorithm was designed. This algorithm monitors the traffic in the network, calculates the probability of collision and accordingly switches between classic IEEE 802.11 MAC and EBNA which is applied only between active stations, rather than to all stations in the network. All amendments were designed to operate as an alternative mode of the existing technology rather as an independent proprietary system. For this reason interoperability with classic IEEE 802.11 was also tested and analysed at the last part of this research. The results showed that the IEEE 802.11 standard, suitably modified, is able to support multiple broadcasting transmission and therefore it can be the platform upon which, the future wireless audio networks will be developed
An efficient multichannel wireless sensor networks MAC protocol based on IEEE 802.11 distributed co-ordinated function.
This research aimed to create new knowledge and pioneer a path in the area relating to future trends in the WSN, by resolving some of the issues at the MAC layer in Wireless Sensor Networks. This work introduced a Multi-channel Distributed Coordinated Function (MC-DCF) which takes advantage of multi-channel assignment. The backoff algorithm of the IEEE 802.11 distributed coordination function (DCF) was modified to invoke channel switching, based on threshold criteria in order to improve the overall throughput for wireless sensor networks.
This work commenced by surveying different protocols: contention-based MAC protocols, transport layer protocols, cross-layered design and multichannel multi-radio assignments. A number of existing protocols were analysed, each attempting to resolve one or more problems faced by the current layers.
The 802.15.4 performed very poorly at high data rate and at long range. Therefore 802.15.4 is not suitable for sensor multimedia or surveillance system with streaming data for future multichannel multi-radio systems.
A survey on 802.11 DCF - which was designed mainly for wireless networks –supports and confirm that it has a power saving mechanism which is used to synchronise nodes. However it uses a random back-off mechanism that cannot provide deterministic upper bounds on channel access delay and as such cannot support real-time traffic. The weaknesses identified by surveying this protocol form the backbone of this thesis
The overall aim for this thesis was to introduce multichannel with single radio as a new paradigm for IEEE 802.11 Distributed Coordinated Function (DCF) in wireless sensor networks (WSNs) that is used in a wide range of applications, from military application, environmental monitoring, medical care, smart buildings and other industry and to extend WSNs with multimedia capability which sense for instance sounds or motion, video sensor which capture video events of interest.
Traditionally WSNs do not need high data rate and throughput, since events are normally captured periodically. With the paradigm shift in technology, multimedia streaming has become more demanding than data sensing applications as such the need for high data rate protocol for WSN which is an emerging technology in this area. The IEEE 802.11 can support data rates up to 54Mbps and 802.11 DCF was designed specifically for use in wireless networks.
This thesis focused on designing an algorithm that applied multichannel to IEEE 802.11 DCF back-off algorithm to reduce the waiting time of a node and increase throughput when attempting to access the medium. Data collection in WSN tends to suffer from heavy congestion especially nodes nearer to the sink node. Therefore, this thesis proposes a contention based MAC protocol to address this problem from the inspiration of the 802.11 DCF backoff algorithm resulting from a comparison of IEEE 802.11 and IEEE 802.15.4 for Future Green Multichannel Multi-radio Wireless Sensor Networks
Innovative energy-efficient wireless sensor network applications and MAC sub-layer protocols employing RTS-CTS with packet concatenation
of energy-efficiency as well as the number of available applications. As a consequence there
are challenges that need to be tackled for the future generation of WSNs. The research work
from this Ph.D. thesis has involved the actual development of innovative WSN applications contributing
to different research projects. In the Smart-Clothing project contributions have been
given in the development of a Wireless Body Area Network (WBAN) to monitor the foetal movements
of a pregnant woman in the last four weeks of pregnancy. The creation of an automatic
wireless measurement system for remotely monitoring concrete structures was an contribution
for the INSYSM project. This was accomplished by using an IEEE 802.15.4 network enabling for
remotely monitoring the temperature and humidity within civil engineering structures. In the
framework of the PROENEGY-WSN project contributions have been given in the identification
the spectrum opportunities for Radio Frequency (RF) energy harvesting through power density
measurements from 350 MHz to 3 GHz. The design of the circuits to harvest RF energy
and the requirements needed for creating a WBAN with electromagnetic energy harvesting and
Cognitive Radio (CR) capabilities have also been addressed. A performance evaluation of the
state-of-the art of the hardware WSN platforms has also been addressed. This is explained by
the fact that, even by using optimized Medium Access Control (MAC) protocols, if the WSNs
platforms do not allow for minimizing the energy consumption in the idle and sleeping states,
energy efficiency and long network lifetime will not be achieved.
The research also involved the development of new innovative mechanisms that tries and solves
overhead, one of the fundamental reasons for the IEEE 802.15.4 standard MAC inefficiency. In
particular, this Ph.D. thesis proposes an IEEE 802.15.4 MAC layer performance enhancement by
employing RTS/CTS combined with packet concatenation. The results have shown that the use
of the RTS/CTS mechanism improves channel efficiency by decreasing the deferral time before
transmitting a data packet. In addition, the Sensor Block Acknowledgment MAC (SBACK-MAC)
protocol has been proposed that allows the aggregation of several acknowledgment responses
in one special Block Acknowledgment (BACK) Response packet. Two different solutions are
considered. The first one considers the SBACK-MAC protocol in the presence of BACK Request
(concatenation) while the second one considers the SBACK-MAC in the absence of BACK Request
(piggyback). The proposed solutions address a distributed scenario with single-destination and
single-rate frame aggregation. The throughput and delay performance is mathematically derived
under both ideal conditions (a channel environment with no transmission errors) and non
ideal conditions (a channel environment with transmission errors). An analytical model is proposed,
capable of taking into account the retransmission delays and the maximum number of
backoff stages. The simulation results successfully validate our analytical model. For more
than 7 TX (aggregated packets) all the MAC sub-layer protocols employing RTS/CTS with packet
concatenation allows for the optimization of channel use in WSNs, v8-48 % improvement in the
maximum average throughput and minimum average delay, and decrease energy consumption
Decision Tree-based Syntactic Language Modeling
Statistical Language Modeling is an integral part of many natural language processing applications, such as Automatic Speech Recognition (ASR) and Machine Translation. N-gram language models dominate the field, despite having an extremely shallow view of language---a Markov chain of words. In this thesis, we develop and evaluate a joint language model that incorporates syntactic and lexical information in a effort to ``put language back into language modeling.'' Our main goal is to demonstrate that such a model is not only effective but can be made scalable and tractable. We utilize decision trees to tackle the problem of sparse parameter estimation which is exacerbated by the use of syntactic information jointly with word context. While decision trees have been previously applied to language modeling, there has been little analysis of factors affecting decision tree induction and probability estimation for language modeling. In this thesis, we analyze several aspects that affect decision tree-based language modeling, with an emphasis on syntactic language modeling. We then propose improvements to the decision tree induction algorithm based on our analysis, as well as the methods for constructing forest models---models consisting of multiple decision trees. Finally, we evaluate the impact of our syntactic language model on large scale Speech Recognition and Machine Translation tasks.
In this thesis, we also address a number of engineering problems associated with the joint syntactic language model in order to make it tractable. Particularly, we propose a novel decoding algorithm that exploits the decision tree structure to eliminate unnecessary computation. We also propose and evaluate an approximation of our syntactic model by word n-grams---the approximation that makes it possible to incorporate our model directly into the CDEC Machine Translation decoder rather than using the model for rescoring hypotheses produced using an n-gram model
Argument Mining with Structured SVMs and RNNs
We propose a novel factor graph model for argument mining, designed for
settings in which the argumentative relations in a document do not necessarily
form a tree structure. (This is the case in over 20% of the web comments
dataset we release.) Our model jointly learns elementary unit type
classification and argumentative relation prediction. Moreover, our model
supports SVM and RNN parametrizations, can enforce structure constraints (e.g.,
transitivity), and can express dependencies between adjacent relations and
propositions. Our approaches outperform unstructured baselines in both web
comments and argumentative essay datasets.Comment: Accepted for publication at ACL 2017. 11 pages, 5 figures. Code at
https://github.com/vene/marseille and data at http://joonsuk.org
Posterior Regularization for Learning with Side Information and Weak Supervision
Supervised machine learning techniques have been very successful for a variety of tasks and domains including natural language processing, computer vision, and computational biology. Unfortunately, their use often requires creation of large problem-specific training corpora that can make these methods prohibitively expensive. At the same time, we often have access to external problem-specific information that we cannot alway easily incorporate. We might know how to solve the problem in another domain (e.g. for a different language); we might have access to cheap but noisy training data; or a domain expert might be available who would be able to guide a human learner much more efficiently than by simply creating an IID training corpus. A key challenge for weakly supervised learning is then how to incorporate such kinds of auxiliary information arising from indirect supervision.
In this thesis, we present Posterior Regularization, a probabilistic framework for structured, weakly supervised learning. Posterior Regularization is applicable to probabilistic models with latent variables and exports a language for specifying constraints or preferences about posterior distributions of latent variables. We show that this language is powerful enough to specify realistic prior knowledge for a variety applications in natural language processing. Additionally, because Posterior Regularization separates model complexity from the complexity of structural constraints, it can be used for structured problems with relatively little computational overhead. We apply Posterior Regularization to several problems in natural language processing including word alignment for machine translation, transfer of linguistic resources across languages and grammar induction. Additionally, we find that we can apply Posterior Regularization to the problem of multi-view learning, achieving particularly good results for transfer learning. We also explore the theoretical relationship between Posterior Regularization and other proposed frameworks for encoding this kind of prior knowledge, and show a close relationship to Constraint Driven Learning as well as to Generalized Expectation Constraints
Performance improvement of ad hoc networks using directional antennas and power control
Au cours de la dernière décennie, un intérêt remarquable a été éprouvé en matière des réseaux ad hoc sans fil capables de s'organiser sans soutien des infrastructures. L'utilisation potentielle d'un tel réseau existe dans de nombreux scénarios, qui vont du génie civil et secours en cas de catastrophes aux réseaux de capteurs et applications militaires. La Fonction de coordination distribuée (DCF) du standard IEEE 802.11 est le protocole dominant des réseaux ad hoc sans fil. Cependant, la méthode DCF n'aide pas à profiter efficacement du canal partagé et éprouve de divers problèmes tels que le problème de terminal exposé et de terminal caché. Par conséquent, au cours des dernières années, de différentes méthodes ont été développées en vue de régler ces problèmes, ce qui a entraîné la croissance de débits d'ensemble des réseaux. Ces méthodes englobent essentiellement la mise au point de seuil de détecteur de porteuse, le remplacement des antennes omnidirectionnelles par des antennes directionnelles et le contrôle de puissance pour émettre des paquets adéquatement. Comparées avec les antennes omnidirectionnelles, les antennes directionnelles ont de nombreux avantages et peuvent améliorer la performance des réseaux ad hoc. Ces antennes ne fixent leurs énergies qu'envers la direction cible et ont une portée d'émission et de réception plus large avec la même somme de puissance. Cette particularité peut être exploitée pour ajuster la puissance d'un transmetteur en cas d'utilisation d'une antenne directionnelle. Certains protocoles de contrôle de puissance directionnel MAC ont été proposés dans les documentations. La majorité de ces suggestions prennent seulement la transmission directionnelle en considération et, dans leurs résultats de simulation, ces études ont l'habitude de supposer que la portée de transmission des antennes omnidirectionnelles et directionnelles est la même. Apparemment, cette supposition n'est pas toujours vraie dans les situations réelles. De surcroît, les recherches prenant l'hétérogénéité en compte dans les réseaux ad hoc ne sont pas suffisantes. Le présent mémoire est dédié à proposer un protocole de contrôle de puissance MAC pour les réseaux ad hoc avec des antennes directionnelles en prenant tous ces problèmes en considération. ______________________________________________________________________________ MOTS-CLÉS DE L’AUTEUR : Réseaux ad hoc, Antennes directives, Contrôle de puissance
- …