55 research outputs found
Text Augmentation: Inserting markup into natural language text with PPM Models
This thesis describes a new optimisation and new heuristics for automatically marking up XML documents. These are implemented in CEM, using PPMmodels. CEM is significantly more general than previous systems, marking up large numbers of hierarchical tags, using n-gram models for large n and a variety of escape methods.
Four corpora are discussed, including the bibliography corpus of 14682 bibliographies laid out in seven standard styles using the BIBTEX system and markedup in XML with every field from the original BIBTEX. Other corpora include the ROCLING Chinese text segmentation corpus, the Computists’ Communique corpus and the Reuters’ corpus. A detailed examination is presented of the methods of evaluating mark up algorithms, including computation complexity measures and correctness measures from the fields of information retrieval, string processing, machine learning and information theory.
A new taxonomy of markup complexities is established and the properties of each taxon are examined in relation to the complexity of marked-up documents. The performance of the new heuristics and optimisation is examined using the four corpora
Recommended from our members
The Corpus Expansion Toolkit: finding what we want on the web
This thesis presents the Corpus Expansion Toolkit (CET), a generally applicable toolkit that allows researchers to build domain-specific corpora from the web. The main purpose of the work presented in this thesis and the development of the CET is to provide a solution to discovering desired content on the web from possibly unknown locations or a poorly defined domain. Using an iterative process, the CET is able to solve the problem of discovering domain-specific online content and expand a corpus using only a very small number of example documents or characteristic phrases taken from the target domain. Using a human-in-the-loop strategy and a chain of discrete software components the CET also allows the concept of a domain to be iteratively defined using the very online resources used to expand the original corpus. The CET combines feature extraction, search, web crawling and machine learning methods to collected, store, filter and perform information extraction on collected documents. Using a small number of example ‘seed’ documents the CET is able to expand the original corpus by finding more relevant documents from the web and provide a number of tools to support their analysis. This thesis presents a case study-based methodology that introduces the various contributions and components of the CET through the discussion of five case studies covering a wide variety of domains and requirements that the CET has been applied. These case studies hope to illustrate three main use cases, listed below, where the CET is applicable:
1. Domain known – source known
2. Domain known – source unknown
3. Domain unknown – source unknown
First, use cases where the sites for document collection are known and the topic of research is clearly defined. Second, instances where the topic of research is clearly defined but where to find relevant documents on the web is unknown. Third, the most extreme use case, where the domain is poorly defined or unknown to the researcher and the location of the information is also unknown. This thesis presents a solution that allows researchers to begin with very little information on a specific topic and iteratively build a clear conception of a domain and translate that to a computational system
Análise e optimização de sistemas de abastecimento de água
Mestrado em Engenharia MecânicaOs
res
entes
onsumos de água geram preo
upações rela
ionadas
om a sua
distribuição. A ne
essidade de fazer
hegar a água a
entros popula
ionais
impli
a elevados
ustos energéti
os e nan
eiros, pois não existe
ontrolo so-
bre o bombeamento de água para torres de abaste
imento ou reservatórios, a
partir das quais se disponibiliza água a uma população, serviços ou indústria.
A adaptação do bombeamento de água às tarifas energéti
as pode permitir
poupanças avultadas a quem exe
uta esse bombeamento. Este trabalho é
parte integrante de um proje
to de desenvolvimento de um software
apaz
de, através de modelação hidráuli
a e ferramentas matemáti
as, minimizar
os
ustos de bombeamento e
ontrolar as bombas do sistema de abaste
-
imento de água. Nesta dissertação foram implementados e testados dois
algoritmos de optimização para
omparar a
apa
idade de minimização de
ustos rela
ionados
om o bombeamento da água. Os métodos de optimiza-
ção sele
ionados foram o algoritmo L-BFGS-B (Limited memory algorithm
for bound
onstrained optimisation), um método de optimização
lássi
o,
e o algoritmo εDE (epsilon
onstrained Di erential Evolution), um método
metaheurísti
o. Os algoritmos sele
ionados foram testados em funções de
teste, tendo o algoritmo εDE obtido bons resultados em todas as funções
testadas, enquanto que o algoritmo L-BFGS-B in
orreu em di
uldades em
funções mais
omplexas. Os dois algoritmos foram testados em duas redes
ben
hmark distintas. Uma rede, denominada Rede Bási
a, de nida apenas
pelos elementos essen
iais e uma rede malhada denominada rede Walski
489, mais
omplexa, que in
lui duas bombas. Em ambas as redes ben
h-
mark testadas foram obtidas reduções de
ustos por ambos os algoritmos
implementados. O algoritmo L-BFGS-B provou ser o mais rápido dos al-
goritmos implementados, enquanto que o algoritmo εDE obteve resultados
superiores para a rede mais
omplexa (rede Walski). Este algoritmo, devido
ao fa
to de testar a violação das restrições em primeiro lugar este tem maior
tendên
ia a produzir resultados que não violem essas
ondições. Foi
riada
uma interfa
e grá
a que permite o
ontrolo do pro
esso e a apresentação
de dados resultantes das optimizações efe
tuadas.In
reasing water
onsumption generates growing
on
ern mainly related to
its distribution. The need to get the water to population
entres implies
high energy
onsumptions and
osts, be
ause there is no
ontrol over the
pumping of water to supply water towers and reservoirs, from whi
h water
is distributed to the population and other servi
es or industry. Suiting the
pumping of water having into a
ount energeti
tari s would allow for high
nan
ial savings to those who pump water. The present work is part of a
urrent e ort to develop a software to a
hieve the alter, through modula-
tion of a Water Supply System and mathemati
al tools, minimizing pumping
osts via
ontrol of the pumps of the so
alled Water Supply System. In
this dissertation were implemented and tested two optimisation algorithms
to
ompare the ability to minimize the
osts asso
iated with pumping water.
The sele
ted optimisation methods were the L-BFGS-B (Limited memory
algorithm for bound
onstrained optimisation), a
lassi
al optimisation algo-
rithm, and the εDE (epsilon
onstrained Di erential Evolution), a heuristi
method. Both algorithms were tested in ben
hmarked fun
tions, with the
εDE able to provide good results in all fun
tions, while the L-BFGS-B algo-
rithm inferred problems with the more
omplex fun
tions. Both algorithms
were tested in pre-existent ben
hmarked water networks. One of the net-
works, denominated Basi
Network, simple in nature and with only one pump.
The other network, denominated Walski Network, more
omplex, and with
2 water pumps. Cost redu
tions were attained with both methods in both
ben
hmarked water networks. The L-BFGS-B algorithm was the fastest of
the
ompared algorithms, while the εDE algorithm obtained better results
than the L-BFGS-B in the Walski Network. The εDE algorithm is the more
assuring to respe
t the
onstrains imposed to the networks, as it testes the
onstraint violations separately. A Graphi
al User Interfa
e was
reated to
ontrol the pro
ess and display the results obtained from optimisation
New tools and specification languages for biophysically detailed neuronal network modelling
Increasingly detailed data are being gathered on the molecular, electrical and anatomical properties of neuronal systems both in vitro and in vivo. These range from the kinetic properties and distribution of ion channels, synaptic plasticity mechanisms, electrical activity in neurons, and detailed anatomical connectivity within neuronal microcircuits from connectomics data. Publications describing these experimental results often set them in the context of higher level network behaviour. Biophysically detailed computational modelling provides a framework for consolidating these data, for quantifying the assumptions about underlying biological mechanisms, and for ensuring consistency in the explanation of the phenomena across scales. Such multiscale biophysically detailed models are not currently in wide- spread use by the experimental neuroscience community however. Reasons for this include the relative inaccessibility of software for creating these models, the range of specialised scripting languages used by the available simulators, and the difficulty in creating and managing large scale network simulations. This thesis describes new solutions to facilitate the creation, simulation, analysis and reuse of biophysically detailed neuronal models. The graphical application neuroConstruct allows detailed cell and network models to be built in 3D, and run on multiple simulation platforms without detailed programming knowledge. NeuroML is a simulator independent language for describing models containing detailed neuronal morphologies, ion channels, synapses, and 3D network connectivity. New solutions have also been developed for creating and analysing network models at much closer to biological scale on high performance computing platforms. A number of detailed neocortical, cerebellar and hippocampal models have been converted for use with these tools. The tools and models I have developed have already started to be used for original scientific research. It is hoped that this work will lead to a more solid foundation for creating, validating, simulating and sharing ever more realistic models of neurons and networks
- …