268 research outputs found
Unsupervised learning of relation detection patterns
L'extracció d'informació és l'àrea del processament de llenguatge natural l'objectiu de la qual és l'obtenir dades
estructurades a partir de la informació rellevant continguda en fragments textuals.
L'extracció d'informació requereix una quantitat considerable de coneixement lingüístic. La especificitat d'aquest
coneixement suposa un inconvenient de cara a la portabilitat dels sistemes, ja que un canvi d'idioma, domini o estil té un
cost en termes d'esforç humà. Durant dècades, s'han aplicat tècniques d'aprenentatge automàtic per tal de superar aquest
coll d'ampolla de portabilitat, reduint progressivament la supervisió humana involucrada. Tanmateix, a mida que augmenta
la disponibilitat de grans col·leccions de documents, esdevenen necessàries aproximacions completament nosupervisades
per tal d'explotar el coneixement que hi ha en elles.
La proposta d'aquesta tesi és la d'incorporar tècniques de clustering a l'adquisició de patrons per a extracció d'informació,
per tal de reduir encara més els elements de supervisió involucrats en el procés En particular, el treball se centra en el
problema de la detecció de relacions. L'assoliment d'aquest objectiu final ha requerit, en primer lloc, el considerar les
diferents estratègies en què aquesta combinació es podia dur a terme; en segon lloc, el desenvolupar o adaptar algorismes
de clustering adequats a les nostres necessitats; i en tercer lloc, el disseny de procediments d'adquisició de patrons que
incorporessin la informació de clustering.
Al final d'aquesta tesi, havíem estat capaços de desenvolupar i implementar una aproximació per a l'aprenentatge de
patrons per a detecció de relacions que, utilitzant tècniques de clustering i un mínim de supervisió humana, és competitiu i
fins i tot supera altres aproximacions comparables en l'estat de l'art.Information extraction is the natural language processing area whose goal is to obtain structured data from the relevant
information contained in textual fragments.
Information extraction requires a significant amount of linguistic knowledge. The specificity of such knowledge supposes a
drawback on the portability of the systems, as a change of language, domain or style demands a costly human effort.
Machine learning techniques have been applied for decades so as to overcome this portability bottleneck¿progressively
reducing the amount of involved human supervision. However, as the availability of large document collections increases,
completely unsupervised approaches become necessary in order to mine the knowledge contained in them.
The proposal of this thesis is to incorporate clustering techniques into pattern learning for information extraction, in order to
further reduce the elements of supervision involved in the process. In particular, the work focuses on the problem of relation
detection. The achievement of this ultimate goal has required, first, considering the different strategies in which this
combination could be carried out; second, developing or adapting clustering algorithms suitable to our needs; and third,
devising pattern learning procedures which incorporated clustering information.
By the end of this thesis, we had been able to develop and implement an approach for learning of relation detection patterns
which, using clustering techniques and minimal human supervision, is competitive and even outperforms other comparable
approaches in the state of the art.Postprint (published version
Computer Aided Verification
The open access two-volume set LNCS 11561 and 11562 constitutes the refereed proceedings of the 31st International Conference on Computer Aided Verification, CAV 2019, held in New York City, USA, in July 2019. The 52 full papers presented together with 13 tool papers and 2 case studies, were carefully reviewed and selected from 258 submissions. The papers were organized in the following topical sections: Part I: automata and timed systems; security and hyperproperties; synthesis; model checking; cyber-physical systems and machine learning; probabilistic systems, runtime techniques; dynamical, hybrid, and reactive systems; Part II: logics, decision procedures; and solvers; numerical programs; verification; distributed systems and networks; verification and invariants; and concurrency
Anomaly Detection with Complex Data Structures
Identifying anomalies with complex patterns is different from the conventional anomaly detection problem. Firstly, for cross-modal anomaly detection problems, a large portion of data instances within a multi-modal context is often not anomalous when they are viewed separately in each modality, but they present abnormal patterns or behaviors when multiple sources of information are jointly considered and analyzed. Secondly, for the attribution network anomaly detection problem, the definition of anomaly becomes more complicated and obscure. Apart from anomalous nodes whose nodal attributes are rather different from the majority reference nodes from a global perspective, nodes with nodal attributes deviate remarkably from their communities are also considered to be anomalies. Thirdly, given a specific task with the different data structures, the process of building a suitable and high-quality deep learning-based outlier detection system still highly relies on human expertise and laboring trials. It is also necessary to automatically search the suitable outlier detection models for different tasks. In this dissertation, we made a series of contributions to enable advanced anomaly detection techniques for complex data structures and discussing how to automatically design anomaly detection frameworks for various data structures
- …