166 research outputs found
A Hybrid Continual Machine Learning Model for Efficient Hierarchical Classification of Domain-Specific Text in The Presence of Class Overlap (Case Study: IT Support Tickets)
In today’s world, support ticketing systems are employed by a wide range of businesses. The ticketing system facilitates the interaction between customers and the support teams when the customer faces an issue with a product or a service. For large-scale IT companies with a large number of clients and a great volume of communications, the task of automating the classification of incoming tickets is key to guaranteeing long-term clients and ensuring business growth.
Although the problem of text classification has been widely studied in the literature, the majority of the proposed approaches revolve around state-of-the-art deep learning models. This thesis addresses the following research questions: What are the reasons behind employing black box models (i.e., deep learning models) for text classification tasks? What is the level of polysemy (i.e., the coexistence of many possible meanings for a word or phrase) in a technical (i.e., specialized) text? How do static word embeddings like Word2vec fare against traditional TFIDF vectorization? How do dynamic word embeddings (e.g., PLMs) compare against a linear classifier such as Support Vector Machine (SVM) for classifying a domain-specific text?
This integrated article thesis aims to investigate the aforementioned issues through five empirical studies that were conducted over the past four years. The observation of our studies is an emerging theory that demonstrates why traditional ML models offer a more efficient solution to domain-specific text classification compared to state-of-the-art DL language models (i.e., PLMs).
Based on extensive experiments on a real-world dataset, we propose a novel Hybrid Online Offline Model (HOOM) that can efficiently classify IT Support Tickets in a real-time (i.e., dynamic) environment. Our classification model is anticipated to build trust and confidence when deployed into production as the model is interpretable, efficient, and can detect concept drifts in the data
Deep learning applied to computational mechanics: A comprehensive review, state of the art, and the classics
Three recent breakthroughs due to AI in arts and science serve as motivation:
An award winning digital image, protein folding, fast matrix multiplication.
Many recent developments in artificial neural networks, particularly deep
learning (DL), applied and relevant to computational mechanics (solid, fluids,
finite-element technology) are reviewed in detail. Both hybrid and pure machine
learning (ML) methods are discussed. Hybrid methods combine traditional PDE
discretizations with ML methods either (1) to help model complex nonlinear
constitutive relations, (2) to nonlinearly reduce the model order for efficient
simulation (turbulence), or (3) to accelerate the simulation by predicting
certain components in the traditional integration methods. Here, methods (1)
and (2) relied on Long-Short-Term Memory (LSTM) architecture, with method (3)
relying on convolutional neural networks. Pure ML methods to solve (nonlinear)
PDEs are represented by Physics-Informed Neural network (PINN) methods, which
could be combined with attention mechanism to address discontinuous solutions.
Both LSTM and attention architectures, together with modern and generalized
classic optimizers to include stochasticity for DL networks, are extensively
reviewed. Kernel machines, including Gaussian processes, are provided to
sufficient depth for more advanced works such as shallow networks with infinite
width. Not only addressing experts, readers are assumed familiar with
computational mechanics, but not with DL, whose concepts and applications are
built up from the basics, aiming at bringing first-time learners quickly to the
forefront of research. History and limitations of AI are recounted and
discussed, with particular attention at pointing out misstatements or
misconceptions of the classics, even in well-known references. Positioning and
pointing control of a large-deformable beam is given as an example.Comment: 275 pages, 158 figures. Appeared online on 2023.03.01 at
CMES-Computer Modeling in Engineering & Science
Operational research:methods and applications
Throughout its history, Operational Research has evolved to include a variety of methods, models and algorithms that have been applied to a diverse and wide range of contexts. This encyclopedic article consists of two main sections: methods and applications. The first aims to summarise the up-to-date knowledge and provide an overview of the state-of-the-art methods and key developments in the various subdomains of the field. The second offers a wide-ranging list of areas where Operational Research has been applied. The article is meant to be read in a nonlinear fashion. It should be used as a point of reference or first-port-of-call for a diverse pool of readers: academics, researchers, students, and practitioners. The entries within the methods and applications sections are presented in alphabetical order
Computational Argumentation for the Automatic Analysis of Argumentative Discourse and Human Persuasion
Tesis por compendio[ES] La argumentación computacional es el área de investigación que estudia y analiza el uso de distintas técnicas y algoritmos que aproximan el razonamiento argumentativo humano desde un punto de vista computacional. En esta tesis doctoral se estudia el uso de distintas técnicas propuestas bajo el marco de la argumentación computacional para realizar un análisis automático del discurso argumentativo, y para desarrollar técnicas de persuasión computacional basadas en argumentos. Con estos objetivos, en primer lugar se presenta una completa revisión del estado del arte y se propone una clasificación de los trabajos existentes en el área de la argumentación computacional. Esta revisión nos permite contextualizar y entender la investigación previa de forma más clara desde la perspectiva humana del razonamiento argumentativo, así como identificar las principales limitaciones y futuras tendencias de la investigación realizada en argumentación computacional. En segundo lugar, con el objetivo de solucionar algunas de estas limitaciones, se ha creado y descrito un nuevo conjunto de datos que permite abordar nuevos retos y investigar problemas previamente inabordables (e.g., evaluación automática de debates orales). Conjuntamente con estos datos, se propone un nuevo sistema para la extracción automática de argumentos y se realiza el análisis comparativo de distintas técnicas para esta misma tarea. Además, se propone un nuevo algoritmo para la evaluación automática de debates argumentativos y se prueba con debates humanos reales. Finalmente, en tercer lugar se presentan una serie de estudios y propuestas para mejorar la capacidad persuasiva de sistemas de argumentación computacionales en la interacción con usuarios humanos. De esta forma, en esta tesis se presentan avances en cada una de las partes principales del proceso de argumentación computacional (i.e., extracción automática de argumentos, representación del conocimiento y razonamiento basados en argumentos, e interacción humano-computador basada en argumentos), así como se proponen algunos de los cimientos esenciales para el análisis automático completo de discursos argumentativos en lenguaje natural.[CA] L'argumentació computacional és l'àrea de recerca que estudia i analitza l'ús de distintes tècniques i algoritmes que aproximen el raonament argumentatiu humà des d'un punt de vista computacional. En aquesta tesi doctoral s'estudia l'ús de distintes tècniques proposades sota el marc de l'argumentació computacional per a realitzar una anàlisi automàtic del discurs argumentatiu, i per a desenvolupar tècniques de persuasió computacional basades en arguments. Amb aquestos objectius, en primer lloc es presenta una completa revisió de l'estat de l'art i es proposa una classificació dels treballs existents en l'àrea de l'argumentació computacional. Aquesta revisió permet contextualitzar i entendre la investigació previa de forma més clara des de la perspectiva humana del raonament argumentatiu, així com identificar les principals limitacions i futures tendències de la investigació realitzada en argumentació computacional. En segon lloc, amb l'objectiu de sollucionar algunes d'aquestes limitacions, hem creat i descrit un nou conjunt de dades que ens permet abordar nous reptes i investigar problemes prèviament inabordables (e.g., avaluació automàtica de debats orals). Conjuntament amb aquestes dades, es proposa un nou sistema per a l'extracció d'arguments i es realitza l'anàlisi comparativa de distintes tècniques per a aquesta mateixa tasca. A més a més, es proposa un nou algoritme per a l'avaluació automàtica de debats argumentatius i es prova amb debats humans reals. Finalment, en tercer lloc es presenten una sèrie d'estudis i propostes per a millorar la capacitat persuasiva de sistemes d'argumentació computacionals en la interacció amb usuaris humans. D'aquesta forma, en aquesta tesi es presenten avanços en cada una de les parts principals del procés d'argumentació computacional (i.e., l'extracció automàtica d'arguments, la representació del coneixement i raonament basats en arguments, i la interacció humà-computador basada en arguments), així com es proposen alguns dels fonaments essencials per a l'anàlisi automàtica completa de discursos argumentatius en llenguatge natural.[EN] Computational argumentation is the area of research that studies and analyses the use of different techniques and algorithms that approximate human argumentative reasoning from a computational viewpoint. In this doctoral thesis we study the use of different techniques proposed under the framework of computational argumentation to perform an automatic analysis of argumentative discourse, and to develop argument-based computational persuasion techniques. With these objectives in mind, we first present a complete review of the state of the art and propose a classification of existing works in the area of computational argumentation. This review allows us to contextualise and understand the previous research more clearly from the human perspective of argumentative reasoning, and to identify the main limitations and future trends of the research done in computational argumentation. Secondly, to overcome some of these limitations, we create and describe a new corpus that allows us to address new challenges and investigate on previously unexplored problems (e.g., automatic evaluation of spoken debates). In conjunction with this data, a new system for argument mining is proposed and a comparative analysis of different techniques for this same task is carried out. In addition, we propose a new algorithm for the automatic evaluation of argumentative debates and we evaluate it with real human debates. Thirdly, a series of studies and proposals are presented to improve the persuasiveness of computational argumentation systems in the interaction with human users. In this way, this thesis presents advances in each of the main parts of the computational argumentation process (i.e., argument mining, argument-based knowledge representation and reasoning, and argument-based human-computer interaction), and proposes some of the essential foundations for the complete automatic analysis of natural language argumentative discourses.This thesis has been partially supported by the Generalitat Valenciana project PROME-
TEO/2018/002 and by the Spanish Government projects TIN2017-89156-R and PID2020-
113416RB-I00.Ruiz Dolz, R. (2023). Computational Argumentation for the Automatic Analysis of Argumentative Discourse and Human Persuasion [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/194806Compendi
Sentiment Analysis on Twitter Data and Social Trends: The Case of Greek General Elections
Η ανάλυση συναισθήματος και εξόρυξη γνώμης (Sentiment Analysis-Opinion Mining) είναι η διαδικασία χρήσης επεξεργασίας φυσικής γλώσσας και διαφόρων τεχνικών (μηχανική μάθηση, λεξικά) για τον εντοπισμό και την εξαγωγή υποκειμενικών πληροφοριών από δεδομένα κειμένου. Χρησιμοποιείται συνήθως για τον προσδιορισμό του συνολικού συναισθήματος ενός κειμένου, όπως αν είναι θετικό, αρνητικό ή ουδέτερο.
Σκοπός της παρούσας Διπλωματικής Εργασίας είναι η ανάλυση του συναισθήματος σε δεδομένα του Twitter. Πιο συγκεκριμένα, εφαρμόστηκε μια προσέγγιση βασισμένη σε λεξικό για την ανάλυση του συναισθήματος σε κείμενο tweet που σχετίζεται με τις Βουλευτικές Εκλογές του 2019 στην Ελλάδα. Τα tweets είναι στην ελληνική γλώσσα και ταξινομούνται ως θετικά, αρνητικά και ουδέτερα με βάση το συνολικό συναίσθημα που εκφράζουν. Μέσω της ανάλυσης συναισθήματος στα σύνολα δεδομένων με τη χρήση της γλώσσας προγραμματισμού Python, εξάγουμε συμπεράσματα σχετικά με τις κοινωνικές τάσεις που αναπτύσσονται στο προεκλογικό twitter σε σχέση με τα έξι (6) πολιτικά κόμματα που εξέλεξαν βουλευτές σε αυτές τις εκλογές. Τα αποτελέσματα παρουσιάζονται με σαφείς οπτικοποιήσεις με τη χρήση του εργαλείου Tableau για πληρέστερη κατανόηση. Εκτός από την περιγραφή της υλοποίησης, παρουσιάζονται οι κυριότεροι περιορισμοί και οι προκλήσεις και δυσκολίες που προέκυψαν στην προσπάθεια επεξεργασίας της ελληνικής γλώσσας. Τέλος, επιχειρείται η να επισήμανση ορισμένων πτυχών της ανάλυσης συναισθήματος και εξόρυξης γνώμης που χρήζουν βελτίωσης, τόσο στη προτεινόμενη εφαρμογή που παρουσιάζεται εδώ όσο και σε άλλες υπάρχουσες.Sentiment analysis and Opinion Mining involve the process of using natural language processing and various techniques (machine learning, lexicons) to identify and extract subjective information from text data. Sentiment analysis and Opinion Mining are commonly used to determine the emotional tone of a piece of text, such as whether it is positive, negative, or neutral.
The purpose of the present Thesis is to analyze sentiment in Twitter data. More specifically, a lexicon-based approach has been implemented to analyze sentiment in tweet texts related to the 2019 general elections in Greece. The tweets are in the Greek language and are classified as positive, negative, and neutral based on the overall sentiment they express. Sentiment analysis implemented on the datasets using the Python programming language allows insights and conclusions about the social trends that develop in pre-election twitter in relation to the six (6) political parties that elected Members of Parliament (MPs) in the 2019 elections. The results are presented with visualizations using the Tableau tool targeting to a clear and more complete understanding.
In addition to the description of the implementation, the main challenges, limitations, and difficulties encountered in trying to process the Greek language are presented, along with aspects of the implementation that can be improved, as well as other existing issues in Sentiment analysis and Opinion Mining
Efficient Neural Methods for Coreference Resolution
Coreference resolution is a core task in natural language processing and in creating language technologies. Neural methods and models for automatically resolving references have emerged and developed over the last several years. This progress is largely marked by continuous improvements on a single dataset and metric. In this thesis, the assumptions that underlie these improvements are shown to be unrealistic for real-world use due to the computational and data tradeoffs made to achieve apparently high performance. The thesis outlines and proposes solutions to three issues. First, to address the growing memory requirements and restrictions on input document length, a novel, constant memory neural model for coreference resolution is proposed and shown to attain performance comparable to contemporary models. Second, to address the failure of these models to generalize across datasets, continued training is evaluated and shown to be successful for transferring coreference resolution models between domains and languages. Finally, to combat the gains obtained via the use of increasingly large pretrained language models, multitask model pruning can be applied to maintain a single (small) model for multiple datasets. These methods reduce the computational cost of running a model and the annotation cost of creating a model for any arbitrary dataset. As real-world applications continue to demand resolution of coreference, methods that reduce the technical cost of training new models and making predictions are greatly desired, which this thesis addresses
Data, deep learning and depression: can artificial neural networks learn risk factors for depression from genetic variants and radiology reports
Major Depressive Disorder (MDD) is a psychiatric disorder characterised by persistent
low mood and loss of enjoyment or interest. MDD affects around 1 in 8
people worldwide and is one of the leading causes of global disability. Studies have
found both genetic and environmental risk factors. In this thesis automated and
scalable models using artificial neural networks are used to analyse two sources of
data where risk factors can be found and quantified.
A number of genes have a small effect size on MDD, making MDD a polygenic
disease. To investigate polygenic diseases, we can analyse Single Nucleotide Polymorphisms
(SNPs), base pairs in DNA that commonly differ between individuals.
Genome wide association studies (GWAS) are used to quantify the association
between SNPs and MDD. From modelling these associations in combination, a
Polygenic Risk Score (PRS) can be devised, which quantifies an individual’s genetic
risk of MDD.
Through scanning the brain using CT or MRI, we can find evidence of disease,
including stroke and small vessel disease. A number of brain diseases have been
linked to subsequent development of MDD, and combined with genetics could give
a better overall risk prediction of developing MDD than either in isolation.
This thesis focuses on these two key biological disciplines in MDD research (genetics
and imaging) where deep learning, in the form of artificial neural networks,
might provide improvement on key problems in these fields. . Specific problems
are chosen due to their tractable nature and the ability to benchmark the new
techniques against the current state-of-the-art methods.
The first project of this thesis uses artificial neural networks that take as input
SNP genotypes and output a polygenic risk score for MDD. A number of hyperparameters
are tested, as well as different architectures. The best of these models,
as chosen by performance (measured using AUC) on a validation set, is then
compared on a held-out test set to existing methods including p-value threshold
and clump, SBayesR, and LDPred2.
The second project uses graph based neural networks, which introduce an additional
layer involving a graph, to add structure to the network computation.
This structure allows use of existing biological information, in this case data detailing
which SNPs act as expression quantitative trait loci (eQTL) for specific
genes. A number of graph networks are designed and tested, with the best of
these compared to the methods in the first project. Across both the first and second
project, the neural network models achieve an AUC, accuracy and Nagelkerke
R2 that are comparable to the best of the current methods tested. Additionally,
when using ensemble modelling the best performing models included both a neural
network based model as well as a summary statistics Bayesian model (LDPred2 or
SBayesR). This indicates the neural network models find information not used by
the best existing methods, and that an ensemble of models provides the highest
performance as defined using the above mentioned metrics.
The final project uses neuroradiology reports, which are written reports that
accompany radiology scans such as CT or MRI scans, and are used to describe
abnormalities that indicate disease. There is evidence that some of the diseases
observable in these scans are risk factors for MDD. Part of the processing of the reports
needed for further analysis is negation detection, which is the task of deciding
if a mention of disease (such as ischaemic stroke) indicates either presence of the
disease or lack of presence. An artificial neural network (NN) is developed for this
task, and its predictions are assessed against a gold standard labelled by domain
experts. The performance of the NN, measured using F1 score, is then compared
against that of a rule-based model developed on the same datasets as the NN, and
two state-of-the-art rule-based models developed on different datasets. The NN
achieves similar performance to the other models, and outperforms the rule-based
models not developed on our datasets. Neural networks have previously shown
a greater adaptability to new datasets than rule-based methods, thereby demonstrating
a potential advantage over rule-based models in transferability between
data sources, such as different health boards or studies.
The work on this final project has contributed to enabling the automatic annotation
of a much larger dataset with increased accuracy. Using this larger dataset
further analysis has linked hypertension with increased risk of stroke, as well as
baseline depression with increased risk of cerebral small vessel disease. Additionally,
approval for access to electronic health records for the entire Scottish population
has been granted, and this has been made possible because of the utility and
effectiveness of the machine learning approaches.
Overall, the deep learning (artificial neural networks) models developed in this
thesis are stronger on the negation detection task than the polygenic risk scoring
task, performing well against all the models tested and proving useful for processing
large datasets for future work.
The models developed for assessing genetic risk of MDD currently have more
limited use, but deliver results that are comparable to current methods, particularly
when summary statistics aren’t available. Additionally, the performance,
using AUC and Nagelkerke R2, of the ensemble models indicates the NN models
find information in the data unused by the other methods, indicating potential
for providing future mechanistic insights. While there are a number of challenges
preventing improvements in the predictive performance of NN models, larger samples
of individuals with MDD with contemporaneous imaging and genetic data are
likely to lead to improvements for these models when used for predictive analytics.
This thesis represents a beginning of the work possible with deep learning for
MDD research, and these experiments are just a subset of the potential problems
where deep learning may provide benefit. The methods used here have the potential
to lead to more accurate prediction, further mechanistic insights, and better
automation of dataset processing and creation for a number of other problems and
challenges in MDD research
Recommended from our members
Toward Annotation Efficiency in Biased Learning Settings for Natural Language Processing
The goal of this thesis is to improve the feasibility of building applied NLP systems for more diverse and niche real-world use-cases of extracting structured information from text. A core factor in determining this feasibility is the cost of manually annotating enough unbiased labeled data to achieve a desired level of system accuracy, and our goal is to reduce this cost. We focus on reducing this cost by making contributions in two directions: (1) easing the annotation burden by leveraging high-level expert knowledge in addition to labeled examples, thus making approaches more annotation-efficient; and (2) mitigating known biases in cheaper, imperfectly labeled real-world datasets so that we may use them to our advantage. A central theme of this thesis is that high-level expert knowledge about the data and task can allow for biased labeling processes that focus experts on only manually labeling aspects of the data that cannot be easily labeled through cheaper means. This combination allows for more accurate models with less human effort. We conduct our research on this general topic through three diverse problems with immediate applications to real-world settings.
First, we study an applied problem in biased text classification. We encounter a rare-event text classification system that has been deployed for several years. We are tasked with improving this system's performance using only the severely biased incidental feedback provided by the experts over years of system use. We develop a method that combines importance weighting and an unlabeled data imputation scheme that exploits the selection-bias of the feedback to train an unbiased classifier without requiring additional labeled data. We experimentally demonstrate that this method considerably improves the system performance.
Second, we tackle an applied problem in named entity recognition (NER) concerning learning tagging models from data that have very low recall for annotated entities. To solve this issue we propose a novel loss, the Expected Entity Ratio (EER), that uses an uncertain estimate of the proportion of entities in the data to counteract the false-negative bias in the data, encouraging the model to have the correct ratio of entities in expectation. We justify the principles of our approach by providing theory that shows it recovers the true tagging distribution under mild conditions. Additionally we provide extensive empirical results that show it to be practically useful. Empirically, we find that it meets or exceeds performance of state-of-the-art baselines across a variety of languages, annotation scenarios, and amounts of labeled data. We also show that, when combined with our approach, a novel sparse annotation scheme can outperform exhaustive annotation for modest annotation budgets.
Third, we study the challenging problem of syntactic parsing in low-resource languages. We approach the problem from a cross-lingual perspective, building on a state-of-the-art transfer-learning approach that underperforms on ``distant'' languages that have little to no representation in the training corpus. Motivated by the field of syntactic typology, we introduce a general method called Expected Statistic Regularization (ESR) to regularize the parser on distant languages according to their expected typological syntax statistics. We also contribute general approaches for estimating the loss supervision parameters from the task formalism or small amounts of labeled data. We present seven broad classes of descriptive statistic families and provide extensive experimental evidence showing that using these statistics for regularization is complementary to deep learning approaches in low-resource transfer settings.
In conclusion, this thesis contributes approaches for reducing the annotation cost of building applied NLP systems through the use of high-level expert knowledge to impart additional learning signal on models and cope with cheaper biased data. We publish implementations of our methods and results, so that they may facilitate future research and applications. It is our hope that the frameworks proposed in this thesis will help to democratize access to NLP for producing structured information from text in wider-reaching applications by making them faster and cheaper to build
- …