154 research outputs found
Towards Advancing the Earthquake Forecasting by Machine Learning of Satellite Data
Earthquakes have become one of the leading causes of death from natural hazards in the last fifty years. Continuous efforts have been made to understand the physical characteristics of earthquakes and the interaction between the physical hazards and the environments so that appropriate warnings may be generated before earthquakes strike. However, earthquake forecasting is not trivial at all. Reliable forecastings should include the analysis and the signals indicating the coming of a significant quake. Unfortunately, these signals are rarely evident before earthquakes occur, and therefore it is challenging to detect such precursors in seismic analysis. Among the available technologies for earthquake research, remote sensing has been commonly used due to its unique features such as fast imaging and wide image-acquisition range. Nevertheless, early studies on pre-earthquake and remote-sensing anomalies are mostly oriented towards anomaly identification and analysis of a single physical parameter. Many analyses are based on singular events, which provide a lack of understanding of this complex natural phenomenon because usually, the earthquake signals are hidden in the environmental noise. The universality of such analysis still is not being demonstrated on a worldwide scale. In this paper, we investigate physical and dynamic changes of seismic data and thereby develop a novel machine learning method, namely Inverse Boosting Pruning Trees (IBPT), to issue short-term forecast based on the satellite data of 1371 earthquakes of magnitude six or above due to their impact on the environment. We have analyzed and compared our proposed framework against several states of the art machine learning methods using ten different infrared and hyperspectral measurements collected between 2006 and 2013. Our proposed method outperforms all the six selected baselines and shows a strong capability in improving the likelihood of earthquake forecasting across different earthquake databases
CREATE: Clinical Record Analysis Technology Ensemble
In this thesis, we describe an approach that won a psychiatric symptom severity prediction challenge. The challenge was to correctly predict the severity of psychiatric symptoms on a 4-point scale. Our winning submission uses a novel stacked machine learning architecture in which (i) a base data ingestion/cleaning step was followed by the (ii) derivation of a base set of features defined using text analytics, after which (iii) association rule learning was used in a novel way to generate new features, followed by a (iv) feature selection step to eliminate irrelevant features, followed by a (v) classifier training algorithm in which a total of 22 classifiers including new classifier variants of AdaBoost and RandomForest were trained on seven different data views, and (vi) finally an ensemble learning step, in which ensembles of best learners were used to improve on the accuracy of individual learners. All of this was tested via standard 10-fold cross-validation on training data provided by the N-GRID challenge organizers, of which the three best ensembles were selected for submission to N-GRID\u27s blind testing. The best of our submitted solutions garnered an overall final score of 0.863 according to the organizer\u27s measure. All 3 of our submissions placed within the top 10 out of the 65 total submissions. The challenge constituted Track 2 of the 2016 Centers of Excellence in Genomic Science (CEGS) Neuropsychiatric Genome-Scale and RDOC Individualized Domains (N-GRID) Shared Task in Clinical Natural Language Processing
Detecting Events and Patterns in Large-Scale User Generated Textual Streams with Statistical Learning Methods
A vast amount of textual web streams is influenced by events or phenomena
emerging in the real world. The social web forms an excellent modern paradigm,
where unstructured user generated content is published on a regular basis and
in most occasions is freely distributed. The present Ph.D. Thesis deals with
the problem of inferring information - or patterns in general - about events
emerging in real life based on the contents of this textual stream. We show
that it is possible to extract valuable information about social phenomena,
such as an epidemic or even rainfall rates, by automatic analysis of the
content published in Social Media, and in particular Twitter, using Statistical
Machine Learning methods. An important intermediate task regards the formation
and identification of features which characterise a target event; we select and
use those textual features in several linear, non-linear and hybrid inference
approaches achieving a significantly good performance in terms of the applied
loss function. By examining further this rich data set, we also propose methods
for extracting various types of mood signals revealing how affective norms - at
least within the social web's population - evolve during the day and how
significant events emerging in the real world are influencing them. Lastly, we
present some preliminary findings showing several spatiotemporal
characteristics of this textual information as well as the potential of using
it to tackle tasks such as the prediction of voting intentions.Comment: PhD thesis, 238 pages, 9 chapters, 2 appendices, 58 figures, 49
table
ANALYZING CUSTOMER REVIEWS IN TURKISH USING MACHINE LEARNING AND DATA SCIENCE METHODOLOGIES
Digital life, especially after the introduction of Web 2.0, has significantly altered
human relations, providing all people the “right of public speech”. Ideas, emotions,
and opinions on many topics are generously shared in virtual environments. A new age
global and digital Mouth of World is shaping the society where knowledge is the most
influential power. Being fed by social media data highly dynamic in either amount or
shape, automatic handling is indispensable.
Natural Language Processing, in cooperation with Machine Language techniques, has
an important say in analyzing written textual data. Traditional techniques exploited in
the literature are empowered when hybrid ones are applied, in accordance also with the
characteristic properties of the language used and the domain-specific data. Although
all the subsequent steps of the text classification chain are important, adequate feature
selecting has a notable huge impact on accurate classification prediction.
In this study, a simple classification of the sentiment polarity of comments in document
level of subjective texts in Turkish is done. Different domains include reviews of
customers towards company products, movies, and healthcare services, deciding on the
positivity or negativity of the comments. Another domain includes doctors’ notes on
patients’ symptoms aiming to predict and thus recommend some of the most often used
medical tests according to general doctors’ procedures.
The features used included a part of or all distinct words roots together with their
binary or frequency information. Linear or vector analysis of the feature sets was done
employing Machine Learning algorithms provided by the Weka tool. Hybrid features
set was proposed and found more efficient combining binary vectors and frequency
meta-features from nodes and leaves of J48 tree classifier for all or a set of correlation based selected features, improving both prediction accuracy and classification
performance
ALEC: Active learning with ensemble of classifiers for clinical diagnosis of coronary artery disease
Invasive angiography is the reference standard for coronary artery disease (CAD) diagnosis but is expensive and
associated with certain risks. Machine learning (ML) using clinical and noninvasive imaging parameters can be
used for CAD diagnosis to avoid the side effects and cost of angiography. However, ML methods require labeled
samples for efficient training. The labeled data scarcity and high labeling costs can be mitigated by active
learning. This is achieved through selective query of challenging samples for labeling. To the best of our
knowledge, active learning has not been used for CAD diagnosis yet. An Active Learning with Ensemble of
Classifiers (ALEC) method is proposed for CAD diagnosis, consisting of four classifiers. Three of these classifiers
determine whether a patient’s three main coronary arteries are stenotic or not. The fourth classifier predicts
whether the patient has CAD or not. ALEC is first trained using labeled samples. For each unlabeled sample, if the
outputs of the classifiers are consistent, the sample along with its predicted label is added to the pool of labeled
samples. Inconsistent samples are manually labeled by medical experts before being added to the pool. The
training is performed once more using the samples labeled so far. The interleaved phases of labeling and training
are repeated until all samples are labeled. Compared with 19 other active learning algorithms, ALEC combined
with a support vector machine classifier attained superior performance with 97.01% accuracy. Our method is
justified mathematically as well. We also comprehensively analyze the CAD dataset used in this paper. As part of
dataset analysis, features pairwise correlation is computed. The top 15 features contributing to CAD and stenosis
of the three main coronary arteries are determined. The relationship between stenosis of the main arteries is
presented using conditional probabilities. The effect of considering the number of stenotic arteries on sample
discrimination is investigated. The discrimination power over dataset samples is visualized, assuming each of the
three main coronary arteries as a sample label and considering the two remaining arteries as sample features
Knowledge Modelling and Learning through Cognitive Networks
One of the most promising developments in modelling knowledge is cognitive network science, which aims to investigate cognitive phenomena driven by the networked, associative organization of knowledge. For example, investigating the structure of semantic memory via semantic networks has illuminated how memory recall patterns influence phenomena such as creativity, memory search, learning, and more generally, knowledge acquisition, exploration, and exploitation. In parallel, neural network models for artificial intelligence (AI) are also becoming more widespread as inferential models for understanding which features drive language-related phenomena such as meaning reconstruction, stance detection, and emotional profiling. Whereas cognitive networks map explicitly which entities engage in associative relationships, neural networks perform an implicit mapping of correlations in cognitive data as weights, obtained after training over labelled data and whose interpretation is not immediately evident to the experimenter. This book aims to bring together quantitative, innovative research that focuses on modelling knowledge through cognitive and neural networks to gain insight into mechanisms driving cognitive processes related to knowledge structuring, exploration, and learning. The book comprises a variety of publication types, including reviews and theoretical papers, empirical research, computational modelling, and big data analysis. All papers here share a commonality: they demonstrate how the application of network science and AI can extend and broaden cognitive science in ways that traditional approaches cannot
The Convergence of Human and Artificial Intelligence on Clinical Care - Part I
This edited book contains twelve studies, large and pilots, in five main categories: (i) adaptive imputation to increase the density of clinical data for improving downstream modeling; (ii) machine-learning-empowered diagnosis models; (iii) machine learning models for outcome prediction; (iv) innovative use of AI to improve our understanding of the public view; and (v) understanding of the attitude of providers in trusting insights from AI for complex cases. This collection is an excellent example of how technology can add value in healthcare settings and hints at some of the pressing challenges in the field. Artificial intelligence is gradually becoming a go-to technology in clinical care; therefore, it is important to work collaboratively and to shift from performance-driven outcomes to risk-sensitive model optimization, improved transparency, and better patient representation, to ensure more equitable healthcare for all
- …