34,129 research outputs found
Population-aware Hierarchical Bayesian Domain Adaptation
Population attributes are essential in health for understanding who the data
represents and precision medicine efforts. Even within disease infection
labels, patients can exhibit significant variability; "fever" may mean
something different when reported in a doctor's office versus from an online
app, precluding directly learning across different datasets for the same
prediction task. This problem falls into the domain adaptation paradigm.
However, research in this area has to-date not considered who generates the
data; symptoms reported by a woman versus a man, for example, could also have
different implications. We propose a novel population-aware domain adaptation
approach by formulating the domain adaptation task as a multi-source
hierarchical Bayesian framework. The model improves prediction in the case of
largely unlabelled target data by harnessing both domain and population
invariant information.Comment: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018
arXiv:1811.0721
Adaptation of WASH Services Delivery to Climate Change and Other Sources of Risk and Uncertainty
This report urges WASH sector practitioners to take more seriously the threat of climate change and the consequences it could have on their work. By considering climate change within a risk and uncertainty framework, the field can use the multitude of approaches laid out here to adequately protect itself against a range of direct and indirect impacts. Eleven methods and tools for this specific type of risk management are described, including practical advice on how to implement them successfully
Hybrid Recommender Systems: A Systematic Literature Review
Recommender systems are software tools used to generate and provide
suggestions for items and other entities to the users by exploiting various
strategies. Hybrid recommender systems combine two or more recommendation
strategies in different ways to benefit from their complementary advantages.
This systematic literature review presents the state of the art in hybrid
recommender systems of the last decade. It is the first quantitative review
work completely focused in hybrid recommenders. We address the most relevant
problems considered and present the associated data mining and recommendation
techniques used to overcome them. We also explore the hybridization classes
each hybrid recommender belongs to, the application domains, the evaluation
process and proposed future research directions. Based on our findings, most of
the studies combine collaborative filtering with another technique often in a
weighted way. Also cold-start and data sparsity are the two traditional and top
problems being addressed in 23 and 22 studies each, while movies and movie
datasets are still widely used by most of the authors. As most of the studies
are evaluated by comparisons with similar methods using accuracy metrics,
providing more credible and user oriented evaluations remains a typical
challenge. Besides this, newer challenges were also identified such as
responding to the variation of user context, evolving user tastes or providing
cross-domain recommendations. Being a hot topic, hybrid recommenders represent
a good basis with which to respond accordingly by exploring newer opportunities
such as contextualizing recommendations, involving parallel hybrid algorithms,
processing larger datasets, etc.Comment: 38 pages, 9 figures, 14 tables. The final authenticated version is
available online at
https://content.iospress.com/articles/intelligent-data-analysis/ida16320
"In vivo" spam filtering: A challenge problem for data mining
Spam, also known as Unsolicited Commercial Email (UCE), is the bane of email
communication. Many data mining researchers have addressed the problem of
detecting spam, generally by treating it as a static text classification
problem. True in vivo spam filtering has characteristics that make it a rich
and challenging domain for data mining. Indeed, real-world datasets with these
characteristics are typically difficult to acquire and to share. This paper
demonstrates some of these characteristics and argues that researchers should
pursue in vivo spam filtering as an accessible domain for investigating them
A decision support methodology to enhance the competitiveness of the Turkish automotive industry
This is the post-print (final draft post-refereeing) version of the article. Copyright @ 2013 Elsevier B.V. All rights reserved.Three levels of competitiveness affect the success of business enterprises in a globally competitive environment: the competitiveness of the company, the competitiveness of the industry in which the company operates and the competitiveness of the country where the business is located. This study analyses the competitiveness of the automotive industry in association with the national competitiveness perspective using a methodology based on Bayesian Causal Networks. First, we structure the competitiveness problem of the automotive industry through a synthesis of expert knowledge in the light of the World Economic Forum’s competitiveness indicators. Second, we model the relationships among the variables identified in the problem structuring stage and analyse these relationships using a Bayesian Causal Network. Third, we develop policy suggestions under various scenarios to enhance the national competitive advantages of the automotive industry. We present an analysis of the Turkish automotive industry as a case study. It is possible to generalise the policy suggestions developed for the case of Turkish automotive industry to the automotive industries in other developing countries where country and industry competitiveness levels are similar to those of Turkey
The Application of Data Mining to Build Classification Model for Predicting Graduate Employment
Data mining has been applied in various areas because of its ability to
rapidly analyze vast amounts of data. This study is to build the Graduates
Employment Model using classification task in data mining, and to compare
several of data-mining approaches such as Bayesian method and the Tree method.
The Bayesian method includes 5 algorithms, including AODE, BayesNet, HNB,
NaviveBayes, WAODE. The Tree method includes 5 algorithms, including BFTree,
NBTree, REPTree, ID3, C4.5. The experiment uses a classification task in WEKA,
and we compare the results of each algorithm, where several classification
models were generated. To validate the generated model, the experiments were
conducted using real data collected from graduate profile at the Maejo
University in Thailand. The model is intended to be used for predicting whether
a graduate was employed, unemployed, or in an undetermined situation
The future of statistical disclosure control
Statistical disclosure control (SDC) was not created in a single seminal
paper nor following the invention of a new mathematical technique, rather it
developed slowly in response to the practical challenges faced by data
practitioners based at national statistical institutes (NSIs). SDC's subsequent
emergence as a specialised academic field was an outcome of three interrelated
socio-technical changes: (i) the advent of accessible computing as a research
tool in the 1980s meant that it became possible - and then increasingly easy -
for researchers to process larger quantities of data automatically; this
naturally increased demand for such data; (ii) it became possible for data
holders to process and disseminate detailed data as digital files and (iii) the
number of organisations holding data about individuals proliferated. This also
meant the number of potential adversaries with the resources to attack any
given dataset increased exponentially. In this article, we describe the state
of the art for SDC and then discuss the core issues and future challenges. In
particular, we touch on SDC and big data, on SDC and machine learning, and on
SDC and anti-discrimination.Comment: A contributing article to the National Statistician's Quality Review
into Privacy and Data Confidentiality Method
Uncertainty Aware AI ML: Why and How
This paper argues the need for research to realize uncertainty-aware
artificial intelligence and machine learning (AI\&ML) systems for decision
support by describing a number of motivating scenarios. Furthermore, the paper
defines uncertainty-awareness and lays out the challenges along with surveying
some promising research directions. A theoretical demonstration illustrates how
two emerging uncertainty-aware ML and AI technologies could be integrated and
be of value for a route planning operation.Comment: Presented at AAAI FSS-18: Artificial Intelligence in Government and
Public Sector, Arlington, Virginia, US
How scientific research changes the Vietnamese higher education landscape: Evidence from social sciences and humanities between 2008 and 2019
Background: In the context of globalization, Vietnamese universities, whose primary function is teaching, there is a need to improve research performance.
Methods: Based on SSHPA data, an exclusive database of Vietnamese social sciences and humanities researchers’ productivity, between 2008 and 2019 period, this study analyzes the research output of Vietnamese universities in the field of social sciences and humanities.
Results: Vietnamese universities have been steadily producing a high volume of publications in the 2008-2019 period, with a peak of 598 articles in 2019. Moreover, many private universities and institutions are also joining the publication race, pushing competitiveness in the country.
Conclusions: Solutions to improve both quantity and quality of Vietnamese universities’ research practice in the context of the industrial revolution 4.0 could be applying international criteria in Vietnamese higher education, developing scientific and critical thinking for general and STEM education, and promoting science communication
Predictive Situation Awareness for Ebola Virus Disease using a Collective Intelligence Multi-Model Integration Platform: Bayes Cloud
The humanity has been facing a plethora of challenges associated with
infectious diseases, which kill more than 6 million people a year. Although
continuous efforts have been applied to relieve the potential damages from such
misfortunate events, it is unquestionable that there are many persisting
challenges yet to overcome. One related issue we particularly address here is
the assessment and prediction of such epidemics. In this field of study,
traditional and ad-hoc models frequently fail to provide proper predictive
situation awareness (PSAW), characterized by understanding the current
situations and predicting the future situations. Comprehensive PSAW for
infectious disease can support decision making and help to hinder disease
spread. In this paper, we develop a computing system platform focusing on
collective intelligence causal modeling, in order to support PSAW in the domain
of infectious disease. Analyses of global epidemics require integration of
multiple different data and models, which can be originated from multiple
independent researchers. These models should be integrated to accurately assess
and predict the infectious disease in terms of holistic view. The system shall
provide three main functions: (1) collaborative causal modeling, (2) causal
model integration, and (3) causal model reasoning. These functions are
supported by subject-matter expert and artificial intelligence (AI), with
uncertainty treatment. Subject-matter experts, as collective intelligence,
develop causal models and integrate them as one joint causal model. The
integrated causal model shall be used to reason about: (1) the past, regarding
how the causal factors have occurred; (2) the present, regarding how the spread
is going now; and (3) the future, regarding how it will proceed. Finally, we
introduce one use case of predictive situation awareness for the Ebola virus
disease
- …