26,018 research outputs found
Distributional Measures of Semantic Distance: A Survey
The ability to mimic human notions of semantic distance has widespread
applications. Some measures rely only on raw text (distributional measures) and
some rely on knowledge sources such as WordNet. Although extensive studies have
been performed to compare WordNet-based measures with human judgment, the use
of distributional measures as proxies to estimate semantic distance has
received little attention. Even though they have traditionally performed poorly
when compared to WordNet-based measures, they lay claim to certain uniquely
attractive features, such as their applicability in resource-poor languages and
their ability to mimic both semantic similarity and semantic relatedness.
Therefore, this paper presents a detailed study of distributional measures.
Particular attention is paid to flesh out the strengths and limitations of both
WordNet-based and distributional measures, and how distributional measures of
distance can be brought more in line with human notions of semantic distance.
We conclude with a brief discussion of recent work on hybrid measures
Two knowledge-based methods for High-Performance Sense Distribution Learning
Knowing the correct distribution of senses within a corpus can potentially boost the performance of Word Sense Disambiguation (WSD) systems by many points. We present two fully automatic and language-independent methods for computing the distribution of senses given a raw corpus of sentences. Intrinsic and extrinsic evaluations show that our methods outperform the current state of the art in sense distribution learning and the strongest baselines for the most frequent sense in multiple languages and on domain-specific test sets. Our sense distributions are available at http://trainomatic.org
Global disease monitoring and forecasting with Wikipedia
Infectious disease is a leading threat to public health, economic stability,
and other key social structures. Efforts to mitigate these impacts depend on
accurate and timely monitoring to measure the risk and progress of disease.
Traditional, biologically-focused monitoring techniques are accurate but costly
and slow; in response, new techniques based on social internet data such as
social media and search queries are emerging. These efforts are promising, but
important challenges in the areas of scientific peer review, breadth of
diseases and countries, and forecasting hamper their operational usefulness.
We examine a freely available, open data source for this use: access logs
from the online encyclopedia Wikipedia. Using linear models, language as a
proxy for location, and a systematic yet simple article selection procedure, we
tested 14 location-disease combinations and demonstrate that these data
feasibly support an approach that overcomes these challenges. Specifically, our
proof-of-concept yields models with up to 0.92, forecasting value up to
the 28 days tested, and several pairs of models similar enough to suggest that
transferring models from one location to another without re-training is
feasible.
Based on these preliminary results, we close with a research agenda designed
to overcome these challenges and produce a disease monitoring and forecasting
system that is significantly more effective, robust, and globally comprehensive
than the current state of the art.Comment: 27 pages; 4 figures; 4 tables. Version 2: Cite McIver & Brownstein
and adjust novelty claims accordingly; revise title; various revisions for
clarit
Insights from Machine-Learned Diet Success Prediction
To support people trying to lose weight and stay healthy, more and more
fitness apps have sprung up including the ability to track both calories intake
and expenditure. Users of such apps are part of a wider ``quantified self''
movement and many opt-in to publicly share their logged data. In this paper, we
use public food diaries of more than 4,000 long-term active MyFitnessPal users
to study the characteristics of a (un-)successful diet. Concretely, we train a
machine learning model to predict repeatedly being over or under self-set daily
calories goals and then look at which features contribute to the model's
prediction. Our findings include both expected results, such as the token
``mcdonalds'' or the category ``dessert'' being indicative for being over the
calories goal, but also less obvious ones such as the difference between pork
and poultry concerning dieting success, or the use of the ``quick added
calories'' functionality being indicative of over-shooting calorie-wise. This
study also hints at the feasibility of using such data for more in-depth data
mining, e.g., looking at the interaction between consumed foods such as mixing
protein- and carbohydrate-rich foods. To the best of our knowledge, this is the
first systematic study of public food diaries.Comment: Preprint of an article appearing at the Pacific Symposium on
Biocomputing (PSB) 2016 in the Social Media Mining for Public Health
Monitoring and Surveillance trac
Strategic I/O Psychology and the Role of Utility Analysis Models
In the 1990’s, the significance of human capital in organizations has been increasing,and measurement issues in human resource management have achieved significant prominence. Yet, I/O psychology research on utility analysis and measurement has actually declined. In this chapter we propose a decision-based framework to review developments in utility analysis research since 1991, and show that through lens of this framework there are many fertile avenues for research. We then show that both I/O psychology and strategic HRM research and practice can be enhanced by greater collaboration and integration, particularly regarding the link between human capital and organizational success. We present an integrative framework as the basis for that integration, and illustrate its implications for future research
Development and evaluation of a web-based learning system based on learning object design and generative learning to improve higher-order thinking skills and learning
This research aims to design, develop and evaluate the effectiveness of a Webbased learning system prototype called Generative Object Oriented Design (GOOD) learning system. Result from the preliminary study conducted showed most of the students were at lower order thinking skills (LOTS) compared to higher order thinking skills (HOTS) based on Bloom’s Taxonomy. Based on such concern, GOOD learning system was designed and developed based on learning object design and generative learning to improve HOTS and learning. A conceptual model design of GOOD learning system, called Generative Learning Object Organizer and Thinking Tasks (GLOOTT) model, has been proposed from the theoretical framework of this research. The topic selected for this research was Computer System (CS) which focused on the hardware concepts from the first year Diploma of Computer Science subjects. GOOD learning system acts as a mindtool to improve HOTS and learning in CS. A pre-experimental research design of one group pretest and posttest was used in this research. The samples of this research were 30 students and 12 lecturers. Data was collected from the pretest, posttest, portfolio, interview and Web-based learning system evaluation form. The paired-samples T test analysis was used to analyze the achievement of the pretest and posttest and the result showed that there was significance difference between the mean scores of pretest and posttest at the significant level a = 0.05 (p=0.000). In addition, the paired-samples T test analysis of the cognitive operations from Bloom’s Taxonomy showed that there was significance difference for each of the cognitive operation of the students before and after using GOOD learning system. Results from the study showed improvement of HOTS and learning among the students. Besides, analysis of portfolio showed that the students engaged HOTS during the use of the system. Most of the students and lecturers gave positive comments about the effectiveness of the system in improving HOTS and learning in CS. From the findings in this research, GOOD learning system has the potential to improve students’ HOTS and learning
- …