49 research outputs found
Constructing Cooking Ontology for Live Streams
We build a cooking domain knowledge by using an ontology schema that reflects natural language processing and enhances ontology instances with semantic query. Our research helps audiences to better understand live streaming, especially when they just switch to a show. The practical contribution of our research is to use cooking ontology, so we may map clips of cooking live stream video and instructions of recipes. The architecture of our study presents three sections: ontology construction, ontology enhancement, and mapping cooking video to cooking ontology. Also, our preliminary evaluations consist of three hierarchies—nodes, ordered-pairs, and 3-tuples—that we use to referee (1) ontology enhancement performance for our first experiment evaluation and (2) the accuracy ratio of mapping between video clips and cooking ontology for our second experiment evaluation. Our results indicate that ontology enhancement is effective and heightens accuracy ratios on matching pairs with cooking ontology and video clips
THE IDENTIFICATION OF NOTEWORTHY HOTEL REVIEWS FOR HOTEL MANAGEMENT
The rapid emergence of user-generated content (UGC) inspires knowledge sharing among Internet users. A good example is the well-known travel site TripAdvisor.com, which enables users to share their experiences and express their opinions on attractions, accommodations, restaurants, etc. The UGC about travel provide precious information to the users as well as staff in travel industry. In particular, how to identify reviews that are noteworthy for hotel management is critical to the success of hotels in the competitive travel industry. We have employed two hotel managers to conduct an examination on Taiwan’s hotel reviews in Tripadvisor.com and found that noteworthy reviews can be characterized by their content features, sentiments, and review qualities. Through the experiments using tripadvisor.com data, we find that all three types of features are important in identifying noteworthy hotel reviews. Specifically, content features are shown to have the most impact, followed by sentiments and review qualities. With respect to the various methods for representing content features, LDA method achieves comparable performance to TF-IDF method with higher recall and much fewer features
The Research on the Detection of Noteworthy Symptom Descriptions
The advance of mobile devices and communication technologies enable patients to communicate with their doctors in a more convenient way. We have developed an App that allows patients to record their symptoms and submit them to their doctors. Physicians can keep track of patients’ conditions by looking at the self-report messages. Nevertheless, physicians are usually busy and may be overwhelmed by the large amount of incoming messages. As a result, critical messages may not receive immediate attentions, and patient care is compromised. It is imperative to identify the messages that require physicians’ attention, called noteworthy messages. In this research, we propose an approach that applies text-mining technologies to identify medical symptoms conveyed in the messages and their associated sentiment orientation, as well as other factors. Noteworthy messages are subsequently characterized by symptom sentiment and symptom change features. We then construct a prediction model to identify messages that are noteworthy to the physicians. We show from our experiments using data collected from a teaching hospital in Taiwan that the different features have different degrees of impact on the performance of the prediction model, and our proposed approach can effectively identify noteworthy messages
PREDICTING COMPANY REVENUE TREND USING FINANCIAL NEWS
Text data analysis has found its way in many applications, and our study focuses on the financial fields. Previous studies in financial indicator prediction are mostly based on econometric models. In recent years, with the advance of text mining techniques, more and more studies employ financial news as the data source for analysis. Most studies, however, aim to predict stock prices, identify the trend of stock market, and detect company bankruptcy or company fraud. We observe that company’ revenue, which can imply the company\u27s cash flow and market share, is indeed an important financial indicator. In our study, we identify a few features that potentially impact company’s revenue and further propose an approach to deriving feature values from financial news data. Specifically, we develop a lexicon-based method that involves the automatic expansion of existing financial sentiment dictionary and the aggregation of sentiment values. Preliminary experimental results show that we are able to predict the revenue trend through the news articles in the last quarter with the accuracy up to 80%
Retrospective evaluation of whole exome and genome mutation calls in 746 cancer samples
Funder: NCI U24CA211006Abstract: The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC) curated consensus somatic mutation calls using whole exome sequencing (WES) and whole genome sequencing (WGS), respectively. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2,658 cancers across 38 tumour types, we compare WES and WGS side-by-side from 746 TCGA samples, finding that ~80% of mutations overlap in covered exonic regions. We estimate that low variant allele fraction (VAF < 15%) and clonal heterogeneity contribute up to 68% of private WGS mutations and 71% of private WES mutations. We observe that ~30% of private WGS mutations trace to mutations identified by a single variant caller in WES consensus efforts. WGS captures both ~50% more variation in exonic regions and un-observed mutations in loci with variable GC-content. Together, our analysis highlights technological divergences between two reproducible somatic variant detection efforts
Forecasting Company Revenue Trend Using Financial News
Text mining has emerged as an important suite of techniques in recent years and found its way into many applications. In the finance area, most recent text mining-based studies focus on the prediction of the stock market trend or the detection of company bankruptcy/fraud. Other important economic indicators of the companies, such as revenues, are seldom addressed. Yet these indicators could be quite important and reflect the financial status of the company’s cash flow and market share. In this paper, we adopt a lexicon-based approach that first builds several lexicons of different types, including sources, entities, aspects, sentiments, and past times. Twelve sentiment features are identified as predictors of revenue trend and a lexicon-based method for determining the sentiment of each feature is proposed. In addition, one more feature computed using ARIMA based on previous revenue data is incorporated. Our experimental results using news articles of the seven Taiwan-based, major PC manufacturing companies demonstrate that both financial news articles and previous revenue data are important for accurately predicting revenue trend. The prediction model constructed using the proposed approach is able to predict revenue trend with accuracy of more than 80%
Interval-valued distributed preference relation and its application to group decision making.
As an important way to help express the preference relation between alternatives, distributed preference relation (DPR) can represent the preferred, non-preferred, indifferent, and uncertain degrees of one alternative over another simultaneously. DPR, however, is unavailable in some situations where a decision maker cannot provide the precise degrees of one alternative over another due to lack of knowledge, experience, and data. In this paper, to address this issue, we propose interval-valued DPR (IDPR) and present its properties of validity and normalization. Through constructing two optimization models, an IDPR matrix is transformed into a score matrix to facilitate the comparison between any two alternatives. The properties of the score matrix are analyzed. To guarantee the rationality of the comparisons between alternatives derived from the score matrix, the additive consistency of the score matrix is developed. In terms of these, IDPR is applied to model and solve multiple criteria group decision making (MCGDM) problem. Particularly, the relationship between the parameters for the consistency of the score matrix associated with each decision maker and those for the consistency of the score matrix associated with the group of decision makers is analyzed. A manager selection problem is investigated to demonstrate the application of IDPRs to MCGDM problems
On Classifying Discussion Threads Using Travel Information Goal-Oriented Model
We study how to recommend discussion threads in the tourism domain to meet visitors’ travel information needs. This research-in-progress paper reports the first stage of our research, namely classifying discussion threads into travel goals. We propose an information goal-oriented model, which consists of four goals: Initiation, Attraction, Accommodation, and Route planning, that can be characterized using nine features. Seven of these nine features can be quantified based on lexicons, and the other two can be measured using the named entity recognition technique. Three lexicons can be further enhanced using WordNet. We conduct an experiment in evaluating the impact of these features on goal classification with a data set collected from TripAdvisor.com, the world\u27s largest travelling website. The experimental results show that our approach generally has comparable or better performance than that of using purely lexical features, namely TF-IDF, for classification
Process of generating a solution to the MCGDM problem with IDPRs.
<p>Process of generating a solution to the MCGDM problem with IDPRs.</p
Score intervals of the group IDPRs between neighboring candidates in the manager selection problem.
<p>Score intervals of the group IDPRs between neighboring candidates in the manager selection problem.</p