76 research outputs found
Constructing Cooking Ontology for Live Streams
We build a cooking domain knowledge by using an ontology schema that reflects natural language processing and enhances ontology instances with semantic query. Our research helps audiences to better understand live streaming, especially when they just switch to a show. The practical contribution of our research is to use cooking ontology, so we may map clips of cooking live stream video and instructions of recipes. The architecture of our study presents three sections: ontology construction, ontology enhancement, and mapping cooking video to cooking ontology. Also, our preliminary evaluations consist of three hierarchiesânodes, ordered-pairs, and 3-tuplesâthat we use to referee (1) ontology enhancement performance for our first experiment evaluation and (2) the accuracy ratio of mapping between video clips and cooking ontology for our second experiment evaluation. Our results indicate that ontology enhancement is effective and heightens accuracy ratios on matching pairs with cooking ontology and video clips
On the Patent Claim Eligibility Prediction Using Text Mining Techniques
With the widespread of computer software in recent decades, software patent has become controversial for the patent system. Of the many patentability requirements, patentable subject matter serves as a gatekeeping function to prevent a patent from preempting future innovation. Software patents may easily fall into the gray area of abstract ideas, whose allowance may hinder future innovation. However, without a clear definition of abstract ideas, determining the patent claim subject matter eligibility is a challenging task for examiners and applicants.Ă In this research, in order to solve the software patent eligibility issues,Ă we propose an effective model to determine patent claim eligibility by text-mining and machine learning techniques.Ă Drawing upon USPTO issued guidelines, we identify 66 patent cases to design domain knowledge features, including abstractness features and distinguishable word features, as well as other textual features, to develop the claim eligibility prediction model. The experiment results show our proposed model reaches the accuracy of more than 80%, and domain knowledge features play a crucial role in our prediction model
The Research on the Detection of Noteworthy Symptom Descriptions
The advance of mobile devices and communication technologies enable patients to communicate with their doctors in a more convenient way. We have developed an App that allows patients to record their symptoms and submit them to their doctors. Physicians can keep track of patientsâ conditions by looking at the self-report messages. Nevertheless, physicians are usually busy and may be overwhelmed by the large amount of incoming messages. As a result, critical messages may not receive immediate attentions, and patient care is compromised. It is imperative to identify the messages that require physiciansâ attention, called noteworthy messages. In this research, we propose an approach that applies text-mining technologies to identify medical symptoms conveyed in the messages and their associated sentiment orientation, as well as other factors. Noteworthy messages are subsequently characterized by symptom sentiment and symptom change features. We then construct a prediction model to identify messages that are noteworthy to the physicians. We show from our experiments using data collected from a teaching hospital in Taiwan that the different features have different degrees of impact on the performance of the prediction model, and our proposed approach can effectively identify noteworthy messages
THE IDENTIFICATION OF NOTEWORTHY HOTEL REVIEWS FOR HOTEL MANAGEMENT
The rapid emergence of user-generated content (UGC) inspires knowledge sharing among Internet users. A good example is the well-known travel site TripAdvisor.com, which enables users to share their experiences and express their opinions on attractions, accommodations, restaurants, etc. The UGC about travel provide precious information to the users as well as staff in travel industry. In particular, how to identify reviews that are noteworthy for hotel management is critical to the success of hotels in the competitive travel industry. We have employed two hotel managers to conduct an examination on Taiwanâs hotel reviews in Tripadvisor.com and found that noteworthy reviews can be characterized by their content features, sentiments, and review qualities. Through the experiments using tripadvisor.com data, we find that all three types of features are important in identifying noteworthy hotel reviews. Specifically, content features are shown to have the most impact, followed by sentiments and review qualities. With respect to the various methods for representing content features, LDA method achieves comparable performance to TF-IDF method with higher recall and much fewer features
Combining Coauthorship Network and Content for Literature Recommendation
This paper studies literature recommendation approaches using both content features and coauthorship relations of articles in literature databases. Most literature databases allow data access (via site subscription) without having to identify users, and thus task-focused recommendation is more appropriate in this context. Previous work mostly utilizes content and usage log for making task-focused recommendation. More recent works start to incorporate coauthorship network for recommendation and found it beneficial when the specified articles preferred by authors are similar in their content. However, it was also found that recommendation based on content features achieves better performance under other circumstances. Therefore, in this work we propose to incorporate both content and coauthorship network in making task-focused recommendation. Three hybrid methods, namely switching, proportional, and fusion are developed and compared. Our experimental results show that in general the proposed hybrid approach achieves better performance than approaches that utilize only one source of knowledge. In particular, the fusion method tends to have higher recommendation accuracy for articles of higher ranks. Besides, the content-based approach is more likely to recommend articles of low fidelity, whereas the coauthorship network-based approach has the least chance
PREDICTING COMPANY REVENUE TREND USING FINANCIAL NEWS
Text data analysis has found its way in many applications, and our study focuses on the financial fields. Previous studies in financial indicator prediction are mostly based on econometric models. In recent years, with the advance of text mining techniques, more and more studies employ financial news as the data source for analysis. Most studies, however, aim to predict stock prices, identify the trend of stock market, and detect company bankruptcy or company fraud. We observe that companyâ revenue, which can imply the company\u27s cash flow and market share, is indeed an important financial indicator. In our study, we identify a few features that potentially impact companyâs revenue and further propose an approach to deriving feature values from financial news data. Specifically, we develop a lexicon-based method that involves the automatic expansion of existing financial sentiment dictionary and the aggregation of sentiment values. Preliminary experimental results show that we are able to predict the revenue trend through the news articles in the last quarter with the accuracy up to 80%
High-Throughput Identification of Long-Range Regulatory Elements and Their Target Promoters in the Human Genome
Enhancer elements are essential for tissue-specific gene regulation during mammalian development. Although these regulatory elements are often distant from their target genes, they affect gene expression by recruiting transcription factors to specific promoter regions. Because of this long-range action, the annotation of enhancer elementâtarget promoter pairs remains elusive. Here, we developed a novel analysis methodology that takes advantage of Hi-C data to comprehensively identify these interactions throughout the human genome. To do this, we used a geometric distribution-based model to identify DNAâDNA interaction hotspots that contact gene promoters with high confidence. We observed that these promoter-interacting hotspots significantly overlap with known enhancer-associated histone modifications and DNase I hypersensitive sites. Thus, we defined thousands of candidate enhancer elements by incorporating these features, and found that they have a significant propensity to be bound by p300, an enhancer binding transcription factor. Furthermore, we revealed that their target genes are significantly bound by RNA Polymerase II and demonstrate tissue-specific expression. Finally, we uncovered that these elements are generally found within 1 Mb of their targets, and often regulate multiple genes. In total, our study presents a novel high-throughput workflow for confident, genome-wide discovery of enhancerâtarget promoter pairs, which will significantly improve our understanding of these regulatory interactions
On Exploring the Reasoning Capability of Large Language Models with Knowledge Graphs
This paper examines the capacity of LLMs to reason with knowledge graphs
using their internal knowledge graph, i.e., the knowledge graph they learned
during pre-training. Two research questions are formulated to investigate the
accuracy of LLMs in recalling information from pre-training knowledge graphs
and their ability to infer knowledge graph relations from context. To address
these questions, we employ LLMs to perform four distinct knowledge graph
reasoning tasks. Furthermore, we identify two types of hallucinations that may
occur during knowledge reasoning with LLMs: content and ontology hallucination.
Our experimental results demonstrate that LLMs can successfully tackle both
simple and complex knowledge graph reasoning tasks from their own memory, as
well as infer from input context.Comment: Presented at the Generative-IR Workshop during SIGIR 2023.
https://coda.io/@sigir/gen-i
- âŠ