9 research outputs found

    Low Dimensional Relevance Coding for Personalized Tag Recommendation in Image Tagging Applications

    Get PDF
    An approach of image coding for tag recommendation based on feature clustering and weighted coding is presented in this paper. The existing tag recommendation approach develops a decision based on correlation of image features and their tag annotated. The descriptive feature of the image sample defines the content of an image and is correlated with database features for tag recommendation. The feature dimension and its representation have a greater impact on the recommendation performance. The recent method tag recommendation developed CNN based visual features and proposed a tag recommendation based on weight factor. The dimensional feature and the isolated weight allocation limit the performance of presented tag recommendation system. This paper presents a new weight allocation and feature clustering method for tag recommendation. An approach of integral coding for weighted image-tag is presented to improve recommendation accuracy. The proposed recommendation system performance is tested on Flickr dataset for retrieval and recommendation accuracy

    Rating consistency is consistently underrated : An exploratory analysis of movie-tag rating inconsistency

    Get PDF
    Publisher Copyright: © 2022 ACM.Content-based and hybrid recommender systems rely on item-tag ratings to make recommendations. An example of an item-tag rating is the degree to which the tag "comedy"applies to the movie "Back to the Future (1985)". Ratings are often generated by human annotators who can be inconsistent with one another. However, many recommender systems take item-tag ratings at face value, assuming them all to be equally valid. In this paper, we investigate the inconsistency of item-tag ratings together with contextual factors that could affect consistency in the movie domain. We conducted semi-structured interviews to identify potential reasons for rating inconsistency. Next, we used these reasons to design a survey, which we ran on Amazon Mechanical Turk. We collected 6,070 ratings from 665 annotators across 142 movies and 80 tags. Our analysis shows that ∼45% of ratings are inconsistent with the mode rating for a given movie-tag pair. We found that the single most important factor for rating inconsistency is the annotator's perceived ease of rating, suggesting that annotators are at least tacitly aware of the quality of their own ratings. We also found that subjective tags (e.g. "funny", "boring") are more inconsistent than objective tags (e.g. "robots", "aliens"), and are associated with lower tag familiarity and lower perceived ease of rating.Peer reviewe

    Stop-words in keyphrase extraction problem

    Get PDF
    Keyword extraction problem is one of the most significant tasks in information retrieval. High-quality keyword extraction sufficiently influences the progress in the following subtasks of information retrieval: classification and clustering, data mining, knowledge extraction and representation, etc. The research environment has specified a layout for keyphrase extraction. However, some of the possible decisions remain uninvolved in the paradigm. In the paper the authors observe the scope of interdisciplinary methods applicable to automatic stop list feeding. The chosen method belongs to the class of experiential models. The research procedure based on this method allows to improve the quality of keyphrase extraction on the stage of candidate keyphrase building. Several ways to automatic feeding of the stop lists are proposed in the paper as well. One of them is based on provisions of lexical statistics and the results of its application to the discussed task point out the non-gaussian nature of text corpora. The second way based on usage of the Inspec train collection to the feeding of stop lists improves the quality considerably

    A tree based keyphrase extraction technique for academic literature

    Get PDF
    Automatic keyphrase extraction techniques aim to extract quality keyphrases to summarize a document at a higher level. Among the existing techniques some of them are domain-specific and require application domain knowledge, some of them are based on higher-order statistical methods and are computationally expensive, and some of them require large train data which are rare for many applications. Overcoming these issues, this thesis proposes a new unsupervised automatic keyphrase extraction technique, named TeKET or Tree-based Keyphrase Extraction Technique, which is domain-independent, employs limited statistical knowledge, and requires no train data. The proposed technique also introduces a new variant of the binary tree, called KeyPhrase Extraction (KePhEx) tree to extract final keyphrases from candidate keyphrases. Depending on the candidate keyphrases the KePhEx tree structure is either expanded or shrunk or maintained. In addition, a measure, called Cohesiveness Index or CI, is derived that denotes the degree of cohesiveness of a given node with respect to the root which is used in extracting final keyphrases from a resultant tree in a flexible manner and is utilized in ranking keyphrases alongside Term Frequency. The effectiveness of the proposed technique is evaluated using an experimental evaluation on a benchmark corpus, called SemEval-2010 with total 244 train and test articles, and compared with other relevant unsupervised techniques by taking the representatives from both statistical (such as Term Frequency-Inverse Document Frequency and YAKE) and graph-based techniques (PositionRank, CollabRank (SingleRank), TopicRank, and MultipartiteRank) into account. Three evaluation metrics, namely precision, recall and F1 score are taken into consideration during the experiments. The obtained results demonstrate the improved performance of the proposed technique over other similar techniques in terms of precision, recall, and F1 scores

    Automatic keyphrase extraction and ontology mining for content-based tag recommendation

    No full text
    Collaborative tagging represents for the Web a potential way for organizing and sharing information and for heightening the capabilities of existing search engines. However, because of the lack of automatic methodologies for generating the tags and supporting the tagging activity, many resources on the Web are deficient in tag information, and recommending opportune tags is both a current open issue and an exciting challenge. This paper approaches the problem by applying a combined set of techniques and tools (that uses tags, domain ontologies, keyphrase extraction methods) thereby generating tags automatically. The proposed approach is implemented in the PIRATES (Personalized Intelligent tag Recommender and Annotator TEStbed) framework, a prototype system for personalized content retrieval, annotation, and classification. A case study application is developed using a domain ontology for software engineering

    Real-world, high-stakes deceptive speech: Theoretical validation and an examination of its potential for detection automation

    Get PDF
    The study of deception and the theories which have been developed have relied heavily on laboratory experiments, in controlled environments, utilizing American college students, participating in mock scenarios. The goal of this study was to validate previous deception research in a real-world high-stakes environment. An additional focus of this study was the development of procedures to process data (e.g. video or audio recordings) from real-world environments in such a manner that behavioral measures can be extracted and analyzed. This study utilized previously confirmed speech cues and constructs to deception in an attempt to validate a leading deception theory, Interpersonal Deception Theory (IDT). Several measures and constructs, utilized and validated in existing research, were explored and validated in this study. The data analyzed came from an adjudicated real-world high-stakes criminal case in which the subject was sentenced in federal court to 470 years in prison for creating child pornography, rape, sexual exploitation of children, child sexual assault and kidnapping; a crime spree that spanned over a five years and four states. The results did validate IDT with mixed results on individual measures and their constructs. The exploratory nature of the study, the volume of data, and the numerous methods of analysis used generated many possibilities for future research

    Enhancing the interactivity of a clinical decision support system by using knowledge engineering and natural language processing

    Get PDF
    Mental illness is a serious health problem and it affects many people. Increasingly,Clinical Decision Support Systems (CDSS) are being used for diagnosis and it is important to improve the reliability and performance of these systems. Missing a potential clue or a wrong diagnosis can have a detrimental effect on the patient's quality of life and could lead to a fatal outcome. The context of this research is the Galatean Risk and Safety Tool (GRiST), a mental-health-risk assessment system. Previous research has shown that success of a CDSS depends on its ease of use, reliability and interactivity. This research addresses these concerns for the GRiST by deploying data mining techniques. Clinical narratives and numerical data have both been analysed for this purpose.Clinical narratives have been processed by natural language processing (NLP)technology to extract knowledge from them. SNOMED-CT was used as a reference ontology and the performance of the different extraction algorithms have been compared. A new Ensemble Concept Mining (ECM) method has been proposed, which may eliminate the need for domain specific phrase annotation requirements. Word embedding has been used to filter phrases semantically and to build a semantic representation of each of the GRiST ontology nodes.The Chi-square and FP-growth methods have been used to find relationships between GRiST ontology nodes. Interesting patterns have been found that could be used to provide real-time feedback to clinicians. Information gain has been used efficaciously to explain the differences between the clinicians and the consensus risk. A new risk management strategy has been explored by analysing repeat assessments. A few novel methods have been proposed to perform automatic background analysis of the patient data and improve the interactivity and reliability of GRiST and similar systems
    corecore