346 research outputs found
Adopted topic modeling for business process and software component conformity checking
Business processes and software components, especially class diagrams, have a firm connection. Considering software components support the business process in providing an excellent product and service. Besides, business process changes affect on software component design. One of them usually appears on the label or name of the software component or business process. Sometimes, a related business process and software component appears in the different label but the same meaning rather than using the same label. This situation is problematic when there are many changes to be made, in which the software component's modifying process becomes quite long. Therefore, the software maintainers should obtain an efficient procedure to shorten the modifying process. One solution is by using conformity checking, which helps the software maintainers know which software component is related to a specific business process. This paper compared two leading topic modeling techniques, namely probabilistic latent semantic analysis (PLSA) and latent Dirichlet allocation (LDA), to determine which one has a better performancefor process traceability
Learning Behavioural Context
The original publication is available at www.springerlink.co
Comprehensive Review of Opinion Summarization
The abundance of opinions on the web has kindled the study of opinion summarization over the last few years. People have introduced various techniques and paradigms to solving this special task. This survey attempts to systematically investigate the different techniques and approaches used in opinion summarization. We provide a multi-perspective classification of the approaches used and highlight some of the key weaknesses of these approaches. This survey also covers evaluation techniques and data sets used in studying the opinion summarization problem. Finally, we provide insights into some of the challenges that are left to be addressed as this will help set the trend for future research in this area.unpublishednot peer reviewe
Identification-method research for open-source software ecosystems
In recent years, open-source software (OSS) development has grown, with many developers around the world working on different OSS projects. A variety of open-source software ecosystems have emerged, for instance, GitHub, StackOverflow, and SourceForge. One of the most typical social-programming and code-hosting sites, GitHub, has amassed numerous open-source-software projects and developers in the same virtual collaboration platform. Since GitHub itself is a large open-source community, it hosts a collection of software projects that are developed together and coevolve. The great challenge here is how to identify the relationship between these projects, i.e., project relevance. Software-ecosystem identification is the basis of other studies in the ecosystem. Therefore, how to extract useful information in GitHub and identify software ecosystems is particularly important, and it is also a research area in symmetry. In this paper, a Topic-based Project Knowledge Metrics Framework (TPKMF) is proposed. By collecting the multisource dataset of an open-source ecosystem, project-relevance analysis of the open-source software is carried out on the basis of software-ecosystem identification. Then, we used our Spectral Clustering algorithm based on Core Project (CP-SC) to identify software-ecosystem projects and further identify software ecosystems. We verified that most software ecosystems usually contain a core software project, and most other projects are associated with it. Furthermore, we analyzed the characteristics of the ecosystem, and we also found that interactive information has greater impact on project relevance. Finally, we summarize the Topic-based Project Knowledge Metrics Framework
A Novel Semantic Statistical Model for Automatic Image Annotation Using the Relationship between the Regions Based on Multi-Criteria Decision Making
Automatic image annotation has emerged as an important research topic due to the existence of the semantic gap and in addition to its potential application on image retrieval and management. In this paper we present an approach which combines regional contexts and visual topics to automatic image annotation. Regional contexts model the relationship between the regions, whereas visual topics provide the global distribution of topics over an image. Conventional image annotation methods neglected the relationship between the regions in an image, while these regions are exactly explanation of the image semantics, therefore considering the relationship between them are helpful to annotate the images. The proposed model extracts regional contexts and visual topics from the image, and incorporates them by MCDM (Multi Criteria Decision Making) approach based on TOPSIS (Technique for Order Preference by Similarity to the Ideal Solution) method. Regional contexts and visual topics are learned by PLSA (Probability Latent Semantic Analysis) from the training data. The experiments on 5k Corel images show that integrating these two kinds of information is beneficial to image annotation.DOI:http://dx.doi.org/10.11591/ijece.v4i1.459
Transforming Graph Representations for Statistical Relational Learning
Relational data representations have become an increasingly important topic
due to the recent proliferation of network datasets (e.g., social, biological,
information networks) and a corresponding increase in the application of
statistical relational learning (SRL) algorithms to these domains. In this
article, we examine a range of representation issues for graph-based relational
data. Since the choice of relational data representation for the nodes, links,
and features can dramatically affect the capabilities of SRL algorithms, we
survey approaches and opportunities for relational representation
transformation designed to improve the performance of these algorithms. This
leads us to introduce an intuitive taxonomy for data representation
transformations in relational domains that incorporates link transformation and
node transformation as symmetric representation tasks. In particular, the
transformation tasks for both nodes and links include (i) predicting their
existence, (ii) predicting their label or type, (iii) estimating their weight
or importance, and (iv) systematically constructing their relevant features. We
motivate our taxonomy through detailed examples and use it to survey and
compare competing approaches for each of these tasks. We also discuss general
conditions for transforming links, nodes, and features. Finally, we highlight
challenges that remain to be addressed
Semantic multimedia analysis using knowledge and context
PhDThe difficulty of semantic multimedia analysis can be attributed to the
extended diversity in form and appearance exhibited by the majority of
semantic concepts and the difficulty to express them using a finite number
of patterns. In meeting this challenge there has been a scientific debate
on whether the problem should be addressed from the perspective of using
overwhelming amounts of training data to capture all possible instantiations
of a concept, or from the perspective of using explicit knowledge about
the concepts’ relations to infer their presence. In this thesis we address
three problems of pattern recognition and propose solutions that combine
the knowledge extracted implicitly from training data with the knowledge
provided explicitly in structured form. First, we propose a BNs modeling
approach that defines a conceptual space where both domain related evi-
dence and evidence derived from content analysis can be jointly considered
to support or disprove a hypothesis. The use of this space leads to sig-
nificant gains in performance compared to analysis methods that can not
handle combined knowledge. Then, we present an unsupervised method
that exploits the collective nature of social media to automatically obtain
large amounts of annotated image regions. By proving that the quality of
the obtained samples can be almost as good as manually annotated images
when working with large datasets, we significantly contribute towards scal-
able object detection. Finally, we introduce a method that treats images,
visual features and tags as the three observable variables of an aspect model
and extracts a set of latent topics that incorporates the semantics of both
visual and tag information space. By showing that the cross-modal depen-
dencies of tagged images can be exploited to increase the semantic capacity
of the resulting space, we advocate the use of all existing information facets
in the semantic analysis of social media
A Topic Modeling approach for Code Clone Detection
In this thesis work, the potential benefits of Latent Dirichlet Allocation (LDA) as a technique for code clone detection has been described. The objective is to propose a language-independent, effective, and scalable approach for identifying similar code fragments in relatively large software systems. The main assumption is that the latent topic structure of software artifacts gives an indication of the presence of code clones. It can be hypothesized that artifacts with similar topic distributions contain duplicated code fragments and to prove this hypothesis, an experimental investigation using multiple datasets from various application domains were conducted. In addition, CloneTM, an LDA-based working prototype for code clone detection was developed. Results showed that, if calibrated properly, topic modeling can deliver a satisfactory performance in capturing different types of code clones, showing particularity good performance in detecting Type III clones. CloneTM also achieved levels of performance comparable to already existing practical tools that adopt different clone detection strategies
Recommender Systems
The ongoing rapid expansion of the Internet greatly increases the necessity
of effective recommender systems for filtering the abundant information.
Extensive research for recommender systems is conducted by a broad range of
communities including social and computer scientists, physicists, and
interdisciplinary researchers. Despite substantial theoretical and practical
achievements, unification and comparison of different approaches are lacking,
which impedes further advances. In this article, we review recent developments
in recommender systems and discuss the major challenges. We compare and
evaluate available algorithms and examine their roles in the future
developments. In addition to algorithms, physical aspects are described to
illustrate macroscopic behavior of recommender systems. Potential impacts and
future directions are discussed. We emphasize that recommendation has a great
scientific depth and combines diverse research fields which makes it of
interests for physicists as well as interdisciplinary researchers.Comment: 97 pages, 20 figures (To appear in Physics Reports
Learning Contextualized Semantics from Co-occurring Terms via a Siamese Architecture
One of the biggest challenges in Multimedia information retrieval and
understanding is to bridge the semantic gap by properly modeling concept
semantics in context. The presence of out of vocabulary (OOV) concepts
exacerbates this difficulty. To address the semantic gap issues, we formulate a
problem on learning contextualized semantics from descriptive terms and propose
a novel Siamese architecture to model the contextualized semantics from
descriptive terms. By means of pattern aggregation and probabilistic topic
models, our Siamese architecture captures contextualized semantics from the
co-occurring descriptive terms via unsupervised learning, which leads to a
concept embedding space of the terms in context. Furthermore, the co-occurring
OOV concepts can be easily represented in the learnt concept embedding space.
The main properties of the concept embedding space are demonstrated via
visualization. Using various settings in semantic priming, we have carried out
a thorough evaluation by comparing our approach to a number of state-of-the-art
methods on six annotation corpora in different domains, i.e., MagTag5K, CAL500
and Million Song Dataset in the music domain as well as Corel5K, LabelMe and
SUNDatabase in the image domain. Experimental results on semantic priming
suggest that our approach outperforms those state-of-the-art methods
considerably in various aspects
- …