2,865 research outputs found
Using webcrawling of publicly available websites to assess E-commerce relationships
We investigate e-commerce success factors concerning their impact on the success of commerce transactions between businesses companies. In scientific literature, many e-commerce success factors are introduced. Most of them are focused on companies' website quality. They are evaluated concerning companies' success in the business-to- consumer (B2C) environment where consumers choose their preferred e-commerce websites based on these success factors e.g. website content quality, website interaction, and website customization. In contrast to previous work, this research focuses on the usage of existing e-commerce success factors for predicting successfulness of business-to-business (B2B) ecommerce. The introduced methodology is based on the identification of semantic textual patterns representing success factors from the websites of B2B companies. The successfulness of the identified success factors in B2B ecommerce is evaluated by regression modeling. As a result, it is shown that some B2C e-commerce success factors also enable the predicting of B2B e-commerce success while others do not. This contributes to the existing literature concerning ecommerce success factors. Further, these findings are valuable for B2B e-commerce websites creation
Web Video in Numbers - An Analysis of Web-Video Metadata
Web video is often used as a source of data in various fields of study. While
specialized subsets of web video, mainly earmarked for dedicated purposes, are
often analyzed in detail, there is little information available about the
properties of web video as a whole. In this paper we present insights gained
from the analysis of the metadata associated with more than 120 million videos
harvested from two popular web video platforms, vimeo and YouTube, in 2016 and
compare their properties with the ones found in commonly used video
collections. This comparison has revealed that existing collections do not (or
no longer) properly reflect the properties of web video "in the wild".Comment: Dataset available from http://download-dbis.dmi.unibas.ch/WWIN
Graph-RAT programming environment
Graph-RAT is a new programming environment specializing in relational data mining. It incorporates a number of different techniques into a single framework for data collection, data cleaning, propositionalization, and analysis. The language is functional where algorithms are executed over arbitrary sub-graphs of the data. Analytical results can be conducted using collaborative filtering or machine learning techniques. The example algorithms are under BSD license
Local Ranking Problem on the BrowseGraph
The "Local Ranking Problem" (LRP) is related to the computation of a
centrality-like rank on a local graph, where the scores of the nodes could
significantly differ from the ones computed on the global graph. Previous work
has studied LRP on the hyperlink graph but never on the BrowseGraph, namely a
graph where nodes are webpages and edges are browsing transitions. Recently,
this graph has received more and more attention in many different tasks such as
ranking, prediction and recommendation. However, a web-server has only the
browsing traffic performed on its pages (local BrowseGraph) and, as a
consequence, the local computation can lead to estimation errors, which hinders
the increasing number of applications in the state of the art. Also, although
the divergence between the local and global ranks has been measured, the
possibility of estimating such divergence using only local knowledge has been
mainly overlooked. These aspects are of great interest for online service
providers who want to: (i) gauge their ability to correctly assess the
importance of their resources only based on their local knowledge, and (ii)
take into account real user browsing fluxes that better capture the actual user
interest than the static hyperlink network. We study the LRP problem on a
BrowseGraph from a large news provider, considering as subgraphs the
aggregations of browsing traces of users coming from different domains. We show
that the distance between rankings can be accurately predicted based only on
structural information of the local graph, being able to achieve an average
rank correlation as high as 0.8
Text Analytics for Android Project
Most advanced text analytics and text mining tasks include text classification, text clustering, building ontology, concept/entity extraction, summarization, deriving patterns within the structured data, production of granular taxonomies, sentiment and emotion analysis, document summarization, entity relation modelling, interpretation of the output. Already existing text analytics and text mining cannot develop text material alternatives (perform a multivariant design), perform multiple criteria analysis,
automatically select the most effective variant according to different aspects (citation index of papers (Scopus, ScienceDirect, Google Scholar) and authors (Scopus, ScienceDirect, Google Scholar), Top 25 papers, impact factor of journals, supporting phrases, document name and contents, density of keywords), calculate utility degree and market value. However, the Text Analytics for Android Project can perform the aforementioned functions. To the best of the knowledge herein, these functions have not been previously implemented; thus this is the first attempt to do so. The Text Analytics for Android Project is briefly described in this article
The contribution of data mining to information science
The information explosion is a serious challenge for current information institutions. On the other hand, data mining, which is the search for valuable information in large volumes of data, is one of the solutions to face this challenge. In the past several years, data mining has made a significant contribution to the field of information science. This paper examines the impact of data mining by reviewing existing applications, including personalized environments, electronic commerce, and search engines. For these three types of application, how data mining can enhance their functions is discussed. The reader of this paper is expected to get an overview of the state of the art research associated with these applications. Furthermore, we identify the limitations of current work and raise several directions for future research
Look back, look around:A systematic analysis of effective predictors for new outlinks in focused Web crawling
Small and medium enterprises rely on detailed Web analytics to be informed
about their market and competition. Focused crawlers meet this demand by
crawling and indexing specific parts of the Web. Critically, a focused crawler
must quickly find new pages that have not yet been indexed. Since a new page
can be discovered only by following a new outlink, predicting new outlinks is
very relevant in practice. In the literature, many feature designs have been
proposed for predicting changes in the Web. In this work we provide a
structured analysis of this problem, using new outlinks as our running
prediction target. Specifically, we unify earlier feature designs in a
taxonomic arrangement of features along two dimensions: static versus dynamic
features, and features of a page versus features of the network around it.
Within this taxonomy, complemented by our new (mainly, dynamic network)
features, we identify best predictors for new outlinks. Our main conclusion is
that most informative features are the recent history of new outlinks on a page
itself, and of its content-related pages. Hence, we propose a new 'look back,
look around' (LBLA) model, that uses only these features. With the obtained
predictions, we design a number of scoring functions to guide a focused crawler
to pages with most new outlinks, and compare their performance. The LBLA
approach proved extremely effective, outperforming other models including those
that use a most complete set of features. One of the learners we use, is the
recent NGBoost method that assumes a Poisson distribution for the number of new
outlinks on a page, and learns its parameters. This connects the two so far
unrelated avenues in the literature: predictions based on features of a page,
and those based on probabilistic modelling. All experiments were carried out on
an original dataset, made available by a commercial focused crawler.Comment: 23 pages, 15 figures, 4 tables, uses arxiv.sty, added new title,
heuristic features and their results added, figures 7, 14, and 15 updated,
accepted versio
- âŠ