15,512 research outputs found
Patent Analytics Based on Feature Vector Space Model: A Case of IoT
The number of approved patents worldwide increases rapidly each year, which
requires new patent analytics to efficiently mine the valuable information
attached to these patents. Vector space model (VSM) represents documents as
high-dimensional vectors, where each dimension corresponds to a unique term.
While originally proposed for information retrieval systems, VSM has also seen
wide applications in patent analytics, and used as a fundamental tool to map
patent documents to structured data. However, VSM method suffers from several
limitations when applied to patent analysis tasks, such as loss of
sentence-level semantics and curse-of-dimensionality problems. In order to
address the above limitations, we propose a patent analytics based on feature
vector space model (FVSM), where the FVSM is constructed by mapping patent
documents to feature vectors extracted by convolutional neural networks (CNN).
The applications of FVSM for three typical patent analysis tasks, i.e., patents
similarity comparison, patent clustering, and patent map generation are
discussed. A case study using patents related to Internet of Things (IoT)
technology is illustrated to demonstrate the performance and effectiveness of
FVSM. The proposed FVSM can be adopted by other patent analysis studies to
replace VSM, based on which various big data learning tasks can be performed
Collaborative recommendations with content-based filters for cultural activities via a scalable event distribution platform
Nowadays, most people have limited leisure time and the offer of (cultural) activities to spend this time is enormous. Consequently, picking the most appropriate events becomes increasingly difficult for end-users. This complexity of choice reinforces the necessity of filtering systems that assist users in finding and selecting relevant events. Whereas traditional filtering tools enable e.g. the use of keyword-based or filtered searches, innovative recommender systems draw on user ratings, preferences, and metadata describing the events. Existing collaborative recommendation techniques, developed for suggesting web-shop products or audio-visual content, have difficulties with sparse rating data and can not cope at all with event-specific restrictions like availability, time, and location. Moreover, aggregating, enriching, and distributing these events are additional requisites for an optimal communication channel. In this paper, we propose a highly-scalable event recommendation platform which considers event-specific characteristics. Personal suggestions are generated by an advanced collaborative filtering algorithm, which is more robust on sparse data by extending user profiles with presumable future consumptions. The events, which are described using an RDF/OWL representation of the EventsML-G2 standard, are categorized and enriched via smart indexing and open linked data sets. This metadata model enables additional content-based filters, which consider event-specific characteristics, on the recommendation list. The integration of these different functionalities is realized by a scalable and extendable bus architecture. Finally, focus group conversations were organized with external experts, cultural mediators, and potential end-users to evaluate the event distribution platform and investigate the possible added value of recommendations for cultural participation
Information Retrieval Models
Many applications that handle information on the internet would be completely\ud
inadequate without the support of information retrieval technology. How would\ud
we find information on the world wide web if there were no web search engines?\ud
How would we manage our email without spam filtering? Much of the development\ud
of information retrieval technology, such as web search engines and spam\ud
filters, requires a combination of experimentation and theory. Experimentation\ud
and rigorous empirical testing are needed to keep up with increasing volumes of\ud
web pages and emails. Furthermore, experimentation and constant adaptation\ud
of technology is needed in practice to counteract the effects of people that deliberately\ud
try to manipulate the technology, such as email spammers. However,\ud
if experimentation is not guided by theory, engineering becomes trial and error.\ud
New problems and challenges for information retrieval come up constantly.\ud
They cannot possibly be solved by trial and error alone. So, what is the theory\ud
of information retrieval?\ud
There is not one convincing answer to this question. There are many theories,\ud
here called formal models, and each model is helpful for the development of\ud
some information retrieval tools, but not so helpful for the development others.\ud
In order to understand information retrieval, it is essential to learn about these\ud
retrieval models. In this chapter, some of the most important retrieval models\ud
are gathered and explained in a tutorial style
Data Mining in Electronic Commerce
Modern business is rushing toward e-commerce. If the transition is done
properly, it enables better management, new services, lower transaction costs
and better customer relations. Success depends on skilled information
technologists, among whom are statisticians. This paper focuses on some of the
contributions that statisticians are making to help change the business world,
especially through the development and application of data mining methods. This
is a very large area, and the topics we cover are chosen to avoid overlap with
other papers in this special issue, as well as to respect the limitations of
our expertise. Inevitably, electronic commerce has raised and is raising fresh
research problems in a very wide range of statistical areas, and we try to
emphasize those challenges.Comment: Published at http://dx.doi.org/10.1214/088342306000000204 in the
Statistical Science (http://www.imstat.org/sts/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Techniques for effective and efficient fire detection from social media images
Social media could provide valuable information to support decision making in
crisis management, such as in accidents, explosions and fires. However, much of
the data from social media are images, which are uploaded in a rate that makes
it impossible for human beings to analyze them. Despite the many works on image
analysis, there are no fire detection studies on social media. To fill this
gap, we propose the use and evaluation of a broad set of content-based image
retrieval and classification techniques for fire detection. Our main
contributions are: (i) the development of the Fast-Fire Detection method
(FFDnR), which combines feature extractor and evaluation functions to support
instance-based learning, (ii) the construction of an annotated set of images
with ground-truth depicting fire occurrences -- the FlickrFire dataset, and
(iii) the evaluation of 36 efficient image descriptors for fire detection.
Using real data from Flickr, our results showed that FFDnR was able to achieve
a precision for fire detection comparable to that of human annotators.
Therefore, our work shall provide a solid basis for further developments on
monitoring images from social media.Comment: 12 pages, Proceedings of the International Conference on Enterprise
Information Systems. Specifically: Marcos Bedo, Gustavo Blanco, Willian
Oliveira, Mirela Cazzolato, Alceu Costa, Jose Rodrigues, Agma Traina, Caetano
Traina, 2015, Techniques for effective and efficient fire detection from
social media images, ICEIS, 34-4
Understanding Chat Messages for Sticker Recommendation in Messaging Apps
Stickers are popularly used in messaging apps such as Hike to visually
express a nuanced range of thoughts and utterances to convey exaggerated
emotions. However, discovering the right sticker from a large and ever
expanding pool of stickers while chatting can be cumbersome. In this paper, we
describe a system for recommending stickers in real time as the user is typing
based on the context of the conversation. We decompose the sticker
recommendation (SR) problem into two steps. First, we predict the message that
the user is likely to send in the chat. Second, we substitute the predicted
message with an appropriate sticker. Majority of Hike's messages are in the
form of text which is transliterated from users' native language to the Roman
script. This leads to numerous orthographic variations of the same message and
makes accurate message prediction challenging. To address this issue, we learn
dense representations of chat messages employing character level convolution
network in an unsupervised manner. We use them to cluster the messages that
have the same meaning. In the subsequent steps, we predict the message cluster
instead of the message. Our approach does not depend on human labelled data
(except for validation), leading to fully automatic updation and tuning
pipeline for the underlying models. We also propose a novel hybrid message
prediction model, which can run with low latency on low-end phones that have
severe computational limitations. Our described system has been deployed for
more than months and is being used by millions of users along with hundreds
of thousands of expressive stickers
- …