3,032 research outputs found
BlogForever D2.6: Data Extraction Methodology
This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform
Recommended from our members
Spatio-temporal patterns of human mobility from geo-social networks for urban computing: Analysis, models & applications
The availability of rich information about fine-grained user mobility in urban environments from increasingly geographically-aware social networking services and the rapid development of machine learning applications greatly facilitate the investigation of urban issues. In this setting, urban computing emerges intending to tackle a variety of challenges faced by cities nowadays and to offer promising approaches to improving our living environment. Leveraging massive amounts of data from geo-social networks with unprecedented richness, we show how to devise novel algorithmic techniques to reveal underlying urban mobility patterns for better policy-making and more efficient mobile applications in this dissertation.
Building upon the foundation of existing research efforts in urban computing field and basic machine learning techniques, in this dissertation, we propose a general framework of urban computing with geo-social network data and develop novel algorithms tailored for three urban computing tasks. We begin by exploring how the transition data recording human movements between urban venues from geo-social networks can be aggregated and utilised to detect spatio-temporal changes of local graphs in urban areas. We further explore how this can be used as a proxy to track and predict socio-economic deprivation changes as government financial effort is put in developing areas by supervised machine learning methods. We then study how to extract latent patterns from collective user-venue interactions with the help of a spatio-temporal aware topic modeling approach for the benefit of urban
infrastructure planning. After that, we propose a model to detect the gap between user-side demand and venue-side supply levels for certain types of services in urban environments to suggest further policymaking and investment optimisation. Finally, we address a mobility prediction task, the application aim of which is to recommend new places to explore in the city for mobile users. To this end, we develop a deep learning framework that integrates memory network and topic modeling techniques. Extensive experiments indicate that the proposed architecture can enhance the prediction performance in various recommendation scenarios with high interpretability.
All in all, the insights drawn and the techniques developed in this dissertation make a substantial step in addressing issues in cities and open the door to future possibilities in the promising urban computing area
์์ ๋คํธ์ํฌ์ ์ด์ปค๋จธ์ค ํ๋ซํผ์์์ ์ ์ฌ ๋คํธ์ํฌ ๋ง์ด๋
ํ์๋
ผ๋ฌธ(๋ฐ์ฌ) -- ์์ธ๋ํ๊ต๋ํ์ : ๊ณต๊ณผ๋ํ ์ปดํจํฐ๊ณตํ๋ถ, 2023. 2. ๊ถํ๊ฒฝ.์น ๊ธฐ๋ฐ ์๋น์ค์ ํญ๋ฐ์ ์ธ ๋ฐ๋ฌ๋ก ์ฌ์ฉ์๋ค์ ์จ๋ผ์ธ ์์์ ํญ๋๊ฒ ์ฐ๊ฒฐ๋๊ณ ์๋ค. ์จ๋ผ์ธ ํ๋ซํผ ์์์, ์ฌ์ฉ์๋ค์ ์๋ก์๊ฒ ์ํฅ์ ์ฃผ๊ณ ๋ฐ์ผ๋ฉฐ ์์ฌ ๊ฒฐ์ ์ ๊ทธ๋ค์ ๊ฒฝํ๊ณผ ์๊ฒฌ์ ๋ฐ์ํ๋ ๊ฒฝํฅ์ ๋ณด์ธ๋ค. ๋ณธ ํ์ ๋
ผ๋ฌธ์์๋ ๋ํ์ ์ธ ์จ๋ผ์ธ ํ๋ซํผ์ธ ์์
๋คํธ์ํฌ ์๋น์ค์ ์ด์ปค๋จธ์ค ํ๋ซํผ์์์ ์ฌ์ฉ์ ํ๋์ ๋ํด ์ฐ๊ตฌํ์๋ค.
์จ๋ผ์ธ ํ๋ซํผ์์์ ์ฌ์ฉ์ ํ๋์ ์ฌ์ฉ์์ ํ๋ซํผ ๊ตฌ์ฑ ์์ ๊ฐ์ ๊ด๊ณ๋ก ํํํ ์ ์๋ค. ์ฌ์ฉ์์ ๊ตฌ๋งค๋ ์ฌ์ฉ์์ ์ํ ๊ฐ์ ๊ด๊ณ๋ก, ์ฌ์ฉ์์ ์ฒดํฌ์ธ์ ์ฌ์ฉ์์ ์ฅ์ ๊ฐ์ ๊ด๊ณ๋ก ๋ํ๋ด์ง๋ค. ์ฌ๊ธฐ์ ํ๋์ ์๊ฐ๊ณผ ๋ ์ดํ
, ํ๊ทธ ๋ฑ์ ์ ๋ณด๊ฐ ํฌํจ๋ ์ ์๋ค.
๋ณธ ์ฐ๊ตฌ์์๋ ๋ ํ๋ซํผ์์ ์ ์๋ ์ฌ์ฉ์์ ํ๋ ๊ทธ๋ํ์ ์ํฅ์ ๋ฏธ์น๋ ์ ์ฌ ๋คํธ์ํฌ๋ฅผ ํ์
ํ๋ ์ฐ๊ตฌ๋ฅผ ์ ์ํ๋ค. ์์น ๊ธฐ๋ฐ์ ์์
๋คํธ์ํฌ ์๋น์ค์ ๊ฒฝ์ฐ ํน์ ์ฅ์์ ๋ฐฉ๋ฌธํ๋ ์ฒดํฌ์ธ ํ์์ผ๋ก ๋ง์ ํฌ์คํธ๊ฐ ๋ง๋ค์ด์ง๋๋ฐ, ์ฌ์ฉ์์ ์ฅ์ ๋ฐฉ๋ฌธ์ ์ฌ์ฉ์ ๊ฐ์ ์ฌ์ ์ ์กด์ฌํ๋ ์น๊ตฌ ๊ด๊ณ์ ์ํด ์ํฅ์ ํฌ๊ฒ ๋ฐ๋๋ค. ์ฌ์ฉ์ ํ๋ ๋คํธ์ํฌ์ ์ ๋ณ์ ์ ์ฌ๋ ์ฌ์ฉ์ ๊ฐ์ ๊ด๊ณ๋ฅผ ํ์
ํ๋ ๊ฒ์ ํ๋ ์์ธก์ ๋์์ด ๋ ์ ์์ผ๋ฉฐ, ์ด๋ฅผ ์ํด ๋ณธ ๋
ผ๋ฌธ์์๋ ๋น์ง๋ํ์ต ๊ธฐ๋ฐ์ผ๋ก ํ๋ ๋คํธ์ํฌ๋ก๋ถํฐ ์ฌ์ฉ์ ๊ฐ ์ฌํ์ ๊ด๊ณ๋ฅผ ์ถ์ถํ๋ ์ฐ๊ตฌ๋ฅผ ์ ์ํ์๋ค.
๊ธฐ์กด์ ์ฐ๊ตฌ๋์๋ ๋ฐฉ๋ฒ๋ค์ ๋ ์ฌ์ฉ์๊ฐ ๋์์ ๋ฐฉ๋ฌธํ๋ ํ์์ธ co-visitation์ ์ค์ ์ ์ผ๋ก ๊ณ ๋ คํ์ฌ ์ฌ์ฉ์ ๊ฐ์ ๊ด๊ณ๋ฅผ ์์ธกํ๊ฑฐ๋, ๋คํธ์ํฌ ์๋ฒ ๋ฉ ๋๋ ๊ทธ๋ํ ์ ๊ฒฝ๋ง(GNN)์ ์ฌ์ฉํ์ฌ ํํ ํ์ต์ ์ํํ์๋ค. ๊ทธ๋ฌ๋ ์ด๋ฌํ ์ ๊ทผ ๋ฐฉ์์ ์ฃผ๊ธฐ์ ์ธ ๋ฐฉ๋ฌธ์ด๋ ์ฅ๊ฑฐ๋ฆฌ ์ด๋ ๋ฑ์ผ๋ก ๋ํ๋๋ ์ฌ์ฉ์์ ํ๋ ํจํด์ ์ ํฌ์ฐฉํ์ง ๋ชปํ๋ค. ํ๋ ํจํด์ ๋ ์ ํ์ตํ๊ธฐ ์ํด, ANES๋ ์ฌ์ฉ์ ์ปจํ
์คํธ ๋ด์์ ์ฌ์ฉ์์ ๊ด์ฌ ์ง์ (POI) ๊ฐ์ ์ธก๋ฉด(Aspect) ์งํฅ ๊ด๊ณ๋ฅผ ํ์ตํ๋ค. ANES๋ User-POI ์ด๋ถ ๊ทธ๋ํ์ ๊ตฌ์กฐ์์ ์ฌ์ฉ์์ ํ๋์ ์ฌ๋ฌ ๊ฐ์ ์ธก๋ฉด์ผ๋ก ๋๋๊ณ , ๊ฐ๊ฐ์ ๊ด๊ณ๋ฅผ ๊ณ ๋ คํ์ฌ ํ๋ ํจํด์ ์ถ์ถํ๋ ์ต์ด์ ๋น์ง๋ํ์ต ๊ธฐ๋ฐ ์ ๊ทผ ๋ฐฉ์์ด๋ค. ์ค์ LBSN ๋ฐ์ดํฐ์์ ์ํ๋ ๊ด๋ฒ์ํ ์คํ์์, ANES๋ ๊ธฐ์กด์ ์ ์๋์๋ ๊ธฐ๋ฒ๋ค๋ณด๋ค ๋์ ์ฑ๋ฅ์ ๋ณด์ฌ์ค๋ค.
์์น ๊ธฐ๋ฐ ์์
๋คํธ์ํฌ์๋ ๋ค๋ฅด๊ฒ, ์ด์ปค๋จธ์ค์ ๋ฆฌ๋ทฐ ์์คํ
์์๋ ์ฌ์ฉ์๋ค์ด ๋ฅ๋์ ์ธ ํ๋ก์ฐ/ํ๋ก์ ๋ฑ์ ํ์๋ฅผ ์ํํ์ง ์๊ณ ๋ ํ๋ซํผ์ ์ํด ์๋ก์ ์ ๋ณด๋ฅผ ์ฃผ๊ณ ๋ฐ๊ณ ์ํฅ๋ ฅ์ ํ์ฌํ๊ฒ ๋๋ค. ์ด์ ๊ฐ์ ์ฌ์ฉ์๋ค์ ํ๋ ํน์ฑ์ ๋ฆฌ๋ทฐ ์คํธ์ ์ํด ์ฝ๊ฒ ์
์ฉ๋ ์ ์๋ค. ๋ฆฌ๋ทฐ ์คํธ์ ์ค์ ์ฌ์ฉ์์ ์๊ฒฌ์ ์จ๊ธฐ๊ณ ํ์ ์ ์กฐ์ํ์ฌ ์๋ชป๋ ์ ๋ณด๋ฅผ ์ ๋ฌํ๋ ๋ฐฉ์์ผ๋ก ์ด๋ฃจ์ด์ง๋ค. ๋๋ ์ด๋ฅผ ํด๊ฒฐํ๊ธฐ ์ํด ์ฌ์ฉ์ ๋ฆฌ๋ทฐ ๋ฐ์ดํฐ์์ ์ฌ์ฉ์ ๊ฐ ์ฌ์ ๊ณต๋ชจ์ฑ(Collusiveness)์ ๊ฐ๋ฅ์ฑ์ ์ฐพ๊ณ , ์ด๋ฅผ ์คํธ ํ์ง์ ํ์ฉํ ๋ฐฉ๋ฒ์ธ SC-Com์ ์ ์ํ๋ค. SC-Com์ ํ๋์ ๊ณต๋ชจ์ฑ์ผ๋ก๋ถํฐ ์ฌ์ฉ์ ๊ฐ ๊ณต๋ชจ ์ ์๋ฅผ ๊ณ์ฐํ๊ณ ํด๋น ์ ์๋ฅผ ๋ฐํ์ผ๋ก ์ ์ฒด ์ฌ์ฉ์๋ฅผ ์ ์ฌํ ์ฌ์ฉ์๋ค์ ์ปค๋ฎค๋ํฐ๋ก ๋ถ๋ฅํ๋ค. ๊ทธ ํ ์คํธ ์ ์ ์ ์ผ๋ฐ ์ ์ ๋ฅผ ๊ตฌ๋ณํ๋ ๋ฐ์ ์ค์ํ ๊ทธ๋ํ ๊ธฐ๋ฐ์ ํน์ง์ ์ถ์ถํ์ฌ ๊ฐ๋
ํ์ต ๊ธฐ๋ฐ์ ๋ถ๋ฅ๊ธฐ์ ์
๋ ฅ ๋ฐ์ดํฐ๋ก ํ์ฉํ๋ ๋ฐฉ๋ฒ์ ์ ์ํ๋ค. SC-Com์ ๊ณต๋ชจ์ฑ์ ๊ฐ๋ ์คํธ ์ ์ ์ ์งํฉ์ ํจ๊ณผ์ ์ผ๋ก ํ์งํ๋ค. ์ค์ ๋ฐ์ดํฐ์
์ ์ด์ฉํ ์คํ์์, SC-Com์ ๊ธฐ์กด ๋
ผ๋ฌธ๋ค ๋๋น ์คํธ ํ์ง์ ๋ฐ์ด๋ ์ฑ๋ฅ์ ๋ณด์ฌ์ฃผ์๋ค.
์ ๋
ผ๋ฌธ์์ ๋ค์ํ ๋ฐ์ดํฐ์ ๋ํด ์ฐ๊ตฌ๋ ์์์ ์ฐ๊ฒฐ๋ง ํ์ง ๋ชจ๋ธ์ ๋ ์ด๋ธ์ด ์๋ ๋ฐ์ดํฐ์ ๋ํด์๋ ์ฌ์ ์ ์ฐ๊ฒฐ๋์์ ๊ฐ๋ฅ์ฑ์ด ๋์ ์ฌ์ฉ์๋ค์ ์์ธกํ๋ฏ๋ก, ์ค์๊ฐ ์์น ๋ฐ์ดํฐ๋, ์ฑ ์ฌ์ฉ ๋ฐ์ดํฐ ๋ฑ์ ๋ค์ํ ๋ฐ์ดํฐ์์ ํ์ฉํ ์ ์๋ ์ ์ฉํ ์ ๋ณด๋ฅผ ์ ๊ณตํ์ฌ ๊ด๊ณ ์ถ์ฒ ์์คํ
์ด๋, ์
์ฑ ์ ์ ํ์ง ๋ฑ์ ๋ถ์ผ์์ ๊ธฐ์ฌํ ์ ์์ ๊ฒ์ผ๋ก ๊ธฐ๋ํ๋ค.Following the exploding usage on online services, people are connected with each other more broadly and widely. In online platforms, people influence each other, and have tendency to reflect their opinions in decision-making. Social Network Services (SNSs) and E-commerce are typical example of online platforms.
User behaviors in online platforms can be defined as relation between user and platform components. A user's purchase is a relationship between a user and a product, and a user's check-in is a relationship between a user and a place. Here, information such as action time, rating, tag, etc. may be included. In many studies, platform user behavior is represented in graph form. At this time, the elements constituting the nodes of the graph are composed of objects such as users and products and places within the platform, and the interaction between the platform elements and the user can be expressed as two nodes being connected.
In this study, I present studies to identify potential networks that affect the user's behavior graph defined on the two platforms.
In ANES, I focus on representation learning for social link inference based on user trajectory data. While traditional methods predict relations between users by considering hand-crafted features, recent studies first perform representation learning using network/node embedding or graph neural networks (GNNs) for downstream tasks such as node classification and link prediction. However, those approaches fail to capture behavioral patterns of individuals ingrained in periodical visits or long-distance movements. To better learn behavioral patterns, this paper proposes a novel scheme called ANES (Aspect-oriented Network Embedding for Social link inference). ANES learns aspect-oriented relations between users and Point-of-Interests (POIs) within their contexts. ANES is the first approach that extracts the complex behavioral pattern of users from both trajectory data and the structure of User-POI bipartite graphs. Extensive experiments on several real-world datasets show that ANES outperforms state-of-the-art baselines.
In contrast to active social networks, people are connected to other users regardless of their intentions in some platforms, such as online shopping websites and restaurant review sites. They do not have any information about each other in advance, and they only have a common point which is that they have visited or have planned to visit same place or purchase a product. Interestingly, users have tendency to be influenced by the review data on their purchase intentions.
Unfortunately, this instinct is easily exploited by opinion spammers. In SC-Com, I focus on opinion spam detection in online shopping services. In many cases, my decision-making process is closely related to online reviews. However, there have been threats of opinion spams by hired reviewers increasingly, which aim to mislead potential customers by hiding genuine consumers opinions. Opinion spams should be filed up collectively to falsify true information. Fortunately, I propose the way to spot the possibility to detect them from their collusiveness. In this paper, I propose SC-Com, an optimized collusive community detection framework. It constructs the graph of reviewers from the collusiveness of behavior and divides a graph by communities based on their mutual suspiciousness. After that, I extract community-based and temporal abnormality features which are critical to discriminate spammers from other genuine users. I show that my method detects collusive opinion spam reviewers effectively and precisely from their collective behavioral patterns. In the real-world dataset, my approach showed prominent performance while only considering primary data such as time and ratings.
These implicit network inference models studied on various data in this thesis predicts users who are likely to be pre-connected to unlabeled data, so it is expected to contribute to areas such as advertising recommendation systems and malicious user detection by providing useful information.Chapter 1 Introduction 1
Chapter 2 Social link Inference in Location-based check-in data 5
2.1 Background 5
2.2 Related Work 12
2.3 Location-based Social Network Service Data 15
2.4 Aspect-wise Graph Decomposition 18
2.5 Aspect-wise Graph learning 19
2.6 Inferring Social Relation from User Representation 21
2.7 Performance Analysis 23
2.8 Discussion and Implications 26
2.9 Summary 34
Chapter 3 Detecting collusiveness from reviews in Online platforms and its application 35
3.1 Background 35
3.2 Related Work 39
3.3 Online Review Data 43
3.4 Collusive Graph Projection 44
3.5 Reviewer Community Detection 47
3.6 Review Community feature extraction and spammer detection 51
3.7 Performance Analysis 53
3.8 Discussion and Implications 55
3.9 Summary 62
Chapter 4 Conclusion 63๋ฐ
Histopathological image analysis : a review
Over the past decade, dramatic increases in computational power and improvement in image analysis algorithms have allowed the development of powerful computer-assisted analytical approaches to radiological data. With the recent advent of whole slide digital scanners, tissue histopathology slides can now be digitized and stored in digital image form. Consequently, digitized tissue histopathology has now become amenable to the application of computerized image analysis and machine learning techniques. Analogous to the role of computer-assisted diagnosis (CAD) algorithms in medical imaging to complement the opinion of a radiologist, CAD algorithms have begun to be developed for disease detection, diagnosis, and prognosis prediction to complement the opinion of the pathologist. In this paper, we review the recent state of the art CAD technology for digitized histopathology. This paper also briefly describes the development and application of novel image analysis technology for a few specific histopathology related problems being pursued in the United States and Europe
Text-based Sentiment Analysis and Music Emotion Recognition
Nowadays, with the expansion of social media, large amounts of user-generated
texts like tweets, blog posts or product reviews are shared online. Sentiment polarity
analysis of such texts has become highly attractive and is utilized in recommender
systems, market predictions, business intelligence and more. We also witness deep
learning techniques becoming top performers on those types of tasks. There are
however several problems that need to be solved for efficient use of deep neural
networks on text mining and text polarity analysis.
First of all, deep neural networks are data hungry. They need to be fed with
datasets that are big in size, cleaned and preprocessed as well as properly labeled.
Second, the modern natural language processing concept of word embeddings as a
dense and distributed text feature representation solves sparsity and dimensionality
problems of the traditional bag-of-words model. Still, there are various uncertainties
regarding the use of word vectors: should they be generated from the same dataset
that is used to train the model or it is better to source them from big and popular
collections that work as generic text feature representations? Third, it is not easy for
practitioners to find a simple and highly effective deep learning setup for various
document lengths and types. Recurrent neural networks are weak with longer texts
and optimal convolution-pooling combinations are not easily conceived. It is thus
convenient to have generic neural network architectures that are effective and can
adapt to various texts, encapsulating much of design complexity.
This thesis addresses the above problems to provide methodological and practical
insights for utilizing neural networks on sentiment analysis of texts and achieving
state of the art results. Regarding the first problem, the effectiveness of various
crowdsourcing alternatives is explored and two medium-sized and emotion-labeled
song datasets are created utilizing social tags. One of the research interests of Telecom
Italia was the exploration of relations between music emotional stimulation and
driving style. Consequently, a context-aware music recommender system that aims
to enhance driving comfort and safety was also designed. To address the second
problem, a series of experiments with large text collections of various contents and
domains were conducted. Word embeddings of different parameters were exercised
and results revealed that their quality is influenced (mostly but not only) by the
size of texts they were created from. When working with small text datasets, it is
thus important to source word features from popular and generic word embedding
collections. Regarding the third problem, a series of experiments involving convolutional
and max-pooling neural layers were conducted. Various patterns relating
text properties and network parameters with optimal classification accuracy were
observed. Combining convolutions of words, bigrams, and trigrams with regional
max-pooling layers in a couple of stacks produced the best results. The derived
architecture achieves competitive performance on sentiment polarity analysis of
movie, business and product reviews.
Given that labeled data are becoming the bottleneck of the current deep learning
systems, a future research direction could be the exploration of various data programming
possibilities for constructing even bigger labeled datasets. Investigation
of feature-level or decision-level ensemble techniques in the context of deep neural
networks could also be fruitful. Different feature types do usually represent complementary
characteristics of data. Combining word embedding and traditional text
features or utilizing recurrent networks on document splits and then aggregating the
predictions could further increase prediction accuracy of such models
- โฆ