8,170 research outputs found
Template Mining for Information Extraction from Digital Documents
published or submitted for publicatio
Applying digital content management to support localisation
The retrieval and presentation of digital content such as that on the World Wide Web (WWW) is a substantial area of research. While recent years have seen huge expansion in the size of web-based archives that can be searched efficiently by commercial search engines, the presentation of potentially relevant content is still limited to ranked document lists represented by simple text snippets or image keyframe surrogates. There is expanding interest in techniques to personalise the presentation of content to improve the richness and effectiveness of the user experience. One of the most significant challenges to achieving this is the increasingly multilingual nature of this data, and the need to provide suitably localised responses to users based on this content. The Digital Content Management (DCM) track of the Centre for Next Generation Localisation (CNGL) is seeking to develop technologies to support advanced personalised access and presentation of information by combining elements from the existing research areas of Adaptive Hypermedia and Information Retrieval. The combination of these technologies is intended to produce significant improvements in the way users access information. We review key features of these technologies and introduce early ideas for how these technologies can support localisation and localised content before concluding with some impressions of future directions in DCM
Detection of Trending Topic Communities: Bridging Content Creators and Distributors
The rise of a trending topic on Twitter or Facebook leads to the temporal
emergence of a set of users currently interested in that topic. Given the
temporary nature of the links between these users, being able to dynamically
identify communities of users related to this trending topic would allow for a
rapid spread of information. Indeed, individual users inside a community might
receive recommendations of content generated by the other users, or the
community as a whole could receive group recommendations, with new content
related to that trending topic. In this paper, we tackle this challenge, by
identifying coherent topic-dependent user groups, linking those who generate
the content (creators) and those who spread this content, e.g., by
retweeting/reposting it (distributors). This is a novel problem on
group-to-group interactions in the context of recommender systems. Analysis on
real-world Twitter data compare our proposal with a baseline approach that
considers the retweeting activity, and validate it with standard metrics.
Results show the effectiveness of our approach to identify communities
interested in a topic where each includes content creators and content
distributors, facilitating users' interactions and the spread of new
information.Comment: 9 pages, 4 figures, 2 tables, Hypertext 2017 conferenc
Improving aircraft maintenance, repair, and overhaul: A novel text mining approach
Aircraft Maintenance, Repair and Overhaul (MRO) feedback commonly includes an engineerâs complex text-based inspection report. Capturing and normalizing the content of these textual descriptions is vital to cost and quality benchmarking, and provides information to facilitate continuous improvement of MRO process and analytics. As data analysis and mining tools requires highly normalized data, raw textual data is inadequate. This paper offers a textual-mining solution to efficiently analyse bulk textual feedback data.
Despite replacement of the same parts and/or sub-parts, the actual service cost for the same repair is often distinctly different from similar previously jobs. Regular expression algorithms were incorporated with an aircraft MRO glossary dictionary in order to help provide additional information concerning the reason for cost variation. Professional terms and conventions were included within the dictionary to avoid ambiguity and improve the outcome of the result. Testing results show that most descriptive inspection reports can be appropriately interpreted, allowing extraction of highly normalized data. This additional normalized data strongly supports data analysis and data mining, whilst also increasing the accuracy of future quotation costing. This solution has been effectively used by a large aircraft MRO agency with positive results
A rule dynamics approach to event detection in Twitter with its application to sports and politics
The increasing popularity of Twitter as social network tool for opinion expression as well as informa- tion retrieval has resulted in the need to derive computational means to detect and track relevant top- ics/events in the network. The application of topic detection and tracking methods to tweets enable users to extract newsworthy content from the vast and somehow chaotic Twitter stream. In this paper, we ap- ply our technique named Transaction-based Rule Change Mining to extract newsworthy hashtag keywords present in tweets from two different domains namely; sports (The English FA Cup 2012) and politics (US Presidential Elections 2012 and Super Tuesday 2012). Noting the peculiar nature of event dynamics in these two domains, we apply different time-windows and update rates to each of the datasets in order to study their impact on performance. The performance effectiveness results reveal that our approach is able to accurately detect and track newsworthy content. In addition, the results show that the adaptation of the time-window exhibits better performance especially on the sports dataset, which can be attributed to the usually shorter duration of football events
- âŠ