1,174 research outputs found

    A web content mining application for detecting relevant pages using Jaccard similarity

    Get PDF
    The tremendous growth in the availability of enormous text data from a variety of sources creates a slew of concerns and obstacles to discovering meaningful information. This advancement of technology in the digital realm has resulted in the dispersion of texts over millions of web sites. Unstructured texts are densely packed with textual information. The discovery of valuable and intriguing relationships in unstructured texts demands more computer processing. So, text mining has developed into an attractive area of study for obtaining organized and useful data. One of the purposes of this research is to discuss text pre-processing of automobile marketing domains in order to create a structured database. Regular expressions were used to extract data from unstructured vehicle advertisements, resulting in a well-organized database. We manually develop unique rule-based ways of extracting structured data from unstructured web pages. As a result of the information retrieved from these advertisements, a systematic search for certain noteworthy qualities is performed. There are numerous approaches for query recommendation, and it is vital to understand which one should be employed. Additionally, this research attempts to determine the optimal value similarity for query suggestions based on user-supplied parameters by comparing MySQL pattern matching and Jaccard similarity

    Engineering an Open Web Syndication Interchange with Discovery and Recommender Capabilities

    Get PDF
    Web syndication has become a popular means of delivering relevant information to people online but the complexity of standards, algorithms and applications pose considerable challenges to engineers.  This paper describes the design and development of a novel Web-based syndication intermediary called InterSynd and a simple Web client as a proof of concept. We developed format-neutral middleware that sits between content sources and the user. Additional objectives were to add feed discovery and recommendation components to the intermediary. A search-based feed discovery module helps users find relevant feed sources. Implicit collaborative recommendations of new feeds are also made to the user. The syndication software built uses open standard XML technologies and the free open source libraries. Extensibility and re-configurability were explicit goals. The experience shows that a modular architecture can combine open source modules to build state-of-the-art syndication middleware and applications. The data produced by software metrics indicate the high degree of modularity retained

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    User needs in television archive access:Acquiring knowledge necessary for system design

    Get PDF
    This paper presents a methodical approach for generating deep knowledge about users, as a prerequisite for design and construction of digital information access to cultural heritage information objects. We exemplify this methodical approach by reporting on an explorative study of information need characteristics in a television broadcast context. The methodical approach is inspired by naturalistic research, and our main data is nine in-depth interviews conducted with scholars and students within the academic field of Media Studies. The analysis identifies four characteristics. Firstly, broadcasts are needed as objects of analysis in empirical research. Secondly, the needs are related to three broadcast dimensions: 1) Transmission; 2) Archive; and 3) Reception. Thirdly, four fundamental types of information needs are verified in a television broadcast context: 1) Known item; 2) Factual data; 3) Known topic or content; and 4) Muddled topic or content. Fourthly, the interviewees’ needs consist of four phases: 1) Getting an overview of transmitted broadcasts; 2) Identification of borderline exemplars; 3) Selection of specific programmes; and 4) Verification of facts. The present paper presents novel research on characteristics of information needs in a television broadcast context. We demonstrate how one may go about generating knowledge which is imperative for the design and construction of future broadcast retrieval systems

    Information-seeking on the Web with Trusted Social Networks - from Theory to Systems

    Get PDF
    This research investigates how synergies between the Web and social networks can enhance the process of obtaining relevant and trustworthy information. A review of literature on personalised search, social search, recommender systems, social networks and trust propagation reveals limitations of existing technology in areas such as relevance, collaboration, task-adaptivity and trust. In response to these limitations I present a Web-based approach to information-seeking using social networks. This approach takes a source-centric perspective on the information-seeking process, aiming to identify trustworthy sources of relevant information from within the user's social network. An empirical study of source-selection decisions in information- and recommendation-seeking identified five factors that influence the choice of source, and its perceived trustworthiness. The priority given to each of these factors was found to vary according to the criticality and subjectivity of the task. A series of algorithms have been developed that operationalise three of these factors (expertise, experience, affinity) and generate from various data sources a number of trust metrics for use in social network-based information seeking. The most significant of these data sources is Revyu.com, a reviewing and rating Web site implemented as part of this research, that takes input from regular users and makes it available on the Semantic Web for easy re-use by the implemented algorithms. Output of the algorithms is used in Hoonoh.com, a Semantic Web-based system that has been developed to support users in identifying relevant and trustworthy information sources within their social networks. Evaluation of this system's ability to predict source selections showed more promising results for the experience factor than for expertise or affinity. This may be attributed to the greater demands these two factors place in terms of input data. Limitations of the work and opportunities for future research are discussed

    Subject Searching in Online Catalogs: Metaknowledge Used by Experienced Searchers

    Get PDF
    This paper begins to identify and characterize the knowledge used by experienced librarians while searching for subject information in online catalogs. Ten experienced librarians performed the same set of six subject searches in an online catalog. Investigated was the knowledge used to solve retrieval problems. This knowledge represents expertise in the use of the catalog. Data were collected through the use of think-aloud protocols, transaction logs, and structured interviews. Knowledge was defined as knowledge of objects (factual knowledge), knowledge of events (experiential knowledge), knowledge of performance (process knowledge), and metaknowledge. Metaknowledge is the sense of whole derived from the integration of factual, process, and experiential knowledge about the search and the conditions under which it is performed. The focus of this paper is on metaknowledge. For evidence of metaknowledge the data were examined for explanations that participants gave for their actions and observations, and for ways that participants evaluated their own progress during the process of searching. Reasons and explanations given by searchers were related to all phases of the library information retrieval process from the user's receipt of material to policies for collection development, and not just events directly related to the performance of a particular search task

    Analysis of community question‐answering issues via machine learning and deep learning: State‐of‐the‐art review

    Get PDF
    Over the last couple of decades, community question-answering sites (CQAs) have been a topic of much academic interest. Scholars have often leveraged traditional machine learning (ML) and deep learning (DL) to explore the ever-growing volume of content that CQAs engender. To clarify the current state of the CQA literature that has used ML and DL, this paper reports a systematic literature review. The goal is to summarise and synthesise the major themes of CQA research related to (i) questions, (ii) answers and (iii) users. The final review included 133 articles. Dominant research themes include question quality, answer quality, and expert identification. In terms of dataset, some of the most widely studied platforms include Yahoo! Answers, Stack Exchange and Stack Overflow. The scope of most articles was confined to just one platform with few cross-platform investigations. Articles with ML outnumber those with DL. Nonetheless, the use of DL in CQA research is on an upward trajectory. A number of research directions are proposed

    Bibliography versus Auto-Bibliography: Tackling the Transformation of Traditions in the Research Process

    Get PDF
    Ms. Babb reports on a study conducted to determine whether researchers will identify the same works recommended by scholarly bibliographies if their searching is limited to the confines of the library catalog and its subject headings. She explores how the auto-bibliography of the catalog compares to more traditionally compiled bibliographies, and what—if anything—is sacrificed when users rely upon auto-bibliography rather than scholarly bibliography
    • 

    corecore