11,809 research outputs found

    A framework for the Comparative analysis of text summarization techniques

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceWe see that with the boom of information technology and IOT (Internet of things), the size of information which is basically data is increasing at an alarming rate. This information can always be harnessed and if channeled into the right direction, we can always find meaningful information. But the problem is this data is not always numerical and there would be problems where the data would be completely textual, and some meaning has to be derived from it. If one would have to go through these texts manually, it would take hours or even days to get a concise and meaningful information out of the text. This is where a need for an automatic summarizer arises easing manual intervention, reducing time and cost but at the same time retaining the key information held by these texts. In the recent years, new methods and approaches have been developed which would help us to do so. These approaches are implemented in lot of domains, for example, Search engines provide snippets as document previews, while news websites produce shortened descriptions of news subjects, usually as headlines, to make surfing easier. Broadly speaking, there are mainly two ways of text summarization – extractive and abstractive summarization. Extractive summarization is the approach in which important sections of the whole text are filtered out to form the condensed form of the text. While the abstractive summarization is the approach in which the text as a whole is interpreted and examined and after discerning the meaning of the text, sentences are generated by the model itself describing the important points in a concise way

    Improving Search Engine Results by Query Extension and Categorization

    Get PDF
    Since its emergence, the Internet has changed the way in which information is distributed and it has strongly influenced how people communicate. Nowadays, Web search engines are widely used to locate information on the Web, and online social networks have become pervasive platforms of communication. Retrieving relevant Web pages in response to a query is not an easy task for Web search engines due to the enormous corpus of data that the Web stores and the inherent ambiguity of search queries. We present two approaches to improve the effectiveness of Web search engines. The first approach allows us to retrieve more Web pages relevant to a user\u27s query by extending the query to include synonyms and other variations. The second, gives us the ability to retrieve Web pages that more precisely reflect the user\u27s intentions by filtering out those pages which are not related to the user-specified interests. Discovering communities in online social networks (OSNs) has attracted much attention in recent years. We introduce the concept of subject-driven communities and propose to discover such communities by modeling a community using a posting/commenting interaction graph which is relevant to a given subject of interest, and then applying link analysis on the interaction graph to locate the core members of a community

    Crossing the academic ocean? Judit Bar-Ilan's oeuvre on search engines studies

    Full text link
    [EN] The main objective of this work is to analyse the contributions of Judit Bar-Ilan to the search engines studies. To do this, two complementary approaches have been carried out. First, a systematic literature review of 47 publications authored and co-authored by Judit and devoted to this topic. Second, an interdisciplinarity analysis based on the cited references (publications cited by Judit) and citing documents (publications that cite Judit's work) through Scopus. The systematic literature review unravels an immense amount of search engines studied (43) and indicators measured (especially technical precision, overlap and fluctuation over time). In addition to this, an evolution over the years is detected from descriptive statistical studies towards empirical user studies, with a mixture of quantitative and qualitative methods. Otherwise, the interdisciplinary analysis evidences that a significant portion of Judit's oeuvre was intellectually founded on the computer sciences, achieving a significant, but not exclusively, impact on library and information sciences.Orduña-Malea, E. (2020). Crossing the academic ocean? Judit Bar-Ilan's oeuvre on search engines studies. Scientometrics. 123(3):1317-1340. https://doi.org/10.1007/s11192-020-03450-4S131713401233Bar-Ilan, J. (1998a). On the overlap, the precision and estimated recall of search engines. A case study of the query “Erdos”. Scientometrics,42(2), 207–228. https://doi.org/10.1007/bf02458356.Bar-Ilan, J. (1998b). The mathematician, Paul Erdos (1913–1996) in the eyes of the Internet. Scientometrics,43(2), 257–267. https://doi.org/10.1007/bf02458410.Bar-Ilan, J. (2000). The web as an information source on informetrics? A content analysis. Journal of the American Society for Information Science and Technology,51(5), 432–443. https://doi.org/10.1002/(sici)1097-4571(2000)51:5%3C432:aid-asi4%3E3.0.co;2-7.Bar-Ilan, J. (2001). Data collection methods on the web for informetric purposes: A review and analysis. Scientometrics,50(1), 7–32.Bar-Ilan, J. (2002). Methods for measuring search engine performance over time. Journal of the American Society for Information Science and Technology,53(4), 308–319. https://doi.org/10.1002/asi.10047.Bar-Ilan, J. (2003). Search engine results over time: A case study on search engine stability. Cybermetrics,2/3, 1–16.Bar-Ilan, J. (2005a). Expectations versus reality—Search engine features needed for Web research at mid 2005. Cybermetrics,9, 1–26.Bar-Ilan, J. (2005b). Expectations versus reality—Web search engines at the beginning of 2005. In Proceedings of ISSI 2005: 10th international conference of the international society for scientometrics and informetrics (Vol. 1, pp. 87–96).Bar-Ilan, J. (2010). The WIF of Peter Ingwersen’s website. In B. Larsen, J. W. Schneider, & F. Åström (Eds.), The Janus Faced Scholar a Festschrift in honour of Peter Ingwersen (pp. 119–121). Det Informationsvidenskabelige Akademi. Retrieved 15 January 15, 2020, from https://vbn.aau.dk/ws/portalfiles/portal/90357690/JanusFacedScholer_Festschrift_PeterIngwersen_2010.pdf#page=122.Bar-Ilan, J. (2018). Eugene Garfield on the web in 2001. Scientometrics,114(2), 389–399. https://doi.org/10.1007/s11192-017-2590-9.Bar-Ilan, J., Mat-Hassan, M., & Levene, M. (2006). Methods for comparing rankings of search engine results. Computer Networks,50(10), 1448–1463. https://doi.org/10.1016/j.comnet.2005.10.020.Thelwall, M. (2017). Judit Bar-Ilan: Information scientist, computer scientist, scientometrician. Scientometrics,113(3), 1235–1244. https://doi.org/10.1007/s11192-017-2551-3

    Guided generation of pedagogical concept maps from the Wikipedia

    Get PDF
    We propose a new method for guided generation of concept maps from open accessonline knowledge resources such as Wikies. Based on this method we have implemented aprototype extracting semantic relations from sentences surrounding hyperlinks in the Wikipedia’sarticles and letting a learner to create customized learning objects in real-time based oncollaborative recommendations considering her earlier knowledge. Open source modules enablepedagogically motivated exploration in Wiki spaces, corresponding to an intelligent tutoringsystem. The method extracted compact noun–verb–noun phrases, suggested for labeling arcsbetween nodes that were labeled with article titles. On average, 80 percent of these phrases wereuseful while their length was only 20 percent of the length of the original sentences. Experimentsindicate that even simple analysis algorithms can well support user-initiated information retrievaland building intuitive learning objects that follow the learner’s needs.Peer reviewe

    Mission Assurance: A Review of Continuity of Operations Guidance for Application to Cyber Incident Mission Impact Assessment (CIMIA)

    Get PDF
    Military organizations have embedded information technology (IT) into their core mission processes as a means to increase operational efficiency, improve decision-making quality, and shorten the sensor-to-shooter cycle. This IT-to-mission dependence can place the organizational mission at risk when an information incident (e.g., the loss or manipulation of a critical information resource) occurs. Non-military organizations typically address this type of IT risk through an introspective, enterprise-wide focused risk management program that continuously identifies, prioritizes, and documents risks so an economical set of control measures (e.g., people, processes, technology) can be selected to mitigate the risks to an acceptable level. The explicit valuation of information resources in terms of their ability to support the organizational mission objectives provides transparency and enables the creation of a continuity of operations plan and an incident recovery plan. While this type of planning has proven successful in static environments, military missions often involve dynamically changing, time-sensitive, complex, coordinated operations involving multiple organizational entities. As a consequence, risk mitigation efforts tend to be localized to each organizational entity making the enterprise-wide risk management approach to mission assurance infeasible. This thesis investigates the concept of mission assurance and presents a content analysis of existing continuity of operations elements within military and non-military guidance to assess the current policy landscape to highlight best practices and identify policy gaps in an effort to further enhance mission assurance by improving the timeliness and relevance of notification following an information incident

    Supplier Ranking System and Its Effect on the Reliability of the Supply Chain

    Get PDF
    Today, due to the growing use of social media and an increase in the number of A HITS with a solution in PageRank (Massimo, 2011) sharing their opinions globally, customers can review products and services in many novel ways. However, since most reviewers lack in-depth technical knowledge, the true picture concerning product quality remains unclear. Furthermore, although product defects may come from the supplier side, making it responsible for repair cost, it is ultimately the manufacturer whose name is damaged when such defects are revealed. In this context, we need to revisit the cost vs. quality equations. Observations of customer behavior towards brand name and reputation suggest that, contrary to the currently dominant model in production where manufacturers are expected to control only Tier 1 supplier and make it responsible for all higher tiers, manufacturers should also have a better hold on the entire supply chain. Said differently, while the current system considers all parts in Tier 1 as equally important, it underestimates the importance of the impact of each piece on the final product. Another flaw of the current system is that, by commonizing the pieces in several different products, such as different care models of the same manufacturer to reduce the cost, only the supplier of the most common parts will be considered essential and thus get the most attention during quality control. To address the aforementioned concerns, in the present study, we created a parts/supplier ranking algorithm and implemented it into our supply chain system. Upon ranking all suppliers and parts, we calculated the minimum number of the elements, from Tier 1 to Tier 4, that have to be checked in our supply chain. In doing so, we prioritized keeping the cost as low as possible with most inferior possible defects

    Understanding the Cloud Computing Ecosystem: Results from a Quantitative Content Analysis

    Get PDF
    An increasing number of companies make use of CloudComputing services in order to reduce costs and increaseflexibility of their IT infrastructure. This has enlivened a debateon the benefits and risks of Cloud Computing, among bothpractitioners and researchers. This study applies quantitativecontent analysis to explore the Cloud Computing ecosystem. Theanalyzed data comprises high quality research articles andpractitioner-oriented articles from magazines and web sites. Weapply n-grams and the cluster algorithm k-means to analyze theliterature. The contribution of this paper is twofold: First, itidentifies the key terms and topics that are part of the CloudComputing ecosystem which we aggregated to a comprehensivemodel. Second, this paper discloses the sentiments of key topicsas reflected in articles from both practice and academia
    • …
    corecore