107 research outputs found

    Implementation and Web Mounting of the WebOMiner_S Recommendation System

    Get PDF
    The ability to quickly extract information from a large amount of heterogeneous data available on the web from various Business to Consumer (B2C) or Ecommerce stores selling similar products (such as Laptops) for comparative querying and knowledge discovery remains a challenge because different web sites have different structures for their web data and web data are unstructured. For example: Find out the best and cheapest deal for Dell Laptop comparing BestBuy.ca and Amazon.com based on the following specification: Model: Inspiron 15 series, ram: 16gb, processor: i5, Hdd: 1 TB. The “WebOMiner” and “WebOMiner_S” systems perform automatic extraction by first parsing web html source code into a document object model (DOM) tree before using some pattern mining techniques to discover heterogeneous data types (e.g. text, image, links, lists) so that product schemas are extracted and stored in a back-end data warehouse for querying and recommendation. Although a web interface application of this system needs to be developed to make it accessible for to all users on the web.This thesis proposes a Web Recommendation System through Graphical User Interface, which is mounted readily on the web and is accessible to all users. It also performs integration of the web data consisting of all the product features such as Product model name, product description, market price subject to the retailer, etc. retained from the extraction process. Implementation is done using “Java server pages (JSP)” as the GUI designed in HTML, CSS, JavaScript and the framework used for this application is “Spring framework” which forms a bridge between the GUI and the data warehouse. SQL database is implemented to store the extracted product schemas for further integration, querying and knowledge discovery. All the technologies used are compatible with UNIX system for hosting the required application

    Measuring for privacy: From tracking to cloaking

    Get PDF
    We rely on various types of online services to access information for different uses, and often provide sensitive information during the interactions with these services. These online services are of different types; e.g. commercial websites (e.g., banking, education, news, shopping, dating, social media), essential websites (e.g., government). Online services are available through websites as well as mobile apps. The growth of web sites, mobile devices and apps that run on those devices, have resulted in the proliferation of online services. This whole ecosystem of online services had created an environment where everyone using it are being tracked. Several past studies have performed privacy measurements to assess the prevalence of tracking in online services. Most of these studies used institutional (i.e., non-residential) resources for their measurements, and lacked global perspective. Tracking on online services and its impact to privacy may differ at various locations. Therefore, to fill in this gap, we perform a privacy measurement study of popular commercial websites, using residential networks from various locations. Unlike commercial online services, there are different categories (e.g., government, hospital, religion) of essential online services where users do not expect to be tracked. The users of these essential online services often use information of extreme personal and sensitive in nature (e.g., social insurance number, health information, prayer requests/confessions made to a religious minister) when interacting with those services. However, contrary to the expectations of users, these essential services include user tracking capabilities. We built frameworks to perform privacy measurements of these online services (include both web sites and Android apps) that are of different types (i.e., governments, hospitals and religious services in jurisdictions around the world). The instrumented tracking metrics (i.e., stateless, stateful, session replaying) from the privacy measurements of these online services are then analyzed. Malicious sites (e.g., phishing) mimic online services to deceive users, causing them harm. We found 80% of analyzed malicious sites are cloaked, and not blocked by search engine crawlers. Therefore, sensitive information collected from users through these sites is exposed. In addition, underlying Internet-connected infrastructure (e.g., networked devices such as routers, modems) used by online users, can suffer from security issues due to nonuse of TLS or use of weak SSL/TLS certificates. Such security issues (e.g., spying on a CCTV camera) can compromise data integrity, confidentiality and user privacy. Overall, we found tracking on commercial websites differ based on the location of corresponding residential users. We also observed widespread use of tracking by commercial trackers, and session replay services that expose sensitive information from essential online services. Sensitive information are also exposed due to vulnerabilities in online services (e.g., Cross Site Scripting). Furthermore, a significant proportion of malicious sites evade detection by security/search engine crawlers, which may make such sites readily available to users. We also detect weaknesses in the TLS ecosystem of Internet-connected infrastructure that supports running these online services. These observations require more research on privacy of online services, as well as information exposure from malicious online services, to understand the significance of privacy issues, and to adopt appropriate mitigation strategies

    Scalable and Declarative Information Extraction in a Parallel Data Analytics System

    Get PDF
    Informationsextraktions (IE) auf sehr großen Datenmengen erfordert hochkomplexe, skalierbare und anpassungsfĂ€hige Systeme. Obwohl zahlreiche IE-Algorithmen existieren, ist die nahtlose und erweiterbare Kombination dieser Werkzeuge in einem skalierbaren System immer noch eine große Herausforderung. In dieser Arbeit wird ein anfragebasiertes IE-System fĂŒr eine parallelen Datenanalyseplattform vorgestellt, das fĂŒr konkrete AnwendungsdomĂ€nen konfigurierbar ist und fĂŒr Textsammlungen im Terabyte-Bereich skaliert. ZunĂ€chst werden konfigurierbare Operatoren fĂŒr grundlegende IE- und Web-Analytics-Aufgaben definiert, mit denen komplexe IE-Aufgaben in Form von deklarativen Anfragen ausgedrĂŒckt werden können. Alle Operatoren werden hinsichtlich ihrer Eigenschaften charakterisiert um das Potenzial und die Bedeutung der Optimierung nicht-relationaler, benutzerdefinierter Operatoren (UDFs) fĂŒr Data Flows hervorzuheben. Anschließend wird der Stand der Technik in der Optimierung nicht-relationaler Data Flows untersucht und herausgearbeitet, dass eine umfassende Optimierung von UDFs immer noch eine Herausforderung ist. Darauf aufbauend wird ein erweiterbarer, logischer Optimierer (SOFA) vorgestellt, der die Semantik von UDFs mit in die Optimierung mit einbezieht. SOFA analysiert eine kompakte Menge von Operator-Eigenschaften und kombiniert eine automatisierte Analyse mit manuellen UDF-Annotationen, um die umfassende Optimierung von Data Flows zu ermöglichen. SOFA ist in der Lage, beliebige Data Flows aus unterschiedlichen Anwendungsbereichen logisch zu optimieren, was zu erheblichen Laufzeitverbesserungen im Vergleich mit anderen Techniken fĂŒhrt. Als Viertes wird die Anwendbarkeit des vorgestellten Systems auf Korpora im Terabyte-Bereich untersucht und systematisch die Skalierbarkeit und Robustheit der eingesetzten Methoden und Werkzeuge beurteilt um schließlich die kritischsten Herausforderungen beim Aufbau eines IE-Systems fĂŒr sehr große Datenmenge zu charakterisieren.Information extraction (IE) on very large data sets requires highly complex, scalable, and adaptive systems. Although numerous IE algorithms exist, their seamless and extensible combination in a scalable system still is a major challenge. This work presents a query-based IE system for a parallel data analysis platform, which is configurable for specific application domains and scales for terabyte-sized text collections. First, configurable operators are defined for basic IE and Web Analytics tasks, which can be used to express complex IE tasks in the form of declarative queries. All operators are characterized in terms of their properties to highlight the potential and importance of optimizing non-relational, user-defined operators (UDFs) for dataflows. Subsequently, we survey the state of the art in optimizing non-relational dataflows and highlight that a comprehensive optimization of UDFs is still a challenge. Based on this observation, an extensible, logical optimizer (SOFA) is introduced, which incorporates the semantics of UDFs into the optimization process. SOFA analyzes a compact set of operator properties and combines automated analysis with manual UDF annotations to enable a comprehensive optimization of data flows. SOFA is able to logically optimize arbitrary data flows from different application areas, resulting in significant runtime improvements compared to other techniques. Finally, the applicability of the presented system to terabyte-sized corpora is investigated. Hereby, we systematically evaluate scalability and robustness of the employed methods and tools in order to pinpoint the most critical challenges in building an IE system for very large data sets

    The Journal of Mine Action Issue 5.1 (2001)

    Get PDF
    The Journal of Mine Action Issue 5.1 Landmines in Asia and the Pacifi

    Theory and Applications for Advanced Text Mining

    Get PDF
    Due to the growth of computer technologies and web technologies, we can easily collect and store large amounts of text data. We can believe that the data include useful knowledge. Text mining techniques have been studied aggressively in order to extract the knowledge from the data since late 1990s. Even if many important techniques have been developed, the text mining research field continues to expand for the needs arising from various application fields. This book is composed of 9 chapters introducing advanced text mining techniques. They are various techniques from relation extraction to under or less resourced language. I believe that this book will give new knowledge in the text mining field and help many readers open their new research fields

    Opening up the fuzzy front-end phase of service innovation

    Get PDF
    The “fuzzy front-end” (FFE) of innovation begins when an opportunity is first considered worthy of further ideation, exploration, and assessment and ends when a firm decides to invest in or to terminate the idea (Khurana & Rosenthal, 1998). Since such an early phase is often characterised as being highly uncertain and unstructured, scholars have suggested that uncertainty must be reduced as much as possible during the FFE to achieve success in innovation (Frishammar et al., 2011; Moenaert et al., 1995; Verworn, 2009; Verworn et al., 2008). Although openness has been proposed as crucial to innovation success (Chesbrough, 2003; Chesbrough et al., 2006), little effort has been put into studying its role in reducing uncertainty in the FFE of service innovation. To address this gap, the current study aims to examine the effect of “openness competence” within the FFE – i.e., the ability of a FFE team to explore, gather and assimilate operant resources from external sources by means of external searches and inter-organisational partnerships – on the success of service innovation. It will also identify the key dimensions of openness competence.This mixed methods study is comprised of two main phases. In the first phase, we interviewed 12 informants who participated in the FFE of 6 distinctive online service innovations. The data were analysed through a services-dominant (S-D) logic analytical lens. The case findings together with the extant literature were used to develop a formative second-order construct of openness competence, and to form a series of hypotheses concerning an “open service innovation” (OSI) model. In the second phase, a total of 122 valid survey responses were collected and analysed using a partial least square structural equation modelling (PLS-SEM) technique with the aim of validating the proposed OSI model.The key findings of this study include the four dimensions of openness competence within the FFE, namely: searching capability, coordination capability, collective mind and absorptive capacity. A FFE team’s IT capability was identified as an antecedent of openness competence. Further, we found that openness competence is positively associated with the amount of market and technical uncertainty being reduced during the FFE. Contrary to our expectations, the impact of openness competence on service innovation success is direct, rather than being mediated by the degree of uncertainty reduction. These findings offer several implications for research on open innovation and on the FFE. Additionally, by identifying the key dimensions of openness competence, the current study provides guidance to front-end managers as well as presenting new areas for future research

    Behind the search box: the political economy of a global Internet industry

    Get PDF
    With the rapid proliferation of the Web, the search engine constituted an increasingly vital tool in everyday life, and offered technical capabilities that might have lent themselves under different circumstances to a sweeping democratization of information provision and access. Instead the search function was transformed into the most profitable large-scale global information industry. This dissertation examines the evolution of search engine technologies within the context of the commercialization and commodification of the Internet. Grounded in critical political economy, the research details how capital has progressively shifted information search activities further into the market, transforming them into sites of profit-making and poles of capitalist growth. It applies historical and political economic analysis by resorting to an extensive array of sources including trade journals, government documents, industry reports, and financial and business newspapers. The first chapter situates the development of the search engine within the wider political economy of the Internet industry. The second shows how the technology of search was reorganized to enable profitable accumulation. The third and fourth chapters focus on another primary concern of political economy: the labor structures and labor processes that typify this emergent industry. These pivot around familiar compulsions: profit-maximization and management control. The search industry is famous for the almost incredible perks it affords to a select group of highly paid, highly skilled engineers and managers. However, the same industry also relies not only a large number of low-wage workers but also an unprecedented mass of unwaged labor. Google and other search engines also have found means of re-constructing the practices of a seemingly bygone industrial era of labor control: corporate paternalism and scientific management. Today, the search engine industry sits at the “magnetic north pole” of economic growth – the Internet. This vital function of search is controlled disproportionately by US digital capital, mainly Google. US dominance in search seems to carry forward the existing, deeply unbalanced, international information order; however, this US-led industry actually faces jarring oppositions within a changing and conflicted global political economy. Chapter Five investigates two of the most important and contested zones: China, whose economic growth has been unsurpassed throughout the entire period spanned by this study of the search engine’s development, and which has nurtured a highly successful domestic Internet industry, including a search engine company, Baidu; and Europe, US’s long-time ally, where units of capital both European and non-European are struggling with one another. By situating search within these contexts, this chapter sheds light on the ongoing reconfiguration of international information services, and on the geopolitical-economic conflicts that are altering the dynamics of information-intensive transnational capitalism. There is a well-developed critical scholarship in political economy that foregrounds the role of information in contemporary capitalist development. This dissertation contributes to and expands this research by looking at search to uncover the capital logics that undergird and shape contemporary information provision

    Denial of Service in Web-Domains: Building Defenses Against Next-Generation Attack Behavior

    Get PDF
    The existing state-of-the-art in the field of application layer Distributed Denial of Service (DDoS) protection is generally designed, and thus effective, only for static web domains. To the best of our knowledge, our work is the first that studies the problem of application layer DDoS defense in web domains of dynamic content and organization, and for next-generation bot behaviour. In the first part of this thesis, we focus on the following research tasks: 1) we identify the main weaknesses of the existing application-layer anti-DDoS solutions as proposed in research literature and in the industry, 2) we obtain a comprehensive picture of the current-day as well as the next-generation application-layer attack behaviour and 3) we propose novel techniques, based on a multidisciplinary approach that combines offline machine learning algorithms and statistical analysis, for detection of suspicious web visitors in static web domains. Then, in the second part of the thesis, we propose and evaluate a novel anti-DDoS system that detects a broad range of application-layer DDoS attacks, both in static and dynamic web domains, through the use of advanced techniques of data mining. The key advantage of our system relative to other systems that resort to the use of challenge-response tests (such as CAPTCHAs) in combating malicious bots is that our system minimizes the number of these tests that are presented to valid human visitors while succeeding in preventing most malicious attackers from accessing the web site. The results of the experimental evaluation of the proposed system demonstrate effective detection of current and future variants of application layer DDoS attacks

    A framework for the dynamic management of Peer-to-Peer overlays

    Get PDF
    Peer-to-Peer (P2P) applications have been associated with inefficient operation, interference with other network services and large operational costs for network providers. This thesis presents a framework which can help ISPs address these issues by means of intelligent management of peer behaviour. The proposed approach involves limited control of P2P overlays without interfering with the fundamental characteristics of peer autonomy and decentralised operation. At the core of the management framework lays the Active Virtual Peer (AVP). Essentially intelligent peers operated by the network providers, the AVPs interact with the overlay from within, minimising redundant or inefficient traffic, enhancing overlay stability and facilitating the efficient and balanced use of available peer and network resources. They offer an “insider‟s” view of the overlay and permit the management of P2P functions in a compatible and non-intrusive manner. AVPs can support multiple P2P protocols and coordinate to perform functions collectively. To account for the multi-faceted nature of P2P applications and allow the incorporation of modern techniques and protocols as they appear, the framework is based on a modular architecture. Core modules for overlay control and transit traffic minimisation are presented. Towards the latter, a number of suitable P2P content caching strategies are proposed. Using a purpose-built P2P network simulator and small-scale experiments, it is demonstrated that the introduction of AVPs inside the network can significantly reduce inter-AS traffic, minimise costly multi-hop flows, increase overlay stability and load-balancing and offer improved peer transfer performance
    • 

    corecore