Search CORE

19,244 research outputs found

Cyber security situational awareness

Author: Tianfield Huaglory
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/05/2017
Field of study

A Survey to Fix the Threshold and Implementation for Detecting Duplicate Web Documents

Author: Bhimireddy Manojreddy
Gandi Krishna Pavan
Hicks Reuven
Veeramachaneni Bhargav Roy
Publication venue: OPUS Open Portal to University Scholarship
Publication date: 01/10/2015
Field of study

The drastic development in the information accessible on the World Wide Web has made the employment of automated tools to locate the information resources of interest, and for tracking and analyzing the same a certainty. Web Mining is the branch of data mining that deals with the analysis of World Wide Web. The concepts from various areas such as Data Mining, Internet technology and World Wide Web, and recently, Semantic Web can be said as the origin of web mining. Web mining can be defined as the procedure of determining hidden yet potentially beneficial knowledge from the data accessible in the web. Web mining comprise the sub areas: web content mining, web structure mining, and web usage mining. Web content mining is the process of mining knowledge from the web pages besides other web objects. The process of mining knowledge about the link structure linking web pages and some other web objects is defined as Web structure mining. Web usage mining is defined as the process of mining the usage patterns created by the users accessing the web pages. The search engine technology has led to the development of World Wide. The search engines are the chief gateways for access of information in the web. The ability to locate contents of particular interest amidst a huge heap has turned businesses beneficial and productive. The search engines respond to the queries by employing the process of web crawling that populates an indexed repository of web pages. The programs construct a confined repository of the segment of the web that they visit by navigating the web graph and retrieving pages. There are two main types of crawling, namely, Generic and Focused crawling. Generic crawlers crawls documents and links of diverse topics. Focused crawlers limit the number of pages with the aid of some prior obtained specialized knowledge. The systems that index, mine, and otherwise analyze pages (such as, the search engines) are provided with inputs from the repositories of web pages built by the web crawlers. The drastic development of the Internet and the growing necessity to incorporate heterogeneous data is accompanied by the issue of the existence of near duplicate data. Even if the near duplicate data don’t exhibit bit wise identical nature they are remarkably similar. The duplicate and near duplicate web pages either increase the index storage space or slow down or increase the serving costs which annoy the users, thus causing huge problems for the web search engines. Hence it is inevitable to design algorithms to detect such pages

Governors State University

Red River chloride remote sensing study

Author
Publication venue
Publication date
Field of study

Side looking radar, infrared thermal imagery and color photography, together with a few examples of black and white panoramic photos, are used to supplement information on the natural saline pollution problem that is hydrologically and geologically oriented. The study area was explored concurrently by ground methods and a reasonably good understanding of hydrogeological conditions has been achieved. Examples of the products acquired, their interpretation, and use techniques are included

NASA Technical Reports Server

Engineering Crowdsourced Stream Processing Systems

Author: Carlos Castillo
Crp Henri Tudor
Ioanna Lykourentzou
Muhammad Imran
Yannick Naudet
Publication venue
Publication date: 04/08/2014
Field of study

A crowdsourced stream processing system (CSP) is a system that incorporates crowdsourced tasks in the processing of a data stream. This can be seen as enabling crowdsourcing work to be applied on a sample of large-scale data at high speed, or equivalently, enabling stream processing to employ human intelligence. It also leads to a substantial expansion of the capabilities of data processing systems. Engineering a CSP system requires the combination of human and machine computation elements. From a general systems theory perspective, this means taking into account inherited as well as emerging properties from both these elements. In this paper, we position CSP systems within a broader taxonomy, outline a series of design principles and evaluation metrics, present an extensible framework for their design, and describe several design patterns. We showcase the capabilities of CSP systems by performing a case study that applies our proposed framework to the design and analysis of a real system (AIDR) that classifies social media messages during time-critical crisis events. Results show that compared to a pure stream processing system, AIDR can achieve a higher data classification accuracy, while compared to a pure crowdsourcing solution, the system makes better use of human workers by requiring much less manual work effort

arXiv.org e-Print Archive

CiteSeerX

Development of a biomarker for penconazole: a human oral dosing study and a survey of UK residents’ exposure

Author: Cherrie John W.
Cocker John
Galea Karen
Jones Kate
MacCalman Laura
Sams Craig
Teedon Paul
van Tongeren Martie
Publication venue: 'MDPI AG'
Publication date: 01/05/2016
Field of study

Penconazole is a widely used fungicide in the UK; however, to date, there have been no peer-reviewed publications reporting human metabolism, excretion or biological monitoring data. The objectives of this study were to i) develop a robust analytical method, ii) determine biomarker levels in volunteers exposed to penconazole, and, finally, to iii) measure the metabolites in samples collected as part of a large investigation of rural residents’ exposure. An LC-MS/MS method was developed for penconazole and two oxidative metabolites. Three volunteers received a single oral dose of 0.03 mg/kg body weight and timed urine samples were collected and analysed. The volunteer study demonstrated that both penconazole-OH and penconazole-COOH are excreted in humans following an oral dose and are viable biomarkers. Excretion is rapid with a half-life of less than four hours. Mean recovery of the administered dose was 47% (range 33%–54%) in urine treated with glucuronidase to hydrolyse any conjugates. The results from the residents’ study showed that levels of penconazole-COOH in this population were low with >80% below the limit of detection. Future sampling strategies that include both end of exposure and next day urine samples, as well as contextual data about the route and time of exposure, are recommended

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

The University of Manchester - Institutional Repository

ResearchOnline@GCU

Challenges Using the Linux Network Stack for Real-Time Communication

Author: Madden Michael M.
Publication venue
Publication date
Field of study

Starting in the early 2000s, human-in-the-loop (HITL) simulation groups at NASA and the Air Force Research Lab began using the Linux network stack for some real-time communication. More recently, SpaceX has adopted Ethernet as the primary bus technology for its Falcon launch vehicles and Dragon capsules. As the Linux network stack makes its way from ground facilities to flight critical systems, it is necessary to recognize that the network stack is optimized for communication over the open Internet, which cannot provide latency guarantees. The Internet protocols and their implementation in the Linux network stack contain numerous design decisions that favor throughput over determinism and latency. These decisions often require workarounds in the application or customization of the stack to maintain a high probability of low latency on closed networks, especially if the network must be fault tolerant to single event upsets

NASA Technical Reports Server