2,083 research outputs found

    An Efficient Approach for Finding Near Duplicate Web pages using Minimum Weight Overlapping Method

    Get PDF
    The existence of billions of web data has severely affected the performance and reliability of web search. The presence of near duplicate web pages plays an important role in this performance degradation while integrating data from heterogeneous sources. Web mining faces huge problems due to the existence of such documents. These pages increase the index storage space and thereby increase the serving cost. By introducing efficient methods to detect and remove such documents from the Web not only decreases the computation time but also increases the relevancy of search results. We aim a novel idea for finding near duplicate web pages which can be incorporated in the field of plagiarism detection, spam detection and focused web crawling scenarios. Here we propose an efficient method for finding near duplicates of an input web page, from a huge repository. A TDW matrix based algorithm is proposed with three phases, rendering, filtering and verification, which receives an input web page and a threshold in its first phase, prefix filtering and positional filtering to reduce the size of record set in the second phase and returns an optimal set of near duplicate web pages in the verification phase by using Minimum Weight Overlapping (MWO) method. The experimental results show that our algorithm outperforms in terms of two benchmark measures, precision and recall, and a reduction in the size of competing record set.DOI:http://dx.doi.org/10.11591/ijece.v1i2.7

    An Approach for Optimal Feature Subset Selection using a New Term Weighting Scheme and Mutual Information

    Get PDF
    With the development of the web, large numbers of documents are available on the Internet and they are growing drastically day by day. Hence automatic text categorization becomes more and more important for dealing with massive data. However the major problem of document categorization is the high dimensionality of feature space.  The measures to decrease the feature dimension under not decreasing recognition effect are called the problems of feature optimum extraction or selection. Dealing with reduced relevant feature set can be more efficient and effective. The objective of feature selection is to find a subset of features that have all characteristics of the full features set. Instead Dependency among features is also important for classification. During past years, various metrics have been proposed to measure the dependency among different features. A popular approach to realize dependency is maximal relevance feature selection: selecting the features with the highest relevance to the target class. A new feature weighting scheme, we proposed have got a tremendous improvements in dimensionality reduction of the feature space. The experimental results clearly show that this integrated method works far better than the others

    Balltracking: an highly efficient method for tracking flow fields

    Get PDF
    We present a method for tracking solar photospheric flows that is highly efficient, and demonstrate it using high resolution MDI continuum images. The method involves making a surface from the photospheric granulation data, and allowing many small floating tracers or balls to be moved around by the evolving granulation pattern. The results are tested against synthesised granulation with known flow fields and compared to the results produced by Local Correlation tracking (LCT). The results from this new method have similar accuracy to those produced by LCT. We also investigate the maximum spatial and temporal resolution of the velocity field that it is possible to extract, based on the statistical properties of the granulation data. We conclude that both methods produce results that are close to the maximum resolution possible from granulation data. The code runs very significantly faster than our similarly optimised LCT code, making real time applications on large data sets possible. The tracking method is not limited to photospheric flows, and will also work on any velocity field where there are visible moving features of known scale length

    Radiative forcing of climate

    Get PDF

    THE EFFECT OF INDIVIDUALISED COACHING INTERVENTIONS ON ELITE YOUNG FAST BOWLERS‘ TECHNIQUE

    Get PDF
    Fast bowling in cricket is an activity well recognised as having a high injury prevalence. Previous research has associated lower back injury with aspects of fast bowling technique. Coaching interventions that may decrease the likelihood of injury, whilst maintaining or increasing ball speed, remain a priority within the sport. Selected kinematics of the bowling action of 14 elite young fast bowlers were measured using an 18 camera Vicon Motion Analysis System. Subjects were tested before and after a two year coaching intervention period, during which subject-specific coaching interventions were provided. Mann-Whitney tests were used to identify significant differences in the change in the selected kinematics between those bowlers who were coached or un-coached on each specific aspect. Coached athletes demonstrated a significant change in shoulder alignment at back foot contact (more side-on, P = 0.002) and shoulder counter-rotation (decreased, P = 0.001) relative to un-coached athletes. There was no difference in the amount of change in flexion angles of the front or back knee or lower trunk side-flexion between those who received coaching intervention and those that did not. This study shows that specific aspects of fast bowling technique in elite players can change over a two year period and may be attributed to coaching intervention
    • …
    corecore