Search CORE

80,442 research outputs found

Mining the Web for Lexical Knowledge to Improve Keyphrase Extraction: Learning from Labeled and Unlabeled Data.

Author: Turney Peter
Publication venue
Publication date: 01/01/2001
Field of study

A journal article is often accompanied by a list of keyphrases, composed of about five to fifteen important words and phrases that capture the articles main topics. Keyphrases are useful for a variety of purposes, including summarizing, indexing, labeling, categorizing, clustering, highlighting, browsing, and searching. The task of automatic keyphrase extraction is to select keyphrases from within the text of a given document. Automatic keyphrase extraction makes it feasible to generate keyphrases for the huge number of documents that do not have manually assigned keyphrases. Good performance on this task has been obtained by approaching it as a supervised learning problem. An input document is treated as a set of candidate phrases that must be classified as either keyphrases or non-keyphrases. To classify a candidate phrase as a keyphrase, the most important features (attributes) appear to be the frequency and location of the candidate phrase in the document. Recent work has demonstrated that it is also useful to know the frequency of the candidate phrase as a manually assigned keyphrase for other documents in the same domain as the given document (e.g., the domain of computer science). Unfortunately, this keyphrase-frequency feature is domain-specific (the learning process must be repeated for each new domain) and training-intensive (good performance requires a relatively large number of training documents in the given domain, with manually assigned keyphrases). The aim of the work described here is to remove these limitations. In this paper, I introduce new features that are conceptually related to keyphrase-frequency and I present experiments that show that the new features result in improved keyphrase extraction, although they are neither domain-specific nor training-intensive. The new features are generated by issuing queries to a Web search engine, based on the candidate phrases in the input document. The feature values are calculated from the number of hits for the queries (the number of matching Web pages). In essence, these new features are derived by mining lexical knowledge from a very large collection of unlabeled data, consisting of approximately 350 million Web pages without manually assigned keyphrases

arXiv.org e-Print Archive

CiteSeerX

NRC Publications Archive

CogPrints Cognitive Sciences Eprint Archive

Object Discovery From a Single Unlabeled Image by Mining Frequent Itemset With Multi-scale Features

Author: Guan Qingji
Huang Yaping
Ling Haibin
Pu Mengyang
Zhang Jian
Zhang Runsheng
Zou Qi
Publication venue
Publication date: 08/08/2020
Field of study

TThe goal of our work is to discover dominant objects in a very general setting where only a single unlabeled image is given. This is far more challenge than typical co-localization or weakly-supervised localization tasks. To tackle this problem, we propose a simple but effective pattern mining-based method, called Object Location Mining (OLM), which exploits the advantages of data mining and feature representation of pre-trained convolutional neural networks (CNNs). Specifically, we first convert the feature maps from a pre-trained CNN model into a set of transactions, and then discovers frequent patterns from transaction database through pattern mining techniques. We observe that those discovered patterns, i.e., co-occurrence highlighted regions, typically hold appearance and spatial consistency. Motivated by this observation, we can easily discover and localize possible objects by merging relevant meaningful patterns. Extensive experiments on a variety of benchmarks demonstrate that OLM achieves competitive localization performance compared with the state-of-the-art methods. We also evaluate our approach compared with unsupervised saliency detection methods and achieves competitive results on seven benchmark datasets. Moreover, we conduct experiments on fine-grained classification to show that our proposed method can locate the entire object and parts accurately, which can benefit to improving the classification results significantly

arXiv.org e-Print Archive

Influence of Strip-Mining on the Mortality of a Wetland Caddisfly, \u3ci\u3eLimnephilus Indivisus\u3c/i\u3e (Trichoptera: Limnephilidae).

Author: Foote B. A
Usis J. D
Publication venue: ValpoScholar
Publication date: 27/11/2017
Field of study

A coal mine about 2.2 km upstream from Stillfork Swamp Nature Preserve, Carroll Co., Ohio was suspected of causing a reduction in Limnephilus indivisus caddisflies in the south half of the preserve. Second instar L. indivisus larvae collected from the south half of the preserve and from two control areas were reared in cages at the site of collection and at the other two sites in a replicated experiment. Elevated total dissolved solids in water samples from within rearing enclosures displayed strong correlation (r2 = 0.864) with increased mortality when compared to larvae reared in unaffected areas. This investigation suggests that larvae of L. indivisus are useful in biomonitoring of wetlands impacted by acid-mine drainage, and potentially other perturbations

Valparaiso University

Analysis and evaluation of fragment size distributions in rock blasting at the Erdenet Mine

Author: Dondov Erdenebaatar
Дондов Эрдэнэбаатар
Publication venue
Publication date: 01/08/2015
Field of study

Master's Project (M.S.) University of Alaska Fairbanks, 2015Rock blasting is one of the most important operations in mining. It significantly affects the subsequent comminution processes and, therefore, is critical to successful mining productions. In this study, for the evaluation of the blasting performance at the Erdenet Mine, we analyzed rock fragment size distributions with the digital image processing method. The uniformities of rock fragments and the mean fragment sizes were determined and applied in the Kuz-Ram model. Statistical prediction models were also developed based on the field measured parameters. The results were compared with the Kuz-Ram model predictions and the digital image processing measurements. A total of twenty-eight images from eleven blasting patterns were processed, and rock size distributions were determined by Split-Desktop program in this study. Based on the rock mass and explosive properties and the blasting parameters, the rock fragment size distributions were also determined with the Kuz-Ram model and compared with the measurements by digital image processing. Furthermore, in order to improve the prediction of rock fragment size distributions at the mine, regression analyses were conducted and statistical models w ere developed for the estimation of the uniformity and characteristic size. The results indicated that there were discrepancies between the digital image measurements and those estimated by the Kuz-Ram model. The uniformity indices of image processing measurements varied from 0.76 to 1.90, while those estimate by the Kuz-Ram model were from 1.07 to 1.13. The mean fragment size of the Kuz-Ram model prediction was 97.59% greater than the mean fragment size of the image processing. The multivariate nonlinear regression analyses conducted in this study indicated that rock uniaxial compressive strength and elastic modulus, explosive energy input in the blasting, bench height to burden ratio and blast area per hole were significant predictor variables in determining the fragment characteristic size and the uniformity index. The regression models developed based on the above predictor variables showed much closer agreement with the measurements

ScholarWorks@UA

Remotely sensed mid-channel bar dynamics in downstream of the Three Gorges Dam, China

Author: Shao Guofan
Wen Zhaofei
Wu Shengjun
Yang Hong
Zhang Ce
Publication venue: 'MDPI AG'
Publication date: 28/01/2020
Field of study

The downstream reach of the Three Gorges Dam (TGD) along the Yangtze River (1560 km) hosts numerous mid-channel bars (MCBs). MCBs dynamics are crucial to the river’s hydrological processes and local ecological function. However, a systematic understanding of such dynamics and their linkage to TGD remains largely unknown. Using Landsat-image-extracted MCBs and several spatial-temporal analysis methods, this study presents a comprehensive understanding of MCB dynamics in terms of number, area, and shape, over downstream of TGD during the period 1985−2018. On average, a total of 140 MCBs were detected and grouped into four types representing small ( 2 km2), middle (2 km2 − 7 km2), large (7 km2 − 33 km2) and extra-large size (>33 km2) MCBs, respectively. MCBs number decreased after TGD closure but most of these happened in the lower reach. The area of total MCBs experienced an increasing trend (2.77 km2/yr, p-value 0.01) over the last three decades. The extra-large MCBs gained the largest area increasing rate than the other sizes of MCBs. Small MCBs tended to become relatively round, whereas the others became elongate in shape after TGD operation. Impacts of TGD operation generally diminished in the longitudinal direction from TGD to Hankou and from TGD to Jiujiang for shape and area dynamics, respectively. The quantified longitudinal and temporal dynamics of MCBs across the entire Yangtze River downstream of TGD provides a crucial monitoring basis for continuous investigation of the changing mechanisms affecting the morphology of the Yangtze River system

Multidisciplinary Digital Publishing Institute

Central Archive at the University of Reading

Lancaster E-Prints

Explore Bristol Research

Exploring time diaries using semi-automated activity pattern extraction

Author: Kajsa Ellegård
Katerina Vrotsou
Matthew Cooper
Publication venue
Publication date
Field of study

Identifying patterns of activities in time diaries in order to understand the variety of daily life in terms of combinations of activities performed by individuals in different groups is of interest in time use research. So far, activity patterns have mostly been identified by visually inspecting representations of activity data or by using sequence comparison methods, such as sequence alignment, in order to cluster similar data and then extract representative patterns from these clusters. Both these methods are sensitive to data size, pure visual methods become too cluttered and sequence comparison methods become too time consuming. Furthermore, the patterns identified by both methods represent mostly general trends of activity in a population, while detail and unexpected features hidden in the data are often never revealed. We have implemented an algorithm that searches the time diaries and automatically extracts all activity patterns meeting user-defined criteria of what constitutes a valid pattern of interest for the user’s research question. Amongst the many criteria which can be applied are a time window containing the pattern, minimum and maximum occurrences of the pattern, and number of people that perform it. The extracted activity patterns can then be interactively filtered, visualized and analyzed to reveal interesting insights. Exploration of the results of each pattern search may result in new hypotheses which can be subsequently explored by altering the search criteria. To demonstrate the value of the presented approach we consider and discuss sequential activity patterns at a population level, from a single day perspective.Time-geography, diaries, everyday life, activity patterns, visualization, data mining, sequential pattern mining

Research Papers in Economics