6,920 research outputs found
United we fall, divided we stand: A study of query segmentation and PRF for patent prior art search
Previous research in patent search has shown that reducing queries by extracting a few key terms is ineffective primarily because of the vocabulary mismatch between patent applications used as queries and existing patent documents. This ïŹnding has led to the use of full patent applications as queries in patent prior art search. In addition, standard information retrieval (IR) techniques such as query expansion (QE) do not work effectively with patent queries, principally because of the presence of noise terms in the massive queries. In this study, we take a new approach to QE for patent search. Text segmentation is used to decompose a patent query into selfcoherent sub-topic blocks. Each of these much shorted sub-topic blocks which is representative of a speciïŹc aspect or facet of the invention, is then used as a query to retrieve documents. Documents retrieved using the different resulting sub-queries or query streams are interleaved to construct a ïŹnal ranked list. This technique can exploit the potential beneïŹt of QE since the segmented
queries are generally more focused and less ambiguous than the full patent query. Experiments on the CLEF-2010 IP prior-art search task show that the proposed method outperforms the retrieval effectiveness achieved when using a single full patent application text as the query, and also demonstrates the potential beneïŹts of QE to alleviate the vocabulary mismatch problem in patent search
Chemoinformatics Research at the University of Sheffield: A History and Citation Analysis
This paper reviews the work of the Chemoinformatics Research Group in the Department of Information Studies at the University of Sheffield, focusing particularly on the work carried out in the period 1985-2002. Four major research areas are discussed, these involving the development of methods for: substructure searching in databases of three-dimensional structures, including both rigid and flexible molecules; the representation and searching of the Markush structures that occur in chemical patents; similarity searching in databases of both two-dimensional and three-dimensional structures; and compound selection and the design of combinatorial libraries. An analysis of citations to 321 publications from the Group shows that it attracted a total of 3725 residual citations during the period 1980-2002. These citations appeared in 411 different journals, and involved 910 different citing organizations from 54 different countries, thus demonstrating the widespread impact of the Group's work
Node Classification in Uncertain Graphs
In many real applications that use and analyze networked data, the links in
the network graph may be erroneous, or derived from probabilistic techniques.
In such cases, the node classification problem can be challenging, since the
unreliability of the links may affect the final results of the classification
process. If the information about link reliability is not used explicitly, the
classification accuracy in the underlying network may be affected adversely. In
this paper, we focus on situations that require the analysis of the uncertainty
that is present in the graph structure. We study the novel problem of node
classification in uncertain graphs, by treating uncertainty as a first-class
citizen. We propose two techniques based on a Bayes model and automatic
parameter selection, and show that the incorporation of uncertainty in the
classification process as a first-class citizen is beneficial. We
experimentally evaluate the proposed approach using different real data sets,
and study the behavior of the algorithms under different conditions. The
results demonstrate the effectiveness and efficiency of our approach
Query refinement for patent prior art search
A patent is a contract between the inventor and the state, granting a limited time period to the inventor to exploit his invention. In exchange, the inventor must put a detailed description of his invention in the public domain. Patents can encourage innovation and economic growth but at the time of economic crisis patents can hamper such growth. The long duration of the application process is a big obstacle that needs to be addressed to maximize the benefit of patents on innovation and economy. This time can be significantly improved by changing the way we search the patent and non-patent literature.Despite the recent advancement of general information retrieval and the revolution of Web Search engines, there is still a huge gap between the emerging technologies from the research labs and adapted by major Internet search engines, and the systems which are in use by the patent search communities.In this thesis we investigate the problem of patent prior art search in patent retrieval with the goal of finding documents which describe the idea of a query patent. A query patent is a full patent application composed of hundreds of terms which does not represent a single focused information need. Other relevance evidences (e.g. classification tags, and bibliographical data) provide additional details about the underlying information need of the query patent. The first goal of this thesis is to estimate a uni-gram query model from the textual fields of a query patent. We then improve the initial query representation using noun phrases extracted from the query patent. We show that expansion in a query-dependent manner is useful.The second contribution of this thesis is to address the term mismatch problem from a query formulation point of view by integrating multiple relevance evidences associated with the query patent. To do this, we enhance the initial representation of the query with the term distribution of the community of inventors related to the topic of the query patent. We then build a lexicon using classification tags and show that query expansion using this lexicon and considering proximity information (between query and expansion terms) can improve the retrieval performance. We perform an empirical evaluation of our proposed models on two patent datasets. The experimental results show that our proposed models can achieve significantly better results than the baseline and other enhanced models
Health Biotechnology Innovation for Social Sustainability -A Perspective from China
China is not only becoming a significant player in the production of high-tech products, but also an increasingly important contributor of ideas and influence in the global knowledge economy. This paper identifies the promises and the pathologies of the biotech innovation system from the perspective of social sustainability in China, looking at the governance of the system and beyond. Based on The STEPS Centreâs âInnovation, Sustainability, Development: A New Manifestoâ, a â3Dâ approach has been adopted, bringing together social, technological and policy dynamics, and focusing on the directions of biotechnological innovation, the distribution of its benefits, costs and risks and the diversity of innovations evolving within it and alongside it
Automatic Learning of A Supervised Classifier for Patent Prior Art Retrieval
Prior art retrieval is the process of determining a set of possibly relevant prior arts for a specific patent or patent application. Such process is essential for various patent practices, e.g. patentability search, validity search, and infringement search. To support the automatic retrieval of prior arts, existing studies generally adopt the traditional information retrieval (IR) approach or extend the IR approach by incorporating additional information such as citations, classes of patents. Those approaches only exploit partial information of patents and thus may limit the performance of prior art retrieval. In response, we propose a novel approach which employs comprehensive information of patents and performs a supervised approach for prior art retrieval. Unlike traditional supervised learning approach which requires manual preparation of a set of positive and negative training examples, the proposed supervised technique includes a simple but effective mechanism for automatic generation of training examples. Our empirical evaluation on a large dataset consisted of 52,311 semiconductor-related patents indicates that the proposed supervised technique significantly outperforms the traditional full-text-based IR approach
Improving average ranking precision in user searches for biomedical research datasets
Availability of research datasets is keystone for health and life science
study reproducibility and scientific progress. Due to the heterogeneity and
complexity of these data, a main challenge to be overcome by research data
management systems is to provide users with the best answers for their search
queries. In the context of the 2016 bioCADDIE Dataset Retrieval Challenge, we
investigate a novel ranking pipeline to improve the search of datasets used in
biomedical experiments. Our system comprises a query expansion model based on
word embeddings, a similarity measure algorithm that takes into consideration
the relevance of the query terms, and a dataset categorisation method that
boosts the rank of datasets matching query constraints. The system was
evaluated using a corpus with 800k datasets and 21 annotated user queries. Our
system provides competitive results when compared to the other challenge
participants. In the official run, it achieved the highest infAP among the
participants, being +22.3% higher than the median infAP of the participant's
best submissions. Overall, it is ranked at top 2 if an aggregated metric using
the best official measures per participant is considered. The query expansion
method showed positive impact on the system's performance increasing our
baseline up to +5.0% and +3.4% for the infAP and infNDCG metrics, respectively.
Our similarity measure algorithm seems to be robust, in particular compared to
Divergence From Randomness framework, having smaller performance variations
under different training conditions. Finally, the result categorization did not
have significant impact on the system's performance. We believe that our
solution could be used to enhance biomedical dataset management systems. In
particular, the use of data driven query expansion methods could be an
alternative to the complexity of biomedical terminologies
The Essential Facilities Doctrine Under United States Antitrust Law
The issue of essential facilities has attracted renewed attention in Europe in recent years because of the controversy between IMS Health Inc. and NDC Health Corporation, two competitors in pharmaceutical data services in Germany . . . After an extensive investigation, the European Commission (EC) ordered that IMS grant access to the 1860 brick structure on commercially reasonable terms, and the EC decision is now on appeal in the Court of First Instance in Luxembourg. One issue that emerged in that litigation is whether a decision by European authorities to grant access to the alleged essential facility, especially one whose market power derived in part from a copyright, would open a gap between European and U.S. antitrust law. In response to that contention, the authors of this piece filed a statement in the Court of First Instance describing U.S. law on the subject. We argued that the EC\u27s ruling is consistent with U.S. jurisprudence on the subject of essential facilities. The remainder of this article consists of a revised version of the Court of First Instance filing
- âŠ