364 research outputs found

    Deep Learning-based Extraction of Algorithmic Metadata in Full-Text Scholarly Documents

    Get PDF
    The advancements of search engines for traditional text documents have enabled the effective retrieval of massive textual information in a resource-efficient manner. However, such conventional search methodologies often suffer from poor retrieval accuracy especially when documents exhibit unique properties that behoove specialized and deeper semantic extraction. Recently, AlgorithmSeer, a search engine for algorithms has been proposed, that extracts pseudo-codes and shallow textual metadata from scientific publications and treats them as traditional documents so that the conventional search engine methodology could be applied. However, such a system fails to facilitate user search queries that seek to identify algorithm-specific information, such as the datasets on which algorithms operate, the performance of algorithms, and runtime complexity, etc. In this paper, a set of enhancements to the previously proposed algorithm search engine are presented. Specifically, we propose a set of methods to automatically identify and extract algorithmic pseudo-codes and the sentences that convey related algorithmic metadata using a set of machine-learning techniques. In an experiment with over 93,000 text lines, we introduce 60 novel features, comprising content-based, font style based and structure-based feature groups, to extract algorithmic pseudo-codes. Our proposed pseudo-code extraction method achieves 93.32% F1-score, outperforming the state-of-the-art techniques by 28%. Additionally, we propose a method to extract algorithmic-related sentences using deep neural networks and achieve an accuracy of 78.5%, outperforming a Rule-based model and a support vector machine model by 28% and 16%, respectively

    Deep feature engineering using full-text publications

    Get PDF
    We have observed a rapid proliferation in scientific literature and advancements in web technologies has shifted information dissemination to digital libraries [1]. In general, the research conducted by scientific community is articulated through scholarly publications pertaining high quality algorithms along other algorithmic specific metadata such as achieved results, deployed datasets and runtime complexity. According to estimation, approximately 900 algorithms are published in top core conferences during the years 2005-2009 [2]. With this significant increase in algorithms reported in these conferences, more efficient search systems with advance searching capabilities must be designed to search for an algorithm and its supported metadata such as evaluation results like precision, recall etc., particular dataset on which an algorithm executed or the time complexity achieved by that algorithm from full body text of an article. Such advanced search systems could support researchers and software engineers looking for cutting edge algorithmic solutions. Recently, state designed to search for an algorithm from full text articles [3-5]. In this work, we designed an advanced search engine for full text publications that leverages the deep learning techniques to classify algorithmic specific metadata and further to improve searching capabilities for a search system

    Mining algorithmic complexity in full-text scholarly documents

    Get PDF
    Non-textual document elements (NTDE) like charts, diagrams, algorithms play an important role to present key information in scientific documents [1]. Recent advancements in information retrieval systems tap this information to answer more complex queries by mining text pertaining to non-textual document elements. However, linking between document elements and corresponding text can be non-trivial. For instance, linking text related to algorithmic complexity with consequent root algorithm could be challenging. These elements are sometime placed at the start or at the end of the page instead of following the flow of document text, and the discussion about these elements may or may not be on the same page. In recent years, quite a few attempts have been made to extract NTDE [2-3]. These techniques are actively applied for effective document summarization, to improve the existing IR systems. Generally, asymptotic notations are used to identify the complexity lines in full text. We mine the relevant complexities of algorithms from full text by comparing the metadata of algorithm with context of paragraph in which complexity related discussion is made by authors. In this paper, we presented a mechanism for identification of algorithmic complexity lines using regular expressions, algorithmic metadata compilation of algorithms, and linking complexity related textual lines to algorithmic metadata

    Extracting algorithmic complexity in scientific literature for advance searching

    Get PDF
    Non-textual document elements such as charts, diagrams, algorithms and tables play an important role to present key information in scientific documents. Recent advances in information retrieval systems tap this information to answer more complex user queries by mining text pertaining to non-textual document elements from full text. Algorithms are critically important in computer science. Researchers are working on existing algorithms to improve them for critical application. Moreover, new algorithms for unsolved and newly faced problems are under development. These enhanced and new algorithms are mostly published in scholarly documents. The complexity of these algorithms is also discussed in the same document by the authors. Complexity of an algorithm is also an important factor for information retrieval (IR) systems. In this paper, we mine the relevant complexities of algorithms from full text document by comparing the metadata of the algorithm, such as caption and function name, with the context of the paragraph in which complexity related discussion is made by the authors. Using the dataset of 256 documents downloaded from CiteSeerX repository, we manually annotate 417 links between algorithms and their complexities. Further, we apply our novel rule-based approach that identifies the desired links with 81% precision, 75% recall, 78% F1-score and 65% accuracy. Overall, our method of identifying the links has potential to improve information retrieval systems that tap the advancements of full text and more specifically non-textual document elements

    Parsing AUC Result-Figures in Machine Learning Specific Scholarly Documents for Semantically-enriched Summarization

    Get PDF
    Machine learning specific scholarly full-text documents contain a number of result-figures expressing valuable data, including experimental results, evaluations, and cross-model comparisons. The scholarly search system often overlooks this vital information while indexing important terms using conventional text-based content extraction approaches. In this paper, we propose creating semantically enriched document summaries by extracting meaningful data from the results-figures specific to the evaluation metric of the area under the curve (AUC) and their associated captions from full-text documents. At first, classify the extracted figures and analyze them by parsing the figure text, legends, and data plots – using a convolutional neural network classification model with a pre-trained ResNet-50 on 1.2 million Images from ImageNet. Next, we extract information from the result figures specific to AUC by approximating the region under the function’s graph as a trapezoid and calculating its area, i.e., the trapezoidal rule. Using over 12,000 figures extracted from 1000 scholarly documents, we show that figure specialized summaries contain more enriched terms about figure semantics. Furthermore, we empirically show that the trapezoidal rule can calculate the area under the curve by dividing the curve into multiple intervals. Finally, we measure the quality of specialized summaries using ROUGE, Edit distance, and Jaccard Similarity metrics. Overall, we observed that figure specialized summaries are more comprehensive and semantically enriched. The applications of our research are enormous, including improved document searching, figure searching, and figure focused plagiarism. The data and code used in this paper can be accessed at the following URL: https://github.com/slab-itu/fig-ir/

    A comprehensive review of artificial intelligence for pharmacology research

    Get PDF
    With the innovation and advancement of artificial intelligence, more and moreartificial intelligence techniques are employed in drug research, biomedicalfrontier research, and clinical medicine practice, especially, in the field ofpharmacology research. Thus, this review focuses on the applications ofartificial intelligence in drug discovery, compound pharmacokinetic prediction,and clinical pharmacology. We briefly introduced the basic knowledge anddevelopment of artificial intelligence, presented a comprehensive review, and then summarized the latest studies and discussed the strengths and limitations of artificial intelligence models. Additionally, we highlighted several important studies and pointed out possible research directions

    Towards an AEC-AI Industry Optimization Algorithmic Knowledge Mapping: An Adaptive Methodology for Macroscopic Conceptual Analysis

    Full text link
    [EN] The Architecture, Engineering, and Construction (AEC) Industry is one of the most important productive sectors, hence also produce a high impact on the economic balances, societal stability, and global challenges in climate change. Regarding its adoption of technologies, applications and processes is also recognized by its status-quo, its slow innovation pace, and the conservative approaches. However, a new technological era - Industry 4.0 fueled by AI- is driving productive sectors in a highly pressurized global technological competition and sociopolitical landscape. In this paper, we develop an adaptive approach to mining text content in the literature research corpus related to the AEC and AI (AEC-AI) industries, in particular on its relation to technological processes and applications. We present a rst stage approach to an adaptive assessment of AI algorithms, to form an integrative AI platform in the AEC industry, the AEC-AI industry 4.0. At this stage, a macroscopic adaptive method is deployed to characterize ``Optimization,'' a key term in AEC-AI industry, using a mixed methodology incorporating machine learning and classical evaluation process. Our results show that effective use of metadata, constrained search queries, and domain knowledge allows getting a macroscopic assessment of the target concept. This allows the extraction of a high-level mapping and conceptual structure characterization of the literature corpus. The results are comparable, at this level, to classical methodologies for the literature review. In addition, our method is designed for an adaptive assessment to incorporate further stages.This work was supported by the CONICYT/FONDECYT/INICIACION under Grant 11180056 to Jose Garcia and the Spanish Ministry of Science and Innovation through the FEDER Funding under Project PID2020-117056RB-I00 to Victor Yepes.Maureira, C.; Pinto, H.; Yepes, V.; García, J. (2021). Towards an AEC-AI Industry Optimization Algorithmic Knowledge Mapping: An Adaptive Methodology for Macroscopic Conceptual Analysis. IEEE Access. 9:110842-110879. https://doi.org/10.1109/ACCESS.2021.3102215S110842110879
    corecore