8 research outputs found

    A systematic comparison of different approaches of unsupervised extraction of text from scholary figures

    Get PDF
    Different approaches have been proposed in the past to address the challenge of extracting text from scholarly figures. However, so far a comparative evaluation of the different approaches has not been conducted. Based on an extensive study, we compare the 7 most relevant approaches described in the literature as well as 25 systematic combinations of methods for extracting text from scholarly figures. To this end, we define a generic pipeline, consisting of six individual steps. We map the existing approaches to this pipeline and re-implement their methods for each pipeline step. The method-wise re-implementation allows to freely combine the different possible methods for each pipeline step. Overall, we have evaluated 32 different pipeline configurations and systematically compared the different methods and approaches. We evaluate the pipeline configurations over four datasets of scholarly figures of different origin and characteristics. The quality of the extraction results is assessed using F-measure and Levenshtein distance. In addition, we measure the runtime performance. The experimental results show that there is an approach that overall shows the best text extraction quality on all datasets. Regarding runtime, we observe huge differences from very fast approaches to those running for several weeks

    Automatically learning topics and difficulty levels of problems in online judge systems

    Get PDF
    Online Judge (OJ) systems have been widely used in many areas, including programming, mathematical problems solving, and job interviews. Unlike other online learning systems, such as Massive Open Online Course, most OJ systems are designed for self-directed learning without the intervention of teachers. Also, in most OJ systems, problems are simply listed in volumes and there is no clear organization of them by topics or difficulty levels. As such, problems in the same volume are mixed in terms of topics or difficulty levels. By analyzing large-scale users’ learning traces, we observe that there are two major learning modes (or patterns). Users either practice problems in a sequential manner from the same volume regardless of their topics or they attempt problems about the same topic, which may spread across multiple volumes. Our observation is consistent with the findings in classic educational psychology. Based on our observation, we propose a novel two-mode Markov topic model to automatically detect the topics of online problems by jointly characterizing the two learning modes. For further predicting the difficulty level of online problems, we propose a competition-based expertise model using the learned topic information. Extensive experiments on three large OJ datasets have demonstrated the effectiveness of our approach in three different tasks, including skill topic extraction, expertise competition prediction and problem recommendation

    Exploratory Browsing

    Get PDF
    In recent years the digital media has influenced many areas of our life. The transition from analogue to digital has substantially changed our ways of dealing with media collections. Today‟s interfaces for managing digital media mainly offer fixed linear models corresponding to the underlying technical concepts (folders, events, albums, etc.), or the metaphors borrowed from the analogue counterparts (e.g., stacks, film rolls). However, people‟s mental interpretations of their media collections often go beyond the scope of linear scan. Besides explicit search with specific goals, current interfaces can not sufficiently support the explorative and often non-linear behavior. This dissertation presents an exploration of interface design to enhance the browsing experience with media collections. The main outcome of this thesis is a new model of Exploratory Browsing to guide the design of interfaces to support the full range of browsing activities, especially the Exploratory Browsing. We define Exploratory Browsing as the behavior when the user is uncertain about her or his targets and needs to discover areas of interest (exploratory), in which she or he can explore in detail and possibly find some acceptable items (browsing). According to the browsing objectives, we group browsing activities into three categories: Search Browsing, General Purpose Browsing and Serendipitous Browsing. In the context of this thesis, Exploratory Browsing refers to the latter two browsing activities, which goes beyond explicit search with specific objectives. We systematically explore the design space of interfaces to support the Exploratory Browsing experience. Applying the methodology of User-Centered Design, we develop eight prototypes, covering two main usage contexts of browsing with personal collections and in online communities. The main studied media types are photographs and music. The main contribution of this thesis lies in deepening the understanding of how people‟s exploratory behavior has an impact on the interface design. This thesis contributes to the field of interface design for media collections in several aspects. With the goal to inform the interface design to support the Exploratory Browsing experience with media collections, we present a model of Exploratory Browsing, covering the full range of exploratory activities around media collections. We investigate this model in different usage contexts and develop eight prototypes. The substantial implications gathered during the development and evaluation of these prototypes inform the further refinement of our model: We uncover the underlying transitional relations between browsing activities and discover several stimulators to encourage a fluid and effective activity transition. Based on this model, we propose a catalogue of general interface characteristics, and employ this catalogue as criteria to analyze the effectiveness of our prototypes. We also present several general suggestions for designing interfaces for media collections

    Analysis and Modular Approach for Text Extraction from Scientific Figures on Limited Data

    Get PDF
    Scientific figures are widely used as compact, comprehensible representations of important information. The re-usability of these figures is however limited, as one can rarely search directly for them, since they are mostly indexing by their surrounding text (e. g., publication or website) which often does not contain the full-message of the figure. In this thesis, the focus is on making the content of scientific figures accessible by extracting the text from these figures. A modular pipeline for unsupervised text extraction from scientific figures, based on a thorough analysis of the literature, was built to address the problem. This modular pipeline was used to build several unsupervised approaches, to evaluate different methods from the literature and new methods and method combinations. Some supervised approaches were built as well for comparison. One challenge, while evaluating the approaches, was the lack of annotated data, which especially needed to be considered when building the supervised approach. Three existing datasets were used for evaluation as well as two datasets of 241 scientific figures which were manually created and annotated. Additionally, two existing datasets for text extraction from other types of images were used for pretraining the supervised approach. Several experiments showed the superiority of the unsupervised pipeline over common Optical Character Recognition engines and identified the best unsupervised approach. This unsupervised approach was compared with the best supervised approach, which, despite of the limited amount of training data available, clearly outperformed the unsupervised approach.Infografiken sind ein viel verwendetes Medium zur kompakten Darstellung von Kernaussagen. Die Nachnutzbarkeit dieser Abbildungen ist jedoch häufig limitiert, da sie schlecht auffindbar sind, da sie meist über die umschließenden Medien, wie beispielsweise Publikationen oder Webseiten, und nicht über ihren Inhalt indexiert sind. Der Fokus dieser Arbeit liegt auf der Extraktion der textuellen Inhalte aus Infografiken, um deren Inhalt zu erschließen. Ausgehend von einer umfangreichen Analyse verwandter Arbeiten, wurde ein generalisierender, modularer Ansatz für die unüberwachte Textextraktion aus wissenschaftlichen Abbildungen entwickelt. Mit diesem modularen Ansatz wurden mehrere unüberwachte Ansätze und daneben auch noch einige überwachte Ansätze umgesetzt, um diverse Methoden aus der Literatur sowie neue und bisher noch nicht genutzte Methoden zu vergleichen. Eine Herausforderung bei der Evaluation war die geringe Menge an annotierten Abbildungen, was insbesondere beim überwachten Ansatz Methoden berücksichtigt werden musste. Für die Evaluation wurden drei existierende Datensätze verwendet und zudem wurden zusätzlich zwei Datensätze mit insgesamt 241 Infografiken erstellt und mit den nötigen Informationen annotiert, sodass insgesamt 5 Datensätze für die Evaluation verwendet werden konnten. Für das Pre-Training des überwachten Ansatzes wurden zudem zwei Datensätze aus verwandten Textextraktionsbereichen verwendet. In verschiedenen Experimenten wird gezeigt, dass der unüberwachte Ansatz besser funktioniert als klassische Texterkennungsverfahren und es wird aus den verschiedenen unüberwachten Ansätzen der beste ermittelt. Dieser unüberwachte Ansatz wird mit dem überwachten Ansatz verglichen, der trotz begrenzter Trainingsdaten die besten Ergebnisse liefert

    Eight Biennial Report : April 2005 – March 2007

    No full text

    Multispace & Multistructure. Neutrosophic Transdisciplinarity (100 Collected Papers of Sciences), Vol. IV

    Get PDF
    The fourth volume, in my book series of “Collected Papers”, includes 100 published and unpublished articles, notes, (preliminary) drafts containing just ideas to be further investigated, scientific souvenirs, scientific blogs, project proposals, small experiments, solved and unsolved problems and conjectures, updated or alternative versions of previous papers, short or long humanistic essays, letters to the editors - all collected in the previous three decades (1980-2010) – but most of them are from the last decade (2000-2010), some of them being lost and found, yet others are extended, diversified, improved versions. This is an eclectic tome of 800 pages with papers in various fields of sciences, alphabetically listed, such as: astronomy, biology, calculus, chemistry, computer programming codification, economics and business and politics, education and administration, game theory, geometry, graph theory, information fusion, neutrosophic logic and set, non-Euclidean geometry, number theory, paradoxes, philosophy of science, psychology, quantum physics, scientific research methods, and statistics. It was my preoccupation and collaboration as author, co-author, translator, or cotranslator, and editor with many scientists from around the world for long time. Many topics from this book are incipient and need to be expanded in future explorations