1,123 research outputs found

    OpenML: networked science in machine learning

    Full text link
    Many sciences have made significant breakthroughs by adopting online tools that help organize, structure and mine information that is too detailed to be printed in journals. In this paper, we introduce OpenML, a place for machine learning researchers to share and organize data in fine detail, so that they can work more effectively, be more visible, and collaborate with others to tackle harder problems. We discuss how OpenML relates to other examples of networked science and what benefits it brings for machine learning research, individual scientists, as well as students and practitioners.Comment: 12 pages, 10 figure

    Searching Data: A Review of Observational Data Retrieval Practices in Selected Disciplines

    Get PDF
    A cross-disciplinary examination of the user behaviours involved in seeking and evaluating data is surprisingly absent from the research data discussion. This review explores the data retrieval literature to identify commonalities in how users search for and evaluate observational research data. Two analytical frameworks rooted in information retrieval and science technology studies are used to identify key similarities in practices as a first step toward developing a model describing data retrieval

    Privacy-preserving efficient searchable encryption

    Get PDF
    Data storage and computation outsourcing to third-party managed data centers, in environments such as Cloud Computing, is increasingly being adopted by individuals, organizations, and governments. However, as cloud-based outsourcing models expand to society-critical data and services, the lack of effective and independent control over security and privacy conditions in such settings presents significant challenges. An interesting solution to these issues is to perform computations on encrypted data, directly in the outsourcing servers. Such an approach benefits from not requiring major data transfers and decryptions, increasing performance and scalability of operations. Searching operations, an important application case when cloud-backed repositories increase in number and size, are good examples where security, efficiency, and precision are relevant requisites. Yet existing proposals for searching encrypted data are still limited from multiple perspectives, including usability, query expressiveness, and client-side performance and scalability. This thesis focuses on the design and evaluation of mechanisms for searching encrypted data with improved efficiency, scalability, and usability. There are two particular concerns addressed in the thesis: on one hand, the thesis aims at supporting multiple media formats, especially text, images, and multimodal data (i.e. data with multiple media formats simultaneously); on the other hand the thesis addresses client-side overhead, and how it can be minimized in order to support client applications executing in both high-performance desktop devices and resource-constrained mobile devices. From the research performed to address these issues, three core contributions were developed and are presented in the thesis: (i) CloudCryptoSearch, a middleware system for storing and searching text documents with privacy guarantees, while supporting multiple modes of deployment (user device, local proxy, or computational cloud) and exploring different tradeoffs between security, usability, and performance; (ii) a novel framework for efficiently searching encrypted images based on IES-CBIR, an Image Encryption Scheme with Content-Based Image Retrieval properties that we also propose and evaluate; (iii) MIE, a Multimodal Indexable Encryption distributed middleware that allows storing, sharing, and searching encrypted multimodal data while minimizing client-side overhead and supporting both desktop and mobile devices

    Linking genes to literature: text mining, information extraction, and retrieval applications for biology

    Get PDF
    Efficient access to information contained in online scientific literature collections is essential for life science research, playing a crucial role from the initial stage of experiment planning to the final interpretation and communication of the results. The biological literature also constitutes the main information source for manual literature curation used by expert-curated databases. Following the increasing popularity of web-based applications for analyzing biological data, new text-mining and information extraction strategies are being implemented. These systems exploit existing regularities in natural language to extract biologically relevant information from electronic texts automatically. The aim of the BioCreative challenge is to promote the development of such tools and to provide insight into their performance. This review presents a general introduction to the main characteristics and applications of currently available text-mining systems for life sciences in terms of the following: the type of biological information demands being addressed; the level of information granularity of both user queries and results; and the features and methods commonly exploited by these applications. The current trend in biomedical text mining points toward an increasing diversification in terms of application types and techniques, together with integration of domain-specific resources such as ontologies. Additional descriptions of some of the systems discussed here are available on the internet

    AI in Medical Imaging Informatics: Current Challenges and Future Directions

    Get PDF
    This paper reviews state-of-the-art research solutions across the spectrum of medical imaging informatics, discusses clinical translation, and provides future directions for advancing clinical practice. More specifically, it summarizes advances in medical imaging acquisition technologies for different modalities, highlighting the necessity for efficient medical data management strategies in the context of AI in big healthcare data analytics. It then provides a synopsis of contemporary and emerging algorithmic methods for disease classification and organ/ tissue segmentation, focusing on AI and deep learning architectures that have already become the de facto approach. The clinical benefits of in-silico modelling advances linked with evolving 3D reconstruction and visualization applications are further documented. Concluding, integrative analytics approaches driven by associate research branches highlighted in this study promise to revolutionize imaging informatics as known today across the healthcare continuum for both radiology and digital pathology applications. The latter, is projected to enable informed, more accurate diagnosis, timely prognosis, and effective treatment planning, underpinning precision medicine

    Integration among databases and data sets to support productive nanotechnology: Challenges and recommendations

    Get PDF
    Many groups within the broad field of nanoinformatics are already developing data repositories and analytical tools driven by their individual organizational goals. Integrating these data resources across disciplines and with non-nanotechnology resources can support multiple objectives by enabling the reuse of the same information. Integration can also serve as the impetus for novel scientific discoveries by providing the framework to support deeper data analyses. This article discusses current data integration practices in nanoinformatics and in comparable mature fields, and nanotechnology-specific challenges impacting data integration. Based on results from a nanoinformatics-community-wide survey, recommendations for achieving integration of existing operational nanotechnology resources are presented. Nanotechnology-specific data integration challenges, if effectively resolved, can foster the application and validation of nanotechnology within and across disciplines. This paper is one of a series of articles by the Nanomaterial Data Curation Initiative that address data issues such as data curation workflows, data completeness and quality, curator responsibilities, and metadata

    Information retrieval and text mining technologies for chemistry

    Get PDF
    Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European Community’s Horizon 2020 Program (project reference: 654021 - OpenMinted). M.K. additionally acknowledges the Encomienda MINETAD-CNIO as part of the Plan for the Advancement of Language Technology. O.R. and J.O. thank the Foundation for Applied Medical Research (FIMA), University of Navarra (Pamplona, Spain). This work was partially funded by Consellería de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi for useful feedback and discussions during the preparation of the manuscript.info:eu-repo/semantics/publishedVersio

    Infrastructuring educational genomics:Associations, architectures and apparatuses

    Get PDF
    Technoscientific transformations in molecular genomics have begun to influence knowledge production in education. Interdisciplinary scientific consortia are seeking to identify ‘genetic influences’ on ‘educationally relevant’ traits, behaviors, and outcomes. This article examines the emerging ‘knowledge infrastructure’ of educational genomics, attending to the assembly and choreography of organizational associations, epistemic architecture, and technoscientific apparatuses implicated in the generation of genomic understandings from masses of bioinformation. As an infrastructure of datafied knowledge production, educational genomics is embedded in data-centered epistemologies and practices which recast educational problems in terms of molecular genetic associations—insights about which are deemed discoverable from digital bioinformation and potentially open to genetically informed interventions in policy and practice. While scientists claim to be ‘opening the black box of the genome’ and its association with educational outcomes, we open the black box of educational genomics itself as a source of emerging scientific authority. Data-intensive educational genomics does not straightforwardly ‘discover’ the biological bases of educationally relevant behaviors and outcomes. Rather, this knowledge infrastructure is also an experimental ‘ontological infrastructure’ supporting particular ways of knowing, understanding, explaining, and intervening in education, and recasting the human subjects of education as being surveyable and predictable through the algorithmic processing of bioinformation

    Technical Debt in Software Development : Examining Premises and Overcoming Implementation for EïŹƒcient Management

    Get PDF
    Software development is a unique ïŹeld of engineering: all software constructs retain their modiïŹability — arguably, at least — until client release, no single project stakeholder has exhaustive knowledge about the project, and even this portion of the knowledge is generally acquired only at project completion. These characteristics imply that the ïŹeld of software development is subject to design decisions that are known to be sub-optimal—either deliberately emphasizing interests of particular stakeholders or indeliberately harming the project due to lack of exhaustive knowledge. Technical debt is a concept that accounts for these decisions and their eïŹ€ects. The concept’s intention is to capture, track, and manage the decisions and their products: the aïŹ€ected software constructs. Reviewing the previous, it is vital for software development projects to acknowledge technical debt both as an enabler and as a hindrance. This thesis looks into facilitating eïŹƒcient technical debt management for varying software development projects. In the thesis, examination of technical debt’s role in software development produces the premises on to which a management implementation approach is introduced. The thesis begins with a revision of motivations. Basing on prior research in the ïŹelds of technical debt management and software engineering in general, the ïŹve motivations establish the premises for technical debt in software development. These include notions of subjectivity in technical debt estimation, update frequency demands posed on technical debt information, and technical debt’s polymorphism. Three research questions are derived from the motivations. They ask for tooling support for technical debt management, capturing and modelling technical debt propagation, and characterizing software development environments and their technical debt instances. The questions imply consecutive completion as the ïŹrst pursued tool would beneïŹt from—possibly automatically assessable—propagation models, and ïŹnally the tool’s introduction to software development organizations could be assisted by tailoring it based on the software development environment and the technical debt instance characterizations. The thesis has seven included publications. In introducing them, the thesis maps their backgrounds to the motivations and their outcomes to the research questions. Amongst the outcomes are the DebtFlag tool for technical debt management, the procedures for retrospectively capturing technical debt from software repositories, a procedure for technical debt propagation model creation from these retrospectives, and a multi-national survey characterizing software development environments and their technical debt instances. The thesis concludes that the tooling support, the technical debt propagation modelling, and the software environment and technical debt instance characterization describe an implementation approach to further eïŹƒcient technical debt management. Simultaneously, future work is implied as all previously described eïŹ€orts need to be continued and extended. Challenges also remain in the introduced approach. An example of this is the combinatorial explosion of technology-development-context-combinations that technical debt propagation modelling needs to consider. All combinations have to be managed if exhaustive modelling is desired. There is, however, a great deal of motivation to pursue these eïŹ€orts when one re-notes that technical debt is a permanent component of software development that, when correctly managed, is a development eïŹƒciency mechanism comparable to a ïŹnancial loan investment.Ohjelmistokehitys on uniikki tekniikan ala: kaikki ohjelmistorakenteet sĂ€ilyttĂ€vĂ€t muokattavuutensa — otaksuttavasti ainakin — asiakasjulkaisuun asti. YhdenkÀÀn projektiosakkaan tietĂ€mys ei kata koko projektia ja merkittĂ€vĂ€ osa tĂ€stĂ€kin tiedosta karttuu vasta projektin suorittamisen aikana. NĂ€mĂ€ ominaisuudet antavat ymmĂ€rtÀÀ, ettĂ€ ohjelmistokehitysala on sellaisten suunnitelupÀÀtösten kohde, joiden tiedetÀÀn olevan epĂ€tĂ€ydellisiÀ—joko tarkoituksella tiettyjen projektiosakkaiden intressejĂ€ painottavia tai tahattomasti projektia vahingoittavia puutteelliseen tietoon perustuvia. Tekninen velka on konsepti, joka huomioi nĂ€mĂ€ pÀÀtökset sekĂ€ niiden vaikutukset. Konseptin tarkoitus on havaita, seurata ja hallita nĂ€itĂ€ pÀÀtöksiĂ€ sekĂ€ tuloksena syntyviĂ€ teknisen velan vaikutuksen alla olevia ohjelmistorakenteita. Edellisen kuvauksen valossa ohjelmistokehitysprojekteille on erityisen tĂ€rkeÀÀ huomioida tekninen velka sekĂ€ mahdollistajana ettĂ€ hidasteena. TĂ€mĂ€n vuoksi kyseinen vĂ€itöskirja perehtyy tehokkaan teknisen velan hallinnan fasilitointiin moninaisille ohjelmistokehitysprojekteille. VĂ€itöskirjassa tarkastellaan teknisen velan roolia osana ohjelmistokehitystĂ€. Tarkastelu tuottaa joukon premissejĂ€, joihin perustuen esitellÀÀn lĂ€hestymistapa teknisen velan hallinnan toteuttamiselle. Viisi vĂ€itöskirjan alussa esitettyĂ€ motivaatiota kiinnittĂ€vĂ€t ne premissit,joille ratkaisu esitetÀÀn. Motivaatiot rakennetaan olemassa olevaan teknisen velan sekĂ€ ohjelmistotekniikan tutkimustietoon perustuen. NĂ€ihin lukeutuvat muun muassa subjektiivisuus teknisen velan estimoinnissa, teknisen velan informaatiolle nĂ€hdyt pĂ€ivitystaajuusvaatimukset sekĂ€ teknisen velan polymorïŹsmi. Havainnoista johdetaan kolme tutkimuskysymystĂ€. Ne tavoittelevat työkalutukea teknisen velan hallinnalle, velan propagoitumisen havainnointia sekĂ€ mallinnusta kuin myös ohjelmistotuotantoympĂ€ristöjen ja niiden velka instanssien kuvaamista. Tutkimuskysymykset implikoivat perĂ€kkĂ€istĂ€ suoritusta: tavoiteltu työkalu hyötyy—mahdollisesti automaattisesti arvoitavista—teknisen velan propagaatiomalleista. Valmiin työkalun kĂ€yttöönottoa voidaan taas edistÀÀ jos kuvaukset kehitysympĂ€ristöistĂ€ sekĂ€ niiden velkainstansseista ovat kĂ€ytettĂ€vissĂ€ työkalun rÀÀtĂ€löintiin. VĂ€itöskirjaaan sisĂ€ltyy seitsemĂ€n julkaisua. VĂ€itöskirja esittelee ne kiinnittĂ€mĂ€llĂ€ julkaisujen taustatyön aikaisemmin mainittuihin motivaatioihin sekĂ€ niiden tulokset edellisiin tutkimuskysymyksiin. Tuloksista huomioidaan esimerkiksi DebtFlag-työkalu teknisen velan hallintaan, retrospektiivinen prosessi teknisen velan kartoittamiselle versionhallintajĂ€rjestelmistĂ€, prosessi teknisen velan mallien rakentamiselle nĂ€istĂ€ kartoituksista ja monikansallinen kyselytutkimus ohjelmistokehitysympĂ€ristöjen sekĂ€ nĂ€iden teknisen velan instanssien luonnehtimiseksi. VĂ€itöskirjan yhteenvetona huomioidaan, ettĂ€ teknisen velan hallinnan työkalutuki, teknisen velan propagaatiomallinnus ja ohjelmistokehitysympĂ€ristöjen sekĂ€ niiden teknisen velan instanssien luonnehdinta muodostavat toteutustavan, jolla teknisen velan tehokasta hallintaa voidaan kehittÀÀ. Samalla implikoidaan jatkotoimia, sillĂ€ kaikkia edellĂ€ kuvattuja työn osia tulee jatkaa ja laajentaa. Toteutustavalle nĂ€hdÀÀn myös haasteita. ErĂ€s nĂ€istĂ€ on kombinatorinen rĂ€jĂ€hdys teknologia- ja kehityskontekstikombinaatioille. Kaikki kombinaatiot tulee huomioida mikĂ€li teknisen velan propagaatiomallinnuksesta halutaan kattavaa. Motivaatio vĂ€itöskirjassa esitetyn työn jatkamiselle on huomattavaa ja sitĂ€ kasvattaa entuudestaan edellĂ€ tehty huomio siitĂ€, ettĂ€ tekninen velka on pysyvĂ€ komponentti ohjelmistokehityksessĂ€, joka oikein hallittuna on kehitystehokkuutta edistĂ€vĂ€nĂ€ komponenttina verrattavissa ïŹnanssialan lainainvestointiin.Siirretty Doriast
    • 

    corecore