Search CORE

6 research outputs found

Discovering shifts in competitive strategies in probiotics, accelerated with TechMining

Author: Artacho Ramírez Miguel Ángel
de la Calle Begoña
Jimémez Sara
Palli Anna
Vicente Gomila José Miguel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2017
Field of study

[EN] Profiling the technological strategy of different competitors is a key element for the companies in a given industry, as well to technology planners and R&D strategists. The analysis of the patent portfolio of a company as well as its evolution in the time line is of interest for technology analysts and decision makers. However, the need for the participation of experts in the field of a company as well as patent specialists, slows down the process. Bibliometrics and text mining techniques contribute to the interpretation of specialists. The present paper tries to offer a step by step procedure to analyze the technology strategy of several companies through the analysis of their portfolio claims, combined with the use of TechMining with the help of a text mining tool. The procedure, complemented with a semantic TRIZ analysis provides key insights in disclosing the technological analysis of some competitors in the field of probiotics for livestock health. The results show interesting shifts in the key probiotic and prebiotic ingredients for which companies claim protection and therefore offers clues about their technology intention in the life sciences industry in a more dynamic, convenient and simple way.The authors would like to thank the contribution of the research institute IRTA, to the TRIZ company triz XXI and to Fernando Palop and their wise insights and guidance. The authors thank the usage of Search Technology s VantagePoint and IHS-Markit s Goldfire.Vicente Gomila, JM.; Palli, A.; De La Calle, B.; Artacho Ramírez, MÁ.; Jimémez, S. (2017). Discovering shifts in competitive strategies in probiotics, accelerated with TechMining. Scientometrics. 111(3):1907-1923. https://doi.org/10.1007/s11192-017-2339-5S190719231113Abbas, A., Zhang, L., & Khan, S. (2014). A literature review on the state-of-the-art in patent analysis. World Patent Information, 37, 3–13.Allen, H., Levine, T., Bandrick, M., & Casey, T. (2012). Treatment, promotion, commotion: Antibiotic alternatives in food-producing animals. Trends in Microbiology, 21(3), 114–119.Animal Task Force. (2013). Research & innovation for a sustainable livestock sector in Europe. http://www.animaltaskforce.eu/Portals/0/ATF/horizon2020/ATF%20white%20paper%20Research%20priorities%20for%20a%20sustainable%20livestock%20sector%20in%20Europe.pdf . Accessed September 4, 2016.Abramson, D. (2011). Patent strategies for life sciences companies to navigate the changing patent landscape. Journal of Commercial Biotechnology, 17, 358–364.Banan-Mwine Daliri, E., & Lee, B. H. (2015). New perspectives on probiotics and disease. Food Science and Human Wellness, 4, 56–65.Bubela, T., Gold, R., Gregory, G., Cahoy, D., & Castle, D. (2013). Patent landscaping for life sciences innovation: Toward consistent and transparent practices. Nature Biotechnology, 31, 202–206.Chih-Hung, H. (2013). Patent value assessment and commercialization strategy. Technology forecasting & Social Change, 80, 307–319.Choi, S., Yoon, J., Kim, K., Lee, J. Y., & Kim, C.-H. (2011). SAO network analysis of patents for technology trends identification: A case study of polymer electrolyte membrane technology in proton exchange membrane fuel cells. Scientometrics, 88, 863–883.Collins, M. D., & Gibson, G. (1999). Probiotics, prebiotics, and synbiotics: Approaches for modulating the microbial ecology of the gut. American Journal of Clinical Nutrition, 69(suppl), 1052S–1057S.Ernst, H. (1998). Patent portfolio for strategic technology management. Journal of Engineering Technology Management, 15, 279–308.Ferraro, G., & Wanner, L. (2011). Towards the derivation of verbal content relations from patent claims using deep syntactic structures. Knowledge-Based Systems, 24, 1233–1244.Foligné, B., Daniel, C., & Pot, B. (2013). Probiotics from research to market: The possibilities, risks and challenges. Current Opinion in Microbiology, 16(3), 284–292.Gerken, J., & Moehrle, M. (2012). A new instrument for technology monitoring: Novelty in patents measured by semantic patent analysis. Scientometrics, 91, 645–670.Grant, R. (2006). Contemporary strategic analysis (5th ed.). ISBN 1-405-1999-3.Grant, E., Van den Hof, M., & Gold, R. (2014). Patent landscape analysis: A methodology in need of harmonized standards. World Patent Information, 39, 3–10.He, J., Yamanaka, T., & Kano, S. (2016). Mapping university receptor based on claim embodiment quantitative analysis: A study of 31 cases form the University of Tokio. World Patent Information, 46, 49–55.IHS Goldfire. www.ihsmarkit.com . Accessed November 2016.Kaushik, G. (Ed.) (2015). Applied environmental biotechnology: Present scenario and future trends. Springer. ISBN 978-81-322-2122-7.Kim, B., Miller, D., & Mahoney, J. (2016). The impact of the timing of patents on innovation performance. Research Policy, 45(2016), 914–928.Kume, H. (2010). From low power to no power through energy harvesting: Powering up the battery-free world. Nikkei Elctronics Asia; October 31, 2010; Accessed November 2011.Lanjouw, J., & Schankerman, M. (1999). The quality of ideas: Measuring innovation with multiple indicators. 7345. National Bureau for Economic Research, Cambridge, MA, USA. http://www.nber.org . Accessed September 2016.Lee, C., Kim, J., Kwon, O., & Woo, H. G. (2016). Stochastic technology life cycle analysis using multiple patent indicators. Technological Forecasting and Social Change, 106(2016), 53–64.Mogee, M. E. (1991). Using patent data for technology analysis and planning. Research-Technology Management, 34(4), 43–49.Niwa, S. (2016). Patent claims and economic growth. Economic Modelling, 54, 377–381.Noh, H., Jo, Y., & Lee, S. (2015). Keyword selection and processing strategy for applying text mining to patent analysis. World Patent Information, 42, 4348–4360.O’Callaghan, T. F., Ross, R. P., Stanton, C., & Clarke, G. (2016). The gut micorbiome as a virtual endocrine organ with implicaitons for farm and domestic animal endocrinology. Domestic Animal Endocrinology, 56, S44–S55.Pargaonkar, Y. (2016). Leveraging patent landscape analysis and IP competitive intelligence. World Patent Information, 45, 10–20.Park, H., Yoon, J., & Kim, K. (2012). Identifiying patent infringement using SAO based semantic technological similarities. Scientometrics, 90, 515–529.Park, H., Yoon, J., & Kim, K. (2013). Identification and evaluation of corporations for merger and acquisition strategies using patent information and text mining. Scientometrics, 97, 883–909.Porter, M. (2008). The five competitive forces that shape strategy. Harvard Business Review. January 2008. 1–17. Reprint R0801E. www.hbrreprints.org .Porter, A. L., & Cunningham, S. (2005). Tech Mining. Hoboken: Wiley Interscience.Porter, A., & Newman, N. (2011). Mining external R&D. Technovation, 31, 171–176.Regulation (EC) No 1831/2003 of the European Parliament and of the Council of 22 September 2003 on additives for use in animal nutrition Regulation (EC) No 1831/2003 of the European Parliament and of the Council. http://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32003R1831 .Rose, C., Cronin, J., & Schwartz, R. (2007). Communicating the value of your intellectual property to Wall Street. Research Technology Management, 50(2), 36–40.Schrezenmeir, J., & De Vrese, M. (2001). Probiotics, prebiotics, and synbiotics—Approaching a definition. The American Journal of Clinical Nutrition, 73(2), 361s–364s.Soranzo, B., Nosella, A., & Filippini, R. (2016). Managing firm patents: A bibliometric investigation into the state of the art. Journal of Engineering and Technology Management, 42, 15–30.Teece, D. J. (1986). Profiting from technological innovation: Implications for integration, collaboration, licensing and public policy. Research Policy, 15, 285–305.The patent guide; A handbook for analyzing and interpreting patent data. UK Intellectual patent office.Tong, X., & Frame, D. (1994). Technological performance with patent claims data. Research Policy, 23, 133–141.VantagePoint. www.theVantagePoint.com . Accessed September 20, 2016.Verberne, S., D’hondt, E., & Oostdijk, N. (2010). Quantifying the challenges in parsing patent claims. In The 1st international workshop on Advances in Patent Information Retrieval (AsPIRe’10), Milton Keynes, UK.Verbitsky, M. (2004). Semantic TRIZ, triz-journal.com. http://www.triz-journal.com/archives/2004/ .Vicente-Gomila, J. M. (2014). The contribution of syntactic-semantic approach to the search for complimentary literatures for scientific or technical discovery. Scientometrics. doi: 10.1007/s11192-014-1299-2 .Vicente-Gomila, J. M., & Palop, F. (2013). Combining tech-mining and semantic-TRIZ for a faster and better technology analysis: A case in energy storage systems. Technology Analysis & Strategic Management, 25(6), 725–743.Wang, M., Chiu, T., & Chen, W. (2009). Exploring potential R&D collaborators based on patent portfolio analysis: The case of biosensors. In PICMET 2009 Proceedings, August 2–6, Portland, Oregon, USA.Wang, J., Lu, F., & Loh, H. (2015). A two-level parser for patent claim parsing. Advanced Engineering Informatics, 29, 431–439.Weenen, T. C., Pronker, E. S., Commandeur, H. R., & Claasen, E. (2013). Patenting in the European medical nutrition industry: Trends, opportunities and strategies. PharmaNutrition, 1, 13–21.Xie, Z., & Miyazaki, K. (2013). Evaluating the effectiveness of keyword search strategy for patent identification. World Patent Information, 35(1), 20–30.Yang, Y., & Choct, M. (2009). Dietary modulation of gut microflora in broiler chickens: A review of the role of six kinds of alternatives to in-feed antibiotics. World’s Poultry Science Journal, 65, 97–114.Yang, S.-Y., & Soo, V.-W. (2012). Extract conceptual graphs from plain texts in patent claims. Engineering Applications of Artificial Intelligence, 25, 874–887.Yoon, J., Park, H., & Kim, K. (2013). Identifying technological competition trends for R&D planning using dynamic patent maps: SAO-bassed content analysis. Scientometrics, 94, 313–331

RiuNet

The combination of the disciplines of Techmining and semantic TRIZ for better and faster analyzing technology evolution

Author: Vicente Gomila José Miguel
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 08/09/2018
Field of study

Tesis por compendioThe purpose of the present thesis is to explore and to demonstrate how the combination of two methodological approaches, text mining plus the systemic vision of TRIZ empowered by semantics, can bring a larger and more comprehensive analysis of the evolution of a technology. Both approaches had been not combined before the first of the four papers constituents of the present thesis based in a compendium of publications. However, this combination applied to the evolution of technologies is increasingly being published in the scientific literature. Such combination shows a second benefit in the form of an improvement in accessing and connecting knowledge from disparate scientific literatures in a systematic manner. The common element in all these papers is the use of the technology mining approach, 'techmining', the application of text mining techniques based on technology management knowledge, combined with the use of semantic TRIZ, the advantage of syntactic applied to the systemic vision of TRIZ. These papers show that a better analysis of evolving technologies, e.g. by profiling technologies from a systemic point of view or, a better access to knowledge, e.g. by semantically connecting concepts with meaning, can be achieved. The research on applying the combination of these approaches to scientific and technological information analysis explores the advantages and new possibilities for technology trends assessment as well as the semantic connection of concepts which represents a change in the way information research can be done. The different applications of the aforementioned combination are explored by means of the here presented articles. The structure followed in this research is the collection of three papers published in international academic journals indexed in the most prestigious databases and one chapter in a proceedings book of an international congress. The attached articles show the research undertaken to demonstrate the aforementioned benefits of the proposed combination. Despite it can be found many methods and approaches about the assessment of the evolution of technologies, distributed across the literature, there is still a need to better understand which technologies may emerge, which may evolve faster and at what pace can they reach the market. The combination of the techmining approach and the semantic TRIZ approaches allows understanding the trends enriched with a systemic vision of the links, functions, and influences of constituent and enabling elements of a technology. Such systemic link of elements with its components and ecosystem also allows for a multi-dimensional view of a technology and further reduces the uncertainty to preview the progress of a technology. The papers presented in this dissertation are based on the combination of the TRIZ methodology, the techmining approach and the semantic TRIZ approach, applied to different technologies in different domains, to proof the advantages and implications of the combination. The articles try the different interactions of the combined approaches, applied to the assessment of different technologies, such as lithium batteries for the electric car, a medical case linked to a disease known as Meniére's Disease, the prognosis of prostate cancer, and the usage of probiotics as substitutes of antibiotics in the animal health. The wide range of technologies was selected to show the clear benefits of either combining the two approaches or applying predominantly one of them in the case of the Meniére's disease article. That difference in the nature of technologies also helped to better understand the systemic point of view of the technology, exploring new applications based on the general system theory from Bertalanffy as well as other related approaches about technologies.El propósito de la presente tesis es la exploración y la demostración de la combinación de dos enfoques metodológicos, la minería de textos y la visión sistémica de TRIZ reforzada con la semántica, pueden aportar un mayor y mas exhaustivo análisis de la evolución de una tecnología. Ambos enfoques no habían sido combinados antes del primero de los cuatro artículos que representan esta tesis por compendio de publicaciones, aunque dicha combinación ha sido crecientemente publicada en la literatura científica, para multiples propósitos desde entonces. Un segundo aporte proporcionado por esta combinación es la mejora de la capacidad de acceso al conocimiento y cómo ello supone un avance para el descubrimiento a través de literaturas no relacionadas "disparate literature discovery" de una forma metódica y científica. El elemento común en los artículos aquí presentados es el aprovechamiento de techmining, esto es, la minería de textos con base en la gestión tecnológica, por ejemplo mediante el perfilado de tecnologías, junto al enfoque de la metodología TRIZ potenciada por el análisis sintáctico y semántico, esto es, mediante la conexión semántica de conceptos, para un análisis más completo de la evolución tecnológica, proporcionando al mismo tiempo un acceso más racional al conocimiento. La investigación sobre la aplicación de la citada combinación al análisis de información científica y tecnológica explora las ventajas y nuevas posibilidades en la evaluación del avance de la tecnología, así como la conexión semántica de conceptos que representa nuevas posibilidades en la forma en que la investigación textual puede hacerse. La estructura de la investigación aquí presentada se muestra a través de los artículos publicados en revistas internacionales de alto impacto y el capítulo de los 'proceedings' de un congreso internacional. Dichos artículos muestran la investigación llevada a cabo para demostrar los beneficios mencionados de la combinación propuesta. A pesar de la gran actividad de investigación y de la existencia de varios enfoques para la prospectiva y la previsión tecnológica presentes en la literatura científica, existe aún la necesidad de entender qué tecnologías pueden emerger, pueden evolucionar más rápido y a qué velocidad pueden llegar al mercado. La combinación de los enfoques de minería tecnológica o techmining y TRIZ semántico permite entender las tendencias de una tecnología dada, enriquecida con una visión de su sistémica, y teniendo en cuenta las conexiones de sus elementos y las influencias de sus elementos constituyentes. Tal conexión entre los components y su entorno permite una vision multidimensional de la tecnología reduciendo más aún la incertidumbre en la previsión de la evolución de una tecnología. Los artículos presentados en esta tesis son aplicaciones y exploraciones de la combinación de mencionada, a diferentes tecnologías de diversos ámbitos muy dispares entre sí, con el fin de demostrar sus ventajas e implicaciones. Los artículos tratan las diferentes interacciones entre ambos enfoques de trabajo, aplicados a tecnologías como baterías de litio para los vehículos eléctricos, un caso médico ligado a una dolencia como el síndrome de Méniere, a la prognosis del cáncer de próstata y al uso de probióticos en la alimentación animal como sustitución de los antibióticos. Este amplio rango de tecnologías han sido seleccionados para mostrar las ventajas, de forma más objetiva, de la combinación de ambos enfoques o con predominancia de alguno en particular, como es el caso del artículo explorando el síndrome de Méniere. Estas exploraciones permiten también entender mejor el punto de vista sistémico de una tecnología, descubriendo nuevas aplicaciones basadas en la teoría general de sistemas de Bertalanffy así como en otros enfoques relacionados.El propòsit de la present tesi és l'exploració i la demostració de la combinació de dos enfocaments metodològics, la minería de textes i la visió sistémica de TRIZ, reforçada amb la sintáctica i la semántica, mostrant que poden oferir un abast més gran i més holístic en l'enteniment de l'evolució d'una tecnología. Tots dos enfocaments no habían estat combinats abans del primer article dels quatre que composen aquesta tesi, però creixentment combinat dins la literatura científica per a múltiples propostes des de la primera publicació. Una segona aportació proporcionada per aquesta combinació és la millora de la capacitat d'accés al coneixement, i de com això suposa un avanç en l'àrea de recerca a traves de literatures no relacionades "disparate literature discovery" d'una forma metòdica i científica. L'element comú en els articles presentats en aquesta tesi és l'aprofitament de la mineria de textos amb base en la gestió tecnològica, 'techmining', per exemple mitjançant el perfilat de tecnologies, al costat de l'enfocament de la metodologia TRIZ potenciada per l'anàlisi sintàctica i semàntica, mitjançant la conexión semántica de conceptes, per assolir un anàlisi més complet de l'evolució tecnològica, així com per a garantir un accés més racional al coneixement. La investigació de l'aplicació de la combinació dels dos enfocaments a l'anàlisi d'informació científica i tecnològica realizat, exploren els avantatges i noves possibilitats en l'avaluació de l'avanç de tecnologies, així com la conexión de conceptes uqe representa noves possibilitats en la forma en què la investigació textual pot fer-se. L'estructura de la investigació ací presentada es mostra a través dels articles publicats i el capítol dels 'proceedings' d'un congrés internacional. Aquests articles mostren la investigació duta a terme per demostrar els beneficis esmentats. Tot i la gran activitat de recerca i enfocaments per a la prospectiva i la previsió tecnològica existents a la literatura científica, existeix encara la necessitat d'entendre quines tecnologies poden emergir, poden evolucionar més ràpid i a quina velocitat poden arribar al mercat. La combinació dels enfocaments de mineria tecnològica o 'techmining' i TRIZ semàntic permet entendre les tendències d'una tecnologia donada, amb una visió del seu sistema, les connexions dels seus elements i les influències dels elements constituents. Els articles presentats en aquesta tesi són aplicacions i exploracions de la combinació de la metodologia TRIZ, la seva potenciació mitjançant la semàntica i el techmining a diferents tecnologies de diversos àmbits, alguns molt dispars entre si, per tal de demostrar les seves avantatges i implicacions. Els articles tracten les diferents interaccions entre els dos enfocaments de treball, aplicats a tecnologies com bateries de liti per als vehicles elèctrics, un cas mèdic lligat a una malaltia com la síndrome de Ménière, a la prognosi del càncer de pròstata i en alimentació, a l'ús de probiòtics en l'alimentació animal com a substitució dels antibiòtics. Aquest ampli rang de tecnologies han estat seleccionats per mostrar els avantatges de forma més objectiva, de la combinació de tots dos enfocaments o amb predominança d'algun en particular, com és el cas de l'article explorant la síndrome de Ménière. Aquestes exploracions permeten també entendre millor el punt de vista sistèmic d'una tecnologia, descobrint noves aplicacions amb base en la teoria general de sistemes de Bertalanffy així com altres treballs relacionats.Vicente Gomila, JM. (2017). The combination of the disciplines of Techmining and semantic TRIZ for better and faster analyzing technology evolution [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/89088TESISCompendi

Crossref

RiuNet

Scientometrics for tech mining: an introduction

Author: Chiavetta D
Porter AL
Zhang Y
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2017
Field of study

OPUS - University of Technology Sydney

Systematic reviews and tech mining: A methodological comparison with case study

Author: Anderson Patricia F.
Bickett Skye
Doucette Joanne
Herring Pamela
Kepsel Andrea
Lyons Tierney
McLachlan Scott
Shannon Carol
Wu Lin
Publication venue: 'Wiley'
Publication date: 01/12/2018
Field of study

Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/147169/1/jrsm1318_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/147169/2/jrsm1318.pd

Deep Blue Documents at the University of Michigan

KERANGKA PEMANFAATAN KONTEN REPOSITORI BAGI INDUSTRI: STUDI KASUS ANALISIS TRIPLE HELIX RISET BIDANG PANGAN

Author: yaniasih yaniasih
Publication venue: 'Institut Pertanian Bogor'
Publication date: 17/12/2019
Field of study

Sebagian besar universitas dan lembaga riset di Indonesia memiliki repositori lembaga. Setelah hasil riset dikumpulkan dan dikelola dalam repositori, bagaimana pengolahan konten dalam repositori supaya lebih bermanfaat luas? Makalah ini membahas kerangka pemanfaatan konten repositori menjadi produk pengetahuan yang disesuaikan dengan kebutuhan sasaran pengguna. Kajian dilakukan dengan metode deskritif. menggunakan kerangka model pemanfaatan konten repositori bagi industri. Kemudian kerangka dijadikan acuan dalam analisis konten publikasi bidang pangan untuk mengetahui aliran teknologi dari lembaga riset ke industri dalam konsep triple helix. Model kerangka yang diusulkan terdiri dari lima tahap yaitu (1) identifikasi sasaran pengguna,(2) penentuan topik, (3) analisis publikasi, (4) pengemasan produk, dan (5) diseminasi. Sedangkan untuk publikasi konten repositori dianalisis dengan menggunakan metode kajian sains kuantitatif.. . Hasil analisis riset bidang pangan menunjukkan kolaborasi lembaga riset dengan industri masih sangat rendah. Hasil analisis perlu disampaikan kepada pimpinan lembaga, peneliti, pemerintah dan industri dalam bentuk kemasan produk pengetahuan supaya mudah dipahami dan didiseminasikan. Saluran diseminasi dapat melalui menu khusus dalam situs web repositori, media sosial dan forum komunikasi langsung dengan sasaran pengguna.Kata kunci: kemasan informasi, lembaga riset, pangan, repositori, sainstometri

Scientific Journals of Bogor Agricultural University

Jurnal Pustakawan Indonesia

Transfomer Models: From Model Inspection to Applications in Patents

Author: PUCCETTI Giovanni
Publication venue: Scuola Normale Superiore
Publication date: 07/11/2023
Field of study

L'elaborazione del linguaggio naturale viene utilizzata per affrontare diversi compiti, sia di tipo linguistico, come ad esempio l'etichettatura della parte del discorso, il parsing delle dipendenze, sia più specifiche, come ad esempio la traduzione automatica e l'analisi del sentimento. Per affrontare questi compiti, nel tempo sono stati sviluppati approcci dedicati.Una metodologia che aumenta le prestazioni in tutti questi casi in modo unificato è la modellazione linguistica, che consiste nel preaddestrare un modello per sostituire i token mascherati in grandi quantità di testo, in modo casuale all'interno di pezzi di testo o in modo sequenziale uno dopo l'altro, per sviluppare rappresentazioni di uso generale che possono essere utilizzate per migliorare le prestazioni in molti compiti contemporaneamente.L'architettura di rete neurale che attualmente svolge al meglio questo compito è il transformer, inoltre, le dimensioni del modello e la quantità dei dati sono essenziali per lo sviluppo di rappresentazioni ricche di informazioni. La disponibilità di insiemi di dati su larga scala e l'uso di modelli con miliardi di parametri sono attualmente il percorso più efficace verso una migliore rappresentazione del testo.Tuttavia, i modelli di grandi dimensioni comportano una maggiore difficoltà nell'interpretazione dell'output che forniscono. Per questo motivo, sono stati condotti diversi studi per indagare le rappresentazioni fornite da modelli di transformers.In questa tesi indago questi modelli da diversi punti di vista, studiando le proprietà linguistiche delle rappresentazioni fornite da BERT, per capire se le informazioni che codifica sono localizzate all'interno di specifiche elementi della rappresentazione vettoriale. A tal fine, identifico pesi speciali che mostrano un'elevata rilevanza per diversi compiti di sondaggio linguistico. In seguito, analizzo la causa di questi particolari pesi e li collego alla distribuzione dei token e ai token speciali.Per completare questa analisi generale ed estenderla a casi d'uso più specifici, studio l'efficacia di questi modelli sui brevetti. Utilizzo modelli dedicati, per identificare entità specifiche del dominio, come le tecnologie o per segmentare il testo dei brevetti. Studio sempre l'analisi delle prestazioni integrandola con accurate misurazioni dei dati e delle proprietà del modello per capire se le conclusioni tratte per i modelli generici valgono anche in questo contesto.Natural Language Processing is used to address several tasks, linguistic related ones, e.g. part of speech tagging, dependency parsing, and downstream tasks, e.g. machine translation, sentiment analysis. To tackle these tasks, dedicated approaches have been developed over time.A methodology that increases performance on all tasks in a unified manner is language modeling, this is done by pre-training a model to replace masked tokens in large amounts of text, either randomly within chunks of text or sequentially one after the other, to develop general purpose representations that can be used to improve performance in many downstream tasks at once.The neural network architecture currently best performing this task is the transformer, moreover, model size and data scale are essential to the development of information-rich representations. The availability of large scale datasets and the use of models with billions of parameters is currently the most effective path towards better representations of text.However, with large models, comes the difficulty in interpreting the output they provide. Therefore, several studies have been carried out to investigate the representations provided by transformers models trained on large scale datasets.In this thesis I investigate these models from several perspectives, I study the linguistic properties of the representations provided by BERT, a language model mostly trained on the English Wikipedia, to understand if the information it codifies is localized within specific entries of the vector representation. Doing this I identify special weights that show high relevance to several distinct linguistic probing tasks. Subsequently, I investigate the cause of these special weights, and link them to token distribution and special tokens.To complement this general purpose analysis and extend it to more specific use cases, given the wide range of applications for language models, I study their effectiveness on technical documentation, specifically, patents. I use both general purpose and dedicated models, to identify domain-specific entities such as users of the inventions and technologies or to segment patents text. I always study performance analysis complementing it with careful measurements of data and model properties to understand if the conclusions drawn for general purpose models hold in this context as well

Archivio istituzionale della Ricerca - Scuola Normale Superiore