The current study investigates lexical complexity in L2 spoken English. Studies on learner language have shown that vocabulary knowledge is one of the best predictors of overall language proficiency (e.g. Milton, 2013). Different measures of vocabulary knowledge have been proposed in the field of second language acquisition, and lexical complexity indices play a key role among them (e.g. Kyle & Crossley, 2015). However, little is known about different aspects of lexical complexity in spoken production in a second language (L2) since most of the research on vocabulary has been conducted on L2 writing (Koizumi & In’nami, 2013; McCarthy & Jarvis, 2013); also, there is no general agreement about which of the many existing complexity indices perform best for the analysis of spoken texts (Jarvis, 2013). Using corpus methods, this study aims at addressing these research gaps to (i) identify reliable and informative measures of lexical complexity for the analysis of spoken language, and (ii) examine the relationship between lexical complexity and several independent variables related to learners’ characteristics and task design in L2 spoken English exams. Specifically, this study refers to task as a communicative activity used in language tests to elicit L2 production with a focus on meaning within a well-defined communicative context and purpose (Norris, 2016). The Trinity Lancaster Corpus (Gablasova et al., 2019b) is the dataset used in this study. It is a 4.2-million-word corpus based on the Graded Examination in Spoken English (GESE), which is a high-stakes exam of L2 English developed and administered by Trinity College London, a large international examination board. The corpus consists of transcripts of learners’ spoken performance across four tasks which differ in terms of interactivity (a monologue and three dialogues) and topic familiarity (test-takers choose the topic of two tasks). It includes over 2,000 learners at four CEFR proficiency levels (B1 to C2), characterised by different first language (L1) and age. The study used corpus linguistics as a method for the analysis, combining quantitative and qualitative research, as well as data visualisation techniques. RQ1 focuses on the validation of lexical complexity indices to identify stable and reliable measures for the analysis of spoken language production. Lex Complexity Tool, an automated tool based on a Python code, was developed for the computation of lexical complexity scores. The tool includes existing indices of lexical complexity and new metrics that were developed for this study. It also contains a new wordlist extracted from a large reference corpus of spontaneous L1 English speech, the Spoken BNC2014 (Love et al., 2017), selected to match the language mode of the target learner corpus. Inter-index correlations and linear regression analyses were run to investigate the lexical indices measured by Lex Complexity Tool, examining their stability across variations of text length and their ability to capture differences across learner and task-related features. The results help to identify a selection of lexical metrics to investigate RQ2 and RQ3. The findings show that lexical sophistication indices based on content words are more informative than general metrics of sophistication. Among lexical diversity metrics, MTLD is the most stable across texts of different length and the most sensitive to task variations, MATTR displays the largest associations with learners’ proficiency, and HD-D shows the greatest sensitivity to their L1. RQ2 examines whether L2 speakers’ characteristics affect lexical complexity in spoken production. Three independent variables were investigated, i.e. learners’ proficiency level, L1, and age. Factorial ANOVA showed that these three variables have a main effect on lexical complexity measures (η2 ≤.36). The post-hoc test highlighted differences between proficiency and L1 groups, while Pearson’s correlations were used to further investigate the effect of age. The results suggest that proficient learners tend to produce longer texts, characterised by more diverse vocabulary, more infrequent function words, and more frequent content words compared to speakers at low proficiency level. Learners’ L1 also has a main effect on lexical scores. Learners whose L1 is typologically closer to English (e.g. Spanish) appear to produce less diverse vocabulary, and more frequent lexical items compared to learners whose L1 is typologically more distant from English (e.g. Chinese). With respect to learners’ age, the results show that older learners produce longer texts, characterised by less diverse but more sophisticated vocabulary compared to learners who are in their compulsory school age. Among the possible explanations for these findings, the production of polysemous and cognate words is discussed, as well as L1 transfer and educational factors. RQ3 examines whether task-related features (interactivity and topic familiarity) affect lexical complexity in L2 spoken production. Repeated-measures ANOVA tests showed that task interactivity has a significant main effect on all lexical complexity measures (η2 ≤.43), while topic familiarity has a significant main effect on lexical density and sophistication (η2 ≤.35). The findings suggest that interactive speech is characterised by less diverse and less sophisticated vocabulary than monologic production. In terms of topic familiarity, learners produce shorter texts characterised by higher scores of lexical density and a larger number of sophisticated verbs and nouns when the conversation centres around a topic of their choice. Factors linked to register variation, real-time processing, and social features of language use are discussed. This study complements and adds to previous research on lexical complexity. Its methodological contribution involves producing an automated tool to compute lexical scores and identifying reliable indices to measure lexical complexity in spoken language. The theoretical contribution draws upon this methodological advance by examining the relationships between lexical complexity, learners’ characteristics, and task design. In doing so, this research develops an account of lexical patterns in L2 English speech which broadens our understanding of L2 users’ vocabulary knowledge, informs learner corpus research and second language acquisition, and has practical implications for language testing and teaching