Completing the English Vocabulary Profile : C1 and C2 vocabulary


1. The English Vocabulary Profile

The decision to move away from the original working title of ‘Wordlists’ reflects how this project has grown, not just in terms of its coverage of vocabulary, to include many more phrases, phrasal verbs and idioms rather than just words, but also the way in which the current online resource has been developed to provide a fully interactive database, instead of being a static listing of vocabulary. Even now, when the compiling process has come to an end with the inclusion of C1 and C2 level data, the resource is not (and never will be) set in stone. It represents the extent of the description that is currently achievable given the learner data and other sources that the project team has access to.

It goes without saying that the resource will need to be regularly monitored and refined, partly to keep it up to date but equally to ensure that it accurately reflects typical learner competence. As additional learner evidence becomes available in the form of spoken and non-exam written data – the Cambridge English Profile Corpus – and as more people use the resource and give their feedback on it, this community project will be honed and augmented.

Public access to the resource is currently limited to the A and B levels of the British-English and American-English versions, which have both been validated over a twelve-month period (see Section 3 below). At the time of writing (February 2012), the resource is available for free on the main English Profile website. It is hoped that the complete six-level resource will be available on general release in May 2012.

2. Coverage within the six levels of the English Vocabulary Profile

For CEFR levels A1 to B2, the rationale for inclusion and decisions on level have focused on the vocabulary that learners around the world seem to know and use. To establish this, we have referred to a range of sources, including written learner data in the Cambridge Learner Corpus, first language corpus data, exam wordlists, and wordlists in coursebooks and other classroom materials. All of the draft entries compiled for the A1−B2 version were reviewed by experienced English language teaching professionals, and other experts were involved in the later validation phase of these levels (see Section 3).

As Section 3 of Capel (Reference Capel2010) suggests, the gap between receptive understanding and productive use at these levels may not be as wide as some people have claimed (see Melka Reference Melka, Schmitt and McCarthy1997). Modern communicative classrooms encourage far more spoken practice than was the case a generation ago and outside the classroom there are endless opportunities for actively using new language, through mobile technology and the Internet. For this reason, we have not made a distinction between receptive knowledge and productive use up to B2 level.

For the C levels, the methodology is somewhat different (see Section 4 below). Here, receptive knowledge is likely to be broader than actual productive range. Learners will also be using skills of deduction to process unknown words and phrases in context, a strategy that is commonly introduced at the B2 level and is standard practice at the C levels, where there is a need to process large amounts of ungraded text in a field of work or study. Given the domain-specificity at these higher levels and the lack of coursebook wordlists at C1 and C2, we have focused on core vocabulary and have based our research at these levels on actual learner evidence, frequency information from first language corpora, and additional sources for Academic English.

3. Validation phase of the A and B levels

The evaluation and validation of the first four levels of the resource aimed to test the usability of the online platform, to verify the decisions taken on CEFR levels and to assess the actual coverage, with a view to adding anything relevant at A1−B2 that had been inadvertently omitted. To this end, password access was provided to known user groups, notably Cambridge University Press authors, editors and lexicographers, and Cambridge ESOL item writers and exam developers, who worked with the resource over a twelve-month period and submitted detailed comments via the feedback button. These comments were acted on and any apparent level discrepancies were further researched, with revisions often made as a result. An online questionnaire was also completed by these users, which largely focused on the first aim, usability.

Specific validation tasks were carried out by academics based in Tokyo, Miami, Cambridge and Nottingham. In Tokyo, Professor Masashi Negishi and colleagues at Tokyo University of Foreign Studies developed a phrasal verbs test to assess the accuracy of the CEFR levels assigned, which was administered to more than 2,500 students in Japan. This test was also administered to smaller groups of learners in Spain and the Czech Republic.

At Miami Dade University, Dr Michelle Thomas validated the American English version. At the University of Cambridge, Professor John Hawkins and Dr Luna Filipović used the resource during their work on criterial features (Hawkins & Filipović, forthcoming). Cambridge ESOL’s Research and Validation expert Dr Angeliki Salamoura carried out quantitative validation research on the A1−B2 data in June 2011, which is described below in Section 4.

Dr Ron Martínez at the University of Nottingham carried out extensive analysis of the phrases in the pilot version, using his own PhD research (Martínez Reference Martínez2011), a list of phrasal expressions based on native-speaker frequency in the British National Corpus. As a result, some 200 ‘missing’ phrases were flagged for possible inclusion, either within the AB levels or at the C levels. Some of these phrases were in fact ‘embedded’ in dictionary examples for individual senses rather than omitted altogether, but in several cases it was decided to raise their profile by recording them separately. An interesting example of this policy is the phrase a number of meaning ‘several’, which was embedded in the B1 sense amount and later became a separate phrase entry at B2. A large proportion of the truly missing phrases turned out to be more suited to the C levels and were added to the subsequent compilation process. For further discussion of this aspect of the project, see Section 6 below.

IELTS Online Exam & Sample Questions
