Eingabe löschen




The Detection of Learner Difficulties from Unannotated Corpora

24. April 2019

Talk by PD Dr. Gerold Schneider, University of Zurich

We detect typical learner errors and areas of difficulty by using learner corpora and parallel corpora in a data-driven fashion, using machine learning and collocation statistics. Our aim is to deliver targeted teaching material to students, helping them to avoid typical pitfalls, and to allow them to explore idioms and subtle differences between related words.

We focus on collocation errors in English as target language, in particular prepositional constructions, like verbs with PP complements (e.g. depend on), adjectives with PP complements (e.g. responsible for), and phrasal verbs (e.g. turn down). We refer to all of them as VPP in the following. The learner corpus that we focus on (ICLE) has not been annotated for errors, which makes it challenging to detect errors, but given its size allows us to detect hundreds of typical errors.

We use collocation metrics to detect errors, coupled with a large native speaker corpus (BNC). Typical VPP errors can be detected as they reach high collocational status in the learner corpus, compared to a much lower collocation value in the native corpus.

We further present machine learning techniques to detect learner difficulties and subtle linguistics differences using parallel corpora (Europarl). We also describe features of learner language such as the overuse of the few most frequent idioms, paired with underuse of most others – the so-called Teddy bear effect, and explore vocabulary profiles. 


Von: 24. April 2019, 17.00 Uhr
Bis: 24. April 2019, 18.00 Uhr


Winterthur, Theaterstrasse 15c, Room: SM O1.05


ILC Institute of Language Competence