Specifics of the acquisition of a closely related language in a corpus of Czech produced by Polish learners

Elżbieta Kaczmarska1, Gabriela Gawrońska2

University of Warsaw1,2


With Polish (L1) as a language closely related to Czech (L2), a strong L1 interference is observed at all levels – pronunciation, morphology, syntax, lexicon, including phraseology (false friends), and metalinguistic communication.  To make teaching (and learning) more efficient, we need to focus on specific weaknesses and strengths of the learner on any level. To identify them, both incorrect and correct use of Czech by the learners should be studied.  For this purpose, we build a corpus of Czech texts produced by Polish students by extending the L1 Polish – L2 Czech subcorpus of CzeSL (Czech as a Second Language), a learner corpus built at Charles University in Prague.

Before the start of our project, the Polish–Czech subcorpus of CzeSL was quite small (77 texts, 15 thousand words). Currently the Polish–Czech subcorpus of CzeSL is significantly larger (200 texts, 60 thousand words).

Extending the CzeSL corpus

Analysis of errors

Rozwijanie uczniowskiego korpusu języka czeskiego CzeSL

Analiza błędów