1. Instructing Polish students about collecting texts, explaining the purpose of the project
2. Filling in metadata questionnaires
- About the author
- Person ID, e.g. TOU_H305
- Gender
- Age
- Age category: 6-11; 12-15; 16
- First language: two-character code according to ISO 639-1
- First language group: Indo-European (IE), non-Indo-European (nIE), Slavic (S)
- Knowledge of other foreign languages: ISO code
- Proficiency in Czech at the time of writing: A1; A1+; A2; A2+; B1; B2; C1; C2
- Knowledge of Czech in the family: mother; father; partner; siblings; others, nobody
- Length of stay in Czechia in years: –1; 1; 2; 2–
- Completed or current Czech language courses: individual, commercial, self-taught, university, abroad, primary school; secondary school, other
- Intensity of Czech language tuition in hours per week: –3; 5–15; 15–
- Textbooks used: Basic Czech (BC), Communicative Czech (CC), Čeština pro economy (CE), Chcete mluvit česky? (CMC), Čeština pro cizince (CpC), Easy Czech Elementary (ECE), New Czech Step by Step (NCSS), other
- bilingual: yes; no
b. About the text
- Text ID, e.g. TOU_H305_442
- Date when the text was collected (YYYY-MM-DD)
- Medium of the text: manuscript; PC
- Time limit for writing the text in minutes: 10; 15; 20; 30; 40; 45; 60; other; no
- Additional help during writing the text: dictionary; student’s book; other; no
- Was the text written during an exam: interim; final; no
- Limit in words
- Title of the text, e.g. The event that changed my life
- Topic type: general; specific
- Activity before writing the text: practice; discussion; visual; vocabulary; other; no
- Ability to choose the topic: selection from many; assigned topic; any; other
- Genre: any; assigned
- Actual text type: informative; descriptive; opinion; short story
- Actual number of words
3. Collecting students’ essays
- 10 doc files
- 109 manuscripts
4. Digitization of files
- Transcription of manuscripts
- Entering metadata into a spreadsheet file
5. Release of an extended version of CzeSL-SGT with automatic annotation in KonText (the Czech team):
-
- Automatic error annotation (suggested corrections from a spell/grammar checker, error type identifier)
- Automatic linguistic annotation (tags and lemmas for the original and the corrected form)
- Adding metadata
6. Preparation of tasks for Polish students of Czech based on CzeSL-SGT (the Polish team, work in progress)
7. Release of an extended version of CzeSL with manual error annotation in TEITOK and KonText (the Czech team, to do)
-
- Manual multi-level correction in TEITOK, using existing manual annotation in feat (the Czech team, work in progress)
- Automatic linguistic annotation
CzeSL: http://utkl.ff.cuni.cz/learncorp
KonText: https://kontext.korpus.cz
TEITOK: http://teitok.corpuswiki.org