Difference between revisions of "About Refugee TermiKnowledge"

From Refugee Terminology
Jump to navigation Jump to search
(Created page with "Refugee Terminology is an outcome of TermiKnowledge, an educational project within the framework of the [https://4euplus.eu/4EU-1.html 4EU+ Alliance] running in the academic y...")
 
Line 4: Line 4:


=== About the Project ===
=== About the Project ===
Our team consisted of 6 teachers [LINK] and more than 20 student participants (3–8 students per university + 1 Student Assistant per university). [LINK]
Our team consisted of six [[Instructors|teachers]] and more than twenty [[Students|student participants]] (three to eight students per university + one Student Assistant per university).


Our goal in the 2<sup>nd</sup> semester was to compile a multilingual knowledge base on refugee crisis. There was a preparatory stage, where activities included the sharing of experiences and giving hints and tips to newcomers to the project by students who took part in compiling the [https://terminology.mimuw.edu.pl/index.php?title=Main_Page COVID-19 knowledge base]. Recordings of the lectures delivered during the first semesters were available, and students were encouraged to watch them when prompted to do so during the course.
Our goal in the 2<sup>nd</sup> semester was to compile a multilingual knowledge base on refugee crisis. There was a preparatory stage, where activities included the sharing of experiences and giving hints and tips to newcomers to the project by students who took part in compiling the [https://terminology.mimuw.edu.pl/index.php?title=Main_Page COVID-19 knowledge base]. Recordings of the lectures delivered during the first semesters were available, and students were encouraged to watch them when prompted to do so during the course.


Unlike the sister project on COVID-19, one corpus was compiled for each language. Each corpus comprised texts representing three genres, namely normative texts (legal regulations such as international conventions, e.g. the 1951 Geneva convention and its translations into Czech, German, Italian and Polish, EU directives, and national regulations), research texts (published research papers) and press texts (general press).  Since all texts in a particular language are stored in one corpus, subcorpus work remains a possibility. The corpora are hosted on SketchEngine.  
Unlike the sister project on COVID-19, one corpus was compiled for each language. Each corpus comprised texts representing three genres, namely normative texts (legal regulations such as international conventions, e.g. the 1951 Geneva convention and its translations into Czech, German, Italian and Polish, EU directives, and national regulations), research texts (published research papers) and press texts (general press). Since all texts in a particular language are stored in one corpus, subcorpus work remains a possibility. The corpora are hosted on SketchEngine.  


When the corpora were ready, keyword lists were generated and we used them to identify and agree on a list of more than 30 key concepts. This was done in stages, with the first batch consisting of 10 entries, originally in English. Equivalent terms in Czech, German, Italian and Polish (the languages of the countries where our universities are) were then identified and knowledge base entries were compiled in these languages and English. The knowledge base entries were based on text excerpted from the corpora in the five languages.  
When the corpora were ready, keyword lists were generated and we used them to identify and agree on a list of more than 30 key concepts. This was done in stages, with the first batch consisting of 10 entries, originally in English. Equivalent terms in Czech, German, Italian and Polish (the languages of the countries where our universities are) were then identified and knowledge base entries were compiled in these languages and English. The knowledge base entries were based on text excerpted from the corpora in the five languages.  
Line 14: Line 14:
The reference corpora for establishing keyword lists were the default national web-crawler corpora suggested by Sketch Engine.  
The reference corpora for establishing keyword lists were the default national web-crawler corpora suggested by Sketch Engine.  


Some technical parameters regarding our corpora can be found here [LINK].  
Some technical parameters regarding our corpora can be found [[Information about the corpora|here]].  


In the last batch of entries, we included four that were attested in just one corpus, but were deemed characteristic of the refugee discourse in that language. These comprised: ''[[tendopoli]]'' for the Italian corpus, ''[[Willkommenskultur (DE)|Willkommenskultur]]'' for the German corpus, ''[[Pushback (PL)|pushback]]'' for the Polish corpus and ''[[uprchlické kvóty]]'' for the Czech corpus. Equivalent entries in the other languages were often very poorly developed due to the scarcity of occurrences of equivalent terms in the respective corpora.
In the last batch of entries, we included four that were attested in just one corpus, but were deemed characteristic of the refugee discourse in that language. These comprised: ''[[tendopoli]]'' for the Italian corpus, ''[[Willkommenskultur (DE)|Willkommenskultur]]'' for the German corpus, ''[[Pushback (PL)|pushback]]'' for the Polish corpus and ''[[uprchlické kvóty]]'' for the Czech corpus. Equivalent entries in the other languages were often very poorly developed due to the scarcity of occurrences of equivalent terms in the respective corpora.

Revision as of 06:14, 30 July 2022

Refugee Terminology is an outcome of TermiKnowledge, an educational project within the framework of the 4EU+ Alliance running in the academic year 2021–2022 for students of the University of Warsaw, Charles University in Prague, Heidelberg University and Milan University.

The website of our sister project on COVID-19-related terminology can be found here.

About the Project

Our team consisted of six teachers and more than twenty student participants (three to eight students per university + one Student Assistant per university).

Our goal in the 2nd semester was to compile a multilingual knowledge base on refugee crisis. There was a preparatory stage, where activities included the sharing of experiences and giving hints and tips to newcomers to the project by students who took part in compiling the COVID-19 knowledge base. Recordings of the lectures delivered during the first semesters were available, and students were encouraged to watch them when prompted to do so during the course.

Unlike the sister project on COVID-19, one corpus was compiled for each language. Each corpus comprised texts representing three genres, namely normative texts (legal regulations such as international conventions, e.g. the 1951 Geneva convention and its translations into Czech, German, Italian and Polish, EU directives, and national regulations), research texts (published research papers) and press texts (general press). Since all texts in a particular language are stored in one corpus, subcorpus work remains a possibility. The corpora are hosted on SketchEngine.

When the corpora were ready, keyword lists were generated and we used them to identify and agree on a list of more than 30 key concepts. This was done in stages, with the first batch consisting of 10 entries, originally in English. Equivalent terms in Czech, German, Italian and Polish (the languages of the countries where our universities are) were then identified and knowledge base entries were compiled in these languages and English. The knowledge base entries were based on text excerpted from the corpora in the five languages.

The reference corpora for establishing keyword lists were the default national web-crawler corpora suggested by Sketch Engine.

Some technical parameters regarding our corpora can be found here.

In the last batch of entries, we included four that were attested in just one corpus, but were deemed characteristic of the refugee discourse in that language. These comprised: tendopoli for the Italian corpus, Willkommenskultur for the German corpus, pushback for the Polish corpus and uprchlické kvóty for the Czech corpus. Equivalent entries in the other languages were often very poorly developed due to the scarcity of occurrences of equivalent terms in the respective corpora.

Entry Structure

Entry structure generally parallels that in the COVID-19 knowledge base, but some changes were introduced during a meeting of all students and teachers.

At the top of an entry is a page navigation menu, below which is a line of links to equivalent articles in the other languages.

The first field in an entry is “Related terms” with the relation specified in natural language in each case. Most, but not all, related terms are also headwords in the knowledge base. The links to other headwords are always provided. Related terms that do not have entries in the database are generally not unique to the refugee issue, e.g., the related terms for the headword residence permit include ‘right to education’. All related terms which are included in the knowledge base are clickable links.

The relations are presented in natural language in an italic font and placed in brackets to the left or right of the related term, depending on its syntactic position (e.g., subject vs object) in the sentence. The phrases describing relations are not complete sentences as the headword is never repeated or even replaced with a pronoun.

The fields within each section relate to conceptual and linguistic aspects of the headword.

The conceptual domain is reflected in definitions, which are often followed by encyclopaedic information.

Nearly all definitions and pieces of encyclopaedic information are direct quotations from texts in the respective corpora. The quotations were not altered in any way. In some cases, we used definitions from texts not included in the relevant corpus. External sources are always indicated as such.

Sources of definitions, encyclopaedic information and examples are always indicated with clickable links to the documents from which they were retrieved. The links generally show only the originating website rather than the entire access path, but this varies between languages.

The linguistic fields begin with Variants and Synonyms. As in the COVID-19 knowledge base, we decided not to distinguish between (spelling, morphological, etc.) variants and true synonyms. There are generally fewer variants and synonyms than you can see in entries in the previous project, which most probably reflects greater stability of the terminology. All synonyms and variants can be your starting point for accessing entries through the search box.

The remaining fields providing linguistic information comprise Collocations, followed by Examples. The collocations generally begin with adjectival collocations for the headword, followed by collocations of the headword with other nouns (as headwords or subordinate elements), and then by verbal collocations with the headwords as the object/complement and subject, in that order.

Sentences that serve as Examples were selected from among those that did not quite qualify as definitions/descriptions, for example, because they did not contain general information. The presence of frequent collocations also played a role. There was a limit of up to three examples per headword, with additional examples allowed for variants or synonyms, if deemed necessary.