Detailed Information on Publication Record
2023
How to Build a Corpus of Legal Language: Ensuring its Representativeness
GLOGAR, OndřejBasic information
Original name
How to Build a Corpus of Legal Language: Ensuring its Representativeness
Name in Czech
Jak vytvořit korpus právního jazyka: Zajištění jeho reprezentativnosti
Authors
Edition
Argumentation 2023: International Conference on Alternative Methods of Argumentation in Law, 2023
Other information
Language
English
Type of outcome
Prezentace na konferencích
Country of publisher
Czech Republic
Confidentiality degree
není předmětem státního či obchodního tajemství
References:
Organization unit
Theatre Faculty
Keywords (in Czech)
právní jazyk; jazykový korpus; právní pragmatika; aplikovaná lingvistika
Keywords in English
legal language; language corpus; legal pragmatics; applied linguistics
Tags
International impact
Změněno: 19/9/2024 14:55, Mgr. Ondřej Glogar
Abstract
V originále
Although the premise of the importance of language for law has resonated in legal theory for some time, existing research on legal language either lacks findings supported by sufficient data or does not cover all aspects of legal language. In particular, it may seem problematic that legal theorists, with few exceptions, describe legal language based solely on their own linguistic experience and a random selection of examples (as noted, for instance, by Mouritsen, 2017). One way of avoiding this problem of intuition and lack of empirical data is to use a language corpus that reflects the actual use of the language in everyday practice. A standard corpus thus collects a range of texts that are accessible by software, so that (mainly linguistic) hypotheses can be easily tested. And although there are already some corpora focused on legal language, they usually capture only a narrow segment or only a specific genre (e.g. a corpus covering only case law or statutes). Therefore, it is advisable to conceive of a comprehensive and balanced corpus including representatives from each genre of legal language. However, we may encounter many intersections when creating such a corpus and we need a suitable methodology first. In my paper, I thus discuss the various risks and procedures to be considered when building such a corpus. Through an analysis of the applied linguistics literature (e.g. Meyer, 2002), I evaluate the individual criteria for sample collection and segmentation and adapt them to the specifics of legal language. Perhaps the most important of these seems to be the question of the representativeness of such a corpus, which is the focus of the paper. The criteria for the selection of texts and utterances must necessarily differ from those of general language, as the different legal branches, legal language speakers, as well as genres of legal language need to be taken into account (cf. Tiersma, 2000, Cao, 2007). The main aim of this paper is to present reflections on the design and methodology for the creation of such a corpus of legal language, with a particular focus on its representativeness.