k 2023

How to Build a Corpus of Legal Language: Ensuring its Representativeness

GLOGAR, Ondřej

Basic information

Original name

How to Build a Corpus of Legal Language: Ensuring its Representativeness

Name in Czech

Jak vytvořit korpus právního jazyka: Zajištění jeho reprezentativnosti

Authors

Edition

Argumentation 2023: International Conference on Alternative Methods of Argumentation in Law, 2023

Other information

Language

English

Type of outcome

Prezentace na konferencích

Country of publisher

Czech Republic

Confidentiality degree

není předmětem státního či obchodního tajemství

References:

Organization unit

Theatre Faculty

Keywords (in Czech)

právní jazyk; jazykový korpus; právní pragmatika; aplikovaná lingvistika

Keywords in English

legal language; language corpus; legal pragmatics; applied linguistics

Tags

International impact
Změněno: 19/9/2024 14:55, Mgr. Ondřej Glogar

Abstract

V originále

Although the premise of the importance of language for law has resonated in legal theory for some time, existing research on legal language either lacks findings supported by sufficient data or does not cover all aspects of legal language. In particular, it may seem problematic that legal theorists, with few exceptions, describe legal language based solely on their own linguistic experience and a random selection of examples (as noted, for instance, by Mouritsen, 2017). One way of avoiding this problem of intuition and lack of empirical data is to use a language corpus that reflects the actual use of the language in everyday practice. A standard corpus thus collects a range of texts that are accessible by software, so that (mainly linguistic) hypotheses can be easily tested. And although there are already some corpora focused on legal language, they usually capture only a narrow segment or only a specific genre (e.g. a corpus covering only case law or statutes). Therefore, it is advisable to conceive of a comprehensive and balanced corpus including representatives from each genre of legal language. However, we may encounter many intersections when creating such a corpus and we need a suitable methodology first. In my paper, I thus discuss the various risks and procedures to be considered when building such a corpus. Through an analysis of the applied linguistics literature (e.g. Meyer, 2002), I evaluate the individual criteria for sample collection and segmentation and adapt them to the specifics of legal language. Perhaps the most important of these seems to be the question of the representativeness of such a corpus, which is the focus of the paper. The criteria for the selection of texts and utterances must necessarily differ from those of general language, as the different legal branches, legal language speakers, as well as genres of legal language need to be taken into account (cf. Tiersma, 2000, Cao, 2007). The main aim of this paper is to present reflections on the design and methodology for the creation of such a corpus of legal language, with a particular focus on its representativeness.