How to Build a Corpus of Legal Language: Ensuring its
Representativeness

k 2023

How to Build a Corpus of Legal Language: Ensuring its Representativeness

GLOGAR, Ondřej

Basic information

Original name

How to Build a Corpus of Legal Language: Ensuring its Representativeness

Name in Czech

Jak vytvořit korpus právního jazyka: Zajištění jeho reprezentativnosti

Authors

GLOGAR, Ondřej

Edition

Argumentation 2023: International Conference on Alternative Methods of Argumentation in Law, 2023

Other information

Language

English

Type of outcome

Prezentace na konferencích

Country of publisher

Czech Republic

Confidentiality degree

není předmětem státního či obchodního tajemství

References:

URL

Organization unit

Theatre Faculty

Keywords (in Czech)

právní jazyk; jazykový korpus; právní pragmatika; aplikovaná lingvistika

Keywords in English

legal language; language corpus; legal pragmatics; applied linguistics

Abstract

V originále

Although the premise of the importance of language for law has resonated in legal theory for some time, existing research on legal language either lacks findings supported by sufficient data or does not cover all aspects of legal language. In particular, it may seem problematic that legal theorists, with few exceptions, describe legal language based solely on their own linguistic experience and a random selection of examples (as noted, for instance, by Mouritsen, 2017). One way of avoiding this problem of intuition and lack of empirical data is to use a language corpus that reflects the actual use of the language in everyday practice. A standard corpus thus collects a range of texts that are accessible by software, so that (mainly linguistic) hypotheses can be easily tested. And although there are already some corpora focused on legal language, they usually capture only a narrow segment or only a specific genre (e.g. a corpus covering only case law or statutes). Therefore, it is advisable to conceive of a comprehensive and balanced corpus including representatives from each genre of legal language. However, we may encounter many intersections when creating such a corpus and we need a suitable methodology first. In my paper, I thus discuss the various risks and procedures to be considered when building such a corpus. Through an analysis of the applied linguistics literature (e.g. Meyer, 2002), I evaluate the individual criteria for sample collection and segmentation and adapt them to the specifics of legal language. Perhaps the most important of these seems to be the question of the representativeness of such a corpus, which is the focus of the paper. The criteria for the selection of texts and utterances must necessarily differ from those of general language, as the different legal branches, legal language speakers, as well as genres of legal language need to be taken into account (cf. Tiersma, 2000, Cao, 2007). The main aim of this paper is to present reflections on the design and methodology for the creation of such a corpus of legal language, with a particular focus on its representativeness.

Citovat

GLOGAR, Ondřej. How to Build a Corpus of Legal Language: Ensuring its Representativeness. In Argumentation 2023: International Conference on Alternative Methods of Argumentation in Law. 2023.

@proceedings{14196,
   author = {Glogar, Ondřej},
   booktitle = {Argumentation 2023: International Conference on Alternative Methods of Argumentation in Law},
   keywords = {legal language; language corpus; legal pragmatics; applied linguistics},
   language = {eng},
   title = {How to Build a Corpus of Legal Language: Ensuring its Representativeness},
   url = {https://argumentation.law.muni.cz/},
   year = {2023}
}

TY  - CONF
ID  - 14196
AU  - Glogar, Ondřej
PY  - 2023
TI  - How to Build a Corpus of Legal Language: Ensuring its Representativeness
KW  - legal language
KW  - language corpus
KW  - legal pragmatics
KW  - applied linguistics
UR  - https://argumentation.law.muni.cz/
N2  - Although the premise of the importance of language for law has resonated in legal theory for some time, existing research on legal language either lacks findings supported by sufficient data or does not cover all aspects of legal language. In particular, it may seem problematic that legal theorists, with few exceptions, describe legal language based solely on their own linguistic experience and a random selection of examples (as noted, for instance, by Mouritsen, 2017). One way of avoiding this problem of intuition and lack of empirical data is to use a language corpus that reflects the actual use of the language in everyday practice. A standard corpus thus collects a range of texts that are accessible by software, so that (mainly linguistic) hypotheses can be easily tested. And although there are already some corpora focused on legal language, they usually capture only a narrow segment or only a specific genre (e.g. a corpus covering only case law or statutes). Therefore, it is advisable to conceive of a comprehensive and balanced corpus including representatives from each genre of legal language. However, we may encounter many intersections when creating such a corpus and we need a suitable methodology first. In my paper, I thus discuss the various risks and procedures to be considered when building such a corpus. Through an analysis of the applied linguistics literature (e.g. Meyer, 2002), I evaluate the individual criteria for sample collection and segmentation and adapt them to the specifics of legal language. Perhaps the most important of these seems to be the question of the representativeness of such a corpus, which is the focus of the paper. The criteria for the selection of texts and utterances must necessarily differ from those of general language, as the different legal branches, legal language speakers, as well as genres of legal language need to be taken into account (cf. Tiersma, 2000, Cao, 2007). The main aim of this paper is to present reflections on the design and methodology for the creation of such a corpus of legal language, with a particular focus on its representativeness.
ER  -

GLOGAR, Ondřej. How to Build a Corpus of Legal Language: Ensuring its Representativeness. In \textit{Argumentation 2023: International Conference on Alternative Methods of Argumentation in Law}. 2023.

Detailed Information on Publication Record