Unlocking the Complexity of Legal Language: Legal Language
Corpus Construction

k 2024

Unlocking the Complexity of Legal Language: Legal Language Corpus Construction

GLOGAR, Ondřej

Basic information

Original name

Unlocking the Complexity of Legal Language: Legal Language Corpus Construction

Name in Czech

Rozkrývání komplexnosti právního jazyka: Konstrukce korpusu právního jazyka

Authors

GLOGAR, Ondřej

Edition

IVR World Congress, 2024

Other information

Language

English

Type of outcome

Prezentace na konferencích

Country of publisher

Republic of Korea

Confidentiality degree

není předmětem státního či obchodního tajemství

Organization unit

Theatre Faculty

Keywords (in Czech)

právní jazyk; jazykový korpus; korpusová lingvistika; právní diskurs

Keywords in English

legal language; language corpus; corpus linguistics; legal discourse

Abstract

V originále

While legal language's significance is well-recognized, research often lacks empirical data and comprehensive coverage. Legal theorists predominantly rely on personal linguistic experiences, underscoring the need for more robust methodologies (as noted, for instance, by Mouritsen, 2017). In this context, the integration of corpus linguistics proves invaluable for the analysis of legal language. Language corpora include large linguistic data accessible by software, facilitating easy testing of linguistic hypotheses. Therefore, if we want to investigate the actual linguistic reality, i.e., how (legal) language is used in practice and everyday life, the use of such a tool is essential. Thus, with such means, we can examine how law relates to language, and inherently, we can also come to a deeper understanding of law itself. This paper responds to the need for more empirical-based exploration in legal language research (and by extension, legal research) and shows, through a practical example, how corpus linguistics can be used in legal scholarship. In this paper, I introduce my own corpus of legal Czech, and in particular, the process of its creation. Among other things, this corpus is intended to fill a gap in existing legal corpus linguistics stemming from the fact that existing legal corpora tend to focus narrowly on specific genres, such as case law or statutes, rather than providing a complex picture. Thus, the main goal of the corpus I am developing is a balanced and comprehensive corpus that includes representatives of all genres of legal language (cf. Tiersma, 2000). Drawing upon insights from applied linguistics literature (e.g., Meyer, 2002), I carefully analyse criteria for sample collection and segmentation, adapting them to the unique demands of legal language. The resulting corpus aims to encompass a broad spectrum of Czech legal texts, ensuring representation across different legal branches, speakers, and genres. Delving into the intricacies of the corpus creation process, I elucidate how the methodology contributes to a more robust comprehension of legal language. By presenting the rationale behind the corpus design and adaptations made for legal language specificity, this paper contributes valuable insights to the ongoing discourse on the role of language in legal discourse.

Citovat

GLOGAR, Ondřej. Unlocking the Complexity of Legal Language: Legal Language Corpus Construction. In IVR World Congress. 2024.

@proceedings{14198,
   author = {Glogar, Ondřej},
   booktitle = {IVR World Congress},
   keywords = {legal language; language corpus; corpus linguistics; legal discourse},
   language = {eng},
   title = {Unlocking the Complexity of Legal Language: Legal Language Corpus Construction},
   year = {2024}
}

TY  - CONF
ID  - 14198
AU  - Glogar, Ondřej
PY  - 2024
TI  - Unlocking the Complexity of Legal Language: Legal Language Corpus Construction
KW  - legal language
KW  - language corpus
KW  - corpus linguistics
KW  - legal discourse
N2  - While legal language's significance is well-recognized, research often lacks empirical data and comprehensive coverage. Legal theorists predominantly rely on personal linguistic experiences, underscoring the need for more robust methodologies (as noted, for instance, by Mouritsen, 2017). In this context, the integration of corpus linguistics proves invaluable for the analysis of legal language. Language corpora include large linguistic data accessible by software, facilitating easy testing of linguistic hypotheses. Therefore, if we want to investigate the actual linguistic reality, i.e., how (legal) language is used in practice and everyday life, the use of such a tool is essential. Thus, with such means, we can examine how law relates to language, and inherently, we can also come to a deeper understanding of law itself. This paper responds to the need for more empirical-based exploration in legal language research (and by extension, legal research) and shows, through a practical example, how corpus linguistics can be used in legal scholarship. In this paper, I introduce my own corpus of legal Czech, and in particular, the process of its creation. Among other things, this corpus is intended to fill a gap in existing legal corpus linguistics stemming from the fact that existing legal corpora tend to focus narrowly on specific genres, such as case law or statutes, rather than providing a complex picture. Thus, the main goal of the corpus I am developing is a balanced and comprehensive corpus that includes representatives of all genres of legal language (cf. Tiersma, 2000). Drawing upon insights from applied linguistics literature (e.g., Meyer, 2002), I carefully analyse criteria for sample collection and segmentation, adapting them to the unique demands of legal language. The resulting corpus aims to encompass a broad spectrum of Czech legal texts, ensuring representation across different legal branches, speakers, and genres. Delving into the intricacies of the corpus creation process, I elucidate how the methodology contributes to a more robust comprehension of legal language. By presenting the rationale behind the corpus design and adaptations made for legal language specificity, this paper contributes valuable insights to the ongoing discourse on the role of language in legal discourse.
ER  -

GLOGAR, Ondřej. Unlocking the Complexity of Legal Language: Legal Language Corpus Construction. In \textit{IVR World Congress}. 2024.

Detailed Information on Publication Record