The Parsed Corpus of Scottish Correspondence

The Parsed Corpus of Scottish Correspondence (PCSC) is a syntactically-annotated corpus of correspondence data produced by Scottish writer 1540-1750. 

Under the Research tab, you can find examples of studies carried out by myself on the PCSC data.

Annotation

The annotation is in the format of the Penn Parsed Corpora of Historical English [1]. More information, and the annotation manual, can be found here.

Data source

The PCSC data is taken from the Helsinki Corpus of Scottish Correspondence (ScotsCorr), compiled by Annelie Meurman-Solin and the VARIENG team [2]. For more information about ScotsCorr, its compilation, metadata collection, and annotation, please refer to the ScotsCorr manual which is on the corpus kielipankki site.

Download

Both the PCSC and ScotsCorr are hosted by FIN-CLARIN, and can (soon) be downloaded from here (kielipankki).*


References

[1] Kroch, Anthony. 2020. Penn Parsed Corpora of Historical English LDC2020T16. Web download. Philadelphia: Linguistic Data Consortium.

[2] Meurman-Solin, A. & VARIENG. 2017. The Helsinki Corpus of Scottish Correspondence (1540-1750). FIN-CLARIN: http://urn.fi/urn:nbn:fi:lb-201411071

*Note that the ScotsCorr data has restrictions on its distribution rights, such that it can only be distributed via CLARIN 

Manual

For more information about the process of data selection and annotation of the PCSC, as well as descriptive statistics, please refer to chapter 3 of my PhD thesis. (This is in lieu of a designated  manual, which is not complete yet!)

Error reporting

Please report any errors in the PCSC directly to me, and I aim to upload corrected versions twice per year at most. Known errors will be listed on this page.

The corpus can be cited as:

Gotthard, L. 2024. The Parsed Corpus of Scottish Correspondence, source [data set]. Kielipankki. Retrieved from http://urn.fi/urn:nbn:fi:lb-2024070601

Known Errors

Acknowledgments

The majority of the annotation and corrections were undertaken with financial support from the Arts and Humanities Research Council.

I am also indebted to Sarah Einhaus for help with manual POS-tag corrections in 2020, which was made possible through financial support from George Walkden (Uni Konstanz).

Training and supervision by Beatrice Santorini and Rob Truswell was invaluable for completing this corpus. All errors are, without doubt, my own.