Dependency Bank

Overwiew

The Dependency Bank project provides linguists with online access to richly annotated corpora. To achieve this we develop tools for the automatic syntactic annotation of corpora as well as tools and methodology for the analysis of the annotated data. In its current form, the dependency framework scales from small sets like the ICE corpora to data sets of more than 2000 million words. The dependency bank encodes information at the levels of word-class, chunking and dependency syntax.

Currently we are in the process of preparing an annotated version of the British National corpus for public access. See BNC Dependency Bank for more information. We have also made considerable progress on the BROWN family of corpora and many of the ICE Corpus components. In addition, we have also experimented with the automatic annotation of Early and Late Modern English corpora (Schneider et al. 2014). Students and staff at UZH can access all these resources on the ES Corpus Server.

Selected Publications

Schneider, Gerold, Lehmann, Hans Martin & Schneider, Peter (2014). Parsing Early and Late Modern English corpora. Literary and Linguistic Computing.

Schneider, Gerold & Zipp, Lena (2013). Discovering new verb-preposition combinations in New Englishes. In Joybrato Mukherjee and Magnus Huber, editors, Studies in Variation, Contacts and Change in English, Volume 14 – Corpus Linguistics and Variation in English: Focus on non-native Englishes. Varieng, Helsinki.

Lehmann, Hans Martin & Schneider, Gerold (2012). BNC Dependency Bank 1.0. In Oksefjell, S., Ebeling, J. & Hasselgard, H. (Eds.), Aspects of corpus linguistics: compilation, annotation, analysis. Helsinki: Research Unit for Variation, Contacts, and Change in English.

Schneider, Gerold (unpublished). BNC Dependency Bank 1.0 & 2.0: The Pro3Gres Annotation scheme. Appendix to Dependency Bank 1.0, with examples as help to the users. Manuscript.

Lehmann, Hans Martin & Schneider, Gerold (2012). Syntactic variation and lexical preference in the dative-shift alternation. In Mukherjee, J. & Huber, M. (Eds.), Corpus Linguistics and Variation in English. Theory and Description. Amsterdam: Rodopi. 258.

Lehmann, Hans Martin & Schneider, Gerold (2012). Dependency Bank. In Mitielu, V. B., Popescu, O. & Pekar, V. (Eds.), LREC 2012 Challenges in the Management of Large Corpora. Istanbul. 67-77.

Lehmann, Hans Martin & Schneider, Gerold (2011). A large-scale investigation of verb-attached prepositional phrases. In Rayson, P., Hoffmann, S. & Leech, G. (Eds.), Methodological and Hitorical Dimensions of Corpus Linguistics. Helsinki: Varieng.

Lehmann, Hans Martin & Schneider, Gerold (2009). Parser-based analysis of syntax-lexis interactions. In Jucker, A. H., Schreier, D. & Hundt, M. (Eds.), Corpora: pragmatics and discourse papers from the 29th International Conference on English Language Research on Computerized Corpora (ICAME 29), Ascona, Switzerland, 14-18 May 2008. Amsterdam: Rodopi. 477-502.

Schneider, Gerold (2008). Hybrid Long-Distance Functional Dependency Parsing. Doctoral Thesis, Institute of Computational Linguistics, University of Zurich.