The aim of this project is to compile a new corpus of edited American English prose printed in the United States during the calender year 1931 (+/- three years). This corpus will complement the existing Brown family of corpora as it represents a ‘prequel' to the Brown and Frown corpora.


The Brown family of corpora consists of several corpora modelled on the original Brown University corpus. The principal corpora are Brown (written American English, 1961), LOB (written British English, 1961), Frown (written American English, 1992), and FLOB (written British English, 1991). Another component has recently been compiled at Lancaster University ; Lancaster1931 (or B-LOB), the 1931 equivalent to LOB and FLOB.

The new corpus being assembled at the English Department of the University of Zurich, called B-Brown (‘before Brown'), will extend the chronological span of the Brown corpora, providing an empirical basis for researchers who want to examine American English as it was written in the first half of the twentieth century. Furthermore, it will allow researchers to investigate real-time changes in the usage of American English, since B-Brown data can be directly compared to data from the Brown and Frown corpora. It should be pointed out that there are not only thirty years between B-Brown (1931) and Brown (1961), but also a disastrous, world-altering event, namely World War II. Therefore it will be particularly interesting to see in how far this might have had an impact on the development of American English. Moreover, empirical data of American English in 1931 can shed new light on recent changes (i.e. grammatical changes revealed by the investigation of the 1961 and 1992 corpora). B-Brown can put these relatively short-term developments into a better perspective, as a third point in time will provide the possibility to reveal, substantiate, or undermine claims about their long-term evolution.

The Brown family of corpora has also been a major empirical resource for researchers interested in differences, commonalities and changes in British and American English. B-Brown represents a further element which can be used to trace and compare linguistic phenomena on both sides of the Atlantic.

Corpus Design

The design of B-Brown follows the model of the other corpora in the Brown family. It will consist of about one million words spread across 500 texts, i.e. each text sample consist of about 2,000 running words. The samples cover 15 different text categories (with further subdivisions), which range from newspaper reportage and academic writing to fiction prose, for example. Due to financial constraints and limited time, the sampling of B-Brown departs from the original procedure in that it permits a leeway of three years on either side of the target year (sampling is from 1928 to 1934 inclusive). Nonetheless, it is improbable that this change will diminish the utility of the corpus, and researchers involved in the compilation of the British English equivalent, the B-LOB corpus, have relied on the same method. Consequently their comparability should not be affected.

With regard to the periodicals used for sampling, the aim is to match them to the publications used in the other corpora, i.e. to use the exact same newspapers, journals etc. which were used in the Brown and Frown corpus. However, this is not always possible, e.g. when a particular periodical was not in print yet in 1931. In these cases we have to settle for other periodicals available for the specific time frame. The text-by-text equivalence of periodical sources in the corpus can obviously not apply to other categories, for instance fiction, where we have to rely on random selection within the desired category subdivision.

See also:  http://www.helsinki.fi/varieng/CoRD/corpora/B-BROWN/index.html