Literary Statistics and Pauline Authorship I. Historical Background
From: JASA 23 (September 1971): 96-99.

Literary statistics has risen to some prominence in biblical studies in recent years. This two part series is an attempt to survey and evaluate the general approach of literary statistics, especially as it applies to Pauline authorship. Part I introduces the biblical question in terms of Pauline authorship of the Pastoral Epistles. Then a survey is made of the rise of literary statistics with emphasis upon its application to biblical studies. In order to present some of the ideas inherent in literary statistics, a study is made of the classic work by P. N. Harrison. Harrison acknowledges that every so-called Pauline letter has certain characteristic expressions and the lack of others. Yet for the most part, the letters form a more or less clearly defined series within certain limits. However in terms of comparative word usage, unique words, and certain grammatical features, Harrison concludes that the Pastorals form an exception to the Pauline series, and must have been written by "a Paul ist" at some later date. In order to complete this historical survey, various critiques of Harrison's study are reviewed.


Objections to Pauline authorship of the Pastoral Epistles (I, II Timothy, Titus) can be summarized under four main areas. Prof. Donald Cuthrie has given a good review of these areas and the advocates of each.1 My approach here will be merely to sketch these objections as a prelude to specific consideration of one area in this paper. The four areas are as follows:

1) Historical problem-some scholars feel that it is impossible to fit the historical data of Paul's life as given in the Pastorals within the framework of history given in the book of Acts.
2) Ecclesiastical problem basically this objection states that the church organization described is too advanced for Paul's time, and the heresy reflected comes from a much later time (presumably gnostic from 2nd century).
3) Doctrinal problem the objection here is that characteristic Pauline teachings are missing, such as Fatherhood of God, mystic union with Christ and the work of the Holy Spirit. Finally, it is felt that the view of faith is stereotyped and fixed, and doesn't fit the creative mind of Paul.
4) Linguistic problem this objection involves the style of writing and word usage. There are a large number of words in the Pastorals which are unique in the New Testament and a large number of words which occur elsewhere in the New Testament, but not in other, undisputed Pauline letters. Many words also show marked kinship with Apostolic Fathers and late Apologists. It appears that this objection carries the most weight for those opting for non-pauline authorship.

Thus, the cumulative effect of these considerations rules out any possibility of Pauline authorship for many critics. Many feel all the objections are overcome by the explanation (theory) that a later "Paulist" in the early
second century produced these Epistles to meet the needs of his own time.2 On the other hand, it should be pointed out that there are some scholars who maintain Pauline authorship and seek to explain the above objections in terms of amenuensis, occasional nature, etc.3

I have already indicated that the linguistic argument (along with the doctrinal problem) seems to be the most telling. The most influential work concerning the linguistic problem was published in 1921 by P. N. Harrison, who marshalled considerable stylistic and word-usage evidence against Pauline authorship in terms of tables, word counts and numerical data. His approach has come to be called the statistical method. The statistical method in general, and Harrison's work in particular has wide reference and influence in contemporary works. Therefore it is necessary to evaluate it critically and consider it in any discussion of Pauline authorship of the Pastorals. Furthermore, the science of statistics has greatly spread and developed, and thus much more sophisticated and complicated approaches are now being applied. This makes it extremely difficult for a nonstatistician to evaluate objections to Pauline authorship in terms of statistical evidence. For these reasons, this paper will focus on linguistic objections to Pauline authorship, specifically as they are formulated in terms of statistical analysis. The approach of my paper will be to survey statistical critiques of Pauline authorship, using P. N. Flarrison as representative of the relatively basic statistical approach, and A. Q. Morton as representative of the more sophisticated, "statistics proper" approach. However, the major purpose of this paper is a basic evaluation and critique of the statistical approach, rather than a detailed critique of any one specific approach. In order to give this a proper perspective, we turn first to a brief sketch of the development of literary statistics (i.e., the application of statistical analysis to literary criticism).

Rise of Literary Statistics

Modern literary statistics dates from the 1930's when two men wrote articles on sentence length distribution.4 In 1944, Yule published what has become a classic work: The Statistical Study of Literary Vocabulary.5 In 1956, the first textbook for literary statistics was published by G. Herdan: Language as Choice and Chance.6 Herdan refers to a list of about 150 publications dealing with the subject of literary statistics up to that time.
More recently, a Ph.D. dissertation was published at the University of Wisconsin (1966).7 It contains a summary of earlier studies which is comprehensive in scope, and it breaks down the survey into the various categories of parameters which have been developed (six). Wachal has over 20 pages of "selected" bibliography and presents a synthesized approach to defining a test for authorship. He has also indicated that the most sophisticated techniques of statistics have been used in literary analysis (including analysis of variance and regression analysis). A computer program was used to analyze the Federalist Papers as an evaluation of his approach to authorship. However, this work would be a necessary starting point in any detailed study of literary statistics at the present time, both from the standpoint of literature references and choice of parameters to describe style or characterize an author.8

In Biblical studies, the classic work concerning a statistical approach to defining authenticity is P. N. Harrison's book. Harrison based his ease against genuineness upon language and style. Harrison's approach was called statistical because he made his analysis in terms of word counts and percentages. He counted the total vocabulary in the three books, making a distinction as to how many did not appear elsewhere in the Pauline writings or in the New Testament. He further made a comparison of words and phrases characteristic of Paul, and compared the Hapaxes with second century Fathers. Since Harrison's work, several critiques on it have been published; but no new work appeared for some time. These critiques will be considered at a later time in the paper.

In 1948, IV. C. Wake10 applied the sentence length work of Williams and Yule to Pauline studies. Wake applied the measurement of sentence length distribution to several classic Greek authors. He showed that for writers of continuous prose, all the works of one author form a statistically homogenous distribution of sentence lengths. In this work, a statistical approach to prose analysis was on a more sophisticated level in terms of the science of statistics. Applied to Pauline writings, Wake's test indicated Romans, 1st and 2nd Corinthians and Galatians were indistinguishable, with a sentence length of just over eleven words, while the remainder had much longer average sentence length. A. Q. Morton has leaned heavily upon this work of Wake.

In 1958, Robert Morgenthaler11 published in tabular form, a statistical analysis for all New Testament words, i.e., their frequency of occurrence. He presented

The science of statistics has greatly spread and developed, and thus much more sophisticated and complicated approaches are being applied.

a breakdown on the relative frequency of inflected forms, use of prepositions, participles, nouns, etc. Morgenthaler also discussed the question of Pauline authorship of the Pastoral Epistles.

In 1959, Msgr. deSolages12 published an exhaustive study of the synoptic gospel problem in terms of a comparison of words in common in different combinations (permutations) of the three gospels. Whereas earlier statistical approaches concentrated on single words or groups of words as sampling basis, deSolages uses pericopes as the sampling base. He finds evidence for a "9" tradition, as well as evidence for the MatthewLuke use of Mark as a primary source.

In 1960, B. Van Elderen13 completed his doc toral dissertation, The Pauline Use of the Participle.
Van Elderen classified and analyzed the 1206 participles occurring in the Pauline letters. By means of various numerical statistical tables, the participles were classified according to frequency of occurrence and syntax. The frequency was expressed as percent of the total words and as percentage of the total number of participles, and the uses were classified according to the type of participle, its gender and position in the sentence.

In 1964, A. 9. Morton and James MeLeman published a book dealing with statistical analysis of the Pauline writings. In this book, the authors reach the conclusion that only four Epistles of the traditional thirteen could have been written by Paul (Romans, I and II Corinthians, Galatians). Flowever, in this book a great deal of background was presented along with conclusions, but very few data were presented and almost no description of procedures was given. 14 Thus, in response to a great deal of criticism, Morton and MeLeman published a much more complete treatment of their statistical approach to Pauline writings in 1966.15

Exposition and Analysis of Harrison's Approach

We begin this section with a consideration of P. N. Flarrison's work in 1921. His linguistic argument may be summarized as follows. The language of the Pastorals shows obvious pecularities as compared with the other ten letters. Harrison concedes that every Pauline letter has certain characteristic expressions, and the lack of others. Yet, taken as a whole, the letters form a clearly defined series with the variations among them within certain limits. Yet, Harrison feels that the Pastorals cannot be brought into this series because of greater linguistic differences. Therefore, he suggests that the Pastorals were not written by Paul, but by a "Paulist" with the other ten Pauline letters before him, sometime between A.D. 94-150. There are, however, authentic Pauline fragments contained within the matrix of the present Pastorals as we know them. Harrison's (statistical) data are as follows: 1) 36% of the words (848 total vocabulary) occurring in the Pastorals do not occur in the other ten Pauline letters; 2) 175 hapax legomena; 3) 131 words occur in the Pastorals and other New Testament books but not in any other Pauline writing; 4) large number of words that Paul uses in his other letters are absent from the Pastorals (582 words peculiar to Paul and 1053 also occur in other New Testament books); 5) particles, prepositions and other minor parts of speech which are clearly Pauline, are for the most part lacking in the Pastorals; 6) the language of the Pastorals is said to show a clear relationship with the language of the Apostolic Fathers and the Apologists in the second century.

For many, Harrison's work closed the question concerning Pauline authorship of the Pastoral Epistles. However, from time to time Harrison's work has been criticized on different grounds; see, for instance, W. Michaelis,16,17 F. R. M. Hitchcock,18,19 D. Guthrie,20  B. M. Metzger,22 and K. Grayston and C. Herdan.23 Grayston and Hcrdan give the best summary of the objections that have been raised to Harrison's method. In this paper, I want to review only the objections of Guthrie, Metzger, and Grayston and Herdan. Cuthrie basically objects to the application of mathematics to literature or language: "Literary art cannot be reduced to a mathematical equation . . . and mathematical equations can never prove linguistic affinity."23 Guthrie denies that frequency relationships such as those used by Harrison can be used to characterize style of an author.
  Crayston and Herdan challenge Guthrie as follows.

One of the principal results of structural linguistics, as we know it today, is that a language is characterized by phonemics (smallest distinctive feature), its vocabulary and grammar, but also by the frequency of use attached so particular linguistic forms through their continued use by members of the speech community. It has come to light that there is a farreaching similarity between members of the speaking community, not only in the phonemooic system, vocabulary and grammar, but also in the frequency of use of particular linguistic forms such as lexicon items, grammatical forms and structures as well as phonemos; in other words, a similarity not only in what is used but also how often it is used . . . . The importance of the frequency distribution of language as a linguistic factor has given rise to the construction of what may be called statistical dictionaries. We are fortunate to have a complete work of this type for the NT Greek in Morgenthaler's Statirtik des Neutesta meutliehen Wortschatzes.24

Metzger's criticism is of a different nature than Guthrie. He objects to harrison's failure to consider the work of Yule which concerned itself with the legitimacy and limitations of using word-count to establish authorship. Yule posits that adequate statistical analysis in prose would require a piece at least ten thousand words long. Metzger goes on to point out that The Pastorals are far from that long. However, Metzger seems to have forgotten that Yule's work was with the text of Imitation of Christ, not biblical or even Greek literature. It is very problematical whether conditions for one test can be generalized to another language and genre of literature.

Grayston and Herdan offer a penetrating critique (of Harrison's work) which comes to grips with the crucial problem, namely, what parameters to use to express style, and how specifically to analyze these statistically. Their critique can be summarized as follows:

1. Harrison failed to distinguish between a word which occurs only once in text (Hapax), and a word which is peculiar to the text in question (one- sample word). Hapax gives frequency within sample test and one-sample words give vocabulary connection between samples.
2. He was not aware of the finding that the ratio of specified portion of prose text (such as he called Hapax) changes with text length (found in 1943 among some literature). The authors insist it is more in keeping with our knowledge of the relation between vocabulary arid frequency of occurrence to relate the number of words peculiar to a text to the total vocabulary, instead of working with the vocabulary occurrence per page as Harrison did. In other words, he didn't take both vocabulary occurrence and text length into account.
3. His work was incomplete in the sense that it considered only the words peculiar to each Pauline
letter and not words common to two, three etc. letters. These additional words will have some hearing upon vocabulary connectivity. Grayston and Herdan suggest the use of the sum of words peculiar to a given letter and words common to all letters relative to the total vocabulary of the letter concerned.
4. Harrison's method lacks a standard of comparison. He recognized this, so he compared the word class frequency in any one letter with the corresponding one in another letter. However, Grayston and Hedran suggest what to expect in the way of vocabulary connectivity on pure chance. They go on to suggest a way of constructing such a standard in terms of "random partitioning of vocabulary."

In looking at Morton's work, we see a "change" in approach. But perhaps it would be better to describe this change as an extension or a sophistication, for it is a quantitative change rather than a qualitative change compared with Harrison and other works like his (such as Van Elderen and Murgenthaler). Harrison works with counts and averages, tabulation of occurrence in terms of average frequency etc. However, he doesn't apply statistical procedures which are part of what we call the "science of statistics". The work by Morton involves use of concepts such as frequency distribution, standard deviation of the distribution, comparison of different distributions for homogeneity, etc. It is necessary to consider Morton's work because it represents a new and influential approach to authenticity work, and it also contains some potentially useful tools for study with an evangelical framework.

Part Two of this paper gives an exposition of A. O. Morton's statistical approach to Pauline Authorship, and then presents a critique of the statistical approach in general.


