While the patterns found are real enough, the interpretation of them
would seem to be more like pareidolia. Inferences are suggested in the
absence of explicit images or messages that support the otherwise
ambiguous interpretation.
JimA
Hon Wai Lai wrote:
> Claims of hidden code in the Bible are classical examples of data
> mining and blatant abuse of statistical techniques. There is a well
> written article on data mining by Ronald Kahn:
>
> http://www.barra.com/newsletter/nl165/biblenl165.aspx
>
>
>
>
> The Bible Code
>
> by Michael Drosnin (New York: Simon & Schuster, 1997)
>
> Reviewed by Ronald N. Kahn
> <http://www.barra.com/newsletter/NL160/NLbios.asp#Kahn>
>
> "For three thousand years a code in the Bible has remained hidden. Now
> it has been unlocked by computer--and it may reveal our future."
>
> So begins the jacket copy for The Bible Code by Michael Drosnin, a
> book receiving considerable public attention, though not much in the
> financial press. And yet we couldn't resist reviewing it because its
> methodologies are surprisingly similar to the worst data mining
> excesses of investment research. This issue's lead article on data
> mining discusses Norman Bloom, arguably the world's greatest data
> miner. He tried to prove the existence of God through baseball
> statistics and the Dow Jones average. Now Mr. Drosnin, armed with the
> Bible and a computer, has taken up the cause.
>
> The idea that the Bible contains encoded information has been around
> for quite some time. But in 1994, in Statistical Science, three
> statisticians reported their analysis of equidistant letter sequences
> (ELS) in the book of Genesis. An ELS is a fairly simple type of code.
> For example, a particular ten-letter word may begin with the 3,057th
> letter and continue with the 3,067th letter, 3,077th letter,..., and
> the 3,147th letter.
>
> Of course, words will appear encoded in the Bible just by random
> chance. So Doron Witztum, Eliyahu Rips, and Yoav Rosenberg devised a
> statistical test of whether Genesis contains any meaningful
> information. They assumed that if meaningfully related words appeared
> encoded "near" each other, that would imply meaningfully encoded
> information. So while the word "hammer" might appear at random and the
> word "anvil" might appear at random, these connected words wouldn't
> appear near each other unless the text contained meaningful encoded
> information. With that assertion, they developed a highly convoluted
> measure of the "closeness" of the encoded appearance of any two given
> words, chose a list of (according to them) meaningfully related word
> pairs (names and dates for a list of famous rabbis), and finally,
> analyzed whether those word pairs appeared closer than expected by
> random chance in Genesis. According to this test, the probability that
> random data would generate encoded word pairs as close as they
> observed was only 16 out of 1 million.
>
> Starting from this academic paper, author Michael Drosnin applied his
> computer to the entire Bible, without regard to any statistical
> principles. Searching now for individual words of interest, he then
> looked for other suggestive words nearby, backwards or forwards, after
> applying liberal interpretive skills. The result in his case is a book
> full of remarkable coincidences, completely lacking any statistical
> analysis of significance.
>
> For this review, let's consider the original statistical analysis and
> the popular book separately. The popular book is simply a fantastic
> example of data mining run amok. If Drosnin didn't find this
> coincidence, he would find another. If one interpretation of the word
> didn't fit, he used another. His quest would have been equally
> successful applied to War and Peace, Men are from Mars, Women are from
> Venus, or even an old Sears catalog. Proust's insight (see quotation
> in "Data Mining" article below) clearly applies here.
>
> The original statistical paper does include an analysis of
> significance. So my criticism here is more technical. The author's
> definition of closeness is so contorted as to defy much intuition, but
> it may be very sensitive to just a small number of very close
> observations. Another similar analysis, by Dror Bar-Natan, Alec
> Gindis, Aryeh Levitan, and Brendan McKay of the Australian National
> University, found no unusual closeness for the same famous rabbis and
> their most famous books. And finally, the occasional appearance of
> encoded word pairs near each other is simply a far cry from finding or
> proving (let alone decoding) any meaningful information encoded in the
> Bible.
>
> For investment researchers, The Bible Code is just a wonderful example
> of the seductive appeal of random patterns found in large data sets.
> The book, if not also the paper, ignores all four guidelines discussed
> on page 29 of this issue: intuition, restraint, sensibility, and
> out-of-sample testing. Researchers--investment and biblical--ignore
> these at their own peril.
>
>
> Data Mining is Easy
>
> Seven Quantitative Insights into Active Management--Part 5
>
> by Ronald N. Kahn <http://www.barra.com/newsletter/NL160/NLbios.asp#Kahn>
>
> Why is it that so many strategies look great in backtests and
> disappoint upon implementation? Backtesters always have 95% confidence
> in their results, so why are investors disappointed far more than 5%
> of the time? It turns out to be surprisingly easy to search through
> historical data and find patterns that don't really exist.
>
> To understand why data mining is easy, we must first understand the
> statistics of coincidence. Let's begin with some non-investment
> examples. Then we will move on to investment research.
>
> The statistics of coincidence
>
> Several years ago Evelyn Adams won the New Jersey state lottery twice
> in four months. Newspapers put the odds of that happening at 17
> trillion to 1, an incredibly improbable event. A few months later, two
> Harvard statisticians, Percy Diaconis and Frederick Mosteller, showed
> that a double win in the lottery is not a particularly improbable
> event. They estimated the odds at 30 to 1. What explains the enormous
> discrepancy in these two probabilities?
>
> It turns out that the odds of Evelyn Adams winning the lottery twice
> are in fact 17 trillion to 1. But that result is presumably of
> interest only to her immediate family. The odds of someone, somewhere,
> winning two lotteries--given the millions of people entering lotteries
> every day--are only 30 to 1. If it wasn't Evelyn Adams, it could have
> been someone else.
>
> Coincidences appear improbable only when viewed from a narrow
> perspective. When viewed from the correct (broad) perspective,
> coincidences are no longer so improbable. Let's consider another
> non-investment example: Norman Bloom, arguably the world's greatest
> data miner.
>
> Norman died a few years ago in the midst of his quest to prove the
> existence of God through baseball statistics and the Dow Jones
> average. He argued that "BOTH INSTRUMENTS are in effect GREAT
> LABORATORY EXPERIMENTS wherein GREAT AMOUNTS OF RECORDED DATA ARE
> COLLECTED, AND PUBLISHED" (capitalization Bloom's). As but one example
> of thousands of his analyzes of baseball, he argued that the fact that
> George Brett, the Kansas City third baseman, hit his third home run in
> the third game of the playoffs, to tie the score 3-3, could not be a
> coincidence--it must prove the existence of God. In the investment
> arena, he argued that the Dow's 13 crossings of the 1,000 line in 1976
> mirrored the 13 colonies which united in 1776--which also could not be
> a coincidence. (He pointed out, too, that the 12th crossing occurred
> on his birthday, deftly combining message and messenger.) He never
> took into account the enormous volume of data--in fact, an entire New
> York Public Library's worth--he searched through to find these
> coincidences. His focus was narrow, not broad.
>
> With Norman's passing, the title of world's greatest living data miner
> has been left open. Recently, however, Michael Drosnin, author of The
> Bible Code, seems to have filled it. (For details, see the book review
> <http://www.barra.com/newsletter/nl165/BiBleNl165.asp>.)
>
> The importance of perspective to understanding the statistics of
> coincidence was perhaps best summarized by, of all people, Marcel
> Proust--who often showed keen mathematical intuition:
>
> The number of pawns on the human chessboard being less than the
> number of combinations that they are capable of forming, in a
> theater from which all the people we know and might have
> expected to find are absent, there turns up one whom we never
> imagined that we should see again and who appears so opportunely
> that the coincidence seems to us providential, although, no
> doubt, some other coincidence would have occurred in its stead
> had we not been in that place but in some other, where other
> desires would have been born and another old acquaintance
> forthcoming to help us satisfy them. (The Guermantes Way, Cities
> of the Plain, Volume 2 of translation of Marcel Proust's
> Remembrance of Things Past [New York: Vintage Books, 1982], p.
> 178.)
>
> Investment research
>
> Investment research involves exactly the same statistics and the same
> issues of perspective. The typical investment data mining example
> involves t-statistics gathered from backtesting strategies. The narrow
> perspective says: "After 19 false starts, this 20th investment
> strategy finally works. It has a t-statistic of 2."
>
> But the broad perspective on this situation is quite different. In
> fact, given 20 informationless strategies, the probability of finding
> at least one with a t-statistic of 2 is 64%. The narrow perspective
> substantially inflates our confidence in the results. When viewed from
> the proper perspective, confidence in the results lowers accordingly.
>
> Four guidelines for backtesting integrity
>
> Given that data mining is easy, how can we safeguard against it? Here
> are four guidelines for data mining integrity:
>
> # Intuition
> # Restraint
> # Sensibility
> # Out-of-sample testing
>
> The intuition guideline demands that researchers investigate only
> those strategies with some ex ante expectation of success. Investment
> research should never involve free-ranging searches for patterns
> without regard for intuition.
>
> The restraint guideline attempts to minimize the number of strategies
> investigated--i.e., to keep the broad and narrow focus similar. In the
> best case, researchers decide ex ante exactly which strategies and
> variants they will investigate, run their tests, and look at the
> answers. They do not go back and continually refine their investigations.
>
> The sensibility guideline deletes results that seem improbably
> successful. Observed t-statistics that are too large may signal
> database errors or an improper methodology rather than a new strategy.
>
> The fourth guideline, out-of-sample testing, is the statistician's
> answer to the curse of data mining. Coincidences observed over one
> data set are quite unlikely to reoccur in another independent data set.
>
> Conclusions
>
> Many backtesting results are not foolproof demonstrations of strategy
> value but merely coincidence. Four backtesting guidelines can help
> avoid data mining.
>
> -----Original Message-----
> From: asa-owner@lists.calvin.edu
> [mailto:asa-owner@lists.calvin.edu] On Behalf Of Vernon Jenkins
> Sent: 29 January 2005 21:39
> To: bivalve; ASA
> Subject: Re: Spellbound? (was Re: Cobb County)
>
> David,
>
> You said (28 Jan), " ...Genesis 1...makes it clear that
> methodological naturalism is highly appropriate for addressing
> pentultimate or antepentultimate origins...". And as Christopher
> wrote earlier (23 Jan), " The simple reason why the _supernatural_
> must never be allowed 'a foot in the door' is that it cannot be
> tested, you cannot get a handle on it, and it is just another
> God-of-the-gaps arguments."
>
> My case is simply this: the numero-geometrical features of the
> Bible's opening Hebrew words provide the _test-case_ that
> Christopher dismisses; for unless a reasonable _naturalistic_
> explanation can be found for these phenomena, the principle of MN
> is permanently undermined. In other words, the findings of science
> - particularly as they offer a direct challenge to scriptural
> revelation - may well be invalid because of failures to take
> account of the possibility of supernatural interference.
>
> By the way, I can't agree with Don who wrote (28 Jan), " We have
> no proof - and are unlikely ever to have proof - that these
> complexities actually required special divine input. (What form
> would such proof take, anyway?)." Whilst the overall probability
> of the many rare and unique features that we find crammed
> into these 7 opening words may be difficult to quantify, the
> impression created is that this must be _vanishingly small_. [And,
> let us observe that such situations have not deterred
> investigators from making far-reaching claims in other fields!]
>
> It is worth spending a little time contemplating this wonder, for
> it comprises a multi-stage development. When the words were first
> recorded, they possessed a relevant literal meaning - nothing
> more. Some centuries later - following the adoption by the Jews of
> a system of alphabetic numeration - the letters and words
> acquired, in addition, the status of numbers - and it is at this
> point that most of the geometrical and other features of interest
> _became available_ for inspection (but spotted by none,
> apparently!). The discovery of the universal constant 'e' by
> Euler, the adoption of the Metric System (both in the 18th
> century) and the creation of a standard for cut paper sizes in the
> 1960s, consolidated the Genesis 1:1 edifice as it is now known..
>
> I have written of this as a 'standing miracle' - something that
> will forever remind those of us who seek and value truth - of our
> Creator's Being and Sovereignty, and of His Grace in providing
> those who love Him with such firm assurances in these testing days.
>
> Vernon
> www.otherbiblecode.com <http://www.otherbiblecode.com>
>
>
>
> ----- Original Message -----
> From: "bivalve" <bivalve@mail.davidson.alumlink.com
> <mailto:bivalve@mail.davidson.alumlink.com>>
> To: "ASA" <asa@calvin.edu <mailto:asa@calvin.edu>>
> Sent: Friday, January 28, 2005 10:26 PM
> Subject: Re: Spellbound? (was Re: Cobb County)
>
> >> (3) You have completey ignored the second matter I raised, viz
> the > lessons that might be learned from the widespread negative
> reaction to > news of the numero-geometrical features of Genesis
> 1:1. Wouldn't you > agree that these phenomena strongly challenge
> the view that > _methodological naturalism_ is the only valid
> basis for the proper > investigation of ultimate origins? It
> would be good and proper if you > were to consider joining me in
> disabusing others of this significant > error.
> >
> > I don't think the numero-geometrical features address whether
> methodological naturalism is relevant to ultimate origins.
> >
> > The text of Genesis 1:1 makes it clear that methodological
> naturalism will be unable to address ultimate origins. However,
> Genesis 1 also makes it clear that methodological naturalism is
> highly appropriate for addressing pentultimate or antepentultimate
> origins:
> >
> > Everything is created by an orderly God.
> > There aren't any rouge powers, quarreling deities, or other
> factors that might disrupt the regular workings of creation,
> unlike pagan views.
> > We were made to rule over creation. In order to do so well, as
> good stewards, we need to be able to determine how creation works.
> >
> > Thus, there are good reasons to expect a study of the ordinary
> workings of the universe to be very productive and informative.
> >
> > Dr. David Campbell
> > Old Seashells
> > University of Alabama
> > Biodiversity & Systematics
> > Dept. Biological Sciences
> > Box 870345
> > Tuscaloosa, AL 35487-0345 USA
> > bivalve@mail.davidson.alumlink.com
> <mailto:bivalve@mail.davidson.alumlink.com>
> >
> > That is Uncle Joe, taken in the masonic regalia of a Grand
> Exalted Periwinkle of the Mystic Order of Whelks-P.G. Wodehouse,
> Romance at Droitgate Spa
> >
> >
>
Received on Sat Jan 29 19:25:56 2005
This archive was generated by hypermail 2.1.8 : Sat Jan 29 2005 - 19:25:58 EST