Si Si Blog - Technology

Write a Comment

Technology, Applications, PCs, Gadgets, Software, Hardware, Innovation

A computational analysis of the KJV and ASV
by TheEtruscan at 19:15 September 24, 2014

The New York Times of June 17, 2014 had an interesting article "Science - Computing Crime and Punishment" about a computational analysis of words showing up in the British justice transcripts of the Old Bailey archive that was digitized a decade ago into a free and searchable database (oldbaileyonline.org).

I quote from that article:
To simplify their task, the researchers turned to the 1911 edition of Roget's Thesaurus, which sorts 26,000 distinct English words into 1,040 numbered categories called synonym sets. For example, words involving love and affection are in the high 800s, money and wealth in the low 800s. "Kick," as in striking a blow, is No. 276, while killing is No. 361.

"The beauty of this," Dr. DeDeo said, "is that for every word we have a number that equates with a meaning" that can be modeled mathematically."
That got me thinking. Wouldn't it be interesting to do the same computational analysis of words showing up in both the Old and New Testaments? (Caveat: I don't know at this writing if it has already been done by others.)

The King James version of the Bible is a very good start as it is very consistent and its vocabulary is not that vast as to match the 1911 edition of Roget's Thesaurus. But since word association and attribution are being performed, any version of the Bible could be analyzed.

I had already a copyright free (in the USA) machine-readable King James Bible from a previous project of mine and to double-check it I also downloaded the copyright free and machine-readable copy of the American Standard Version (ASV).

I fooled around for some time with the downloaded ELKB computational analysis material (See Roget's Thesaurus Electronic Lexical Knowledge Base (ELKB)) but I didn't seem to get anywhere.

I downloaded then the 1911 edition of Roget's Thesaurus from Project Gutenberg and I wrote in Java a program with a couple of routines to carry out the tasks that I deemed necessary. Since the above files are in CSV format Microsoft Excel could also be used to do the checking and the desired manipulations.
Of interest are "Killing number 361" entries and "Loving number 897" entries with approximately a ratio of 2:1 in favor of "Killing" in the Old Testament.

The Roget's Thesaurus in addition to group words according to their meaning, also has commonly used phrases. To address this I have coded two more routines:

A tabular comparison based on the Old Testament of the American Standard Version (ASV) of the Bible.

# 360, 361, 362, 363 Killing
WORDCOUNTROGET'S HEADERS
bereavement1360 776 789
bier1363
blood-thirsty1361
burying-place1363
carcase1727 362 329 0
carcase carcases*1727 362 329 0
carrion1362 653
choke choking*1706 361 261 641
coffin1363
death deathly*1142 360 67
death deaths*1142 360 67
despatch1298 532 361 682 684 592 692 741 729 297
despatch despatched*1298 532 361 682 684 592 692 741 729 297
dispatch1297 532 298 684 692 592 729 741 682 361
dissolution1756 360 49 162 335
drown1361 731 732 337
drown drowning*1361 731 732 337
expire1109 360 67
expire expiring*1109 360 67
funeral1363
genocide1361
gore143 361 260
hunter hunters*1271 361 622
kill killer*1361
lifeless1360 172
losses1619 776 638 40.a 360 659
manslaughter1361
martyrdom1942 972 361 955 378
murder murdered*1361
murderous1361
pyre1363
rip1949 363 962
rip ripping*1949 363 962
shoot shooteth*1167 284 972 194 361 378
slaughter slaughters*1361
slaughtering1361
stab1260 361 659 649
stab stabbed*1260 361 659 649
stifle1528 361 403
stone stoning*1972 727 716 321 558 319 323 363 361 635
stone stoning*1972 727 716 321 558 319 323 363 361 635
strangle strangling*1361 195 158
unburied1362
undertaker undertakers*1363
behead beheaded*2361 972
corpse corpses*2362
deceased2360
drown drowned*2361 731 732 337
embalm2363 670 400
embalm embalmed*2363 670 400
extinct2122 2 162 360
fatal fatally*2361
fell felled*2151 945 283 732 217 162 306 360 735 126
gore gored*243 361 260
gore gores*243 361 260
gore goring*243 361 260
graveyard2363
mortal mortals*2111 841 361 372
murder murders*2361
sepulchre sepulchres*2363 0
shooting2361 378 622
slay slays*2361
stillborn2360 732
strangle2361 195 158
bury buries*3229 528 300 363
butcher3913 361
fatal3361
hang hangs*345 361 214 972
hunt hunts*3461 361 622
hunting3361 622
rip ripped*3949 363 962
slaughter slaughtered*3361
strangle strangled*3361 195 158
departure4360 287 293 623 449
execution4680 729 692 361 416 972 771
hunt hunted*4461 361 622
hunter4271 361 622
nimrod4361 622
carcasses5727 362 329 0
mortal mortally*5111 841 361 372
murderer murderers*5949 361
sepulchre5363 0
shoot shoots*5167 284 972 194 361 378
slay slaying*5361
bloody6361 653
bury burying*6229 528 300 363
hung645 361 214 972
doom7360 152 162 67 480 601 971
dying7360
deadly8361 649 360 162 657
hanging8206 972 214 361 847
mortal8111 841 361 372
murder8361
killing9361 829 731
hunt13461 361 622
tomb tombs*13363
released15927.a 360
burial16363 998
carcass16727 362 329
gallows16361 975
loss16619 776 638 40.a 360 659
hanging hangings*17206 972 214 361 847
shot17167 284 972 194 361 378
murderer18949 361
shoot19167 284 972 194 361 378
slayer19361
cain20361
tomb22363
release24970 360 777.a 927.a 918 750 807 760 284 768.a 790 771
hang2545 361 214 972
kill kills*29361
hang hanged*3045 361 214 972
fall falls*35151 945 283 732 217 162 306 360 735 126
die dies*42360 659 67 558 2 22
bury44229 528 300 363
ashes62653 362
slaughter63361
fallen89151 945 283 732 217 162 306 360 735 126
bones103362 417
perish128360 162 659 2
departed1332 360
slain147361
stone161972 727 716 321 558 319 323 363 361 635
stone stones*174972 727 716 321 558 319 323 363 361 635
gone203187 360 449 859 2 122
kill210361
die died*215360 659 67 558 2 22
fell226151 945 283 732 217 162 306 360 735 126
dead241360 172 376 52 429 408.a 361
fall248151 945 283 732 217 162 306 360 735 126
kill killed*267361
die326360 659 67 558 2 22
blood391854 361 333 11 875
death430142 360 67
  Total4,517 
# 897 Love
WORDCOUNTROGET'S HEADERS
adore adores*1897 990
affect1855 820 865 824 544 897 9 176
affect affects*1855 820 865 824 544 897 9 176
attract1829 615 288 897 865
attract attracted*1829 615 288 897 865
attractiveness1288 374.a 829 897 615
bewitching1897 829
captivate1897 615 829 751
charm charms*1845 897 615 829 670 992 993
charming1897 829
cherish1897 707 902
darling1899 897
enamoured1897
enchantment1827 992 829 897
fancy fancies*1484 608 609 865 453 515 450 897 514 451 842
favorite1829 899 897
fellow-feeling1906 888 897 914
fond fonder*1897
fond fondest*1897
gallant1962 897 894 861 961
idolatry1984 991 897
liking1865 897
loving lovingly*1897
popular1931 897 873
prize1793 733 731 618 931 775 897
prize prized*1793 733 731 618 931 775 897
prize prizes*1793 733 731 618 931 775 897
propitiate1723 914 897 826 976 831 952 918
revere1897 990 987 928
revere revered*1897 990 987 928
sympathetic1914 906 888 897
tender tenderly*1273 726 324 32 428 897 763 914
tenderness1906 897 914 822
yearning1897 914 865
admiration2931 897 928 870
attractive2288 845 897 615 829
charm2845 897 615 829 670 992 993
charmed2897 992
devotion2604 928 987 897 942 682 990 743
fond2897
goddess2979 897
passionate2897 374.a 821 825 901
sympathetically2914 906 888 897
winning2894 897 829
cherish cherished*4897 707 902
follower followers*4746 281 897 541
heart hearted*4221 602 897 450 642 861 222 820 5 8001
seduce seduced*4961 897 615
sweet sweeter*4413 829 396 428 652 897
sweet sweetly*4413 829 396 428 652 897
flame flames*5825 824 382 420 897 423
dear6814 899 897
myrtle6897
sympathy8821 915 914 897 820 714 906 888
lovely9829 845 897
affection10821 897 888
enchantment enchantments*12827 992 829 897
regard regarded*12457 441 928 873 9 480 931 897
idol13865 599 899 991 897
passion14865 825 824 821 897 828 900 173 820
lover15865 897
burn burns*20897 972 197 825 480.a 384 348 382
angel angels*24977 711 599 948 897
lover lovers*24865 897
devoted27987 735 897 743 828
desire desires*29829 600 602 630 620 858 865 817.a 897
tender33273 726 324 32 428 897 763 914
regard36457 441 928 873 9 480 931 897
flame40825 824 382 420 897 423
passion passions*44865 825 824 821 897 828 900 173 820
captive captives*51897 754 781
love loves*58865 827 897 899 894 906 931 888
loved65897
precious71814 648 31 897 812.a
sweet76413 829 396 428 652 897
beloved77897
captive92897 754 781
desire112829 600 602 630 620 858 865 817.a 897
idol idols*112865 599 899 991 897
angel145977 711 599 948 897
burn145897 972 197 825 480.a 384 348 382
loving184897
love208865 827 897 899 894 906 931 888
burn burned*233897 972 197 825 480.a 384 348 382
heart908221 602 897 450 642 861 222 820 5 8001
  Total3,002