Machine Learning, AI, and Bible Data Project List

There are three kinds of people who come to me looking to use the Bible data I’ve compiled. One group wants to make their own visualizations for personal Bible study. Another set of people are interested in using it in an application. The third are academics.
If you’re in a data science program and want to use Bible data for a Machine Learning or Artificial Intelligence project, it can be hard to find open-licensed sources. You may also wonder if you’re alone in your interest to apply advanced data processing to scriptural study.
There are a few ministries and students working in this area, and the list is growing. Below are the projects I know of which which use natural language processing, image recognition, or advanced data processing. I hope it will increase awareness of ongoing work and spark ideas for future research. If you would like to add a project to this list, please e-mail me.

Bible Lens

Bible Lens performs object detection on photos to match it with verses sharing the same theme. Then, the app creates stylized images with overlaid text that’s ready to share with others. Here are some examples of photos I tested with the app:
  • A photo of children is matched with Psalm 127:3: “Children are a heritage from the LORD…”
  • Focusing on someone’s feet returns Psalm 119:105: “Your word is a lamp unto my feet…”
  • My wedding ring picture yields 1 Corinthians 13:4: “Love is patent, love is kind…”

Logos Reference Scanner

When using the Logos Mobile app, users can point their camera to a verse reference to jump to the text. This is useful when you have a sermon outline or small group study guide and want to save some typing or page-flipping. It supports multiple references at a time and worked smoothly in my tests on an iPhone XR.

STEPBible Data – Tyndale Individualised Proper Names with all References (TIPNR)

Tyndale House Cambridge produces many research tools, including an online Bible app. What’s unique about the data behind this app is the Individualised Proper Names dataset. It is the first I’ve seen from a publisher which associates individual people and places to verse references, aliases, and Strong’s numbers. This kind of data is an important step in producing an open-source knowledge graph of biblical entities. The data is also available in JSON format here: github.com/robertrouse/STEPBible-Data/tree/master/json

Ecce (Bible Text AI)

Ecce (Explanatory Core Concept Extraction, and Latin for “behold”) uses natural language processing to find topics and verses related to a search phrase. It draws in data from the English Standard Version, Nave’s Topical Index, and Treasury of Scripture Knowledge cross-references to train a model relating each entry to possible input terms. Explanations of the method, models, code, and other materials are available at github.com/rcdilorenzo/ecce

XML-TEI Bible

Most encoded Bible text is limited to linguistics, such as tagging grammatical forms and original language pointers. The XML-TEI Bible goes further by encoding references to people, places, times, speakers and audience. It follows standards developed by the Text Encoding Initiative, a common set of guidelines across humanities research. The Spanish-language data is available for the New Testament and some Old Testament books.

Contextual Scripture Recommendation for Writers

If you’re a blogger or a minister preparing a sermon, it may be helpful to get recommended verses based on your draft text. This system, developed by Joshua Mathias for his Master’s thesis, scans a paragraph and determines which verses are closely associated with the main ideas. Behind the scenes, it uses Term Frequency – Inverse Document Frequency modeling to match documents with each other. Learn more about it from the BibleTech 2019 slides and Thesis document.

Using Social Co-occurrence Networks to Analyze Biblical Narrative

Social Network data allows for automated identification of influential people and community associations. Frederik de Vree’s thesis work identifies names and builds co-occurrence networks to pull out important characters and communities in the biblical text. His paper gives a full discussion of methodology, limitations, and visualizations of the results. Read it here.

Analyzing the Bible with a BERT model

Bidirectional Encoder Representations (BERT) models are an increasingly popular means of analyzing text. Roland Szabo applied this natural language processing technique to the Bible (Old and New Testaments) focusing on uses of the words “love,” “soul,” and “spirit.” Hi results include finsing such as “the word agape does sometimes refer to the love of God, in a seemingly special way, but it often refers to other kinds of love as well, in a way which BERT can’t really distinguish from philos love.” Distinctions between the soul and spirit are similarly mixed. What does this mean for how we understand distinctions and overlap between the meaning of words in Hebrew and Greek? You can get the full methodology, conclusions, and code here.