Protein-Binding Sites ENCODEd into the Design of the Human Genome

At last year’s AMP Conference, I delivered a talk titled: “How the Greatest Challenges Can Become the Greatest Opportunities for the Gospel.” I illustrated this point by describing three scientific concepts related to the origin of humanity that 20 years ago stood as insurmountable challenges to the traditional biblical view of human origins. But, thanks to scientific advances, these concepts have been replaced with new insights that turn these challenges into evidence for the Christian faith.

The Challenge of Junk DNA

One of the challenges I discussed centered on junk DNA—nonfunctional DNA littering the genomes of most organisms. Presumably, these nonfunctional DNA sequences arose through random biochemical, chemical, and physical events, with functional DNA converted into useless junk, in some instances. In fact, when the scientific community declared the human genome sequence completed in 2003, estimates at that time indicated that around 95 percent of the human genome consist of junk sequences.

Since I have been involved in apologetics (around 20 years), skeptics (and believers) have regarded the high percentages of junk DNA in genomes as a significant problem for intelligent design and creation models. Why would an all-powerful, all-knowing, and all-good God create organisms with so much junk in their genomes? The shared junk DNA sequences found among the genomes of humans and the great apes compounds this challenge. For many, these shared sequences serve as compelling evidence for common ancestry among humans and the other primates. Why would a Creator introduce nonfunctional DNA sequences into corresponding locations in genomes of humans and the great apes?

But what if the junk DNA sequences are functional? It would undermine the case for common descent, because these shared sequences could reasonably be interpreted as evidence for common design.

The ENCODE Project

In recent years, numerous discoveries indicate that virtually every class of junk DNA displays function, providing mounting support for a common-design interpretation of junk DNA. (For a summary, see the expanded and updated edition of Who Was Adam?) Perhaps the most significant advance toward that end came in the fall of 2012 with the publication of phase II results of the ENCODE project—a program carried out by a consortium of scientists with the goal of identifying the functional DNA sequence elements in the human genome.

To the surprise of many, the ENCODE project reported that around 80 percent of the human genome displays function, with the expectation that this percentage should increase with phase III of the project. Many of the newly recognized functional elements play a central role in regulating gene expression. Others serve critical roles in establishing and maintaining the three-dimensional hierarchical structure of chromosomes.

If valid, the ENCODE results would force a radical revision of the way scientists view the human genome. Instead of a wasteland littered with junk DNA sequences, the human genome (and the genome of other organisms) would have to be viewed as replete with functional elements, pointing to a system far more complex and sophisticated than ever imagined—befitting a Creator’s handiwork. (See the articles listed in the Resources section below for more details.)

ENCODE Skeptics

Within hours of the publication of the phase II results, evolutionary biologists condemned the ENCODE project, citing a number of technical issues with the way the study was designed and the way the results were interpreted. (For a response to these complaints go here, here, and here.)

These technical complaints continue today, igniting the junk DNA war between evolutionary biologists and genomics scientists. Though the concerns expressed by evolutionary biologists are technical, some scientists have suggested the real motivation behind the criticisms of the ENCODE project are philosophical—even theological—in nature. For example, molecular biologists John Mattick and Marcel Dinger write:

There may also be another factor motivating the Graur et al. and related articles (van Bakel et al. 2010; Scanlan 2012), which is suggested by the sources and selection of quotations used at the beginning of the article, as well as in the use of the phrase ‘evolution-free gospel’ in its title (Graur et al. 2013): the argument of a largely non-functional genome is invoked by some evolutionary theorists in the debate against the proposition of intelligent design of life on earth, particularly with respect to the origin of humanity. In essence, the argument posits that the presence of non-protein-coding or so-called ‘junk DNA’ that comprises >90% of the human genome is evidence for the accumulation of evolutionary debris by blind Darwinian evolution, and argues against intelligent design, as an intelligent designer would presumably not fill the human genetic instruction set with meaningless information (Dawkins 1986; Collins 2006). This argument is threatened in the face of growing functional indices of noncoding regions of the genome, with the latter reciprocally used in support of the notion of intelligent design and to challenge the conception that natural selection accounts for the existence of complex organisms (Behe 2003; Wells 2011).1

Is DNA-Binding Activity Functional?

Even though there may be nonscientific reasons for the complaints leveled against the ENCODE project, it is important to address the technical concerns. One relates to how biochemical function was determined by the ENCODE project. Critics argued that ENCODE scientists conflated biochemical activity with function. As a case in point, three of the assays employed by the ENCODE consortium measure binding of proteins to the genome, with the assumption that binding of transcription factors and histones to DNA indicated a functional role for the target sequences. On the other hand, ENCODE skeptics argue that most of the measured protein binding to the genome was random.

Most DNA-binding proteins recognize and bind to short stretches of DNA (4 to 10 base pairs in length) comprised of highly specific nucleotide sequences. But given the massive size of the human genome (3.2 billion genetic letters), nonfunctional binding sites will randomly occur throughout the genome, for statistical reasons alone. To illustrate: Many DNA-binding proteins target roughly between 1 and 100 sites in the genome. Yet, the genome potentially harbors between 1 million and 1 billion binding sites. The hundreds of sites that are slight variants of the target sequence will have a strong affinity to the DNA-binding proteins, with thousands more having weaker affinities. Hence, the ENCODE critics maintain that much of the protein binding measured by the ENCODE team was random and nonfunctional. To put it differently, much of the protein binding measured in the ENCODE assays merely is a consequence of random biochemical activity.

Nonfunctional Protein Binding to DNA Is Rare

This challenge does have some merit. But, this criticism may not be valid. In an earlier response to this challenge, I acknowledged that some protein binding in genomes will be random and nonfunctional. Yet, based on my intuition as a biochemist, I argued that random binding of proteins throughout the genome would be disruptive to DNA metabolism, and, from an evolutionary perspective would have been eliminated by natural selection. (From an intelligent design/creation model vantage point, it is reasonable to expect that a Creator would design genomes with minimal nonfunctional protein-binding sites.)

As it happens, new work by researchers from NYU affirms my assessment.2 These investigators demonstrated that protein binding in genomes is not random but highly specific. As a corollary, the human genome (and genomes of other organisms) contains very few nonfunctional protein-binding sites.

To reach this conclusion, these researchers looked for nonfunctional protein-binding sites in the genomes of 75 organisms, representative of nearly every major biological group, and assessed the strength of their interaction with DNA-binding proteins. The researchers began their project by measuring the binding affinity for a sample of regulatory proteins (from humans, mice, fruit flies, and yeast) with every possible 8 base pair sequence combination (32,896). Based on the binding affinity data, the NYU scientists discovered that nonfunctional binding sites with a high affinity for DNA binding proteins occurred infrequently in genomes. To use scientific jargon to describe their findings: The researchers discovered a negative correlation between protein-binding affinity and the frequency of nonfunctional binding sites in genomes. Using statistical methods, they demonstrated that this pattern holds for all 75 genomes in their study.

They attempted to account for the frequency of nonfunctional binding sequences in genomes by modeling the evolutionary process, assuming neutral evolution in which random mutations accrue over time free from the influence of natural selection. They discovered that this modeling failed to account for the sequence distributions they observed in the genomes, concluding that natural selection must have weeded high affinity nonfunctional binding sites in genomes.

These results make sense. The NYU scientists point out that protein mis-binding would be catastrophic for two reasons: (1) it would interfere with several key processes, such as transcription, gene regulation, replication, and DNA repair (the interference effect); and (2) it would create inefficiencies by rendering DNA-binding proteins unavailable to bind at functional sites (the titration effect). Though these problems may be insignificant for a given DNA-binding protein, the cumulative effects would be devastating because there are 100 to 1,000 DNA-binding proteins per genome with 10 to 10,000 copies of each protein.

The Human Genome Is ENCODEd for Design

Though the NYU researchers conducted their work from an evolutionary perspective, their results also make sense from an intelligent design/creation model vantage point. If genome sequences are truly the product of a Creator’s handiwork, then it is reasonable to think that the sequences comprising genomes would be optimized—in this case, to minimize protein mis-binding. Though evolutionary biologists maintain that natural selection shaped genomes for optimal protein binding, as a creationist, it is my contention that the genomes were shaped by an intelligent Agent—a Creator.

These results also have important implications for how we interpret the results of the ENCODE project. Given that the NYU researchers discovered that high affinity nonfunctional binding sites rarely occur in genomes (and provided a rationale for why that is the case), it is difficult for critics of the ENCODE project to argue that transcription factor and histone binding assays were measuring mostly random binding. Considering this recent work, it makes most sense to interpret the protein-binding activity in the human genome as functionally significant, bolstering the original conclusion of the ENCODE project—namely, that most of the human genome consists of functional DNA sequence elements. It goes without saying: If the original conclusion of the ENCODE project stands, the best evidence for the evolutionary paradigm unravels.

Our understanding of genomes is in its infancy. Forced by their commitment to the evolutionary paradigm, many biologists see genomes as the cobbled-together product of an unguided evolutionary history. But as this recent study attests, the more we learn about the structure and function of genomes, the more elegant and sophisticated they appear to be. And the more reasons we have to believe that genomes are the handiwork of our Creator.

Resources


Endnotes

  1. John S. Mattick and Marcel E. Dinger, “The Extent of Functionality in the Human Genome,” The HUGO Journal 7 (July 2013): doi:10.1186/1877-6566-7-2.
  2. Long Qian and Edo Kussell, “Genome-Wide Motif Statistics Are Shaped by DNA Binding Proteins over Evolutionary Time Scales,” Physical Review X 6 (October–December 2016): id. 041009, doi:10.1103/PhysRevX.6.041009.
Fazale Rana