Fractal Biology

You may be familiar with fractals.  Fractals are often popularized as repeating patterns, but that's not really required as part of the definition.  Fractals are typically two-dimensional images (yes, I know, plus fractional dimensions, that's the whole point of calling them fractals) where you can zoom into any part of the image and additional complexity is revealed.  For example, if you want to draw the coast line of a nation like Great Britain, the level of detail you choose to include is entirely dependent on how far in you decide to 'zoom'.  There's no point at which we can say you've captured the 'true' coastline, because the complexity defies a solid definition by its very nature.

I want to make the case that biology approaches fractal complexity.  By this I don't mean that it's too complicated to study, or that it's infinitely complicated.  I think there is a finite amount of complexity to biology.  I'm invoking fractals as a way to help understand how biology feels from both a research and conceptual standpoint.

I could start with large animals and talk about biodiversity and the various almost-magical ways they use chemistry and physics to accomplish simple things.  But I'd rather 'zoom in' on human biology - specifically molecular cell biology.  By doing so, I'm also going to ignore cell-cell interactions, systemic signalling, environmental cues, and interactions with commensal mico-organisms.  I only mention these here because I could easily write this post focusing on each of those areas, drilling down to the same level of complexity, and make the same point.  That's because when you zoom in on part of a fractal, you necessarily remove from your focus all the other areas where similar complexity also reigns.

Before you ever get into the lab, as you take your first biology classes, you get the opposite feeling from what I want to instill here.  At least I did.  I got this feeling that biology was a defined, finite set of mechanisms and interactions.  I would learn about each one, and it felt like all biological problems could be solved once we finished cataloging all the different mechanisms.  Need to understand something?  Eventually we'll be able to look it up along some biochemical pathway, understand the genes involved, and solve it.  We just need to finish the hard, arduous work of figuring out how each little piece of the puzzle fits together.

I have since realized that even if that were the case, we have no way to get there from here.

Complexity as a feature of the system

The human genome project was a huge undertaking, which produced long-lasting benefits that researchers still take advantage of daily.  However, after a billion dollars people wanted more than 'has lasting impacts'; they wanted something to put on the front page.  They wanted a headline.  One of the things we knew we'd discover from the genome project was the number of genes in the human genome.  Researchers expected a large number, perhaps a few hundred thousand.  The answer shocked everyone because it was so small: roughly 20,000 genes total.

To understand why this was so shocking, let me paint a picture of the inside of a cell.  The problem is the sizes are too unfamiliar to us, so let's scale everything up to make a more intuitive comparison.  We'll use the ninth-largest city in the USA, which is Dallas, TX with a population a little under 1.5 million (7 million metro area).  Most of the city lies within the 635 outer loop highway, around 9.3 miles (15 km) from downtown.  Cell sizes vary, depending on the cell type, but a typical cell is around 15 um in radius (a micron is 1*10^-6 meters).  Let's pretend we rode the Magic School Bus and shrank down into a cell so it became the size of Dallas to our eyes.  In that case, a common protein called a ribosome, which is normally about 15 nm (1*10-9) would be about the size of a house at 15 meters (about 45 ft.).  DNA would be a little wider than most people are tall, with turns every 3.4 meters (~10 feet).  Water molecules would be about the size of a piece of paper.

Now imagine this version of Dallas is filled with paper-size water, house-sized proteins, and has a bunch of long strands of DNA coiled and packed tightly inside the inner loop.  Remember that cells are three-dimensional, and this Dallas-sized city has hundreds of millions of proteins busily going about their business, just like a city except above and below ground as well.  Just like any city, truck-sized proteins are constantly ferrying cargo in and out of the cell along the walls and up the dome.  New molecules are constantly being created, changed, and destroyed like any construction crew/waste management department.  Thousands of different pathways are required just to keep the cell running, before we take into account this cell's specialties (skin, lung, liver, brain, heart, etc.).  (Incidentally, if you didn't follow that link you missed out on a wonderfully complex cataloging of known human metabolic pathways.  There are almost certainly many additional pathways we don't know, and this is just metabolism!)

With this context, let's look at what it means to only have 20,000 genes.  The number of genes tells us how many different types of proteins the body can make.  Proteins are like nano-sized machines that do everything that happens inside a cell.  Think of it like a job, but in this case, instead of engineers and factory workers the jobs are more specialized and limited.  You'd need one type of protein to use a hammer and another type to use a screwdriver, and another type to use a copy machine, drive a forklift, etc.  Many genes contain the code for multiple different proteins, but they're usually all of the same type.  For example, one might use a claw hammer, another might use a mallet, another might use a sledgehammer, but they all do essentially the same function under slightly different conditions.

Okay, so there are roughly 20,000 different 'jobs' available inside a cell.  Why is that a problem?  Because everything a cell does is very complex.  Looking at all the different metabolic pathways in the list above, that would take a large number of the total 20,000 just to keep the cell alive.  Then you have complicated pathways like blood clotting, where dozens of different proteins coordinate to make this one thing happen.  Multiply this pattern out to everything cells need to be able to do and you quickly run out of genes.  And blood clotting isn't the exception to complexity in biochemical pathways, it's the rule.  If every function requires dozens of genes, and there are hundreds of different functions for each cell type, and there are hundreds of different cell types, plus all the basic biochemical pathways every cell needs, how can we get away with a 'mere' 20,000 genes to create all the complexity of the human body?

As we've seen, since each protein performs one basic 'job', multiple proteins have to come together to do anything complicated.  If you want to build a birdhouse, you'll need a hammer, nails, saw, drill, glue, etc.  But you can use all those tools elsewhere.  This appears to be how cells function, but at an even higher level of complexity.

Down the rabbit hole

Allow me a brief foray into the weeds to help illustrate this point.  When I was an undergraduate, I learned about biochemical signaling pathways like NFkappaB.  The name isn't important.  In fact, let's call it 'Mr. Smith'.  Now, I'd see Mr. Smith working in certain cells, required for some function or another, and think 'that function is driven by Mr. Smith'.  Indeed, it wouldn't work without him there, and there were clear ways to trace the signal to the response straight through Mr. Smith.

In some other context I'd see Mr. Smith in the same cell driving a different function.  What was going on?  Although I thought Mr. Smith was only one protein, Smith was actually a whole family of proteins.  Perhaps I saw Smith in the literature, but it mattered which member of the Smith family they were talking about.  (Often the researchers wouldn't know.  They just knew they were talking about one of the Smiths.)  At this point I thought I understood the topic - just keep the different Smiths sorted out - and I moved on, if a little more carefully than before.

Later, I discovered that each individual Smith was a combination of sub-units.  Like a Frankenstein with lots of interchangeable parts.  Four or six different sub-units might come together in different combinations to produce different effects.  "John Smith" wasn't just one protein, like I thought.  That was a description of potentially a dozen different possible John Smiths.  This was annoying, because I couldn't just look at a pathway and know what was going on.  I had to drill down to a much finer level of detail, but at least I was, finally starting to understand the underlying biology.

Then it got complicated.  I learned that a protein's effects depend on where it is in the cell, what else it's attached to (scaffold proteins, lipid rafts on the cell surface, etc.), how much of the protein is expressed in the cell, and what 'modifications' the protein has (other enzymes can come along and make small molecular changes to the protein that impact how it functions, and/or what it interacts with).

After 'complicated' we moved to Byzantine, and beyond.  I discovered that the genes controlling protein concentrations in the cell can be regulated directly, by various epigenetic DNA modifications (and histone modification), and by special non-gene short sequences of RNA that reduce translation of a gene into protein.  Proteins can be ubiquitinated and the extent of ubiquitination determines whether the protein is recycled or something else happens to it.  Finally, we know next to nothing about addition of lipids and other moieties, even though they're all over the place.  Each of these modifications/regulators/siRNA/etc. have pathways that regulate them in turn.  Any single signal will activate multiple pathways that all converge together to produce an enormously convoluted network of effects.  That is then processed and the output eventually becomes a dance of output signals that eventually lead to the overall 'response' you're looking for.

This is repeated for each function, in each cell.

This is why biology approaches fractal complexity.  When you think of a fractal you think of something you can 'zoom in' on and you'll see more detail.  And in the sense that you can physically zoom in on the molecular and then sub-atomic level the complexity may be reducible in a finite way.  There's a saying that "biology is just applied chemistry and chemistry is just applied physics".  And while that may be conceptually true at a basic level, it's entirely unhelpful to actually understanding what's going on.

Biology gets fractally complex, not when you physically zoom in with a microscope, but when you zoom in on the details of any one process.  As a researcher you think, "I should know more about how this protein works" and you go down the rabbit hole of different protein sub-units, modifications, gene regulations, modifications of the modifiers, etc.  Much of those sub-sub-sub categories remain to be explored, so you set it aside, hoping it's not integral to the effect you're studying but secretly wondering if you're missing something important at the next level of complexity down.

The rabbit hole goes much deeper, but we don't have the gear to get ourselves down there to explore it yet.  We have no idea how deep it goes.  Long before we've scratched the surface of the current level of complexity, we discover a new level nobody ever suspected existed.

Comments

Popular posts from this blog

Reverse Engineering Life

Cancer update: precision oncology comes into its own