Getting software to “hallucinate” reasonable protein structures

Process resembles repeatedly asking the software "does this look like a protein?"

Screen-Shot-2021-12-01-at-7.16.27-PM.jpg

Top row: the hallucination and actual structure. Bottom row: the two structures superimposed.

Anishchenko et. al.

Chemically, proteins are just a long string of amino acids. Their amazing properties come about because that chain can fold up into a complex, three-dimensional shape. So understanding the rules that govern this folding could not only give us insights into the proteins that life uses but could potentially help us design new proteins with novel chemical abilities.

There's been remarkable progress on the first half of that problem recently: researchers have tuned AIs to sort through the evolutionary relationships among proteins and relate common features to structures. As of yet, however, those algorithms aren't any help for designing new proteins from scratch. But that may change, thanks to the methods described in a paper released on Wednesday.

In it, a large team of researchers describes what it terms protein "hallucinations." These are the products of a process that resembles a game of hotter/colder with an algorithm, starting with a random sequence of amino acids, making a change, and asking, "Does this look more or less like a structured protein?" Several of the results were tested and do, in fact, fold up like they were predicted to.

AI hallucinations

The odd terminology here can't be blamed on the authors of the new paper. Instead, the term "hallucination" was applied to work done by Google's AI team. That work involved starting with an image of random pixels and asking a neural network trained to recognize fruit, "How much does this look like a banana?" After some random tweaks, the question was asked again; any changes that increased the image's banana-like properties were retained, and the process repeated.

The end result clearly has banana-like aspects, but it looks more like a cubist and impressionist both had a go at the bananas before running a few random Photoshop filters. While the term isn't used in Google's blog post, others labeled the images "hallucinations."

Random noise (left) gets converted to a banana-like hallucination (right) by repeated queries to a banana-recognition AI.

Google

The researchers thought that, if this works for AIs that handle image recognition, maybe it would also work with AIs that suggest 3D structures for proteins.

Those of you paying careful attention here may notice a problem, however. The biology-specific algorithms don't output a rating of whether something is structure-like; instead, they simply assume there's a structure and try to suggest what it is. So they're not inherently set up to do the sort of getting-hotter/getting-colder evaluation that's needed to create a hallucination.

The research team figured out a way around this, however. Unstructured proteins tend to spread out in space, with only a handful of neighboring amino acids interacting with each other. Highly structured proteins, by contrast, tend to be compact and fold so that amino acids in different parts of the chain can interact with each other. The algorithm they were using for structure prediction, trRosetta, outputs its predictions as the relative location of each amino acid in a 3D space. So, by using a measure of their spread, the authors were able to provide a sort of answer to the question "how structured does this look?"

Starting from random

To start their structural hallucinations, the researchers generated numerous proteins composed of 100 random amino acids and fed them to the trRosetta software. As expected, all of the proteins were unstructured at the start. Then, for each of the 100 sequences, an amino acid was chosen at random and changed to a different amino acid that was also chosen at random. trRosetta then ran a new analysis, and the results were compared; any change that made things look more structured was retained.

By about 20,000 repeats of this process, the compactness of the arrangement of the amino acids in these hallucinations were similar in nature to those of regular proteins. But, critically, the amino acid sequences didn't look like those of known proteins. The structures themselves didn't either. In the proteins used by life, there are often loops of poorly structured amino acids that perform key functions. But the hallucinations weren't selected for function; they were selected for compactness. So, those sorts of extended loops were not found in the hallucinations.

There are a couple of reasons to be skeptical that actual chains of amino acids would form these structures in the real world. trRosetta isn't the latest and greatest in structure-prediction software that's been making all the headlines. And trRosetta was trained to figure out structure in part by evaluating evolutionary relationships. These proteins are all brand new and have no evolutionary relatives. The process would only work if the neural network used in trRosetta had inferred principles of protein structure from those evolutionary relationships.

The only way to tell whether it worked is to make the actual proteins and see what they look like. So, the research team put together genes that encoded 129 hallucinatory proteins.

Out of computers, into E. coli

There seem to be two lessons to what the researchers saw when they made the proteins. The researchers could purify 27 of the proteins and find indications that they were likely forming the sorts of structures that the software had predicted. In a handful of cases, they obtained detailed structures, which showed the computer predictions were good in all cases. So, the software seems to be pretty good at getting us into the rough neighborhood of a real structure.

But the other thing that was clear was that a lot of the hallucinatory proteins formed aggregates by sticking together. The researchers suspect this was a product of the fact that many of the proteins that trRosetta was trained on are part of multiprotein complexes, and the basic principles of interactions among proteins work well when applied to new proteins. In other words, part of what the system recognized as "more like a structure" was actually a general feature of the proteins it was trained on.

The results as a whole, however, suggest that training AIs on proteins with known structure does, in fact, enable them to identify basic principles of the underlying biochemistry. And we can use that in reverse to come up with some novel structures, starting with nothing but a chain of random amino acids. So that's all good.

But there's not a clear link between this sort of structural information and any kind of function. So, while we can potentially compute a protein that will form a stable coil, we're still a long way off from getting that coil to, for example, break down a plastic. This work may be a critical first step toward that sort of capability. But there's no guarantee that it is at this point.

Nature, 2021. DOI: 10.1038/s41586-021-04184-w (About DOIs).

Getting software to “hallucinate” reasonable protein structures

Recommended Comments

There are no comments to display.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Add a comment...

× Pasted as rich text. Paste as plain text instead

Only 75 emoji are allowed.

× Your link has been automatically embedded. Display as a link instead

× Your previous content has been restored. Clear editor

× You cannot paste images directly. Upload or insert images from URL.

Insert image from URL

Sign In

Getting software to “hallucinate” reasonable protein structures

Process resembles repeatedly asking the software "does this look like a protein?"

AI hallucinations

Starting from random

Out of computers, into E. coli

User Feedback

Recommended Comments

Join the conversation

Recently Browsing 0 members

nsane.down

News

Browse

Activity