Proteins manage many vital functions inside cells, which are essential for life, breathing, and thought. These proteins help cells talk to each other, manage a cell’s basic functions, and transform DNA information into more proteins.
Their effectiveness relies on how the amino acids in proteins fold into complex, specific three-dimensional forms that allow them to work properly.
Before this decade, figuring out a protein’s 3D shape involved purifying the protein and using a process that was both time-consuming and required a lot of effort. This changed when DeepMind, an AI division of Google, introduced Alpha Fold in 2021.
Another similar project from the academic community followed soon after. The software had its limitations, such as difficulties with larger proteins and not always providing reliable solutions for every protein. However, many of its predictions were impressively accurate.
However, these structures provided only part of the information. For proteins to work, they usually need to interact with other elements like other proteins, DNA, chemicals, and membranes. The first version of AlphaFold was able to manage some interactions between proteins, but many interactions were still unclear.
Now, DeepMind has released the third version of AlphaFold. This new version includes major changes and updates to its core system. With these improvements, the software can now deal with more types of protein interactions and modifications.
The first version of AlphaFold was built on two main software features. One feature considered the evolutionary constraints of a protein. By examining the same protein across different species, it’s possible to identify which parts remain consistent, suggesting their crucial role in the protein’s function.
These consistent parts are likely to be found in the same place and orientation within the protein’s structure. To achieve this, AlphaFold originally gathered as many examples of a protein as possible and aligned their sequences to spot areas with minimal changes.
However, lining up many proteins is computationally demanding, as it involves solving more constraints. In the updated version, the AlphaFold team still finds multiple related proteins but now primarily uses pair-wise alignments within the related group.
This method may not provide as much information as aligning many sequences together, but it is much more efficient computationally, and the information that gets lost doesn’t seem crucial for determining protein structures.
Using these pair-wise alignments, another software component determined the spatial relationships between pairs of amino acids in the target protein. These relationships were then converted into exact spatial coordinates for each atom through a program that considered physical traits of amino acids, like the parts that could rotate.
In AlphaFold 3, a diffusion module is responsible for predicting the exact positions of atoms. This module is trained using a known structure and altered versions of it where some atom positions are randomly shifted. This training helps the module refine the approximate locations provided by relative positions into precise atomic coordinates. It doesn’t require information about the physical traits of amino acids because it learns this from studying many structures.
DeepMind trained the module with two types of noise: one where only atom positions were shifted but the overall structure stayed the same, and another where larger structural changes were made, affecting many atom positions.
Through training, it was found that around 20,000 examples of protein structures were needed for AlphaFold 3 to accurately predict 97 percent of them in tests. At 60,000 examples, it was also accurately predicting interactions between proteins and complex formations with other molecules at a similar rate.
None of the complexes achieved the same level of accuracy as basic protein structures. However, for proteins combined with a signaling molecule, around 75% of the predictions were correct. For protein-DNA complexes, accuracy was about 60%, and for protein-RNA complexes, it was around 40%.
These percentages are significantly higher than those achieved by other top prediction tools. AlphaFold 3 was also able to predict structures for proteins that had undergone chemical modifications, like the addition of sugar links, which is a common modification.
The use of a diffusion engine in AlphaFold 3 raised concerns because these engines are known to occasionally create “hallucinations.” This happens especially with proteins that have segments without a set structure, such as loops of amino acids that move freely in water surrounding the protein. The diffusion module might create a structure for these loops even though they don’t actually have one.
To reduce hallucinations, the DeepMind team trained the module using structure predictions from a previous version of their software, which usually places unstructured protein parts in a very recognizable setup. This strategy was somewhat effective, and it helped identify most hallucinations as low-confidence predictions.
The team also encountered other sporadic issues. Sometimes, the software struggled with chirality, which involves molecules having mirror-image forms; biological molecules typically exist in just one of these forms. Additionally, the software occasionally positioned atoms so they would physically overlap. Lowering the scores of such predictions helped reduce, but not completely eliminate, this problem.
Finally, the software can predict interactions between proteins and the antibodies that bind them, but this is highly resource-intensive. It often requires making multiple predictions and assessing their likely accuracy. While this is consistent with findings from other research, it’s still a disappointment given the potential value of understanding how antibodies interact with their targets.
What we think?
I think AlphaFold 3 will greatly help scientists understand protein structures and their interactions. It seems more accurate and can handle more complex situations than before.
The improvements could lead to new discoveries in medicine and biology. However, there are still some issues like hallucinations and errors in predicting certain complex forms. Overall, I believe it’s a big step forward, but there is room for more improvement.