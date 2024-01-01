Researchers at the University of Bonn have made a significant breakthrough in understanding the inner workings of machine learning applications used in pharmaceutical research. The traditional perception of artificial intelligence (AI) models as “black boxes” has been challenged by Prof. Dr. Jürgen Bajorath and his team. Their findings suggest that these AI systems primarily rely on recalling existing data rather than learning specific chemical interactions to predict the effectiveness of drugs.

In drug discovery research, finding the most effective drug molecules is crucial. Scientists often search for compounds that can dock onto proteins, triggering specific physiological actions or blocking undesirable reactions in the body. With a vast number of available chemical compounds, this process can sometimes feel like searching for a needle in a haystack. Scientific models, including AI applications, aim to predict which molecules will best bind to target proteins.

One type of AI application gaining popularity in drug discovery research is “Graph neural networks” (GNNs). These networks are trained using graph representations of protein-ligand complexes and are adapted to predict how strongly a molecule binds to a target protein. However, the inner workings of GNNs have remained elusive, resembling a “black box” that researchers can’t fully understand.

To shed light on this mystery, the researchers analyzed six different GNN architectures using their specially developed “EdgeSHAPer” method. They wanted to determine whether the GNNs truly learned protein-ligand interactions or arrived at predictions through other means. According to their findings, the GNN models mainly “remembered” chemically similar molecules encountered during training, regardless of the target protein. The learned chemical similarities then influenced the predictions.

These findings suggest that GNNs may not actually learn the specific interactions between active substances and proteins. As a result, the predictions made by these models may be overrated, as similar forecasts can be achieved using simpler methods and chemical knowledge. However, these findings also present opportunities for further improvement in AI models.

Further research is needed to explore new approaches that enhance GNNs’ ability to learn the most important interactions and improve the accuracy of predictions. By understanding the limitations and strengths of AI models, scientists can optimize their use in drug discovery research and make more informed decisions, ultimately accelerating the development of effective therapies.

FAQ

What is machine learning in drug research?

Machine learning is a subset of artificial intelligence (AI) that focuses on developing algorithms and statistical models to enable computers to learn from data and make predictions or decisions without explicit programming. In drug research, machine learning can be used to identify patterns in data, classify data into different categories, and make predictions about the effectiveness of drug molecules.

How do machine learning applications work in drug research?

Machine learning applications, such as Graph neural networks (GNNs), use graph representations of protein-ligand complexes to predict the binding strength of a molecule to a target protein. These applications are trained using existing data and learn from patterns in the data to make predictions about the effectiveness of drug molecules.

What were the findings of the University of Bonn researchers?

The researchers found that GNN models used in drug research primarily rely on recalling existing data rather than learning specific chemical interactions. The models “remembered” chemically similar molecules encountered during training and based their predictions on these learned chemical similarities, regardless of the target protein. These findings suggest that the predictions made by GNN models may not accurately reflect the actual interactions between active substances and proteins.