For a small percentage of cancer patients, doctors are unable to determine where their cancer originated, making it difficult to choose a treatment. However, researchers at MIT and Dana-Farber Cancer Institute have developed a new approach using machine learning to identify the origins of those enigmatic cancers. By analyzing the sequence of approximately 400 genes, the computational model created by the researchers can predict where a tumor originated in the body.

The researchers demonstrated that they could accurately classify at least 40 percent of tumors of unknown origin with high confidence using this model. This resulted in a 2.2-fold increase in the number of patients eligible for targeted treatment based on the origin of their cancer.

The lack of knowledge regarding the origin of cancer prevents doctors from prescribing precision drugs, which are typically approved for specific cancer types. These drugs are often more effective and have fewer side effects than treatments used for a broad range of cancers. The new model developed by the researchers could potentially aid in treatment decisions by guiding doctors towards personalized treatments for patients with cancers of unknown primary origin.

The researchers trained the machine learning model on data from nearly 30,000 patients diagnosed with one of 22 known cancer types. They then tested the model on about 7,000 tumors whose origin was known, achieving an accuracy of approximately 80 percent. They further validated the model on a set of about 900 tumors from patients with cancers of unknown primary, where it made high-confidence predictions for 40 percent of the tumors.

The model’s predictions were also compared to germline mutations in a subset of tumors that revealed a genetic predisposition to develop a particular type of cancer. The researchers found that the model’s predictions were more likely to match the type of cancer predicted by the germline mutations.

The model’s predictions were further validated by comparing the survival time of patients with the prognosis of the predicted cancer type. Patients whose predicted cancer had a poor prognosis showed shorter survival times, while those with cancers that typically have better prognoses had longer survival times.

In addition, the model identified 15 percent more patients who could have received targeted treatment if their cancer type had been known. These findings are potentially clinically actionable, as existing precision treatments can be used for this population.

The researchers aim to expand their model to include other types of data, such as pathology and radiology images, to provide a more comprehensive prediction using multiple data modalities. The goal is to predict not only the type of tumor and patient outcome but also the optimal treatment.