How big data is being mobilised in the fight against leukaemia

Healthy cell function relies on well orchestrated gene activity. Via a fantastically complex network of interactions, around 30,000 genes cooperate to maintain this delicate balance in each of the 37.2 trillion cells in the human body.

Broadly speaking, cancer is a disruption of this balance by genetic changes, or mutations. Mutations can trigger over-activation of genes that normally instruct cells to divide, or inactivation of genes that suppress the development of cancer. When a mutated cell divides, it passes the mutation down to its daughter cells. This leads to the accumulation of non-functioning, abnormal cells that we recognise as cancer.

Our laboratory is focused on understanding how one particular cancer – chronic myeloid leukaemia or CML – works. Each year more than 700 patients in the UK – and over 100,000 worldwide – are diagnosed with CML. After recent advances, almost 90% of patients under the age of 65 now survive for more than five years.

But in the vast majority of patients CML is currently incurable and lifelong treatment means that patients must live with side effects and the chance of drug resistance arising. With increasing numbers of CML patients surviving (and treatment costing between £40,000 and £70,000 per patient a year), increasing strain is being placed on health services.

A single mutation

CML is perhaps unique in cancers in that a single mutation, named BCR-ABL, underlies the disease biology. This mutation originates in a single leukaemic stem cell, but is then propagated throughout the blood and bone marrow as leukaemia cells take over and block the healthy process of blood production. The presence of BCR-ABL affects the activity of thousands of genes, in turn preventing these cells from fulfilling their normal function as blood cells.

Drugs that specifically neutralise the aberrant effects of this mutation were introduced to the clinic from the early 2000s. These drugs have revolutionised CML patient care. Many are now able to live relatively normal lives with their leukaemia under good control.

But while these drugs kill the more mature daughter cells of the originally mutated leukaemia stem cell, they have not fully lived up to their initial billing as “magic bullets” in the fight against cancer. This is because the original “seed” population of leukaemic stem cells evade therapy, lying dormant in the bone marrow to stimulate new cancer growth when treatment is withdrawn.

To truly cure CML we must expose, understand the inner workings of, and uproot the leukaemia stem cells. And to do this, we need to learn more about them. How do they survive the treatment that so readily kills their more mature counterparts? Which overactive or inactivated genes protect them?

We believe that the answers to these questions lie in the analysis of biological “big data”. Genome-scale technologies now allow scientists to measure the activity (or “expression”) of every gene in the genome simultaneously, in any given population of cells, or even at the level of a single cell. Comparison of expression data generated from leukaemia stem cells with the same data generated from healthy blood stem cells will reveal single genes or networks of genes potentially targetable in the fight against leukaemia.

Big data to the rescue

In a project funded by Bloodwise and the Scottish Cancer Foundation, we have created LEUKomics. This online data portal brings together a wealth of CML gene expression data from specialised laboratories across the globe, including our own at the University of Glasgow.

Our intention is to eliminate the bottleneck surrounding big data analysis in CML. Each dataset is subjected to manual quality checks, and all the necessary computational processing to extract information on gene expression. This enables immediate access to and interpretation of data that previously would not have been easily accessible to academics or clinicians without training in specialised computational approaches.

Consolidating these data into a single resource also allows large-scale, computationally-intensive research efforts by bioinformaticians (specialists in the analysis of big data in biology). From a computational perspective, the fact that CML is caused by a single mutation makes it an attractive disease model for cancer stem cells. However, existing datasets tend to have small sample numbers, which can limit their potential.

The more samples available, the higher the power to detect subtle changes that may be crucial to the biology of the cancer stem cells. By bringing all the globally available CML datasets together, we have significantly increased the sample size, from two to six per dataset to more than 100 altogether. This offers an unprecedented opportunity to analyse gene expression data to expose underlying mechanisms of this disease.

As of March 2017, the portal is up and running in the public domain. We are planning to tour Scotland and present at international conferences, aiming to train researchers in how best to exploit this new resource. Ultimately, we hope that this tool will lead to new ideas and approaches, and attract more funding, in the fight against CML. And while we continue to expand our representation of CML data in real time from research centres all over the world, we also plan to begin incorporating data from other types of leukaemia.

In recent years, targeted therapies have become hugely important in cancer research. By providing these data to the CML research community within LEUKomics, we hope to mobilise new research into cancer-causing leukaemic stem cells, and ultimately design treatments to target them without affecting healthy cells. Our database provides a critical stepping stone in this process.


Authors: Lorna Jackson, PhD candidate (Paul O’Gorman Leukaemia Research Centre), University of Glasgow and Lisa Hopcroft, Research Associate (Institute of Cancer Sciences), University of Glasgow.


Source :

Online data portal LEUKomics :