Data Science & Artificial Intelligence:

Unlocking new science insights

Home / R&D / Data Science & Artificial Intelligence

At AstraZeneca we harness data and technology to maximise time for the discovery and delivery of potential new medicines. Data science and artificial intelligence (AI) are embedded across our R&D to enable our scientists to push the boundaries of science to deliver life-changing medicines.

Data science and AI are transforming R&D, helping us turn science into medicine more quickly and with a higher probability of success. We are applying AI throughout the discovery and development process, from target identification to clinical trials, to uncover new insights to guide our drug discovery and development.

Jim Weatherall Vice President, Data Science & AI, R&D

Today we are generating and have access to more data than ever before. In fact, more data has been created in the past two years than in the entire previous history of the human race. But the value of this data can only be realised if we are able to analyse, interpret and apply it. Right across our R&D, we are using AI to help us decipher this wealth of information with the aim of:

•  Gaining a better understanding of the diseases we want to treat

•  Identifying new targets for novel medicines

•  Predicting which molecules to make and how to make them

•  Better predicting clinical success

•  Pioneering new approaches in the clinic and beyond

Our scientists are using AI to help redefine medical science in the quest for new and better ways to discover, test and accelerate the potential medicines of tomorrow. The following sections tell just some of the stories behind how data science and AI are starting to make a difference to our R&D efforts.


Gaining a better understanding of diseases we want to treat

We are determined to advance our fundamental understanding of diseases such as cancer, respiratory disease and heart, kidney and metabolic diseases. Because by learning what causes or drives disease, we hope to find new ways to treat, prevent or even cure them.

Using graphs to turn knowledge into insights
Knowledge graphs are networks of contextualised scientific data facts and the relationship between them. Our knowledge graphs integrate genomic, disease, drug, clinical and safety information, helping to overcome confirmation bias and to turn data into insights. Machine learning and AI applications such as graph neural networks can then mine this data to uncover previously unknown patterns and make novel target predictions. In 2021, we selected the first two AI-generated drug targets into our portfolio, from our collaboration with BenevolentAI. We share parts of our internal knowledge graph work on GitHub.  

Unlocking secrets in our genes and beyond
Our Centre for Genomics Research is working towards the analysis of up to two million genomes by 2026. We use best-practice cloud environments to process and apply advanced data and AI tools to interpret the vast genomics data faster and more robustly than previously possible.

Beyond the genome lie the dynamic realms of the transcriptome, proteome and metabolome – largely untapped repositories of rich information that if connected could tell us more about what is driving disease. Multi-omics is the integration of these datasets which, with the help of machine learning and AI, can help us predict what a drug molecule does in a cell with far greater certainty.

Predicting what molecules to make next and how to make them

Through AI, we are transforming medicinal chemistry, augmenting traditional drug design with sophisticated computational methods to predict what molecules to make next and how to make them.

Werngard Czechtizky Head of Medicinal Chemistry, Research and Early Development, Respiratory & Immunology, BioPharmaceuticals R&D

As our discovery efforts identify new targets, we must find more efficient ways to design traditional or novel therapeutics that affect those targets and can move through our pipeline successfully. 

The traditional way of generating novel molecular ideas involves a lengthy and intensive period of optimisation cycles making and testing molecules, as well as manually reviewing vast amounts of literature and data.

Today we use AI to help us deduce the best molecules to make in the shortest time, across 70 percent of our small molecule chemistry projects.

AI is also helping us design and develop other therapeutic modalities including peptide or protein therapeutics, nucleotide-based therapeutics and cell-based therapeutics.

Using AI for fast, accurate image analysis

Every week, our pathologists analyse hundreds of tissue samples from our research studies. They check them for disease and for biomarkers that may indicate patients most likely to respond to our medicines. It is very time consuming which is why we are training AI systems to assist pathologists in analysing samples accurately and more effortlessly. This has the potential to cut analysis time by over 30%.

For one of our AI systems, we implemented an approach inspired by how some self-driving cars understand their environment. We trained the AI system to score tumour cells and immune cells for a biomarker, called PD-L1, which has potential to help inform immunotherapy-based treatment decisions for bladder cancer.

Cancer is not the only disease where imaging and AI are transforming research. Recently one of our biopharmaceuticals research teams undertook an ambitious project to train deep neural networks to predict disease risk and related biomarkers from retinal fundus images.  

Accelerating clinical trials through data science and AI

Randomised Clinical Trials (RCTs) are currently the method of choice for pharma when it comes to assessing potential new medicines. However, published data shows they have become more expensive and complex over time.

Advances in data science can help us re-think clinical trials, enhancing current practice and finding new ways to discover and develop potential new medicines.

For example, the rapid adoption of high-quality Electronic Health Records (EHRs) represents a vast, rich, and highly relevant data source that has a huge potential to improve clinical trial implementation.

Federated EHR technology is unlocking new opportunities to enhance clinical research and transform the way we do clinical trials. The technology has the potential to refine or replace many clinical trial processes including patient identification, selection, trial conduct, and capture of data.

We are also employing AI and machine learning tools to glean more value from clinical trial data. Historically, we have been proficient in using data from trials to analyse, interpret and report on the safety and efficacy of the trial drug. But we want to maximise the value of the data we have already collected.

Machine learning and AI are also being applied for event adjudication in clinical trials to enable us to optimise the process at different stages with the intent of reducing the time overall. 

Data re-use can help us better design our drug development strategies and programmes. This can help us design smarter trials, strengthen our scientific discoveries, and ultimately, in the future, has the potential to help our patients receive the best treatments.

Building the right data backbone

Today we are generating and have access to more data than ever before. Data and analytics have the potential to transform our business, but the true value of scientific data can only be realised if it is “FAIR” - Findable, Accessible, Interoperable and Reusable.

AstraZeneca’s R&D and IT groups are working closely together to create an industry-leading enterprise data and AI architecture. This will help us answer key business questions and enhance our ability to harness new tools and technologies, such as AI and machine learning, both now and in the future.

We are also mobilising a team of data scientists, bioinformaticians, data engineers and machine learning experts from across the company to ensure we are collecting, organising and using the right data in the best way.

AstraZeneca’s principles for ethical data and AI


Rapid developments in AI technology have brought us in to uncharted territory, and companies and regulators must work together to meet the new challenges posed. Our principles will empower us and our partners to navigate this new environment safely and effectively. By encouraging innovation and evolution while maintaining our values, they provide a long-term ethical foundation to uphold our AI governance.

During 2020, we engaged a diverse range of experts both inside and outside AstraZeneca to develop principles for ethical data and AI, aligned with our Code of ethics and values. These values work for patients and employees and enable AstraZeneca to make a positive contribution to society.

Pushing the boundaries of science through AI expertise

Our leading scientists are using AI to help redefine medical science in the quest for new and better ways to discover, test and accelerate the potential medicines of tomorrow.

Where could you fit in?

Whatever your role, everyone in Data and AI makes a big contribution to our purpose, and our enterprise-wide transformation. Whether you’re a Data Scientist, Data Engineer or Information Architect, a Chemoinformatian, Bioinformatician or Machine Learning Engineer – there’s a team for you.

Collaborating to help answer big questions in AI

We partner globally to innovate together, building an ecosystem that brings the outside in.

We start with the challenge we need to solve and identify the best partners, whether academic, tech or industry, all with the aim of fueling scientific discovery and development.  

Examples of our collaborators include:

Join us

If you believe in the power of what science can do, join us in our endeavour to push the boundaries of science to deliver life-changing medicines.

Collaborate with us

We know that however innovative our science, however effective our medicines and delivery, to achieve all we want to achieve, we cannot do it alone.

Veeva ID: Z4-48758
Date of preparation: September 2022