My views on anything

Beyond building predictive models: TwinOps in biomanufacturing

Beyond building predictive models: TwinOps in biomanufacturing

On the wave of more and more manufacturers embracing the pervasive mission to build digital twins, also biopharmaceutical industry envisions a significant paradigm shift of digitalisation towards an intelligent factory where bioprocesses continuously learn from data to optimise and control productivity. While extensive efforts are made to build and combine the best mechanistic and data-driven models, there has not been a complete digital twin application in pharma. One of the main reasons is that production deployment becomes more complex regarding the possible impact such digital technologies could have on vaccine products and ultimately on patients. To address current technical challenges and fill regulatory gaps, this paper explores some best practices for TwinOps in biomanufacturing – from experiment to GxP validation – and discusses approaches to oversight and compliance that could work with these best practices towards building bioprocess digital twins at scale.

Please read our whole pre-print here:

Senior AI/ML engineer in Bengaluru, India at GSK

Senior AI/ML engineer in Bengaluru, India at GSK

I’m hiring a Senior AI/ML engineer in Bengaluru, India. You will work with the rest of our international team on delivering cutting edge AI/ML solutions to support our vaccines business. This is a great role to grow into a lead data scientist as well as developing your machine learning and modern DevOps skills.—Karnataka—Bengaluru/Senior-AIML-Engineer_272917

My next career step: GSK Vaccines

My next career step: GSK Vaccines

Weird day, after nearly 5yrs years at Microsoft I’ve handed in my badge and laptop. Very much excited about my next step that is even deeper into healthcare, but also sad leaving such a great company with amazing people behind. 

I cannot be more proud to join GSK as their new director of Analytics and AI. Their mantra feels like a homecoming: “We are a science-led global healthcare company with a special purpose: to help people do more, feel better, live longer.”

The economic case for clinical genomics

The economic case for clinical genomics

A great systematic review by Schwarze in Genetics in Medicine on the cost benefits of Whole Genome Sequencing (WGS) and Whole Exome Sequencing (WES) in the clinical settings.

Main findings that interested me:

  • Doing molecular testing (using single-gene, panel testing, or microarrays) for genetic disorders only results in 50% molecular diagnosis. Many patients will still be going on extensive diagnostic testing to diagnose patients that is both slow and expensive.
  • Although the raw costs of sequencing are dropping in the clinical genetics setting the costs of both WGS and WES are stable and don’t decrease.
  • Diagnostic yield between WES and WGS varies a-lot. With for WES ranging 3 ~ 79% and for WGS 17 ~ 73%. Authors do note that in many of these cases in these studies the patients were hard to diagnose traditionally.

Skin lesion segmentation using Deep Learning framework Keras – ISIC 2018 challenge

Skin lesion segmentation using Deep Learning framework Keras – ISIC 2018 challenge

Every summer I try to learn something new (methods, techniques, frameworks, tools, …). This year, I decided to focus on Keras, which is a Python framework for rapid AI prototyping. Or as they state: “Being able to go from idea to result with the least possible delay is key to doing good research.” Keras also seems the place where a-lot of the AutoML innovation is happening. In addition to just learning the framework, spending some time with Keras will also help me to hone my deep-learning and machine learning skills.

ISIC 2018 challenge for lesion boundary detection

As a dataset, I’ve been using the ISIC 2018 challenge data and in particular challenge 1. In this challenge, we try to detect lesion boundaries. The training data consists of 2594 images and 2594 corresponding ground truth response masks (arXiv paper).

The first challenge has skin lesion images and corresponding masks. These will be used for training and evaluation purposes. These masks have been manually created (or at least curated) and should represent what a medical expert would consider as the lesion.

Neural network design: U-net

As I’m trying to test the framework and am not looking for the “best” model, I’ve decided to go with the U-net architecture implementation. This implementation was used in 2015 ISBI challenge for cell tracking. Read more in their arXiv paper: U-Net: Convolutional Networks for Biomedical Image Segmentation.

Initial training

As I’m testing my models on my Surface Book 2 (with GPU that is) I’ve decided to resize the images to make sure they would fit in memory. In a first try, I decided to resize the images to a square format (256×256 pixels), assuming that this would make things easier with the implementation in Keras. Loss function for the training is basically just a negative of Dice coefficient. When testing on my first try, we got results as shown below. We do manage to get segmentation with decent results but don’t seem to “learn” much new after ~8 epochs.

More epochs, better resizing, image augmentation

To start tweaking our results I decided to try the following steps 1) resize masks/images while keeping aspect ratios (500×666 pixels) and 2) resize, flip, and rotate our images/masks. Luckily, image augmentation is extremely easy in Keras. I just followed the steps in this blog. I only had to make a few small tweaks to make it work for our scenario where we augment both masks and images (and want to make sure we do the same augmentation to both at the same time). See some code snippet below:

data_gen_args = dict(width_shift_range=0.2, height_shift_range=0.2, horizontal_flip=True, vertical_flip=True, rotation_range=90, zoom_range=0.2)

image_datagen = ImageDataGenerator(**data_gen_args)
mask_datagen  = ImageDataGenerator(**data_gen_args), augment=True, seed=seed), augment=True, seed=seed)

image_generator = image_datagen.flow(imgs_train, seed=seed)
mask_generator = mask_datagen.flow(imgs_mask_train, seed=seed)
#create single generator with mask and images
train_generator = zip(image_generator, mask_generator)

Doing these 2 steps, created already much better results. We went from ~0.58 Negative Dice to ~0.68 Negative Dice.

Results of image segmentation

Looking at the “best” model created in my last try, we notice that most of the masks look rather good (or at least, as expected). However, when comparing to the ground-truths masks, there is an observation that the current model is generalising the borders and creating more rounded borders than the actual masks have. These rounded borders have most likely to do with the massive down-scaling of the images.

Conclusion and next steps

Most significant learning for me was that it has become almost effortless to train a deep neural network. Once you have decided (or copied as in my case) a network design you can reasonably quickly implement it and try if learning takes place (and thus if it is possible to use AI/deep-learning for this). This means, more time to spend on the actual model than on the plumbing AND more time curating and collecting the high quality data needed for doing AI at scale. With the current model, I think some improvements could be made by trying to down-size the image size even less. I’m currently looking into if Auto-Keras can be used to find a better architecture for our type of problem.