Artikel

Digital twins and hybrid models in biologics production: A model comparison

Thomas de Marchin

Digital twin technology has the potential to play a transformational role in drug development and manufacturing.

In our first article How Digital Twins and Hybrid Modelling Optimize Cell Culture Process for Better Production, we introduced our vision of digital twins for bioprocesses as virtual replicas that combine data-driven and mechanistic models to optimize biologics manufacturing. We showed how hybrid models can support real-time optimization, scale-up, and regulatory compliance. Digital twins are especially crucial for personalized medicine and cell and gene therapies, as they enable adaptable production of small batches tailored to individualized treatments.¹

In this second article, we put theory into practice by comparing a classical data-driven model with a hybrid model for predicting viable cell density (VCD) and product accumulation – key parameters of productivity – in a perfusion bioreactor.

We demonstrate how digital twins enable innovators to answer questions such as:

How can we identify optimal conditions for maximizing cell growth and productivity?
When should we initiate perfusion and adjust feed strategies to avoid substrate depletion and cell density collapse?
Can we detect early signs of instability and predict hard-to-measure parameters such as substrate concentration in the bioreactor?

Figure 1: Illustration of a perfusion bioreactor

Case Study: Perfusion Bioreactor cell culture modeling

Figure 1: Illustration of a perfusion bioreactor

In a perfusion bioreactor, cells are continuously supplied with fresh nutrients while waste and product are removed, but the cells themselves are retained. This creates a stable, nutrient-rich environment where cells can thrive for long periods, leading to higher cell densities and increased productivity. Perfusion bioreactors are used extensively for production of biologics.²

Modeling attributes such as viable cell density (VCD) or product accumulation alongside process parameters like substrate concentration, temperature, or agitation using predictive models is a crucial step in drug process characterization.³

Figure 1: Illustration of a perfusion bioreactor

Classical bioreactor modeling typically relies on purely data-driven approaches, where simple mathematical functions (like lines or curves) are fitted to experimental data to make predictions.³ Although this method is straightforward and computationally efficient, it often fails to capture the underlying biological or physical mechanisms driving the system. As a result, it provides limited insight into why specific outcomes occur and struggles to generalize beyond the training data range.

Moreover, data-driven methods face challenges when dealing with complex systems involving numerous process parameters and limited experimental data, which is common in drug development. Mechanistic or hybrid models resolve some of these issues by incorporating prior process knowledge in the form of biological and physical laws.⁴ Hence, while classical data-driven models serve as a useful starting point, we believe integrating mechanistic knowledge through hybrid approaches enhances predictive power and facilitates better process optimization in bioreactors.

We developed a hybrid model that integrates mechanistic knowledge – such as nutrient consumption rates and metabolic pathways – with data-driven machine learning components that capture complex or poorly understood aspects. The mechanistic part provides a solid foundation of prior process knowledge, while the machine learning component models unknown behaviors. Hybrid models offer enhanced robustness in extrapolation and can accurately capture dynamic behaviors like changes in cell growth rates and nutrient uptake under varying conditions. Although more complex to develop, they can reduce the need for extensive experimentation and enable reuse of existing models across products. They also excel in real-time optimization, predictive scale-up, and regulatory compliance by improving process understanding during characterization (FDA Stage 1) and supporting development of real-time control limits for specific batches (FDA Stage 3).

Our hybrid model takes process parameters as inputs and predicts biological variables such as rates, yields, and plateau levels for biomass and product formation. These predicted parameters are then fed into a system of ordinary differential equations (ODEs), which calculate the dynamic responses of the bioreactor over time. Figure 2 illustrates the key steps of both the classical purely data-driven approach and the hybrid approach.

Figure 2: Comparison of data-driven and hybrid models

To evaluate both approaches, we simulated 29 experiments varying temperature (20–40 °C) and agitation rate (150–350 rpm), capturing different dynamics of biomass growth, product formation, and glucose consumption over time (Figure 3). The model was trained on 21 experiments. Performance was then tested on two sets of four unseen experiments: one set within the training domain (test1) to assess interpolation capability, and another outside this domain (test2) to evaluate extrapolation and generalization.

Digital twins resource article image

Figure 3: Experimental domain of the data used to train and test the models. Test 1 dataset lies within the experimental domain of the training dataset, while test 2 dataset lies outside.

Figure 4 and Figure 5 show the predictions on the two test datasets. We can see that the hybrid model (green) fits the data (red points) better than the classical approach (blue). This is also shown in Figure 6 which shows the Root Mean Square Error (RMSE), a common measure of prediction accuracy that quantifies the average difference between predicted and observed values. The smaller the RMSE, the better the model’s predictive performance. The hybrid model achieves consistently lower RMSE than the classical approach in both domains. This indicates that the hybrid model generalizes better and makes more accurate predictions, even when extrapolating beyond the training conditions.

Figure 4: Data (red) and predictions by the hybrid model (green) and classical (blue) approach for the within experimental domain dataset.

Figure 5: Data (red) and predictions by the hybrid model (green) and classical (blue) approach for the outside experimental domain dataset.

Figure 6: Predictive performance comparison of Classical and Hybrid modelling approaches measured by RMSE (root mean square error). Lower RMSE values indicate better predictive accuracy.

As our case study shows, hybrid models not only fit within the training domain but also demonstrate far greater robustness when extrapolating to new conditions – an essential feature for real-world drug manufacturing. This clearly shows the superiority of such modeling over commonly used classical approaches.

Historically, creating and fitting hybrid models required specialized knowledge in mathematics and data science, which limited their accessibility across many companies. However, this landscape is changing with the advent of user-friendly applications like TwinLab, shown in Figure 7. We created this application to allow users without deep technical expertise to easily explore different process scenarios and predict outcomes. Such tools make advanced hybrid modeling practical and actionable, supporting scientists and engineers in bioprocess development by integrating mechanistic knowledge with data-driven insights through intuitive interfaces.

Figure 7: Simulation application enabling scenario analysis by varying key process parameters including agitation rate, temperature, dilution rate, substrate concentration in the feed, and initial seed concentration. These inputs are processed by a hybrid model that predicts the time courses of viable cell density (VCD), product formation, and substrate consumption.

Even with a hybrid model, prediction accuracy is not perfect, and requiring 21 samples for training may present a barrier – especially for small biotech companies that cannot readily generate this volume of high-quality data. Several approaches can help address this limitation.

One approach is to apply clever design of experiments. On the one hand, covering the process parameter space remains very important, while optimal designs for such mechanistic models help to ensure enough data support is provided where the information is the most relevant. On the other hand, mechanistically informed Bayesian optimization could be a first step to quickly cover the experimental domain, while caution remains with respect to the proper coverage of the domain.

Another solution is the use of Intensified Design of Experiments (iDoE), which introduces deliberate shifts in process parameters within a single experiment (e.g. bioreactor run). This strategy effectively condenses multiple conventional DoE combinations into a smaller number of experimental runs, thereby maximizing the information gained per experiment.

Finally, a more ambitious but highly promising strategy involves leveraging pre-trained digital twins. In this scenario, the hybrid model would first be pre-trained on thousands of experimental datasets aggregated from various sources, similar to how large language models like ChatGPT are developed. Users could then use and improve the model using their own limited training data, continually improving its performance for everyone involved – ultimately benefiting both users and patients. To protect proprietary data, the platform would deploy federated learning, a privacy-preserving approach that enables users to improve the digital twin collaboratively. This method ensures that individual experimental data remains confidential and is never directly shared between users, while still contributing to an ever-improving collective model.

We will continue to explore the potential of the digital twin and statistical methodologies to support better predictions, including the use of iDoE and Bayesian optimal designs.

Note: Results were generated with the assistance of BioWin ASBL and the financial support from the Region, in accordance with the provisions of the Grant Agreement (Convention 8881 ATMP Thérapie cellulaire).

*Sources continued below

About the author:

Thomas de Marchin is Associate Director of Statistics and Data Science at Cencora, where he applies advanced statistical methodologies and machine learning algorithms to optimize drug discovery and manufacturing efficiency. With deep expertise in regulatory compliance – particularly FDA and GMP standards – he bridges the gap between complex analytical approaches and practical pharmaceutical applications.

Disclaimer:
The information provided in this article does not constitute legal advice. Cencora, Inc., strongly encourages readers to review available information related to the topics discussed and to rely on their own experience and expertise in making decisions related thereto.

Neem contact op met ons team

Ons ervaren team van waarde-experts creëert op basis van bewijsmateriaal, beleidsinzichten en marktinformatie effectieve strategieën voor toegang tot wereldwijde markten. We maken u wegwijs in de complexe wereld van de gezondheidszorg in verschillende delen van de wereld. Neem contact op om te ontdekken hoe we uw doelen kunnen ondersteunen.

Contact opnemen

Sources:

^1. Digital Twins: From Personalised Medicine to Precision Public Health, J Pers Med., July 2021. https://pubmed.ncbi.nlm.nih.gov/34442389/
^2. Perfusion Bioreactors Industry Research Report 2025, Research and Markets. https://www.globenewswire.com/news-release/2025/10/09/3164326/28124/en/Perfusion-Bioreactors-Industry-Research-Report-2025-Biopharmaceutical-Growth-Drives-Demand-Amid-Cell-Culture-Advancements-Global-Forecast-to-2032.html
^3.Predictive models for upstream mammalian cell culture development - A review, Digital Chemical Engineering, March 2024. https://www.sciencedirect.com/science/article/pii/S2772508123000558
^4. Hybrid semi-parametric modeling in process systems engineering: Past, present and future, Computers & Chemical Engineering, Jan 2014. https://www.sciencedirect.com/science/article/pii/S0098135413002639

Gerelateerde bronnen

Artikel

As eCTD 4.0 adoption inches forward, a new ICH guideline puts it to the test

Meer informatie

Artikel

New paths to product value: Strategies to optimize the mature product portfolio

Meer informatie

Artikel

Preparing for the electronic patient insert in Europe

Meer informatie