Synthetic Data Exercise

Author

Asmith Joseph

Published

February 11, 2025

Introduction of the TBpregnancy Synthetic Dataset

The TB pregnancy dataset is a synthetic dataset designed to study the pharmacokinetic (PK) properties of antiretroviral (ARV) and anti-tuberculosis (TB) drugs in pregnant and postpartum women. The dataset was generated to model real-world drug metabolism patterns and assess how pregnancy impacts drug clearance, exposure (AUC), and birth outcomes. The data simulates clinical trial observations and incorporates known physiological changes during pregnancy that affect drug absorption, distribution, metabolism, and elimination.

Additionally, the dataset examines how pregnancy affects drug metabolism through demographic, pharmacokinetic, and clinical variables. It includes age, BMI, trimester (1st, 2nd, 3rd, postpartum), and drug type (ARV or anti-TB), along with Cmax, AUC, clearance (CL), half-life (T½), and protein binding. Clinical outcomes include CD4 count (for ARV users), viral load, ALT levels, and birth outcomes (normal vs. adverse). Key research questions explore pregnancy’s effect on AUC and clearance, differences in ARV vs. anti-TB metabolism, factors influencing adverse birth outcomes, and inter-individual variability in drug metabolism.

Generating Synthetic Data of pharmacokinetic (PK) data for Antiretroviral (ARV) and Anti-Tuberculosis (TB) drugs during pregnancy and postpartum

# Load required libraries
library(ggplot2)
Warning: package 'ggplot2' was built under R version 4.4.3
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
# Set seed for reproducibility
set.seed(123)

# Number of participants
n <- 300  

# Generate participant IDs
participant_id <- 1:n  

# Pregnancy periods (1st, 2nd, 3rd trimester, Postpartum)
trimester <- sample(c("1st Trimester", "2nd Trimester", "3rd Trimester", "Postpartum"), 
                    size = n, replace = TRUE, prob = c(0.25, 0.25, 0.25, 0.25))

# Drug Type (ARV or Anti-TB)
drug_type <- sample(c("ARV", "Anti-TB"), size = n, replace = TRUE)

# Generate demographic data
age <- sample(18:45, size = n, replace = TRUE)  # Age between 18-45 years
bmi <- ifelse(trimester == "Postpartum", 
              rnorm(n, mean = 26, sd = 3), 
              rnorm(n, mean = 30, sd = 3))  # Higher BMI in pregnancy

# Gestational Age (weeks) for pregnancy cases
gestational_age <- ifelse(trimester == "Postpartum", NA, sample(12:40, size = n, replace = TRUE))

# Dose (mg), adjusted for pregnancy
dose <- ifelse(trimester == "Postpartum", 
               rnorm(n, mean = 600, sd = 50), 
               rnorm(n, mean = 650, sd = 50))  # Slightly increased dose in pregnancy

# Generate pharmacokinetic data
cmax <- ifelse(trimester == "Postpartum", 
               rnorm(n, mean = 5, sd = 1), 
               rnorm(n, mean = 4, sd = 1))  # Lower in pregnancy

auc <- ifelse(trimester == "Postpartum", 
              rnorm(n, mean = 45, sd = 5), 
              rnorm(n, mean = 35, sd = 5))  # Reduced drug exposure in pregnancy

half_life <- ifelse(trimester == "Postpartum", 
                    rnorm(n, mean = 8, sd = 1), 
                    rnorm(n, mean = 7, sd = 1))  # Shorter in pregnancy

clearance <- ifelse(trimester == "Postpartum", 
                    rnorm(n, mean = 10, sd = 2), 
                    rnorm(n, mean = 12, sd = 2))  # Higher clearance in pregnancy

# Protein Binding (%)
protein_binding <- rnorm(n, mean = 85, sd = 5)  # Between 80-90%

# CD4 Count (HIV patients, ARV only)
cd4_count <- ifelse(drug_type == "ARV", 
                    ifelse(trimester == "Postpartum", rnorm(n, mean = 500, sd = 100), rnorm(n, mean = 400, sd = 100)), 
                    NA)  # Only for ARV users

# Viral Load (log10 copies/mL) - inverse of CD4 count
viral_load <- ifelse(drug_type == "ARV", 
                     5 - (cd4_count / 200), 
                     NA)  # Only for ARV users

# Liver Enzyme Levels (ALT, IU/L) - indicator of liver function
alt_levels <- rnorm(n, mean = 25, sd = 10)  # Between 15-35 IU/L

# Birth Outcome (0 = Normal, 1 = Adverse) influenced by drug exposure
prob_adverse_outcome <- 1 / (1 + exp(-(-1 + 0.05 * clearance - 0.1 * auc + 0.01 * alt_levels)))
birth_outcome <- rbinom(n, size = 1, prob = prob_adverse_outcome)

# Create the dataset
TBpregnancy <- data.frame(ParticipantID = participant_id, 
                 Age = age,
                 BMI = bmi,
                 GestationalAge = gestational_age,
                 Trimester = trimester, 
                 DrugType = drug_type, 
                 Dose_mg = dose,
                 Cmax = cmax, 
                 AUC = auc, 
                 HalfLife = half_life, 
                 Clearance = clearance, 
                 ProteinBinding = protein_binding,
                 CD4Count = cd4_count,
                 ViralLoad = viral_load,
                 ALT_Levels = alt_levels,
                 BirthOutcome = birth_outcome)

# View first few rows
head(TBpregnancy)
  ParticipantID Age      BMI GestationalAge     Trimester DrugType  Dose_mg
1             1  45 30.39880             18 3rd Trimester  Anti-TB 690.5561
2             2  33 28.22766             25 1st Trimester  Anti-TB 731.4292
3             3  26 29.43845             21 3rd Trimester      ARV 641.5331
4             4  28 35.93455             35 1st Trimester  Anti-TB 692.9263
5             5  28 25.94956             27 1st Trimester      ARV 627.9202
6             6  36 29.20644             36 2nd Trimester      ARV 704.2929
      Cmax      AUC HalfLife Clearance ProteinBinding CD4Count ViralLoad
1 5.051530 31.87629 5.439194  9.920776       78.50281       NA        NA
2 3.911380 26.52678 6.122411 10.812195       75.52868       NA        NA
3 5.561617 41.72005 5.674557  8.264870       84.76765 397.1953  3.014024
4 4.966893 28.51780 6.877079 13.380295       84.80144       NA        NA
5 3.949942 30.39506 6.690450 13.424845       82.75101 373.9001  3.130500
6 2.427745 37.21709 7.535453 13.231151       84.61910 499.8509  2.500746
  ALT_Levels BirthOutcome
1   14.62036            0
2   21.40848            0
3   14.65765            0
4   26.06639            0
5   41.08755            0
6   24.63504            0

Exploratory of the Dataset

summary(TBpregnancy)
 ParticipantID         Age             BMI        GestationalAge 
 Min.   :  1.00   Min.   :18.00   Min.   :19.37   Min.   :12.00  
 1st Qu.: 75.75   1st Qu.:24.00   1st Qu.:26.53   1st Qu.:19.25  
 Median :150.50   Median :30.00   Median :28.89   Median :27.50  
 Mean   :150.50   Mean   :31.04   Mean   :28.92   Mean   :26.89  
 3rd Qu.:225.25   3rd Qu.:39.00   3rd Qu.:31.31   3rd Qu.:35.00  
 Max.   :300.00   Max.   :45.00   Max.   :37.23   Max.   :40.00  
                                                  NA's   :74     
  Trimester           DrugType            Dose_mg           Cmax      
 Length:300         Length:300         Min.   :507.8   Min.   :1.175  
 Class :character   Class :character   1st Qu.:610.3   1st Qu.:3.596  
 Mode  :character   Mode  :character   Median :639.9   Median :4.223  
                                       Mean   :639.2   Mean   :4.235  
                                       3rd Qu.:669.6   3rd Qu.:4.967  
                                       Max.   :763.4   Max.   :7.852  
                                                                      
      AUC           HalfLife        Clearance      ProteinBinding 
 Min.   :24.02   Min.   : 3.866   Min.   : 5.126   Min.   :66.32  
 1st Qu.:33.41   1st Qu.: 6.421   1st Qu.: 9.881   1st Qu.:81.81  
 Median :37.16   Median : 7.137   Median :11.280   Median :85.17  
 Mean   :37.80   Mean   : 7.186   Mean   :11.377   Mean   :84.78  
 3rd Qu.:42.52   3rd Qu.: 7.984   3rd Qu.:12.951   3rd Qu.:87.74  
 Max.   :54.96   Max.   :10.334   Max.   :17.451   Max.   :99.56  
                                                                  
    CD4Count        ViralLoad       ALT_Levels      BirthOutcome 
 Min.   : 92.31   Min.   :1.227   Min.   :-3.729   Min.   :0.00  
 1st Qu.:364.42   1st Qu.:2.488   1st Qu.:18.851   1st Qu.:0.00  
 Median :429.95   Median :2.850   Median :25.494   Median :0.00  
 Mean   :429.62   Mean   :2.852   Mean   :25.157   Mean   :0.03  
 3rd Qu.:502.43   3rd Qu.:3.178   3rd Qu.:32.009   3rd Qu.:0.00  
 Max.   :754.55   Max.   :4.538   Max.   :48.426   Max.   :1.00  
 NA's   :150      NA's   :150                                    
str(TBpregnancy)
'data.frame':   300 obs. of  16 variables:
 $ ParticipantID : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Age           : int  45 33 26 28 28 36 27 40 19 42 ...
 $ BMI           : num  30.4 28.2 29.4 35.9 25.9 ...
 $ GestationalAge: int  18 25 21 35 27 36 NA 33 NA 22 ...
 $ Trimester     : chr  "3rd Trimester" "1st Trimester" "3rd Trimester" "1st Trimester" ...
 $ DrugType      : chr  "Anti-TB" "Anti-TB" "ARV" "Anti-TB" ...
 $ Dose_mg       : num  691 731 642 693 628 ...
 $ Cmax          : num  5.05 3.91 5.56 4.97 3.95 ...
 $ AUC           : num  31.9 26.5 41.7 28.5 30.4 ...
 $ HalfLife      : num  5.44 6.12 5.67 6.88 6.69 ...
 $ Clearance     : num  9.92 10.81 8.26 13.38 13.42 ...
 $ ProteinBinding: num  78.5 75.5 84.8 84.8 82.8 ...
 $ CD4Count      : num  NA NA 397 NA 374 ...
 $ ViralLoad     : num  NA NA 3.01 NA 3.13 ...
 $ ALT_Levels    : num  14.6 21.4 14.7 26.1 41.1 ...
 $ BirthOutcome  : int  0 0 0 0 0 0 0 0 0 0 ...

Visualizations of the dataset

# Boxplot of AUC (Drug Exposure) across pregnancy stages
ggplot(TBpregnancy, aes(x = Trimester, y = AUC, fill = Trimester)) +
  geom_boxplot() +
  theme_minimal() +
  ggtitle("Drug Exposure (AUC) Across Pregnancy and Postpartum")

# Scatter plot: Clearance vs. AUC
ggplot(TBpregnancy, aes(x = Clearance, y = AUC, color = Trimester)) +
  geom_point() +
  theme_minimal() +
  ggtitle("Clearance vs. Drug Exposure (AUC)")

# Birth Outcome Distribution
ggplot(TBpregnancy, aes(x = factor(BirthOutcome), fill = factor(BirthOutcome))) +
  geom_bar() +
  theme_minimal() +
  labs(x = "Birth Outcome (0 = Normal, 1 = Adverse)", y = "Count", title = "Distribution of Birth Outcomes")

Fitting Regression Models

I want to explore this question: Does Pregnancy Affect Drug Exposure (AUC)?

lm_model <- lm(AUC ~ Trimester + DrugType + BMI, data = TBpregnancy)
summary(lm_model)  # Expect AUC to be lower in pregnancy

Call:
lm(formula = AUC ~ Trimester + DrugType + BMI, data = TBpregnancy)

Residuals:
     Min       1Q   Median       3Q      Max 
-11.2311  -2.9273  -0.4594   3.0240  10.0642 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)            36.78303    2.59611  14.169   <2e-16 ***
Trimester2nd Trimester  1.39998    0.74434   1.881    0.061 .  
Trimester3rd Trimester  0.34391    0.72871   0.472    0.637    
TrimesterPostpartum    10.84955    0.80261  13.518   <2e-16 ***
DrugTypeARV            -0.81245    0.52043  -1.561    0.120    
BMI                    -0.05828    0.08543  -0.682    0.496    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.465 on 294 degrees of freedom
Multiple R-squared:  0.5152,    Adjusted R-squared:  0.507 
F-statistic: 62.49 on 5 and 294 DF,  p-value: < 2.2e-16

Interpretation The linear regression model examines the effect of trimester, drug type, and BMI on AUC (drug exposure). - The intercept (36.78, p < 0.001) represents the estimated AUC in the first trimester for anti-TB drugs, assuming BMI is zero. - Postpartum significantly increases AUC by 10.85 units (p < 0.001), indicating higher drug exposure after pregnancy. - The second trimester shows a marginally significant increase (1.40, p = 0.061), while the third trimester has no meaningful effect (p = 0.637). - ARV drugs tend to reduce AUC (-0.81), but this effect is not significant (p = 0.120). BMI has a negligible effect on AUC (p = 0.496).

The model explains 51.5% of the variance in AUC (R² = 0.515), and the overall model is highly significant (p < 0.001), confirming that trimester is the primary driver of AUC changes.

Fitting Logistic Regression to Predict Birth Outcomes

glm_model <- glm(BirthOutcome ~ AUC + Clearance + ALT_Levels + Trimester, data = TBpregnancy, family = binomial)
summary(glm_model)  # Expect higher clearance & ALT levels to increase risk of adverse birth outcomes

Call:
glm(formula = BirthOutcome ~ AUC + Clearance + ALT_Levels + Trimester, 
    family = binomial, data = TBpregnancy)

Coefficients:
                         Estimate Std. Error z value Pr(>|z|)  
(Intercept)               3.82120    3.78056   1.011   0.3121  
AUC                      -0.20101    0.08760  -2.295   0.0218 *
Clearance                 0.09208    0.17360   0.530   0.5958  
ALT_Levels               -0.03880    0.03740  -1.037   0.2995  
Trimester2nd Trimester   -0.20570    0.80006  -0.257   0.7971  
Trimester3rd Trimester   -1.91442    1.13286  -1.690   0.0910 .
TrimesterPostpartum     -15.43338 1959.80257  -0.008   0.9937  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 80.845  on 299  degrees of freedom
Residual deviance: 63.949  on 293  degrees of freedom
AIC: 77.949

Number of Fisher Scoring iterations: 19

Interpretattion: This logistic regression model assesses the relationship between AUC (drug exposure), Clearance, ALT levels, and Trimester on the probability of adverse birth outcomes. - The intercept (3.82, p = 0.312) suggests the baseline log-odds of an adverse birth outcome when all predictors are at their reference levels. - Higher AUC significantly reduces the risk of adverse birth outcomes (-0.201, p = 0.0218), indicating that increased drug exposure is protective. - Clearance (0.092, p = 0.5958) and ALT levels (-0.0388, p = 0.2995) are not significant predictors, suggesting that drug metabolism and liver function have limited impact in this model. - The third trimester shows a marginally lower risk (-1.91, p = 0.091), while postpartum has an extreme coefficient (-15.43), likely due to data imbalance.

The model improves prediction compared to the null model (lower residual deviance: 63.95 vs. null deviance: 80.85), and the overall fit is reasonable (AIC = 77.95), but additional covariates or interaction terms may refine predictions.

Linear Regression: Predict Drug Exposure (AUC)

# Linear Model: AUC as the dependent variable
lm_auc <- lm(AUC ~ Trimester + BMI + DrugType + Clearance + ProteinBinding, data = TBpregnancy)
summary(lm_auc)  # Expect Trimester and Clearance to be significant

Call:
lm(formula = AUC ~ Trimester + BMI + DrugType + Clearance + ProteinBinding, 
    data = TBpregnancy)

Residuals:
     Min       1Q   Median       3Q      Max 
-10.8389  -2.9398  -0.3856   3.1062  10.0087 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)            44.67683    5.55159   8.048 2.14e-14 ***
Trimester2nd Trimester  1.40815    0.74509   1.890   0.0598 .  
Trimester3rd Trimester  0.32169    0.72716   0.442   0.6585    
TrimesterPostpartum    10.46530    0.85853  12.190  < 2e-16 ***
BMI                    -0.06334    0.08530  -0.743   0.4583    
DrugTypeARV            -0.84857    0.52081  -1.629   0.1043    
Clearance              -0.16896    0.12772  -1.323   0.1869    
ProteinBinding         -0.06733    0.05426  -1.241   0.2157    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.454 on 292 degrees of freedom
Multiple R-squared:  0.5207,    Adjusted R-squared:  0.5092 
F-statistic: 45.32 on 7 and 292 DF,  p-value: < 2.2e-16

Interpretation This linear regression model examines the effect of Trimester, BMI, Drug Type, Clearance, and Protein Binding on AUC (drug exposure). - The intercept (44.68, p < 0.001) represents the baseline AUC when all categorical variables are at their reference levels. - Postpartum is significantly associated with a 10.47-unit increase in AUC (p < 0.001), confirming that drug exposure is higher after pregnancy. - The second trimester shows a marginally significant increase (1.41, p = 0.0598), while the third trimester has no significant effect (p = 0.6585). - Drug type (ARV vs. Anti-TB) slightly reduces AUC (-0.85, p = 0.1043), but the effect is not statistically significant. BMI (-0.063, p = 0.4583), Clearance (-0.168, p = 0.1869), and Protein Binding (-0.067, p = 0.2157) do not significantly influence AUC.

The model explains 52.1% of the variance in AUC (R² = 0.5207), indicating a moderate fit, and the overall model is highly significant (p < 0.001), confirming that trimester remains the strongest predictor of AUC changes.

2 Logistic Regression to Predict Adverse Birth Outcome

# Logistic Regression: Predicting BirthOutcome
glm_birth <- glm(BirthOutcome ~ AUC + Clearance + ALT_Levels + Trimester + DrugType, 
                 data = TBpregnancy, family = binomial)
summary(glm_birth)  # Expect Clearance & ALT levels to increase risk

Call:
glm(formula = BirthOutcome ~ AUC + Clearance + ALT_Levels + Trimester + 
    DrugType, family = binomial, data = TBpregnancy)

Coefficients:
                         Estimate Std. Error z value Pr(>|z|)  
(Intercept)               4.29376    4.04765   1.061   0.2888  
AUC                      -0.20583    0.08914  -2.309   0.0209 *
Clearance                 0.07990    0.17648   0.453   0.6507  
ALT_Levels               -0.04112    0.03841  -1.071   0.2844  
Trimester2nd Trimester   -0.20060    0.79934  -0.251   0.8018  
Trimester3rd Trimester   -1.88516    1.13819  -1.656   0.0977 .
TrimesterPostpartum     -15.38573 1957.06453  -0.008   0.9937  
DrugTypeARV              -0.26425    0.75192  -0.351   0.7253  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 80.845  on 299  degrees of freedom
Residual deviance: 63.824  on 292  degrees of freedom
AIC: 79.824

Number of Fisher Scoring iterations: 19

Interpretation This logistic regression model evaluates the impact of AUC, Clearance, ALT Levels, Trimester, and Drug Type on the probability of adverse birth outcomes. - The intercept (4.29, p = 0.2888) represents the baseline log-odds of an adverse birth outcome when all predictors are at their reference levels. - AUC is the only significant predictor (-0.2058, p = 0.0209), indicating that higher drug exposure significantly reduces the risk of adverse birth outcomes. - Clearance (0.0799, p = 0.6507) and ALT levels (-0.0411, p = 0.2844) do not significantly influence birth outcomes. - Trimester has no significant impact, although the third trimester shows a marginally protective effect (-1.88, p = 0.0977). - The postpartum coefficient (-15.39) is extreme and non-significant (p = 0.9937), likely due to data imbalance. - Drug type (ARV vs. Anti-TB) does not significantly influence birth outcomes (-0.2642, p = 0.7253).

The model improves prediction compared to the null model (residual deviance: 63.82 vs. null deviance: 80.85), and AUC remains the strongest predictor of birth outcomes. However, the AIC (79.82) suggests that additional predictors may improve the model’s fit.

Random Forest of Non-Linear Model for Birth Outcomes

# Load required libraries
library(randomForest)
Warning: package 'randomForest' was built under R version 4.4.2
randomForest 4.7-1.2
Type rfNews() to see new features/changes/bug fixes.

Attaching package: 'randomForest'
The following object is masked from 'package:dplyr':

    combine
The following object is masked from 'package:ggplot2':

    margin
library(ggplot2)
library(dplyr)

# Convert categorical variables (ensuring classification mode)
TBpregnancy$Trimester <- as.factor(TBpregnancy$Trimester)
TBpregnancy$DrugType <- as.factor(TBpregnancy$DrugType)
TBpregnancy$BirthOutcome <- as.factor(TBpregnancy$BirthOutcome)  # Ensuring classification mode

# Train a Random Forest classification model
set.seed(123)
rf_model <- randomForest(BirthOutcome ~ AUC + Clearance + ALT_Levels + BMI + Age + ProteinBinding, 
                         data = TBpregnancy, ntree = 500, importance = TRUE)

# Extract variable importance (correctly using "MeanDecreaseGini" for classification models)
var_imp <- as.data.frame(importance(rf_model))  # Extract importance values
var_imp$Variable <- rownames(var_imp)  # Add variable names
var_imp <- var_imp %>% arrange(desc(MeanDecreaseGini))  # Sort by importance

# Create a color-coded variable importance plot
ggplot(var_imp, aes(x = reorder(Variable, MeanDecreaseGini), y = MeanDecreaseGini, fill = MeanDecreaseGini)) +
  geom_bar(stat = "identity", width = 0.7, color = "black") +  # Adds border for clarity
  coord_flip() +
  scale_fill_gradient(low = "#00BFC4", high = "#F8766D") +  # Blue to red gradient (low to high importance)
  theme_minimal() +
  labs(title = "Variable Importance in Predicting Birth Outcomes",
       x = "Predictor Variables", y = "Mean Decrease in Gini",
       fill = "Importance") +
  theme(text = element_text(size = 14),
        plot.title = element_text(hjust = 0.5, face = "bold"),
        legend.position = "right")

Simulate Repeated Measures

# Expand the dataset: Each participant has 3 time points
set.seed(123)
TBpregnancy_long <- TBpregnancy %>%
  slice(rep(1:n(), each = 3)) %>%  # Repeat each row 3 times
  mutate(Timepoint = rep(1:3, times = n))  # Assign time points

# Introduce slight variations in AUC and Clearance per timepoint
TBpregnancy_long <- TBpregnancy_long %>%
  mutate(AUC = AUC + rnorm(n() , mean = 0, sd = 2),  # Small variations
         Clearance = Clearance + rnorm(n(), mean = 0, sd = 0.5))

# Check the structure
head(TBpregnancy_long)
  ParticipantID Age      BMI GestationalAge     Trimester DrugType  Dose_mg
1             1  45 30.39880             18 3rd Trimester  Anti-TB 690.5561
2             1  45 30.39880             18 3rd Trimester  Anti-TB 690.5561
3             1  45 30.39880             18 3rd Trimester  Anti-TB 690.5561
4             2  33 28.22766             25 1st Trimester  Anti-TB 731.4292
5             2  33 28.22766             25 1st Trimester  Anti-TB 731.4292
6             2  33 28.22766             25 1st Trimester  Anti-TB 731.4292
     Cmax      AUC HalfLife Clearance ProteinBinding CD4Count ViralLoad
1 5.05153 30.75534 5.439194  9.413719       78.50281       NA        NA
2 5.05153 31.41594 5.439194  9.525120       78.50281       NA        NA
3 5.05153 34.99371 5.439194 10.070573       78.50281       NA        NA
4 3.91138 26.66780 6.122411 11.631721       75.52868       NA        NA
5 3.91138 26.78536 6.122411 11.354504       75.52868       NA        NA
6 3.91138 29.95691 6.122411 10.499911       75.52868       NA        NA
  ALT_Levels BirthOutcome Timepoint
1   14.62036            0         1
2   14.62036            0         2
3   14.62036            0         3
4   21.40848            0         1
5   21.40848            0         2
6   21.40848            0         3

Mixed-Effects Model: Accounting for Individual Differences

library(lme4)
Warning: package 'lme4' was built under R version 4.4.3
Loading required package: Matrix
Warning: package 'Matrix' was built under R version 4.4.3
# Fit a mixed model for AUC with random intercept per Participant
mixed_model <- lmer(AUC ~ Trimester + Clearance + (1 | ParticipantID), data = TBpregnancy_long)
summary(mixed_model)
Linear mixed model fit by REML ['lmerMod']
Formula: AUC ~ Trimester + Clearance + (1 | ParticipantID)
   Data: TBpregnancy_long

REML criterion at convergence: 4610

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.60844 -0.57569 -0.00148  0.54177  2.90555 

Random effects:
 Groups        Name        Variance Std.Dev.
 ParticipantID (Intercept) 19.501   4.416   
 Residual                   3.923   1.981   
Number of obs: 900, groups:  ParticipantID, 300

Fixed effects:
                       Estimate Std. Error t value
(Intercept)            35.39729    1.33250  26.564
Trimester2nd Trimester  1.49325    0.76096   1.962
Trimester3rd Trimester  0.18794    0.73955   0.254
TrimesterPostpartum    10.80113    0.79607  13.568
Clearance              -0.05564    0.10071  -0.553

Correlation of Fixed Effects:
            (Intr) Trms2T Trms3T TrmstP
Trmstr2ndTr -0.326                     
Trmstr3rdTr -0.310  0.521              
TrmstrPstpr -0.556  0.496  0.502       
Clearance   -0.914  0.040  0.013  0.306

Interpretation of the last model This linear mixed model (LMM) examines the effect of Trimester and Clearance on AUC (drug exposure), while accounting for individual variability through random effects for ParticipantID. - The random intercept variance (19.501, SD = 4.416) indicates substantial differences in AUC between individuals, whereas residual variance (3.923, SD = 1.981) reflects within-individual fluctuations. - The postpartum period is significantly associated with a 10.80-unit increase in AUC (p < 0.001), confirming that drug exposure rises after pregnancy. - The second trimester shows a marginally significant increase in AUC (+1.49, p ≈ 0.05), while the third trimester has no meaningful effect. Clearance (-0.056, p = 0.55) does not significantly influence AUC, suggesting that other metabolic factors may play a stronger role. - The large variation across participants highlights the need for individualized drug monitoring during pregnancy and postpartum.

Key Results & Findings

  • Pregnancy significantly reduces drug exposure (AUC), with postpartum showing a strong rebound (+10.8, p < 0.001), confirming higher drug levels after delivery.
  • The second trimester shows a marginal increase in AUC (p ≈ 0.05), while the third trimester has no meaningful effect.
  • Higher AUC is protective against adverse birth outcomes (-0.2058, p = 0.0218), while drug clearance does not show a significant effect.
  • A linear mixed model (LMM) revealed substantial inter-individual variability in drug metabolism (ParticipantID variance = 19.501), highlighting the need for personalized drug monitoring during pregnancy.