API

Module contents

A package to integrate unpaired multi-omics single-cell data via single-cell Multi-omics Regularized Disentangled Representations.

Submodules

scMRDR.data module

class scMRDR.data.CombinedDataset(X, b, m, i, w)

Bases: Dataset

Dataset for combined data.

Parameters:
  • X – (n, d) feature matrix

  • b – (n, ) covariates like batches

  • m – (n, ) one-hot encoded modality index

  • i – (n, ) index to indicate which masked-feature group the sample belongs to

  • w – (n, ) one-hot encoded cell type index

scMRDR.loss module

class scMRDR.loss.ZINBLoss

Bases: Module

Zero-Inflated Negative Binomial Loss This loss function is used for modeling count data with excess zeros. It combines a zero-inflated component with a negative binomial distribution.

Parameters:
  • x – observed count data (batch_size, num_features)

  • rho – mean parameter of the negative binomial distribution (batch_size, num_features)

  • dispersion – dispersion parameter of the negative binomial distribution (batch_size, num_features)

  • pi – zero-inflation probability (batch_size, num_features)

  • s – scaling factor (batch_size, num_features)

  • mask – optional mask to ignore certain elements in the loss computation (batch_size, num_features)

  • eps – small value to avoid log(0) (default: 1e-8)

Returns:

mean_loss – mean loss value across the batch

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x, rho, dispersion, pi, s, mask=None, eps=1e-08)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

scMRDR.loss.isometric_loss(X, X_prime, m, p=2)

Compute Isometric Loss while preserving the structure within each class separately.

Parameters:
  • X – Feature matrix in the original space (batch_size, feature_dim)

  • X_prime – Feature matrix in the latent space (batch_size, latent_dim)

  • m – One-hot encoded class labels (batch_size, num_classes)

  • p – Norm type for distance computation (default: Euclidean distance, p=2)

Returns:

loss – Isometric Loss (Mean Squared Error between pairwise distances within each class)

scMRDR.loss.klLoss(mu, logvar)

Compute KL divergence between q(z|x) ~ N(mu, exp(logvar)) and p(z) ~ N(0, 1).

Parameters:
  • mu – Mean of q(z|x) (batch_size, latent_dim)

  • logvar – Log variance of q(z|x) (batch_size, latent_dim)

Returns:

- KL divergence for each sample in the batch (scalar)

scMRDR.loss.klLoss_prior(mu_q, logvar_q, mu_p, logvar_p)

Compute KL(q || p) for two Gaussians q(z|x) ~ N(mu_p, exp(logvar_p)) and p(z) ~ N(mu_q, exp(logvar_q))

Parameters:
  • mu_q – mean of q

  • logvar_q – log variance of q

  • mu_p – mean of p

  • logvar_p – log variance of p

Returns:

kl – KL divergence

scMRDR.loss.mseLoss(x, y, mask=None)

Mean Squared Error Loss

Parameters:
  • x – predicted values (batch_size, num_features)

  • y – target values (batch_size, num_features)

  • mask – optional mask to ignore certain elements in the loss computation (batch_size, num_features)

Returns:

mean_loss – mean squared error loss across the batch

scMRDR.model module

class scMRDR.model.Decoder(device, input_dim=3000, covariate_dim=1, modality_num=2, layer_dims=[500, 100], latent_dim=20, dropout_rate=0.5)

Bases: Module

ZINB Decoder for the VAE model.

Parameters:
  • device (torch.device) – Device to run the model on.

  • input_dim (int) – Dimension of the input data.

  • covariate_dim (int) – Dimension of the batch size.

  • modality_num (int) – Number of modalities.

  • layer_dims (list) – List of hidden layer dimensions.

  • latent_dim (int) – Dimension of the latent space.

  • dropout_rate (float) – Dropout rate for regularization.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(z, b, m, dispersion_strategy='gene-modality')

Forward pass through the decoder.

Parameters:
  • z (torch.Tensor) – Latent variable tensor of shape (batch_size, latent_dim).

  • b (torch.Tensor) – Batch information tensor of shape (batch_size, covariate_dim).

  • m (torch.Tensor) – Modality information tensor of shape (batch_size, modality_num).

Returns:
  • rho (torch.Tensor) – Mean of the output distribution.

  • dispersion (torch.Tensor) – Dispersion parameter of the output distribution.

  • pi (torch.Tensor) – Dropout probabilities for the output distribution.

class scMRDR.model.EmbeddingNet(device, input_dim, modality_num, covariate_dim=1, celltype_num=0, layer_dims=[500, 100], latent_dim_shared=20, latent_dim_specific=20, dropout_rate=0.5, beta=2, gamma=1, lambda_adv=0.01, feat_mask=None, distribution='ZINB', encoder_covariates=False, eps=1e-10)

Bases: Module

Models to get the unified latent embeddings.

Parameters:
  • device (torch.device) – Device to run the model on.

  • input_dim (int) – Dimension of the input data.

  • modality_num (int) – Number of modalities.

  • covariate_dim (int) – Dimension of the covariates (like sequencing batches).

  • celltype_num (int) – Dimension of the cell type information. Default is 0.

  • layer_dims (list) – List of hidden layer dimensions.

  • latent_dim_shared (int) – Dimension of the shared latent space.

  • latent_dim_specific (int) – Dimension of the modality-specific latent space.

  • dropout_rate (float) – Dropout rate for regularization.

  • beta (float) – Weight for the KL divergence term.

  • gamma (float) – Weight for the isometric loss term.

  • lambda_adv (float) – Weight for the adversarial loss term.

  • feat_mask (torch.Tensor) – Feature mask for the input data.

  • distribution (str) – Distribution of the data, can be “ZINB”, “NB”, “Normal”, “Normal_positive”.

  • encoder_covariates (bool) – Whether to include covariates in the encoder.

  • eps (float) – Small value to avoid division by zero in loss calculations.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x, b, m, i, w, stage='vae')

Forward pass through the embedding network.

Parameters:
  • x (torch.Tensor) – Input tensor of shape (batch_size, input_dim).

  • b (torch.Tensor) – Batch information tensor of shape (batch_size, covariate_dim).

  • m (torch.Tensor) – Modality information tensor of shape (batch_size, modality_num).

  • i (torch.Tensor) – Mask indicator tensor of shape (batch_size, input_dim).

  • w (torch.Tensor) – Cell type information tensor of shape (batch_size, celltype_num).

  • stage (str) – Stage of the model, can be “vae”, “discriminator”, or “warmup”.

Returns:
  • mu_shared (torch.Tensor) – Mean of the shared latent variable distribution.

  • mu_specific (torch.Tensor) – Mean of the specific latent variable distribution.

  • total_loss (torch.Tensor) – Total loss for the VAE model.

  • loss_dict (dict) – Dictionary containing individual loss components.

reparameterize(mu, logvar)

Reparameterization trick to sample from the latent variable distribution.

Parameters:
  • mu (torch.Tensor) – Mean of the latent variable distribution.

  • logvar (torch.Tensor) – Log variance of the latent variable distribution.

Returns:

z (torch.Tensor) – Sampled latent variable tensor.

sample_sequencing_depth(x, strategy='observed')

Sample sequencing depth based on the strategy.

Parameters:
  • x (torch.Tensor) – Input tensor of shape (batch_size, input_dim).

  • strategy (str) – Strategy for sampling sequencing depth, can be “batch_sample” or “observed”.

Returns:

s (torch.Tensor) – Sampled sequencing depth tensor of shape (batch_size, 1).

class scMRDR.model.Encoder(device, input_dim=3000, layer_dims=[500, 100], latent_dim=20, dropout_rate=0.5)

Bases: Module

Encoder for the VAE model.

Parameters:
  • device (torch.device) – Device to run the model on.

  • input_dim (int) – Dimension of the input data.

  • layer_dims (list) – List of hidden layer dimensions.

  • latent_dim (int) – Dimension of the latent space.

  • dropout_rate (float) – Dropout rate for regularization.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x)

Forward pass through the encoder.

Parameters:

x (torch.Tensor) – Input tensor of shape (batch_size, input_dim).

Returns:
  • z (torch.Tensor) – Latent variable tensor of shape (batch_size, latent_dim).

  • mu (torch.Tensor) – Mean of the latent variable distribution.

  • logvar (torch.Tensor) – Log variance of the latent variable distribution.

reparameterize(mu, logvar)

Reparameterization trick to sample from the latent variable distribution.

Parameters:
  • mu (torch.Tensor) – Mean of the latent variable distribution.

  • logvar (torch.Tensor) – Log variance of the latent variable distribution.

Returns:

z (torch.Tensor) – Sampled latent variable tensor.

class scMRDR.model.MSEDecoder(device, input_dim=3000, covariate_dim=1, layer_dims=[500, 100], latent_dim=20, dropout_rate=0.5, positive_outputs=True)

Bases: Module

MSE Decoder for the VAE model.

Parameters:
  • device (torch.device) – Device to run the model on.

  • input_dim (int) – Dimension of the input data.

  • covariate_dim (int) – Dimension of the batch size.

  • layer_dims (list) – List of hidden layer dimensions.

  • latent_dim (int) – Dimension of the latent space.

  • dropout_rate (float) – Dropout rate for regularization.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(z, b)

Forward pass through the decoder.

Parameters:
  • z (torch.Tensor) – Latent variable tensor of shape (batch_size, latent_dim).

  • b (torch.Tensor) – Batch information tensor of shape (batch_size, covariate_dim).

Returns:

rho (torch.Tensor) – Mean of the output distribution.

class scMRDR.model.ModalityDiscriminator(z_dim, num_modalities, layer_dims=[128, 128], dropout_rate=0.2)

Bases: Module

Discriminator for modality classification.

Parameters:
  • z_dim (int) – Dimension of the input latent space.

  • num_modalities (int) – Number of modalities to classify.

  • layer_dims (list) – List of hidden layer dimensions.

  • dropout_rate (float) – Dropout rate for regularization.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(z)

Forward pass through the discriminator.

Parameters:

z (torch.Tensor) – Input tensor of shape (batch_size, z_dim).

Returns:

torch.Tensor – Output tensor of shape (batch_size, num_modalities).

class scMRDR.model.NBDecoder(device, input_dim=3000, covariate_dim=1, modality_num=2, layer_dims=[500, 100], latent_dim=20, dropout_rate=0.5)

Bases: Module

NB Decoder for the VAE model.

Parameters:
  • device (torch.device) – Device to run the model on.

  • input_dim (int) – Dimension of the input data.

  • covariate_dim (int) – Dimension of the batch size.

  • modality_num (int) – Number of modalities.

  • layer_dims (list) – List of hidden layer dimensions.

  • latent_dim (int) – Dimension of the latent space.

  • dropout_rate (float) – Dropout rate for regularization.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(z, b, m, dispersion_strategy='gene-modality')

Forward pass through the decoder.

Parameters:
  • z (torch.Tensor) – Latent variable tensor of shape (batch_size, latent_dim).

  • b (torch.Tensor) – Batch information tensor of shape (batch_size, covariate_dim).

  • m (torch.Tensor) – Modality information tensor of shape (batch_size, modality_num).

Returns:
  • rho (torch.Tensor) – Mean of the output distribution.

  • dispersion (torch.Tensor) – Dispersion parameter of the output distribution.

  • pi (torch.Tensor) – Dropout probabilities for the output distribution.

scMRDR.module module

class scMRDR.module.Integration(data, layer=None, modality_key='modality', batch_key=None, celltype_key=None, distribution='ZINB', mask_key=None, feature_list=None)

Bases: object

Integration class.

Parameters:
  • data – AnnData object

  • layer – str, layer name in adata.layers containing the data to be integrated

  • modality_key – str, key in adata.obs for modality information

  • batch_key – str, key in adata.obs for batch information

  • distribution – str, distribution of the data, can be “ZINB”, “NB”, “Normal”, “Normal_positive”

  • feature_list – distionary, containing unmasked feature indices for each mask group (by default, modality). Default is None, indicating all features are unmasked.

  • mask_key – str, key in adata.obs to indicate mask information, corresponding to feature_list. Default is None, indicating modality_key will be used.

get_adata()

Get the AnnData object with latent embeddings.

Returns:

AnnData object with latent embeddings in obsm.

inference(n_samples=1, dataset=None, batch_size=None, update=True, returns=False)

Inference the model.

Parameters:
  • n_samples – int, number of samples to average in reparametrization trick

  • dataset – dataset to use for inference

  • batch_size – int, batch size

  • update – bool, whether to update the latent embeddings in the adata

  • returns – bool, whether to return the results, including latent shared, latent specific

predict(predict_modality, batch_size=None, strategy='observed', library_size=None, method='ot', k=10)

Predict the missing modality data.

Parameters:
  • predict_modality – str, modality to predict

  • batch_size – int, batch size

  • strategy – str, strategy to predict the missing modality. Options (default: “observed”): - “observed”: use the observed data from other modalities to predict the missing modality. - “latent”: use the latent embeddings to predict the missing modality.

  • library_size – array, library size for generation, default is None, indicating using the estimated library size from the model

  • method – str, method to use for prediction, can be “ot” or “knn”

  • k – int, number of neighbors for knn method

Returns:

x_pred – predicted data for the missing modality

setup(hidden_layers=[100, 50], latent_dim_shared=15, latent_dim_specific=15, dropout_rate=0.5, beta=2, gamma=1, lambda_adv=0.01, device=None)

Setup the model.

Parameters:
  • hidden_layers – list, hidden layers dimensions of the model

  • latent_dim_shared – int, latent dimension of the shared latent space

  • latent_dim_specific – int, latent dimension of the specific latent space

  • dropout_rate – float, dropout rate in neural network

  • beta – float, beta parameter for the beta distribution

  • gamma – float, gamma parameter for the gamma distribution

  • lambda_adv – float, lambda parameter for the adversarial loss

  • device – device to train the model. Default is None, indicating GPU will be used if available.

train(epoch_num=200, batch_size=64, lr=1e-05, accumulation_steps=1, adaptlr=False, valid_prop=0.1, num_warmup=0, early_stopping=True, patience=10, weighted=False, tensorboard=False, savepath='./', random_state=42)

Train the model.

Parameters:
  • epoch_num – int, number of epochs

  • batch_size – int, batch size

  • lr – float, learning rate

  • accumulation_steps – int, number of steps to accumulate gradients

  • adaptlr – bool, whether to adapt learning rate

  • valid_prop – float, proportion of data to use for validation

  • num_warmup – int, number of warmup epochs

  • early_stopping – bool, whether to use early stopping

  • patience – int, patience for early stopping

  • weighted – bool, whether to use weighted sampling based on modality sizes

  • tensorboard – bool, whether to use tensorboard

  • savepath – str, path to save the tensorboard logs

  • random_state – int, random seed

scMRDR.module.to_dense_array(x)

Convert input to a dense numpy array.

Parameters:

x – Input data, can be a sparse matrix, numpy array, or other types.

Returns:

Dense numpy array.

scMRDR.train module

class scMRDR.train.EarlyStopping(patience=10, delta=0.0, verbose=False)

Bases: object

Early stopping for training.

Parameters:
  • patience – int, patience for early stopping

  • delta – float, delta for early stopping

  • verbose – bool, whether to print early stopping information

scMRDR.train.inference_model(device, inference_dataset, model, batch_size)

Inference the model.

Parameters:
  • device – device to inference the model

  • inference_dataset – inference dataset

  • model – model to inference

  • batch_size – batch size

scMRDR.train.train_model(device, writer, train_dataset, validate_dataset, model, epoch_num, batch_size, num_batch, lr, accumulation_steps=1, num_warmup=0, adaptlr=False, early_stopping=True, patience=25, sample_weights=None)

Train the model.

Parameters:
  • device – device to train the model

  • writer – writer to write the training progress

  • train_dataset – train dataset

  • validate_dataset – validate dataset

  • model – model to train

  • epoch_num – number of epochs

  • batch_size – batch size

  • num_batch – number of batches

  • lr – learning rate

  • accumulation_steps – number of steps to accumulate gradients

  • num_warmup – number of warmup epochs

  • adaptlr – whether to adapt learning rate

  • early_stopping – whether to use early stopping

  • patience – patience for early stopping

  • sample_weights – sample weights for weighted sampling

scMRDR.train.validate_model(device, validate_dataset, model, batch_size)

Validate the model.

Parameters:
  • device – device to validate the model

  • validate_dataset – validate dataset

  • model – model to validate

  • batch_size – batch size