API

Module contents

A package to integrate unpaired multi-omics single-cell data via single-cell Multi-omics Regularized Disentangled Representations.

Submodules

scMRDR.data module

class scMRDR.data.CombinedDataset(X, b, m, i, w)

Bases: Dataset

Dataset for combined data.

Parameters:

X – (n, d) feature matrix
b – (n, ) covariates like batches
m – (n, ) one-hot encoded modality index
i – (n, ) index to indicate which masked-feature group the sample belongs to
w – (n, ) one-hot encoded cell type index

scMRDR.loss module

class scMRDR.loss.ZINBLoss

Bases: Module

Zero-Inflated Negative Binomial Loss This loss function is used for modeling count data with excess zeros. It combines a zero-inflated component with a negative binomial distribution.

Parameters:

x – observed count data (batch_size, num_features)
rho – mean parameter of the negative binomial distribution (batch_size, num_features)
dispersion – dispersion parameter of the negative binomial distribution (batch_size, num_features)
pi – zero-inflation probability (batch_size, num_features)
s – scaling factor (batch_size, num_features)
mask – optional mask to ignore certain elements in the loss computation (batch_size, num_features)
eps – small value to avoid log(0) (default: 1e-8)

Returns:

mean_loss – mean loss value across the batch

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x, rho, dispersion, pi, s, mask=None, eps=1e-08)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

scMRDR.loss.isometric_loss(X, X_prime, m, p=2)

Compute Isometric Loss while preserving the structure within each class separately.

Parameters:

X – Feature matrix in the original space (batch_size, feature_dim)
X_prime – Feature matrix in the latent space (batch_size, latent_dim)
m – One-hot encoded class labels (batch_size, num_classes)
p – Norm type for distance computation (default: Euclidean distance, p=2)

Returns:

loss – Isometric Loss (Mean Squared Error between pairwise distances within each class)

scMRDR.loss.klLoss(mu, logvar)

Compute KL divergence between q(z|x) ~ N(mu, exp(logvar)) and p(z) ~ N(0, 1).

Parameters:

mu – Mean of q(z|x) (batch_size, latent_dim)
logvar – Log variance of q(z|x) (batch_size, latent_dim)

Returns:

- KL divergence for each sample in the batch (scalar)

scMRDR.loss.klLoss_prior(mu_q, logvar_q, mu_p, logvar_p)

Compute KL(q || p) for two Gaussians q(z|x) ~ N(mu_p, exp(logvar_p)) and p(z) ~ N(mu_q, exp(logvar_q))

Parameters:

mu_q – mean of q
logvar_q – log variance of q
mu_p – mean of p
logvar_p – log variance of p

Returns:

kl – KL divergence

scMRDR.loss.mseLoss(x, y, mask=None)

Mean Squared Error Loss

Parameters:

x – predicted values (batch_size, num_features)
y – target values (batch_size, num_features)
mask – optional mask to ignore certain elements in the loss computation (batch_size, num_features)

Returns:

mean_loss – mean squared error loss across the batch

scMRDR.model module

class scMRDR.model.Decoder(device, input_dim=3000, covariate_dim=1, modality_num=2, layer_dims=[500, 100], latent_dim=20, dropout_rate=0.5)

Bases: Module

ZINB Decoder for the VAE model.

Parameters:

device (torch.device) – Device to run the model on.
input_dim (int) – Dimension of the input data.
covariate_dim (int) – Dimension of the batch size.
modality_num (int) – Number of modalities.
layer_dims (list) – List of hidden layer dimensions.
latent_dim (int) – Dimension of the latent space.
dropout_rate (float) – Dropout rate for regularization.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(z, b, m, dispersion_strategy='gene-modality')

Forward pass through the decoder.

Parameters:

z (torch.Tensor) – Latent variable tensor of shape (batch_size, latent_dim).
b (torch.Tensor) – Batch information tensor of shape (batch_size, covariate_dim).
m (torch.Tensor) – Modality information tensor of shape (batch_size, modality_num).

Returns:

rho (torch.Tensor) – Mean of the output distribution.
dispersion (torch.Tensor) – Dispersion parameter of the output distribution.
pi (torch.Tensor) – Dropout probabilities for the output distribution.

class scMRDR.model.EmbeddingNet(device, input_dim, modality_num, covariate_dim=1, celltype_num=0, layer_dims=[500, 100], latent_dim_shared=20, latent_dim_specific=20, dropout_rate=0.5, beta=2, gamma=1, lambda_adv=0.01, feat_mask=None, distribution='ZINB', encoder_covariates=False, eps=1e-10)

Bases: Module

Models to get the unified latent embeddings.

Parameters:

device (torch.device) – Device to run the model on.
input_dim (int) – Dimension of the input data.
modality_num (int) – Number of modalities.
covariate_dim (int) – Dimension of the covariates (like sequencing batches).
celltype_num (int) – Dimension of the cell type information. Default is 0.
layer_dims (list) – List of hidden layer dimensions.
latent_dim_shared (int) – Dimension of the shared latent space.
latent_dim_specific (int) – Dimension of the modality-specific latent space.
dropout_rate (float) – Dropout rate for regularization.
beta (float) – Weight for the KL divergence term.
gamma (float) – Weight for the isometric loss term.
lambda_adv (float) – Weight for the adversarial loss term.
feat_mask (torch.Tensor) – Feature mask for the input data.
distribution (str) – Distribution of the data, can be “ZINB”, “NB”, “Normal”, “Normal_positive”.
encoder_covariates (bool) – Whether to include covariates in the encoder.
eps (float) – Small value to avoid division by zero in loss calculations.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x, b, m, i, w, stage='vae')

Forward pass through the embedding network.

Parameters:

x (torch.Tensor) – Input tensor of shape (batch_size, input_dim).
b (torch.Tensor) – Batch information tensor of shape (batch_size, covariate_dim).
m (torch.Tensor) – Modality information tensor of shape (batch_size, modality_num).
i (torch.Tensor) – Mask indicator tensor of shape (batch_size, input_dim).
w (torch.Tensor) – Cell type information tensor of shape (batch_size, celltype_num).
stage (str) – Stage of the model, can be “vae”, “discriminator”, or “warmup”.

Returns:

mu_shared (torch.Tensor) – Mean of the shared latent variable distribution.
mu_specific (torch.Tensor) – Mean of the specific latent variable distribution.
total_loss (torch.Tensor) – Total loss for the VAE model.
loss_dict (dict) – Dictionary containing individual loss components.

reparameterize(mu, logvar)

Reparameterization trick to sample from the latent variable distribution.

Parameters:

mu (torch.Tensor) – Mean of the latent variable distribution.
logvar (torch.Tensor) – Log variance of the latent variable distribution.

Returns:

z (torch.Tensor) – Sampled latent variable tensor.

sample_sequencing_depth(x, strategy='observed')

Sample sequencing depth based on the strategy.

Parameters:

x (torch.Tensor) – Input tensor of shape (batch_size, input_dim).
strategy (str) – Strategy for sampling sequencing depth, can be “batch_sample” or “observed”.

Returns:

s (torch.Tensor) – Sampled sequencing depth tensor of shape (batch_size, 1).

class scMRDR.model.Encoder(device, input_dim=3000, layer_dims=[500, 100], latent_dim=20, dropout_rate=0.5)

Bases: Module

Encoder for the VAE model.

Parameters:

device (torch.device) – Device to run the model on.
input_dim (int) – Dimension of the input data.
layer_dims (list) – List of hidden layer dimensions.
latent_dim (int) – Dimension of the latent space.
dropout_rate (float) – Dropout rate for regularization.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x)

Forward pass through the encoder.

Parameters:

x (torch.Tensor) – Input tensor of shape (batch_size, input_dim).

Returns:

z (torch.Tensor) – Latent variable tensor of shape (batch_size, latent_dim).
mu (torch.Tensor) – Mean of the latent variable distribution.
logvar (torch.Tensor) – Log variance of the latent variable distribution.

reparameterize(mu, logvar)

Reparameterization trick to sample from the latent variable distribution.

Parameters:

mu (torch.Tensor) – Mean of the latent variable distribution.
logvar (torch.Tensor) – Log variance of the latent variable distribution.

Returns:

z (torch.Tensor) – Sampled latent variable tensor.

class scMRDR.model.MSEDecoder(device, input_dim=3000, covariate_dim=1, layer_dims=[500, 100], latent_dim=20, dropout_rate=0.5, positive_outputs=True)

Bases: Module

MSE Decoder for the VAE model.

Parameters:

device (torch.device) – Device to run the model on.
input_dim (int) – Dimension of the input data.
covariate_dim (int) – Dimension of the batch size.
layer_dims (list) – List of hidden layer dimensions.
latent_dim (int) – Dimension of the latent space.
dropout_rate (float) – Dropout rate for regularization.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(z, b)

Forward pass through the decoder.

Parameters:

z (torch.Tensor) – Latent variable tensor of shape (batch_size, latent_dim).
b (torch.Tensor) – Batch information tensor of shape (batch_size, covariate_dim).

Returns:

rho (torch.Tensor) – Mean of the output distribution.

class scMRDR.model.ModalityDiscriminator(z_dim, num_modalities, layer_dims=[128, 128], dropout_rate=0.2)

Bases: Module

Discriminator for modality classification.

Parameters:

z_dim (int) – Dimension of the input latent space.
num_modalities (int) – Number of modalities to classify.
layer_dims (list) – List of hidden layer dimensions.
dropout_rate (float) – Dropout rate for regularization.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(z)

Forward pass through the discriminator.

Parameters:: z (torch.Tensor) – Input tensor of shape (batch_size, z_dim).
Returns:: torch.Tensor – Output tensor of shape (batch_size, num_modalities).

class scMRDR.model.NBDecoder(device, input_dim=3000, covariate_dim=1, modality_num=2, layer_dims=[500, 100], latent_dim=20, dropout_rate=0.5)

Bases: Module

NB Decoder for the VAE model.

Parameters:

device (torch.device) – Device to run the model on.
input_dim (int) – Dimension of the input data.
covariate_dim (int) – Dimension of the batch size.
modality_num (int) – Number of modalities.
layer_dims (list) – List of hidden layer dimensions.
latent_dim (int) – Dimension of the latent space.
dropout_rate (float) – Dropout rate for regularization.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(z, b, m, dispersion_strategy='gene-modality')

Forward pass through the decoder.

Parameters:

z (torch.Tensor) – Latent variable tensor of shape (batch_size, latent_dim).
b (torch.Tensor) – Batch information tensor of shape (batch_size, covariate_dim).
m (torch.Tensor) – Modality information tensor of shape (batch_size, modality_num).

Returns:

rho (torch.Tensor) – Mean of the output distribution.
dispersion (torch.Tensor) – Dispersion parameter of the output distribution.
pi (torch.Tensor) – Dropout probabilities for the output distribution.

scMRDR.module module

class scMRDR.module.Integration(data, layer=None, modality_key='modality', batch_key=None, celltype_key=None, distribution='ZINB', mask_key=None, feature_list=None)

Bases: object

Integration class.

Parameters:

data – AnnData object
layer – str, layer name in adata.layers containing the data to be integrated
modality_key – str, key in adata.obs for modality information
batch_key – str, key in adata.obs for batch information
distribution – str, distribution of the data, can be “ZINB”, “NB”, “Normal”, “Normal_positive”
feature_list – distionary, containing unmasked feature indices for each mask group (by default, modality). Default is None, indicating all features are unmasked.
mask_key – str, key in adata.obs to indicate mask information, corresponding to feature_list. Default is None, indicating modality_key will be used.

get_adata()

Get the AnnData object with latent embeddings.

Returns:: AnnData object with latent embeddings in obsm.

inference(n_samples=1, dataset=None, batch_size=None, update=True, returns=False)

Inference the model.

Parameters:

n_samples – int, number of samples to average in reparametrization trick
dataset – dataset to use for inference
batch_size – int, batch size
update – bool, whether to update the latent embeddings in the adata
returns – bool, whether to return the results, including latent shared, latent specific

predict(predict_modality, batch_size=None, strategy='observed', library_size=None, method='ot', k=10)

Predict the missing modality data.

Parameters:

predict_modality – str, modality to predict
batch_size – int, batch size
strategy – str, strategy to predict the missing modality. Options (default: “observed”): - “observed”: use the observed data from other modalities to predict the missing modality. - “latent”: use the latent embeddings to predict the missing modality.
library_size – array, library size for generation, default is None, indicating using the estimated library size from the model
method – str, method to use for prediction, can be “ot” or “knn”
k – int, number of neighbors for knn method

Returns:

x_pred – predicted data for the missing modality

setup(hidden_layers=[100, 50], latent_dim_shared=15, latent_dim_specific=15, dropout_rate=0.5, beta=2, gamma=1, lambda_adv=0.01, device=None)

Setup the model.

Parameters:

hidden_layers – list, hidden layers dimensions of the model
latent_dim_shared – int, latent dimension of the shared latent space
latent_dim_specific – int, latent dimension of the specific latent space
dropout_rate – float, dropout rate in neural network
beta – float, beta parameter for the beta distribution
gamma – float, gamma parameter for the gamma distribution
lambda_adv – float, lambda parameter for the adversarial loss
device – device to train the model. Default is None, indicating GPU will be used if available.

train(epoch_num=200, batch_size=64, lr=1e-05, accumulation_steps=1, adaptlr=False, valid_prop=0.1, num_warmup=0, early_stopping=True, patience=10, weighted=False, tensorboard=False, savepath='./', random_state=42)

Train the model.

Parameters:

epoch_num – int, number of epochs
batch_size – int, batch size
lr – float, learning rate
accumulation_steps – int, number of steps to accumulate gradients
adaptlr – bool, whether to adapt learning rate
valid_prop – float, proportion of data to use for validation
num_warmup – int, number of warmup epochs
early_stopping – bool, whether to use early stopping
patience – int, patience for early stopping
weighted – bool, whether to use weighted sampling based on modality sizes
tensorboard – bool, whether to use tensorboard
savepath – str, path to save the tensorboard logs
random_state – int, random seed

scMRDR.module.to_dense_array(x)

Convert input to a dense numpy array.

Parameters:: x – Input data, can be a sparse matrix, numpy array, or other types.
Returns:: Dense numpy array.

scMRDR.train module

class scMRDR.train.EarlyStopping(patience=10, delta=0.0, verbose=False)

Bases: object

Early stopping for training.

Parameters:

patience – int, patience for early stopping
delta – float, delta for early stopping
verbose – bool, whether to print early stopping information

scMRDR.train.inference_model(device, inference_dataset, model, batch_size)

Inference the model.

Parameters:

device – device to inference the model
inference_dataset – inference dataset
model – model to inference
batch_size – batch size

scMRDR.train.train_model(device, writer, train_dataset, validate_dataset, model, epoch_num, batch_size, num_batch, lr, accumulation_steps=1, num_warmup=0, adaptlr=False, early_stopping=True, patience=25, sample_weights=None)

Train the model.

Parameters:

device – device to train the model
writer – writer to write the training progress
train_dataset – train dataset
validate_dataset – validate dataset
model – model to train
epoch_num – number of epochs
batch_size – batch size
num_batch – number of batches
lr – learning rate
accumulation_steps – number of steps to accumulate gradients
num_warmup – number of warmup epochs
adaptlr – whether to adapt learning rate
early_stopping – whether to use early stopping
patience – patience for early stopping
sample_weights – sample weights for weighted sampling

scMRDR.train.validate_model(device, validate_dataset, model, batch_size)

Validate the model.

Parameters:

device – device to validate the model
validate_dataset – validate dataset
model – model to validate
batch_size – batch size