High-level entry point for exploring the natural clustering structure of pairwise TCR distances. For each level of splitField, the function extracts within-group distances from assayName, downsamples to maxSamples via decile-stratified sampling, fits a Gaussian Dirichlet process mixture, and collects the per-cluster parameters (mean distance, spread, mixing weight) into a tidy data frame.

The resulting cluster means tell you where the natural modes of the distance distribution sit, which directly informs the dianaHeight parameter in RunTcrClustering.

DirichletClusterAnalysis(
  seuratObj,
  assayName,
  splitField,
  maxSamples = 100,
  nIterations = 1000,
  minClonesPerGroup = 2,
  nBins = 10,
  samplesPerBin = NULL,
  seed = 42,
  verbose = TRUE
)

Arguments

seuratObj

A Seurat object produced by CalculateTcrDistances (must have @misc$TCR_Distances).

assayName

Character. Distance assay to analyse (e.g., "TRA_fl", "TRB_cdr3", "TRA_TRB_fl").

splitField

Character. Metadata column whose levels define groups (e.g., "cDNA_ID", "Tissue").

maxSamples

Integer. Maximum pairwise distances to sample per group before fitting the DirichletProcess. Controls compute cost. Default 100.

nIterations

Integer. MCMC iterations for each DirichletProcess fit. Default 1000.

minClonesPerGroup

Integer. Groups with fewer clones are skipped. Default 2.

nBins

Integer. Number of equal-frequency quantile bins used during decile-stratified downsampling. Increase to preserve rare modes at the tails of the distance distribution. Default 10.

samplesPerBin

Integer. Samples drawn from each bin. Defaults to max(1, floor(maxSamples / nBins)), which evenly distributes the budget across bins. Override to draw more observations from each stratum.

seed

Integer. RNG seed for reproducible downsampling. Default 42.

verbose

Logical. Print progress messages. Default TRUE.

Value

A list of class "tcrDirichletResult" containing:

cluster_summary

A tidy data.frame with one row per group-cluster combination. Columns: Cluster, Mu, Sigma, MixingProportion, PointsPerCluster, Group.

models

Named list of raw dirichletprocess model objects.

assayName

The assay used.

splitField

The metadata split field used.

Examples

if (FALSE) { # \dontrun{
# after CalculateTcrDistances() + RunTcrClustering():
dp <- DirichletClusterAnalysis(seuratObj, "TRA_fl", "metadata_variable")

# diagnostic plots
PlotClusterMeans(dp)
PlotMixingProportions(dp)

# use cluster means to pick a dianaHeight:
dp$cluster_summary

# additionally, you can use quantile sampling to sample rare modes
dp <- DirichletClusterAnalysis(seuratObj, "TRA_fl", "metadata_variable", 
                               nBins = 10, samplesPerBin = 100)

} # }