DirichletClusterAnalysis.RdHigh-level entry point for exploring the natural clustering
structure of pairwise TCR distances. For each level of splitField,
the function extracts within-group distances from assayName,
downsamples to maxSamples via decile-stratified sampling, fits a
Gaussian Dirichlet process mixture, and collects the per-cluster parameters
(mean distance, spread, mixing weight) into a tidy data frame.
The resulting cluster means tell you where the natural modes of the distance
distribution sit, which directly informs the dianaHeight parameter in
RunTcrClustering.
DirichletClusterAnalysis(
seuratObj,
assayName,
splitField,
maxSamples = 100,
nIterations = 1000,
minClonesPerGroup = 2,
nBins = 10,
samplesPerBin = NULL,
seed = 42,
verbose = TRUE
)A Seurat object produced by CalculateTcrDistances
(must have @misc$TCR_Distances).
Character. Distance assay to analyse (e.g., "TRA_fl",
"TRB_cdr3", "TRA_TRB_fl").
Character. Metadata column whose levels define groups
(e.g., "cDNA_ID", "Tissue").
Integer. Maximum pairwise distances to sample per group
before fitting the DirichletProcess. Controls compute cost. Default 100.
Integer. MCMC iterations for each DirichletProcess fit. Default
1000.
Integer. Groups with fewer clones are skipped.
Default 2.
Integer. Number of equal-frequency quantile bins used during
decile-stratified downsampling. Increase to preserve rare modes at the
tails of the distance distribution. Default 10.
Integer. Samples drawn from each bin. Defaults to
max(1, floor(maxSamples / nBins)), which evenly distributes the
budget across bins. Override to draw more observations from each stratum.
Integer. RNG seed for reproducible downsampling. Default
42.
Logical. Print progress messages. Default TRUE.
A list of class "tcrDirichletResult" containing:
A tidy data.frame with one row per
group-cluster combination. Columns: Cluster, Mu,
Sigma, MixingProportion, PointsPerCluster,
Group.
Named list of raw dirichletprocess model objects.
The assay used.
The metadata split field used.
if (FALSE) { # \dontrun{
# after CalculateTcrDistances() + RunTcrClustering():
dp <- DirichletClusterAnalysis(seuratObj, "TRA_fl", "metadata_variable")
# diagnostic plots
PlotClusterMeans(dp)
PlotMixingProportions(dp)
# use cluster means to pick a dianaHeight:
dp$cluster_summary
# additionally, you can use quantile sampling to sample rare modes
dp <- DirichletClusterAnalysis(seuratObj, "TRA_fl", "metadata_variable",
nBins = 10, samplesPerBin = 100)
} # }