An R package for clustering and analyzing T-cell receptor (TCR) sequences to identify ‘TCR families’ via sequence similarity. This package uses tcrdist3 for TCR distance calculations and provides flexible clustering algorithms to identify groups of functionally related TCRs. Similar in concept to GLIPH and CoNGA, but with more direct control over clustering parameters.
Full documentation: https://bimberlabinternal.github.io/tcrClustR/
The fastest way to cluster TCR data:
library(tcrClustR)
# Step 1: compute TCR distance matrices (stored in seuratObj@misc$TCR_Distances)
seuratObj <- CalculateTcrDistances(
inputData = seuratObj,
chains = c("TRA", "TRB"),
minimumCloneSize = 2,
calculateChainPairs = TRUE
)
# Step 2: cluster TCRs via DIANA and store results in metadata
seuratObj <- RunTcrClustering(
seuratObj_TCR = seuratObj,
dianaHeight = 20,
clusterSizeThreshold = 1
)
# Cluster assignments live in metadata columns like TRB_fl_ClusterIdx
DimPlot(seuratObj, reduction = "umap", group.by = "TRB_fl_ClusterIdx", label = TRUE)
# Retrieve a raw distance matrix
distance_mat <- GetDistanceMatrix(seuratObj, chains = "TRA")T-cell receptor (TCR) clustering groups TCR sequences based on similarity metrics, enabling identification of functionally related TCR families from single-cell sequencing data. TCRs with similar sequences (CDR3 regions and V/J gene segments) may recognize the same or related antigens.
dianaHeight selectiondianaHeight
# Install devtools if needed
if (!require("devtools")) install.packages("devtools")
# Install tcrClustR
devtools::install_github("bimberlabinternal/tcrClustR")tcrClustR requires Python 3.8+ with tcrdist3 and related packages. The package includes tools to simplify setup.
Use the built-in helper function to validate and install Python dependencies:
library(tcrClustR)
#check and install Python dependencies automatically
SetupPythonEnvironment()
#or just validate without installing
SetupPythonEnvironment(installMissing = FALSE)
#use specific Python executable
SetupPythonEnvironment(pythonExecutable = "/path/to/python3")This function: - Validates Python installation (requires 3.8+) - Checks for required modules (tcrdist3, pandas, numpy, rpy2) - Installs missing packages from requirements.txt
If you prefer manual installation:
# install individual packages
pip install pandas numpy scikit-learn rpy2
pip install git+https://github.com/bimberlabinternal/tcrdist3.git@0.3
#optional: install from requirements.txt in this repo
pip install -r requirements.txtSet the Python path in R if needed:
Sys.setenv(RETICULATE_PYTHON = "/path/to/python3")If you encounter Python-related errors:
SetupPythonEnvironment(verbose = TRUE)
python3 -c 'import tcrdist; print(tcrdist.__version__)'
reticulate::py_config()
Common error messages and solutions:
# Error: "Missing required Python modules: tcrdist"
# Solution: Run SetupPythonEnvironment() to install
# Error: "No valid Python executable found"
# Solution: Install Python 3.8+ or specify path:
SetupPythonEnvironment(pythonExecutable = "/usr/bin/python3")For exploratory analysis with RMarkdown:
# Export example workflow template
GetExampleMarkdown(dest = 'tcrClustR_workflow.Rmd')
# Or view built-in vignettes
browseVignettes("tcrClustR")DirichletClusterAnalysis() fits a non-parametric Gaussian Dirichlet process mixture to within-group pairwise TCR distances. The discovered cluster means (mu) and spreads (sigma) reveal the natural modes of the distance distribution, which you can use to select an informed dianaHeight cutoff for RunTcrClustering().
# Fit DP mixture models per group
dp <- DirichletClusterAnalysis(
seuratObj = seuratObj,
assayName = "TRA_fl",
splitField = "Population",
maxSamples = 1000,
nIterations = 500
)
# Two diagnostic plots (combine with patchwork)
library(patchwork)
(PlotClusterMeans(dp) + Seurat::NoLegend()) +
PlotMixingProportions(dp) +
plot_layout(guides = "collect")
# Inspect the cluster parameter table
glimpse(dp$cluster_summary)
#Rows: 11
#Columns: 6
#$ Cluster <int> 1, 2, 3, 4, 5, 6, 1, 2,…
#$ Mu <dbl> 135.98787492, 31.812791…
#$ Sigma <dbl> 4.9843801, 3.4540618, 1…
#$ MixingProportion <dbl> 0.79400000, 0.17100000,…
#$ PointsPerCluster <int> 794, 171, 31, 1, 2, 1, …
#$ Group <chr> "MR1-5-OP-RU-Tet", "MR1…TCR distance distributions often have small, rare modes in the tails that would be missed by uniform random sampling. DirichletClusterAnalysis() uses quantile-stratified (n-tile) downsampling by default: distances are divided into nBins equal-frequency bins and up to samplesPerBin values are drawn from each, preserving the full distributional shape. Because the Dirichlet Proces fitting can be computationally taxing, maxSamples downsamples the quantile-stratified population if nBins * samplesPerBin > maxSamples.
Increase nBins and samplesPerBin (and maxSamples) to improve resolution of rare modes:
dp <- DirichletClusterAnalysis(
seuratObj = seuratObj,
assayName = "TRA_fl",
splitField = "Population",
maxSamples = 1000,
nIterations = 500,
nBins = 20,
samplesPerBin = 150
)DirichletClusterAnalysis() returns a tcrDirichletResult list containing:
| Field | Description |
|---|---|
cluster_summary |
Tidy data.frame: one row per group × cluster with Mu, Sigma, MixingProportion, PointsPerCluster, Group
|
models |
Named list of raw dirichletprocess model objects for downstream inspection |
assayName |
The distance assay that was analysed |
splitField |
The metadata column used for grouping |
PlotClusterMeans(dp) — Error-bar plot of mu ± sigma per cluster, dodged by group. The y-axis corresponds directly to TCR distance and can be compared against dianaHeight.PlotMixingProportions(dp) — Dodged bar chart of cluster mixing weights. Clusters with high weight and low mu identify well-supported clonotype families.If you need the raw per-group distance vectors without fitting a DP model (e.g., for your own downstream analysis):
vecs <- ExtractGroupDistanceVectors(
seuratObj = seuratObj,
assayName = "TRB_cdr3",
splitField = "Tissue"
)
# Returns a named list of numeric vectors, one per group
hist(vecs[["Spleen"]], breaks = 50)RETICULATE_PYTHON environment variable if tcrdist3 failsJoinLayers() before accessing assay data in v5 objectsTRBV7-9*01 → TRBV7-9)
# Update documentation
devtools::document()
# Build vignettes
devtools::build_vignettes()
# Check package
devtools::check()devtools::check() (0 errors/warnings/notes)This project is licensed under the GPL (>= 3) License — see LICENSE.md for details.