Human bone marrow of healthy human donors - Annotation#

In this notebook, we annotate cyTOF data of bone marrow samples from 8 healthy donors. Data were provided by Oetjen et al (JCl Insight, 2018). We employ the following steps:

  1. Read anndata formatted data

  2. Annotate based on clustering

  3. Compare to annotation provided by the authors (in publication).

import scanpy as sc
import anndata as ann
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib.pyplot as pl
from matplotlib import rcParams
from matplotlib import colors
import seaborn as sb
import datetime
import pytometry as pm


sc.logging.print_versions()
sc.settings.verbosity = 3
WARNING: If you miss a compact list, please try `print_header`!
The `sinfo` package has changed name and is now called `session_info` to become more discoverable and self-explanatory. The `sinfo` PyPI package will be kept around to avoid breaking old installs and you can downgrade to 0.3.2 if you want to use it without seeing this message. For the latest features and bug fixes, please install `session_info` instead. The usage and defaults also changed slightly, so please review the latest README at https://gitlab.com/joelostblom/session_info.
-----
anndata     0.7.6
scanpy      1.8.2
sinfo       0.3.4
-----
PIL                         8.4.0
anyio                       NA
asciitree                   NA
attr                        21.2.0
babel                       2.9.1
backcall                    0.2.0
beta_ufunc                  NA
binom_ufunc                 NA
brotli                      1.0.9
certifi                     2021.10.08
cffi                        1.15.0
charset_normalizer          2.0.7
cycler                      0.10.0
cython_runtime              NA
dateutil                    2.8.2
debugpy                     1.5.1
decorator                   5.1.0
defusedxml                  0.7.1
entrypoints                 0.3
fasteners                   0.17.3
flowio                      1.0.1
google                      NA
h5py                        2.10.0
idna                        3.3
igraph                      0.9.8
importlib_resources         NA
ipykernel                   6.5.0
ipython_genutils            0.2.0
ipywidgets                  7.6.5
jedi                        0.18.0
jinja2                      3.0.2
joblib                      1.1.0
json5                       NA
jsonschema                  4.2.1
jupyter_server              1.11.2
jupyterlab_server           2.8.2
kiwisolver                  1.3.2
leidenalg                   0.8.8
llvmlite                    0.37.0
louvain                     0.7.0
markupsafe                  2.0.1
matplotlib                  3.4.3
matplotlib_inline           NA
mpl_toolkits                NA
msgpack                     1.0.3
natsort                     8.0.0
nbclassic                   NA
nbformat                    5.1.3
nbinom_ufunc                NA
numba                       0.54.1
numcodecs                   0.9.1
numexpr                     2.7.3
numpy                       1.19.5
packaging                   21.2
pandas                      1.3.4
parso                       0.8.2
pexpect                     4.8.0
pickleshare                 0.7.5
pkg_resources               NA
prometheus_client           NA
prompt_toolkit              3.0.22
ptyprocess                  0.7.0
pvectorc                    NA
pydev_ipython               NA
pydevconsole                NA
pydevd                      2.6.0
pydevd_concurrency_analyser NA
pydevd_file_utils           NA
pydevd_plugins              NA
pydevd_tracing              NA
pygments                    2.10.0
pyparsing                   2.4.7
pyrsistent                  NA
pytometry                   0.0.1
pytz                        2021.3
readfcs                     0.1.5
requests                    2.26.0
scipy                       1.7.2
seaborn                     0.11.2
send2trash                  NA
six                         1.16.0
sklearn                     1.0.1
sniffio                     1.2.0
socks                       1.7.1
statsmodels                 0.13.0
storemagic                  NA
tables                      3.6.1
terminado                   0.12.1
texttable                   1.6.4
threadpoolctl               3.0.0
tornado                     6.1
traitlets                   5.1.1
typing_extensions           NA
urllib3                     1.26.7
wcwidth                     0.2.5
websocket                   1.2.1
yaml                        6.0
zarr                        2.11.3
zipp                        NA
zmq                         22.3.0
-----
IPython             7.29.0
jupyter_client      7.0.6
jupyter_core        4.9.1
jupyterlab          3.2.2
notebook            6.4.5
-----
Python 3.8.6 (default, Oct 26 2021, 09:26:31) [GCC 8.3.0]
Linux-4.18.0-305.12.1.el8_4.x86_64-x86_64-with-glibc2.28
288 logical CPU cores
-----
Session information updated at 2022-08-10 09:26
sc.settings.figdir = "./../figures/"

Add date.

now = datetime.datetime.now()
today = now.strftime("%Y%m%d")

Define a nice colour map for marker intensity.

colors2 = pl.cm.Reds(np.linspace(0, 1, 80))
colors3 = pl.cm.Greys_r(np.linspace(0.7, 0.8, 10))
colorsComb = np.vstack([colors3, colors2])
mymap = colors.LinearSegmentedColormap.from_list("my_colormap", colorsComb)
import os
data_path = "./../data/Oetjen_2018/"

Read data#

Read the anndata object from the previous notebook. Here, we stored the arcsinh-normalised and filtered events for all donors.

adata_all = sc.read(data_path + "anndata/" + "cytof_data_norm.h5ad")
adata_all
AnnData object with n_obs × n_vars = 4829382 × 34
    obs: 'sample', 'Time', 'Event-length', 'Center', 'Offset', 'Width', 'Residual', 'batch', 'DNA1', 'DNA2', 'VIABILITY'
    var: 'channel', 'marker', 'signal_type', 'AB'
    uns: 'meta', 'neighbors', 'pca', 'sample_colors', 'umap'
    obsm: 'X_pca', 'X_umap'
    varm: 'PCs'
    layers: 'compensated'
    obsp: 'connectivities', 'distances'

Check sample size, i.e. the number of events per donor.

adata_all.obs["sample"].value_counts()
B    940911
O    923137
H    827621
J    584961
T    503578
A    484678
C    370639
U    193857
Name: sample, dtype: int64

Exploratory data analysis#

In this section, we aim to get an overview on the data set. In particular, we check visually for batch effects (donor specific shifts in the data distribution). The PCA gives us an idea on the intrinsic dimension of the data.

sc.pl.pca_overview(adata_all, color="sample")
../_images/5bc43f8b5ebdb0d55b63d19c942ec95ec4646c25c71c167f59155d781421b6f5.png ../_images/892626b629e191df177b9797638800d7fa4786d92b45fedb3c47f7fc261a611d.png ../_images/a1c8491ecd0b85aa81331374307a163c80237214f4864007a461063754d671df.png

Visualise the data as UMAP (pre-computed in the previous notebook). The visual distribution of cells from different donors allows to assume that we have very little shifts due to batch effects.

rcParams["figure.figsize"] = (5, 5)
sc.pl.umap(adata_all, color="sample")
../_images/8ace503982cae49b78e6ea5d52a62cfe99d0b8b77c2be8de9c9477b888e0ff3f.png

Next, we color the UMAP by all markers, where we use the custom color scheme of grey (no signal or only background) and reds to indicate the marker intensity. This map gives us an intuition for the cell distribution.

sc.pl.umap(
    adata_all,
    gene_symbols="AB",
    color=[
        "CD45",
        "CD45RA",
        "CD45RO",
        "CD11a",
        "CD16",
        "CD2",
        "CD5",
        "CD3",
        "CD4",
        "CD8a",
        "CD25",
        "CD27",
        "CD28",
        "CD44",
        "CD49D",
        "CD57",
        "CD69",
        "CD7",
        "CD9",
        "FAS",
        "HLA-DR",
        "CD127",
        "OX40",
        "41BB",
        "CTLA4",
        "CD161",
        "CD183",
        "CD194",
        "CD195",
        "CD197",
        "LAG3",
        "ICOS",
        "PD1",
        "TIM3",
    ],
    ncols=3,
    color_map=mymap,
)
../_images/44b836f37cfcbea20c96f6ffda86b70e7c6d4543fac7d61e0f5cc6176e5d1a15.png

The exploratory analysis also shows that we have a variety of CD45 negative cells in the dataset. These cells are likely no immune cells. Before we continue with the cell type annotation, we filter out the CD45 negative cells.

Annotation#

For cell type annotation, we use several levels of granularity based on marker intensity. The measured panel contains mostly T cell markers. We therefore cannot annotate B cells or the myeloid lineage. In the lymphoid lineage, we can distinguish NK cells and T cells. The annotation focusses on T cell subtypes, following the annotation of the original publication. The different levels are:

  • Level 1: Immune cells and non-immune cells (CD45 marker)

  • Level 2: NK cells and T cells

  • Level 3: T cell subtypes (CD4, CD8, double positive and negative)

  • Level 4: CD4 and CD8 T cell subtypes based on marker CD197 intensity)

  • Level 5: naive, central and effector memory T cell subtypes

Level 1#

We start with a rough annotation into CD45+ and negative cells.

adata_all.obs["cell_type_lvl0"] = adata_all.X[:, adata_all.var["AB"] == "CD45"] > 0.5
adata_all.obs["cell_type_lvl0"] = adata_all.obs["cell_type_lvl0"].map(
    {True: "CD45+", False: "CD45-"}
)
sc.pl.umap(adata_all, color="cell_type_lvl0")
/opt/python/lib/python3.8/site-packages/anndata/_core/anndata.py:1220: FutureWarning: The `inplace` parameter in pandas.Categorical.reorder_categories is deprecated and will be removed in a future version. Reordering categories will always return a new Categorical object.
  c.reorder_categories(natsorted(c.categories), inplace=True)
... storing 'cell_type_lvl0' as categorical
../_images/eccd9b7f2e3e85b917540a655b2d273171ef33f5a6295a7bf43830fa68204b67.png

Examine the number of CD45 positive and negative cells.

adata_all.obs["cell_type_lvl0"].value_counts()
CD45+    4162929
CD45-     666453
Name: cell_type_lvl0, dtype: int64

Level 2#

We exclude the CD45- cells and continue with annotatin the CD45+ cells.

adata_cd45 = adata_all[adata_all.obs["cell_type_lvl0"] == "CD45+"].copy()

Recompute PCA representation, UMAP embedding and clustering for subsequent cell type annotation.

sc.pp.pca(adata_cd45)
sc.pp.neighbors(adata_cd45, n_neighbors=10, n_pcs=10)
sc.tl.umap(adata_cd45)
computing PCA
    with n_comps=33
    finished (0:00:15)
computing neighbors
    using data matrix X directly
    finished: added to `.uns['neighbors']`
    `.obsp['distances']`, distances for each pair of neighbors
    `.obsp['connectivities']`, weighted adjacency matrix (0:17:42)
computing UMAP
sc.tl.leiden(adata_cd45, resolution=1.0)

Save temporary result to file.

adata_cd45.write(data_path + "anndata/" + "cytof_data_tmp.h5ad")

Read in the temporary data file.

adata_cd45 = sc.read(data_path + "anndata/" + "cytof_data_tmp.h5ad")

Visualize data#

Similar to the exploratory data analysis shown above, we visualise the data using UMAP. In this step, we visualise the Leiden clustering of the data. Leiden clustering is a community detection based approach that tends to overcluster the data, but captures clusters of very different sizes. In this way, we can discover very rare cells types, even though abundant cell types are split into several subclusters.

rcParams["figure.figsize"] = (5, 5)
sc.pl.umap(adata_cd45, color="leiden")
../_images/fc86e70f0ec81a78f613f719817c811d5d2ffc97542beeec32467dac4140dba0.png
rcParams["figure.figsize"] = (5, 5)
sc.pl.umap(adata_cd45, color="leiden", legend_loc="on data")
../_images/afae880ef665d6e51fdd16044a96bc625397885fd26c773104cc0fbc5d9e9f0e.png

Let us inspect the marker intensity on a UMAP.

sc.pl.umap(
    adata_cd45,
    gene_symbols="AB",
    color=[
        "CD45",
        "CD45RA",
        "CD45RO",
        "CD11a",
        "CD16",
        "CD2",
        "CD5",
        "CD3",
        "CD4",
        "CD8a",
        "CD25",
        "CD27",
        "CD28",
        "CD44",
        "CD49D",
        "CD57",
        "CD69",
        "CD7",
        "CD9",
        "CD95-FAS",
        "HLA-DR",
        "CD127",
        "CD134-OX40",
        "CD137-41BB",
        "CD152-CTLA4",
        "CD161",
        "CD183",
        "CD194",
        "CD195",
        "CD197",
        "CD223-LAG3",
        "CD278-ICOS",
        "CD279-PD1",
        "CD366-TIM3",
    ],
    ncols=3,
    color_map=mymap,
    vmax="p99",
)
../_images/1c1043e696291b295bf68c6403d5a9c2ae3e9b6632ebc221313080ffa5046b4b.png

Let us inspect the mean marker intensity as a matrixplot. Clusters are organised based on hierarchical clustering.

sc.pl.matrixplot(
    adata_cd45,
    groupby="leiden",
    gene_symbols="AB",
    var_names=[
        "CD45",
        "CD45RA",
        "CD45RO",
        "CD11a",
        "CD16",
        "CD2",
        "CD5",
        "CD3",
        "CD4",
        "CD8a",
        "CD25",
        "CD27",
        "CD28",
        "CD44",
        "CD49D",
        "CD57",
        "CD69",
        "CD7",
        "CD9",
        "CD95-FAS",
        "HLA-DR",
        "CD127",
        "CD134-OX40",
        "CD137-41BB",
        "CD152-CTLA4",
        "CD161",
        "CD183",
        "CD194",
        "CD195",
        "CD197",
        "CD223-LAG3",
        "CD278-ICOS",
        "CD279-PD1",
        "CD366-TIM3",
    ],
    dendrogram=True,
    vmin=0,
    cmap=mymap,
)
WARNING: dendrogram data not found (using key=dendrogram_leiden). Running `sc.tl.dendrogram` with default parameters. For fine tuning it is recommended to run `sc.tl.dendrogram` independently.
    using data matrix X directly
Storing dendrogram info using `.uns['dendrogram_leiden']`
../_images/662d4ead429fee22c3021adb677264cee1515ad42a73254caffc0f031d26b520.png

Markers for NK cells:

  • CD16+

  • HLA-DR-

  • CD3-

  • CD44-

  • CD45RA+

Markers for neutrophils (in distinction to NK cells):

  • CD16+

  • CD44+

  • CD3-

Markers for T cells:

  • CD3+ (general marker for T cells)

  • CD4+

  • CD8a+

Annotate T cells and NK cells in a second round.

cluster2cell = {
    "0": "not annotated",
    "1": "T cell",  #
    "2": "T cell",  #
    "3": "not annotated",
    "4": "not annotated",
    "5": "T cell",  #
    "6": "T cell",  #
    "7": "NK cell",  #
    "8": "not annotated",
    "9": "T cell",  #
    "10": "T cell",  #
    "11": "NK cell",  #
    "12": "not annotated",
    "13": "T cell",  #
    "14": "T cell",  #
    "15": "not annotated",
    "16": "T cell",  #
    "17": "not annotated",
    "18": "T cell",  #
    "19": "T cell",  #
    "20": "T cell",  #
    "21": "T cell",  # Double positive CD4/CD8
    "22": "not annotated",
    "23": "not annotated",
    "24": "not annotated",
    "25": "T cell",  #
    "26": "T cell",  #
    "27": "T cell",  #
    "28": "NK cell",  # HLA-DR positive NK cell?
    "29": "T cell",  #
    "30": "not annotated",
    "31": "not annotated",
    "32": "not annotated",
    "33": "T cell",  #
    "34": "not annotated",
    "35": "T cell",  #
}
adata_cd45.obs["cell_type_lvl2"] = adata_cd45.obs["leiden"].map(cluster2cell).copy()
sc.pl.umap(adata_cd45, color="cell_type_lvl2")
/opt/python/lib/python3.8/site-packages/anndata/_core/anndata.py:1220: FutureWarning: The `inplace` parameter in pandas.Categorical.reorder_categories is deprecated and will be removed in a future version. Reordering categories will always return a new Categorical object.
  c.reorder_categories(natsorted(c.categories), inplace=True)
... storing 'cell_type_lvl2' as categorical
../_images/10c1ca8d0f42370d7c3c2fa25095d0495178f02c2b5ef6f63e3121d8fcb2cc7f.png
df = pd.crosstab(
    adata_cd45.obs["sample"], adata_cd45.obs["cell_type_lvl2"], normalize=0
)
df
cell_type_lvl2 NK cell T cell not annotated
sample
A 0.048344 0.624290 0.327366
B 0.092156 0.472933 0.434912
C 0.031945 0.403321 0.564733
H 0.084707 0.571623 0.343670
J 0.028097 0.510479 0.461425
O 0.069115 0.448752 0.482133
T 0.083106 0.431413 0.485481
U 0.037466 0.533421 0.429112
ax = sb.boxplot(data=df[["NK cell", "T cell"]], orient="v")
ax = sb.swarmplot(
    data=df[["NK cell", "T cell"]], orient="v", color=".25", size=10, alpha=0.8
)
ax.set_ylim([0, 1])
(0.0, 1.0)
../_images/7e7c8fd0edbc309104a3ee637b54a5b2f849e799b243b5e81ed86dcc8c59e10b.png
boxplot = df.boxplot(column=["NK cell", "T cell"])
../_images/a49b550ab475aaa647cc7ef0dad2bded9fb7da83d1a4ac1e2e47a3f30d5d59bb.png

Level 3#

Annotate T cell subtypes in a third round. Notes on markers:

  • Double positive T cells: express both CD4 and CD8a

  • Double negative T cells: express only CD3, but not CD4 or CD8a

  • Distinguish CD4+ T cells and CD8+ T cells

  • CCR7 is also known as CD197

cluster2cell = {
    "0": "not annotated",
    "1": "CD4+ T cell",  #
    "2": "CD4+ T cell",  #
    "3": "not annotated",
    "4": "not annotated",
    "5": "CD8+ T cell",  #
    "6": "CD8+ T cell",  #
    "7": "NK cell",  #
    "8": "not annotated",
    "9": "CD4+ T cell",  #
    "10": "CD8+ T cell",  #
    "11": "NK cell",  #
    "12": "not annotated",
    "13": "CD4+ T cell",  #
    "14": "CD8+ T cell",  #
    "15": "not annotated",
    "16": "CD4+ T cell",  #
    "17": "not annotated",
    "18": "CD4+ T cell",  #
    "19": "CD4+ T cell",  #
    "20": "CD8+ T cell",  #
    "21": "Double positive T cell",  # Double positive CD4/CD8
    "22": "not annotated",
    "23": "not annotated",
    "24": "not annotated",
    "25": "CD8+ T cell",  #
    "26": "Double negative T cell",  # very little CD8a
    "27": "Double negative T cell",  # has only CD3 marker
    "28": "NK cell",  # HLA-DR positive NK cell?
    "29": "CD8+ T cell",  #
    "30": "not annotated",
    "31": "not annotated",
    "32": "not annotated",  # special activated T cell or simply autofluorescence?
    "33": "CD8+ T cell",  #
    "34": "not annotated",
    "35": "CD4+ T cell",  #
}
adata_cd45.obs["cell_type_lvl3"] = adata_cd45.obs["leiden"].map(cluster2cell).copy()
sc.pl.umap(adata_cd45, color="cell_type_lvl3")
/opt/python/lib/python3.8/site-packages/anndata/_core/anndata.py:1220: FutureWarning: The `inplace` parameter in pandas.Categorical.reorder_categories is deprecated and will be removed in a future version. Reordering categories will always return a new Categorical object.
  c.reorder_categories(natsorted(c.categories), inplace=True)
... storing 'cell_type_lvl3' as categorical
../_images/4fcdcc6e7a9b2e9af39fe88f4c35ba4dc9d4b9e901f6f0f5662f420676e19405.png

Check proportions. Restrict to CD4 and CD8 T cells.

obs_tmp = adata_cd45.obs.loc[
    adata_cd45.obs["cell_type_lvl3"].isin(["CD4+ T cell", "CD8+ T cell"])
]
df = pd.crosstab(obs_tmp["sample"], obs_tmp["cell_type_lvl4"], normalize=0)
df
cell_type_lvl4 CCR7+ CD4+ T cell CCR7+ CD8+ T cell CCR7- CD4+ T cell CCR7- CD8+ T cell
sample
A 0.345225 0.052842 0.237266 0.364666
B 0.592302 0.140645 0.024481 0.242573
C 0.546661 0.055512 0.146351 0.251477
H 0.502886 0.136178 0.062442 0.298493
J 0.614723 0.263139 0.015905 0.106233
O 0.374888 0.153879 0.097987 0.373247
T 0.407082 0.245514 0.045433 0.301972
U 0.657344 0.156213 0.045153 0.141291
ax = sb.boxplot(
    data=df[
        [
            "CCR7+ CD4+ T cell",
            "CCR7- CD4+ T cell",
            "CCR7+ CD8+ T cell",
            "CCR7- CD8+ T cell",
        ]
    ],
    orient="v",
)
ax = sb.swarmplot(
    data=df[
        [
            "CCR7+ CD4+ T cell",
            "CCR7- CD4+ T cell",
            "CCR7+ CD8+ T cell",
            "CCR7- CD8+ T cell",
        ]
    ],
    orient="v",
    color=".25",
    size=10,
    alpha=0.8,
)
ax.set_xticklabels(ax.get_xticklabels(), rotation=90)
ax.set_ylim([0, 1])
(0.0, 1.0)
../_images/86550bd4152acf8a51e1bd9051c645ed83b6feaede1a0cfbcd551d5e6840839e.png

Level 4#

Annotate T cell subtypes in a fourth round. Notes on markers:

  • Distinguish CCR7+/- cells in CD4+ T cells and CD8+ T cells

  • CCR7 is also known as CD197

cluster2cell = {
    "0": "not annotated",
    "1": "CCR7+ CD4+ T cell",  #
    "2": "CCR7+ CD4+ T cell",  #
    "3": "not annotated",
    "4": "not annotated",
    "5": "CCR7- CD8+ T cell",  #
    "6": "CCR7+ CD8+ T cell",  #
    "7": "NK cell",  #
    "8": "not annotated",
    "9": "CCR7+ CD4+ T cell",  #
    "10": "CCR7- CD8+ T cell",  #
    "11": "NK cell",  #
    "12": "not annotated",
    "13": "CCR7- CD4+ T cell",  #
    "14": "CCR7- CD8+ T cell",  #
    "15": "not annotated",
    "16": "CCR7- CD4+ T cell",  #
    "17": "not annotated",
    "18": "CCR7+ CD4+ T cell",  #
    "19": "CCR7+ CD4+ T cell",  #
    "20": "CCR7- CD8+ T cell",  #
    "21": "Double positive T cell",  # Double positive CD4/CD8
    "22": "not annotated",
    "23": "not annotated",
    "24": "not annotated",
    "25": "CCR7+ CD8+ T cell",  #
    "26": "Double negative T cell",  # very little CD8a
    "27": "Double negative T cell",  # has only CD3 marker
    "28": "NK cell",  # HLA-DR positive NK cell?
    "29": "CCR7- CD8+ T cell",  #
    "30": "not annotated",
    "31": "not annotated",
    "32": "not annotated",  # special activated T cell or simply autofluorescence?
    "33": "CCR7- CD8+ T cell",  #
    "34": "not annotated",
    "35": "CCR7+ CD4+ T cell",  #
}
adata_cd45.obs["cell_type_lvl4"] = adata_cd45.obs["leiden"].map(cluster2cell).copy()
sc.pl.umap(adata_cd45, color="cell_type_lvl4")
/opt/python/lib/python3.8/site-packages/anndata/_core/anndata.py:1220: FutureWarning: The `inplace` parameter in pandas.Categorical.reorder_categories is deprecated and will be removed in a future version. Reordering categories will always return a new Categorical object.
  c.reorder_categories(natsorted(c.categories), inplace=True)
... storing 'cell_type_lvl4' as categorical
../_images/0a87a96b6dc201c42bf7edcf489a267c7a5c593ee46598a9196e005474c790b1.png

Check proportions. Restrict to CD4 and CD8 T cells.

obs_tmp = adata_cd45.obs.loc[
    adata_cd45.obs["cell_type_lvl3"].isin(["CD4+ T cell", "CD8+ T cell"])
]
df = pd.crosstab(obs_tmp["sample"], obs_tmp["cell_type_lvl4"], normalize=0)
df
cell_type_lvl4 CCR7+ CD4+ T cell CCR7+ CD8+ T cell CCR7- CD4+ T cell CCR7- CD8+ T cell
sample
A 0.345225 0.052842 0.237266 0.364666
B 0.592302 0.140645 0.024481 0.242573
C 0.546661 0.055512 0.146351 0.251477
H 0.502886 0.136178 0.062442 0.298493
J 0.614723 0.263139 0.015905 0.106233
O 0.374888 0.153879 0.097987 0.373247
T 0.407082 0.245514 0.045433 0.301972
U 0.657344 0.156213 0.045153 0.141291
ax = sb.boxplot(
    data=df[
        [
            "CCR7+ CD4+ T cell",
            "CCR7- CD4+ T cell",
            "CCR7+ CD8+ T cell",
            "CCR7- CD8+ T cell",
        ]
    ],
    orient="v",
)
ax = sb.swarmplot(
    data=df[
        [
            "CCR7+ CD4+ T cell",
            "CCR7- CD4+ T cell",
            "CCR7+ CD8+ T cell",
            "CCR7- CD8+ T cell",
        ]
    ],
    orient="v",
    color=".25",
    size=10,
    alpha=0.8,
)
ax.set_xticklabels(ax.get_xticklabels(), rotation=90)
ax.set_ylim([0, 1])
(0.0, 1.0)
../_images/86550bd4152acf8a51e1bd9051c645ed83b6feaede1a0cfbcd551d5e6840839e.png

Level 5#

Annotate T cell subtypes in a fifth round. Notes on markers:

  • Distinguish Naive, Central Memory (CM), Effector Memory (EM) and terminally differentiated effector memory T cells (TEMRA) in CD4+ T cells and CD8+ T cells with CD45RA:

    • CCR7+ CD45RA+ is a Naive T cell

    • CCR7+ CD45RA- is a CM T cell

    • CCR7- CD45RA+ is a TEMRA

    • CCR7- CD45RA+ in CD8+ T cells is an effector T cell (TE)

    • CCR7- CD45RA- is an EM T cell

    • CCR7- CD45RA- CD69+ is a Tissue-resident T cell (TRM)

  • CCR7 is also known as CD197

cluster2cell = {
    "0": "not annotated",
    "1": "CD4+ CM T cell",  #
    "2": "Naive CD4+ T cell",  #
    "3": "not annotated",
    "4": "not annotated",
    "5": "CD8+ EM T cell",  #
    "6": "Naive CD8+ T cell",  #
    "7": "NK cell",  #
    "8": "not annotated",
    "9": "CD4+ CM T cell",  #
    "10": "CD8+ TE T cell",  #
    "11": "NK cell",  #
    "12": "not annotated",
    "13": "CD4+ EM T cell",  # mixed with TRM CCR7- CD45RA- CD69+
    "14": "CD8+ EM T cell",  # somewhat different to cluster 5
    "15": "not annotated",
    "16": "CD4+ TEMRA",  #
    "17": "not annotated",
    "18": "CD4+ CM T cell",  #
    "19": "Naive CD4+ T cell",  #
    "20": "CD8+ TE T cell",  #
    "21": "Double positive T cell",  # Double positive CD4/CD8
    "22": "not annotated",
    "23": "not annotated",
    "24": "not annotated",
    "25": "Naive CD8+ T cell",  #
    "26": "Double negative T cell",  # very little CD8a
    "27": "Double negative T cell",  # has only CD3 marker
    "28": "NK cell",  # HLA-DR positive NK cell?
    "29": "CD8+ TRM T cell",  # CCR7- CD45RA- CD69+
    "30": "not annotated",
    "31": "not annotated",
    "32": "not annotated",  # special activated T cell or simply autofluorescence?
    "33": "CD8+ EM T cell",  #
    "34": "not annotated",
    "35": "CD4+ CM T cell",  #
}
adata_cd45.obs["cell_type_lvl5"] = pd.Categorical(
    adata_cd45.obs["leiden"].map(cluster2cell).copy()
)

Subcluster#

In addition to the initial clustering, we observe that cluster 13 is a mixture of CD4+ TRM cells and CD4+ EM T cells. Let us subcluster cluster 13 to resolve CD4+ TRMs.

sc.tl.leiden(adata_cd45, key_added="leiden_R", restrict_to=["leiden", ["13"]])
running Leiden clustering
    finished: found 52 clusters and added
    'leiden_R', the cluster labels (adata.obs, categorical) (0:00:46)

Cluster 5 is a mixture of naive CD8+ cells and CD8+ CM T cells. Subcluster cluster 5 to resolve CD8+ CMs.

sc.tl.leiden(adata_cd45, key_added="leiden_R", restrict_to=["leiden_R", ["5"]])
running Leiden clustering
    finished: found 67 clusters and added
    'leiden_R', the cluster labels (adata.obs, categorical) (0:07:12)
rcParams["figure.figsize"] = (10, 10)
sc.pl.umap(adata_cd45, color="leiden_R", legend_loc="on data")
../_images/9dfc33b6a344d477645ea1f8776cb2e371cdf26b8cd94fdd7019117c6c740f87.png

Let us visualise the mean marker intensity as matrixplot to examine the intensity levels in the subclustered data. Also, we want to see where the subclusters are grouped.

sc.pl.matrixplot(
    adata_cd45,
    groupby="leiden_R",
    gene_symbols="AB",
    var_names=[
        "CD45",
        "CD45RA",
        "CD45RO",
        "CD11a",
        "CD16",
        "CD2",
        "CD5",
        "CD3",
        "CD4",
        "CD8a",
        "CD197",
        "CD25",
        "CD27",
        "CD28",
        "CD44",
        "CD49D",
        "CD57",
        "CD69",
        "CD7",
        "CD9",
        "CD95-FAS",
        "HLA-DR",
        "CD127",
        "CD134-OX40",
        "CD137-41BB",
        "CD152-CTLA4",
        "CD161",
        "CD183",
        "CD194",
        "CD195",
        "CD223-LAG3",
        "CD278-ICOS",
        "CD279-PD1",
        "CD366-TIM3",
    ],
    dendrogram=True,
    vmin=0,
    cmap=mymap,
)
WARNING: dendrogram data not found (using key=dendrogram_leiden_R). Running `sc.tl.dendrogram` with default parameters. For fine tuning it is recommended to run `sc.tl.dendrogram` independently.
    using data matrix X directly
Storing dendrogram info using `.uns['dendrogram_leiden_R']`
../_images/b6c493205027e081849fe28bea3ef426c238cc7f11fba3d920bea37eccce1e9f.png

Annotate cells from subclustering#

Clusters 13,6 and 13,8 CD4+ T cells, which are negative for CD197 and CD45RA, so we term them CD4+ TRM T cell. In contrast, 13,1 is slightly positive for CD197, so we keep the original annotation. All three clusters have a distinctly higher intensity for CD69, though.

adata_cd45.obs["cell_type_lvl5"] = adata_cd45.obs["cell_type_lvl5"].cat.add_categories(
    ["CD4+ TRM T cell"]
)
adata_cd45.obs["cell_type_lvl5"][
    adata_cd45.obs["leiden_R"].isin(["13,6", "13,8"])
] = "CD4+ TRM T cell"

Clusters 5,4 and 5,9 are CD8+ T cells, which are positive for CD197 and negative for CD45RA, so we term them CD8+ CM T cells. Cluster 5,10 is double positive for CD197 and CD45RA, which is characteristic for naive CD8+ T cells.

adata_cd45.obs["cell_type_lvl5"] = adata_cd45.obs["cell_type_lvl5"].cat.add_categories(
    ["CD8+ CM T cell"]
)
adata_cd45.obs["cell_type_lvl5"][
    adata_cd45.obs["leiden_R"].isin(["5,4", "5,9"])
] = "CD8+ CM T cell"
adata_cd45.obs["cell_type_lvl5"][
    adata_cd45.obs["leiden_R"].isin(["5,10"])
] = "Naive CD8+ T cell"
adata_cd45.obs["cell_type_lvl5"] = adata_cd45.obs[
    "cell_type_lvl5"
].cat.remove_unused_categories()

Save to file.

adata_cd45.write(data_path + "anndata/" + "cytof_data_annotated.h5ad")

Visualise final annotation#

Read anndata object with final annotation.

adata_cd45 = sc.read(data_path + "anndata/" + "cytof_data_annotated.h5ad")

Reorder the cell type names for visualisation.

adata_cd45.obs["cell_type_lvl5"] = adata_cd45.obs[
    "cell_type_lvl5"
].cat.reorder_categories(
    [
        "Naive CD4+ T cell",
        "CD4+ CM T cell",
        "CD4+ EM T cell",
        "CD4+ TRM T cell",
        "CD4+ TEMRA",
        "Naive CD8+ T cell",
        "CD8+ CM T cell",
        "CD8+ EM T cell",
        "CD8+ TRM T cell",
        "CD8+ TE T cell",
        "NK cell",
        "Double negative T cell",
        "Double positive T cell",
        "not annotated",
    ]
)

Adjust color scheme and set gray as color for not annotated cells.

adata_cd45.uns["cell_type_lvl5_colors"][:-1] = np.flip(
    adata_cd45.uns["cell_type_lvl5_colors"][:-1]
)
adata_cd45.uns["cell_type_lvl5_colors"][-1] = "#bbbbbb"  # not annotated

Plot the mean marker intensity for all cell types in the highest level of granularity. Save the plots as PDF and PNG file.

sc.pl.matrixplot(
    adata_cd45,
    groupby="cell_type_lvl5",
    gene_symbols="AB",
    var_names=[
        "CD45",
        "HLA-DR",
        "CD16",
        "CD44",
        "CD3",
        "CD4",
        "CD8a",
        "CD197",
        "CD45RA",
        "CD57",
        "CD69",
        "CD2",
        "CD5",
        "CD7",
        "CD9",
        "CD11a",
        "CD25",
        "CD27",
        "CD28",
        "CD45RO",
        "CD49D",
        "CD95-FAS",
        "CD127",
        "CD134-OX40",
        "CD137-41BB",
        "CD152-CTLA4",
        "CD161",
        "CD183",
        "CD194",
        "CD195",
        "CD223-LAG3",
        "CD278-ICOS",
        "CD279-PD1",
        "CD366-TIM3",
    ],
    dendrogram=False,
    vmin=0,  # standard_scale='var',
    save=f"{today}_Tcell_subtypes.pdf",
    cmap=mymap,
)
WARNING: saving figure to file ../figures/matrixplot_20220728_Tcell_subtypes.pdf
../_images/817b6f3d21017bcb59f56cd77748828fa44e1627da019e564d51b909dfbed741.png
sc.pl.matrixplot(
    adata_cd45,
    groupby="cell_type_lvl5",
    gene_symbols="AB",
    var_names=[
        "CD45",
        "HLA-DR",
        "CD16",
        "CD44",
        "CD3",
        "CD4",
        "CD8a",
        "CD197",
        "CD45RA",
        "CD57",
        "CD69",
        "CD2",
        "CD5",
        "CD7",
        "CD9",
        "CD11a",
        "CD25",
        "CD27",
        "CD28",
        "CD45RO",
        "CD49D",
        "CD95-FAS",
        "CD127",
        "CD134-OX40",
        "CD137-41BB",
        "CD152-CTLA4",
        "CD161",
        "CD183",
        "CD194",
        "CD195",
        "CD223-LAG3",
        "CD278-ICOS",
        "CD279-PD1",
        "CD366-TIM3",
    ],
    dendrogram=False,
    vmin=0,  # standard_scale='var',
    save=f"{today}_Tcell_subtypes.png",
    cmap=mymap,
)
WARNING: saving figure to file ../figures/matrixplot_20220728_Tcell_subtypes.png
../_images/817b6f3d21017bcb59f56cd77748828fa44e1627da019e564d51b909dfbed741.png

Visualise the cell type annotation on a UMAP plot. Save the plot as PDF and PNG file.

rcParams["figure.figsize"] = (5, 5)
sc.pl.umap(
    adata_cd45, color="cell_type_lvl5", save="_" + today + "_cytof_cd45_lvl5.pdf"
)
WARNING: saving figure to file ../figures/umap_20220323_cytof_cd45_lvl5.pdf
../_images/4ed89bf7d3700f4de0fb8a8c77fd7a60eb459f60d045ef09b34830fa03cd9083.png
rcParams["figure.figsize"] = (5, 5)
sc.pl.umap(
    adata_cd45, color="cell_type_lvl5", save="_" + today + "_cytof_cd45_lvl5.png"
)
WARNING: saving figure to file ../figures/umap_20220323_cytof_cd45_lvl5.png
../_images/4ed89bf7d3700f4de0fb8a8c77fd7a60eb459f60d045ef09b34830fa03cd9083.png

Visualise the cells colored by donor on a UMAP and save as PDF and PNG file.

rcParams["figure.figsize"] = (5, 5)
sc.pl.umap(adata_cd45, color="sample", save="_" + today + "_cytof_donor.pdf")
WARNING: saving figure to file ../figures/umap_20220728_cytof_donor.pdf
../_images/335f04c7451aac6fbd40004c647a5ed542adc8b34b2020711dae0ae1ceb28f1e.png
rcParams["figure.figsize"] = (5, 5)
sc.pl.umap(adata_cd45, color="sample", save="_" + today + "_cytof_donor.png")
WARNING: saving figure to file ../figures/umap_20220728_cytof_donor.png
../_images/335f04c7451aac6fbd40004c647a5ed542adc8b34b2020711dae0ae1ceb28f1e.png

Boxplot of cell fractions#

To check our annotation, we compute the proportions of all immune cell types.

obs_tmp = adata_cd45.obs
df = pd.crosstab(obs_tmp["sample"], obs_tmp["cell_type_lvl2"], normalize=0) * 100
df
cell_type_lvl2 NK cell T cell not annotated
sample
A 4.834401 62.429026 32.736573
B 9.215583 47.293266 43.491151
C 3.194541 40.332118 56.473341
H 8.470708 57.162303 34.366989
J 2.809668 51.047882 46.142450
O 6.911501 44.875164 48.213335
T 8.310561 43.141323 48.548116
U 3.746645 53.342113 42.911242
df.columns
CategoricalIndex(['NK cell', 'T cell', 'not annotated'], categories=['NK cell', 'T cell', 'not annotated'], ordered=False, dtype='category', name='cell_type_lvl2')

Show the proportions of NK cells, T cells and the not annotated cell types from the lowest level of granularity. Save the plot as PDF and PNG file.

rcParams["figure.figsize"] = (3, 5)
ax = sb.boxplot(
    data=df[["NK cell", "T cell", "not annotated"]],
    orient="v",
    palette=["#1f77b4", "#ff7f0e", "#bbbbbb"],
)
ax = sb.swarmplot(
    data=df[["NK cell", "T cell", "not annotated"]],
    orient="v",
    color=".25",
    size=5,
    alpha=0.8,
)
ax.set_xticklabels(ax.get_xticklabels(), rotation=90)

ax.spines["right"].set_visible(False)
ax.spines["top"].set_visible(False)

ax.set_ylim([-2, 65])
ax.set_ylabel("Percentage of all CD45+ cells")
pl.savefig(f"./../figures/{today}_boxplot_Oetjen_cytof_CD45.pdf", bbox_inches="tight")
pl.show()
../_images/7451d86334a5008930cc56db3357bcb0fb3bb94900df6f43ef57f0190294f700.png
rcParams["figure.figsize"] = (3, 5)
ax = sb.boxplot(
    data=df[["NK cell", "T cell", "not annotated"]],
    orient="v",
    palette=["#1f77b4", "#ff7f0e", "#bbbbbb"],
)
ax = sb.swarmplot(
    data=df[["NK cell", "T cell", "not annotated"]],
    orient="v",
    color=".25",
    size=5,
    alpha=0.8,
)
ax.set_xticklabels(ax.get_xticklabels(), rotation=90)

ax.spines["right"].set_visible(False)
ax.spines["top"].set_visible(False)

ax.set_ylim([-2, 65])
ax.set_ylabel("Percentage of all CD45+ cells")
pl.savefig(f"./../figures/{today}_boxplot_Oetjen_cytof_CD45.png", bbox_inches="tight")
pl.show()
../_images/7451d86334a5008930cc56db3357bcb0fb3bb94900df6f43ef57f0190294f700.png

Boxplot of T cell fractions#

Next, the compute the proportions of T cell subtypes. Here we exclude the NK cells and not annotated cells.

obs_tmp = adata_cd45.obs.loc[
    adata_cd45.obs["cell_type_lvl3"].isin(
        [  #'NK cell',
            "CD4+ T cell",
            "CD8+ T cell",
            "Double negative T cell",
            "Double positive T cell",
        ]
    )
]
df = pd.crosstab(obs_tmp["sample"], obs_tmp["cell_type_lvl5"], normalize=0) * 100
df
cell_type_lvl5 Naive CD4+ T cell CD4+ CM T cell CD4+ EM T cell CD4+ TRM T cell CD4+ TEMRA Naive CD8+ T cell CD8+ CM T cell CD8+ EM T cell CD8+ TRM T cell CD8+ TE T cell Double negative T cell Double positive T cell
sample
A 15.218739 18.446668 9.277919 1.562233 12.297399 5.413229 1.259448 19.655145 0.260562 14.125899 0.203885 2.278874
B 25.461977 31.797843 1.793686 0.501674 0.071262 14.594000 3.018309 14.359816 0.045442 5.029396 0.719333 2.607261
C 17.528144 36.018856 3.178619 1.026850 10.130046 5.510870 1.849318 18.156945 0.724399 3.828847 1.161181 0.885926
H 24.642163 22.860104 2.627395 0.357778 2.913099 13.428097 2.484190 10.187133 0.819662 14.139649 3.561057 1.979673
J 30.615129 29.517360 1.112910 0.390318 0.052575 25.830372 1.065803 7.704990 0.206515 1.324473 0.901769 1.277786
O 10.641064 24.461048 4.141803 0.583726 4.449321 14.672224 1.583755 19.141179 0.690654 13.268893 4.477711 1.888620
T 10.567680 28.205243 3.732956 0.509420 0.084903 23.841655 2.576148 23.647271 1.011579 1.069112 3.319611 1.434421
U 17.355147 46.807631 2.898825 1.308842 0.199634 15.293840 1.898293 10.336070 0.485500 1.025338 1.152915 1.237966
df.columns
CategoricalIndex(['Naive CD4+ T cell', 'CD4+ CM T cell', 'CD4+ EM T cell',
                  'CD4+ TRM T cell', 'CD4+ TEMRA', 'Naive CD8+ T cell',
                  'CD8+ CM T cell', 'CD8+ EM T cell', 'CD8+ TRM T cell',
                  'CD8+ TE T cell', 'Double negative T cell',
                  'Double positive T cell'],
                 categories=['Naive CD4+ T cell', 'CD4+ CM T cell', 'CD4+ EM T cell', 'CD4+ TRM T cell', 'CD4+ TEMRA', 'Naive CD8+ T cell', 'CD8+ CM T cell', 'CD8+ EM T cell', ...], ordered=False, dtype='category', name='cell_type_lvl5')

Plot the proportions of T cell subtypes as boxplots. Every dot is the proportion from a single donor. Save the plots as PDF and PNG file.

rcParams["figure.figsize"] = (10, 5)
ax = sb.boxplot(
    data=df[
        [
            "Naive CD4+ T cell",
            "CD4+ CM T cell",
            "CD4+ EM T cell",
            "CD4+ TEMRA",
            "CD4+ TRM T cell",
            "Naive CD8+ T cell",
            "CD8+ CM T cell",
            "CD8+ EM T cell",
            "CD8+ TE T cell",
            "CD8+ TRM T cell",
            "Double negative T cell",
            "Double positive T cell",
        ]
    ],
    orient="v",
    palette=adata_cd45.uns["cell_type_lvl5_colors"],
)
ax = sb.swarmplot(
    data=df[
        [
            "Naive CD4+ T cell",
            "CD4+ CM T cell",
            "CD4+ EM T cell",
            "CD4+ TEMRA",
            "CD4+ TRM T cell",
            "Naive CD8+ T cell",
            "CD8+ CM T cell",
            "CD8+ EM T cell",
            "CD8+ TE T cell",
            "CD8+ TRM T cell",
            "Double negative T cell",
            "Double positive T cell",
        ]
    ],
    orient="v",
    color=".25",
    size=5,
    alpha=0.8,
)
ax.set_xticklabels(ax.get_xticklabels(), rotation=90)

ax.spines["right"].set_visible(False)
ax.spines["top"].set_visible(False)

ax.set_ylim([-2, 51])
ax.set_ylabel("Percentage of all T cells")
pl.savefig(f"./../figures/{today}_boxplot_Oetjen_cytof.pdf", bbox_inches="tight")
pl.show()
../_images/acbffa6aff2e576a14a0b03ffe684038ddc8596979b7af001040af74ae923cdc.png
rcParams["figure.figsize"] = (10, 5)
ax = sb.boxplot(
    data=df[
        [
            "Naive CD4+ T cell",
            "CD4+ CM T cell",
            "CD4+ EM T cell",
            "CD4+ TEMRA",
            "CD4+ TRM T cell",
            "Naive CD8+ T cell",
            "CD8+ CM T cell",
            "CD8+ EM T cell",
            "CD8+ TE T cell",
            "CD8+ TRM T cell",
            "Double negative T cell",
            "Double positive T cell",
        ]
    ],
    orient="v",
    palette=adata_cd45.uns["cell_type_lvl5_colors"],
)
ax = sb.swarmplot(
    data=df[
        [
            "Naive CD4+ T cell",
            "CD4+ CM T cell",
            "CD4+ EM T cell",
            "CD4+ TEMRA",
            "CD4+ TRM T cell",
            "Naive CD8+ T cell",
            "CD8+ CM T cell",
            "CD8+ EM T cell",
            "CD8+ TE T cell",
            "CD8+ TRM T cell",
            "Double negative T cell",
            "Double positive T cell",
        ]
    ],
    orient="v",
    color=".25",
    size=5,
    alpha=0.8,
)
ax.set_xticklabels(ax.get_xticklabels(), rotation=90)

ax.spines["right"].set_visible(False)
ax.spines["top"].set_visible(False)

ax.set_ylim([-2, 51])
ax.set_ylabel("Percentage of all T cells")
pl.savefig(f"./../figures/{today}_boxplot_Oetjen_cytof.png", bbox_inches="tight")
pl.show()
../_images/acbffa6aff2e576a14a0b03ffe684038ddc8596979b7af001040af74ae923cdc.png

End of the annotation notebook.