Preprocess flow data#

In this notebook, we load an fcs file into the anndata format, move the forward scatter (FCS) and sideward scatter (SSC) information to the .obs section of the anndata file and perform compensation on the data. Next, we apply different types of normalisation to the data. The fcs file was part of the following reference and originally deposited on the FlowRepository.

import readfcs
import pytometry as pm
/home/runner/work/pytometry/pytometry/.nox/build-3-9/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
%load_ext autoreload
%autoreload 2

Read data from readfcs package example.

path_data = readfcs.datasets.Oetjen18_t1()
adata = pm.io.read_fcs(path_data)
adata
AnnData object with n_obs × n_vars = 241552 × 20
    var: 'n', 'channel', 'marker', '$PnR', '$PnB', '$PnE', '$PnV', '$PnG'
    uns: 'meta'

Reduce features#

We split the data matrix into the marker intensity part and the FSC/SSC part. Moreover, we move all height related features to the .obs part of the anndata file. Notably. the function split_signal checks if a feature name is either FSC/SSC or whether a name endswith -A for area related features and -H for height related features.

Let us check the var_names of the features and the channel names. In this example, the channel names have been cleaned such that none of the markers have the -A or -H suffix.

adata.var
n channel marker $PnR $PnB $PnE $PnV $PnG
FSC-A 1 FSC-A 262144 32 0,0 510 1.0
FSC-H 2 FSC-H 262144 32 0,0 510 1.0
FSC-W 3 FSC-W 262144 32 0,0 510 1.0
SSC-A 4 SSC-A 262144 32 0,0 310 1.0
SSC-H 5 SSC-H 262144 32 0,0 310 1.0
SSC-W 6 SSC-W 262144 32 0,0 310 1.0
CD95 7 R660-A CD95 262144 32 0,0 490 1.0
CD8 8 R780-A CD8 262144 32 0,0 475 1.0
CD27 9 B515-A CD27 262144 32 0,0 470 1.0
CXCR4 10 B710-A CXCR4 262144 32 0,0 417 1.0
CCR7 11 V450-A CCR7 262144 32 0,0 400 1.0
LIVE/DEAD 12 V545-A LIVE/DEAD 262144 32 0,0 495 1.0
CD4 13 V605-A CD4 262144 32 0,0 400 1.0
CD45RA 14 V655-A CD45RA 262144 32 0,0 375 1.0
CD3 15 V800-A CD3 262144 32 0,0 400 1.0
CD49B 16 G560-A CD49B 262144 32 0,0 400 1.0
CD14/19 17 G610-A CD14/19 262144 32 0,0 415 1.0
CD69 18 G660-A CD69 262144 32 0,0 470 1.0
CD103 19 G780-A CD103 262144 32 0,0 435 1.0
Time 20 Time 262144 32 0,0 0.01

We use the channel column of the adata.var data frame to split the matrix.

pm.pp.split_signal(adata, var_key="channel")
adata
AnnData object with n_obs × n_vars = 241552 × 13
    obs: 'FSC-A', 'FSC-H', 'FSC-W', 'SSC-A', 'SSC-H', 'SSC-W', 'Time'
    var: 'n', 'channel', 'marker', '$PnR', '$PnB', '$PnE', '$PnV', '$PnG', 'signal_type'
    uns: 'meta'

The data matrix was reduced by three features (FSC-A, FSC-H and SSC-A).

Compensation#

Next, we compensate the data using the compensation matrix that is included in the FCS file header. Alternatively, one may provide a custom compensation matrix.

The compensate function matches the var_names of adata with the column names of the spillover matrix to compensate the correct channels.

pm.pp.compensate(adata)

Normalize data#

In the next step, we normalize the data. By default, normalization is an inplace operation, i.e. we only create a new anndata object, if we set the argument inplace=False. We demonstrate three different normalization methods that are build in pytometry:

  • arcsinh

  • logicle

  • bi-exponential

  • auto-logicle

adata_arcsinh = pm.tl.normalize_arcsinh(adata, cofactor=150, inplace=False)
adata_logicle = pm.tl.normalize_logicle(adata, inplace=False)
adata_biex = pm.tl.normalize_biExp(adata, inplace=False)
adata_autologicle = pm.tl.normalize_autologicle(adata, inplace=False)