RNA-seq & clustering algorithms
RNA-seq
From sequencing data to gene abundance table (matrix)
- RNAseq: full-transcript sequencing protocols (e.g., Smartseq2) vs tag-based protocols (e.g., 10X Chromium); bulk RNAseq vs single-cell RNAseq
- Preprocessing of the sequencing data
- Quantification
Reads mapping -> read counts -> normalization (check out some slides about reads mapping) (the key: a good indexing technique)
"Sample-specific reads were aligned to the mouse reference genome (GRCm38.p3; Ensembl V.80) and genomic features determined using featureCounts." (paper)
- Postprocessing
"Low-quality cells were filtered resulting in normalised data from 325 cells and 34,769 genes being passed onto downstream clustering analyses. " (paper)
- Feature selection.
Paper: M3Drop: Dropout-based feature selection for scRNASeq
Clustering algorithms
The power of visualization (in combination with dimension reduction, and luck)
- "We defined six cell clusters within the LIN–HLA-DR+CD14– population using unsupervised analysis that did not rely on known markers of DCs. Briefly, we identified 595 genes exhibiting high variability across single cells, reduced the dimensionality of these data with principal components analysis (PCA), and identified five significant PCs using a previously described permutation test (6, 9). We used these PC loadings as input to t-distributed stochastic neighbor embedding (t-SNE) (10) for visualization, and clustered cells using a graph-based approach similar to one recently developed for mass cytometry data (6, 11). " (paper (Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes, and progenitors))
(this image is taken from here)
- PCA
- t-SNE