How to Use t-SNE Effectively

t-SNE stands for t-Distributed Stochastic Neighbor Embedding (t-SNE) and is a popular technique for dimensionality reduction. The technique was introduced by van der Maaten and Hinton in 2008. T-SNE is particularly well suited for the visualization of high-dimensional genomic or proteomic datasets (e.g. gene expression, mass spectrometry, etc).

The most popular used method in genomics/proteomics literature for dimensionality reduction is the Principal Component Analysis (PCA). However, PCA might not be the best method as it is a linear and parametric method. Low-dimensional maps resulting from a PCA analysis have been used as an input to clustering algorithms, but in fact, PCA is not necessarily a method primarily developed for clustering and even dimension reduction. Lior Pachter post explains very well what PCA is:

Recently, t-SNE (a non-linear and non-parametric method) has been gaining some popularity in the genomics and proteomics field.
t-SNE is often used to embed high-dimensional data into low dimensions for visualisation. (Fonville et al., 2013) have shown that
t-SNE outperforms PCA and when used in the visual analysis of high-dimensional molecular data. You can easily use the t-SNE method in R with the “tsne” R package, or read this blog post to start.

How to use t-SNE effectively:

I found a fantastic blog post that I think everyone should read when using t-SNE:

Click below:

How to Use t-SNE Effectively

How to Use t-SNE Effectively



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s