When a color scheme hides batch effects

I am using a gene expression dataset collected for 269 patients in 19 different batches, over time. One way to visually inspect batch effects in gene expression measurements is to perform a PCA analysis:

Apparently there are no batch effects, right?

In the PCA plot, I labeled the 269 samples by color based on the batch they originate from. If no batch effects exists, samples should cluster independently of the batch of origin.
It seems like there are no batch effects in the data, right?:

plot(PC1, PC2, col=1:19,pch=19,main="random colors")
legend("topright", legend = levels(batches), col = 1:19, lty = 1, lwd = 5, cex = 0.5)

Now with a different color scheme…

However, simply changing the color scheme makes a huge difference:

par(mfrow=c(1,2))

gTypeCols <- topo.colors(length(unique(batches)))
colors <- gTypeCols[batches]
plot(PC1, PC2, col=colors,pch=19,main="topo.colors scheme")
legend("topright", legend = levels(batches), col = gTypeCols, lty = 1, lwd = 5, cex = 0.5)

gTypeCols <- rainbow(length(unique(batches)))
colors <- gTypeCols[batches]
plot(PC1, PC2, col=colors,pch=19,main="rainbow scheme")
legend("topright", legend = levels(batches), col = gTypeCols, lty = 1, lwd = 5, cex = 0.5)

Now, we can clearly see that there are some batch effects, in particular the first batches (batches 1-10) cluster independently of the later batches (batches 11-19):

Bottom line: Getting the right color scheme can make a difference in how data are perceived in a plot. I like to use continuous color schemes for continuous data. Definitely useful, if your batches represent data collection over time!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s