« Extended Excentric Labeling | Main | New Stephen Few's book out soon: "Now you see it" »

No more excuses: a list of references to learn how to use color

... and finally stop polluting our eyes I'd say!

color-vis.gif

I was talking with Ilya, a new PhD student in our department, the other day and in front of a prototype he developed he said something like: "oh yes, and I should find the right color mapping here but ... how?" Oh well ... good question! Originally I wanted to write a whole new post on it but after some reasonings I came to the conclusion that not only it is a daunting task but also and more importantly I don't know enough to seriously teach about it.

But wait a minute, does it mean I cannot help him and the ever increasing pool of poor color choosers? No, there is one thing I can do at least: share my list of favorite sources of information on color. And maybe add some tips and rules of thumb I often use for myself.

So, no more excuses to use poor color schemes. Here is my annotated list of resources, plus some personal tips.

Research Papers

List of papers I found most useful in understanding color in use. Some of them are written more for the general public, some others require quite some effort to understand. They cover however a very large part of what should be learned and the effort is largely payed off.

Color Use Guidelines for Data Representation. Brewer, C. A.,Proceedings of the Section on Statistical Graphics, American Statistical Association, Alexandria VA. pp. 55-60 (1999).
[ If you can read only one, read this ]
If you don't have time to read and you need one single source for practical advice stop here. This is the best and conciser explanation about how to use color in visualization you'll ever find. Cynthia Brewer is a cartographer and focused much of her work on color in geographical data but her suggestions apply broadly to any kind of data. You may see the result of her work in Color Brewer, an on line tool to learn how to select color scales. The tool alone is an eye-opener for those who don't know anything about the topic.

How NOT to lie with visualization. BE Rogowitz, LA Treinish, S Bryson, Computers in Physics (1996).
[ More into color for SciVis but still very useful and great examples ]
This is another classic, quite short and easy to read. I like it especially for its focus on how harmful color can be if not used properly. The use of color is discussed more in the context of scientific visualization where continuous shades of color are often the case, like in medical images and geographical mapping, but the results can be applied to any other visualization. It is especially interesting the notion that different color mapping strategies should/can be used according to the task at hand (e.g., segmentation, highlight, etc.).

Designing pixel-oriented visualization techniques: Theory and applications. DA Keim, IEEE Transactions on Visualization and Computer Graphics (2000).
[ Discussion (and code) of a "perceptually" optimal color scale ]
Though this is not only about color, the paper contains a very useful section on color and on how to build a perceptually optimal color scale. The color scale is called HSI (Hue, Saturation, Intensity) and is a variation over the most common RGB, HSB, etc. The very good point about it is that it is a very rare example of article where both color theory and practical implementations are discussed in the same place. The HSI color scale can be easily re-implemented by following the code they provide in a related paper: Issues in visualizing large databases. DA Keim, HP Kriegel - Proc. Conf. on Visual Database Systems, VDB'95 (1995).

Color Scales for Image Data. H. Levkowitz, G. T. Herman, IEEE Computer Graphics and Applications (12):1 pp.72 - 80 (1992).
[ Some relevant psychophysics theory and its relevance in color scale design ]
This is a purely theoretical paper. I included it because it contains some information that is difficult to find elsewhere. And also because I find it especially intriguing. Here we learn that (1) not all differences in color intensity are perceived by our eyes and (2) that a linear increase in color intensity is not necessarily perceived linearly. The concept of Just Noticeable Difference (JND) is introduced and applied to color scale design. One practical consequence is that it doesn't matter how well we map our data to color, some differences will always be lost.

Choosing Effective Colours for Data Visualization. Healey, C. G., Proceedings IEEE Visualization '96, pp. 263-270 (1996).
[ Not easy read, hard-core experimentation, but unique info on categorical colors ]
This is even more theoretical than the paper above. And be warned, it is not an easy read! Anyway, I put it in the list because it is the only "serious" reference I know where the selection of categorical colors, that is, colors that represents categories and not quantity, is discussed in fine details and an algorithm for their selection is discussed. Here we learn that color is not as powerful as we may think. The maximum number of distinguishable colors we can use to label data is around 12. Not so many indeed!

Book Chapters

Information visualization: perception for design (Chapter 4: Color) by Colin Ware.
Colin Ware's book is simply the best resource for whatever concerns perception theory applied to visualization. Admittedly, this is probably the best book on visualization ever. Chapter 4 is all about color theory and its content is obviously great. Theory and practice are well balanced and useful examples are illustrated throughout the chapter. I think it only missed practical advices and how to implement the suggestions in practice, but ok, maybe this would be out of the scope of the book.

Envisioning Information (Chapter 5: Color and Information) by Edward Tufte.
I don't think this book needs any introduction. It is part of the famous Tufte's trilogy and of course it contains some indications on color use. Even if here one can find many of the things discussed in other books and papers, but in a useful summarized version, it also contains some unique content in the usual original Tufte's style. A great piece of knowledge here is given right away as the chapter opens. Tufte summarizes color uses in information design as: to label, to measure, to represent or imitate reality and to enliven or decorate. These few tasks provide a useful framework around the work of a visualization designer.

Show Me the Numbers (Chapter 6: Visual perception and quantitative communication) by Stephen Few.
This chapter written by Stephen Few is the best summary I have ever seen on visual perception theory applied to visualization. Here you will find not only how to use color effectively but also how to boil down basic theory on how human vision works to few simple rules to apply in visual design. In a way it can be considered a sort of Colin Ware's book compressed in one pill. So again, if you don't have enough time to read, pick this one and study this chapter. You won't regret your choice.

Tips and rules of thumb

Finally I try to put something myself. This is just a random list of rules I learned the hard way by doing.

  • Don't overestimate the power of color - Color is attractive and powerful and let's admit it, it is what makes most of our visualizations pretty and nice to see. But for any serious use it is important to realize how limited it is. The number of colors we can easily distinguish is incredibly low (this you can learn it from the refs above). For instance, it is estimated that the maximum number of categorical colors we can easily detect in a representation is around 12. Similar figures holds when presenting continuous data. Compared to other data features like position, length, size, it is visually perceived less efficiently. So just don't believe color mapping will do wonders, it is useful within its bounds.

  • Always provide a color legend - I think this one goes in the list of the most common mistakes in visualization: some data feature is represented with color but then there's nothing in the interface that tells you what this color represents. A color legend is alway needed and not only for labeling. As an example, when it represents quantitative data it must also tell us to what numbers the brightest and darkest colors map to. So in short, please do your home work, provide a legend.

  • Use color with extreme care and parsimony (above all do no harm!) - This is a sort of repetition of the first point but from a different angle. As color is added to an interface it soon becomes noise. Learn to use it with extreme care and parsimony. It is important for instance to realize that if color is used to represent a data feature it is extremely hard to use it for some other elements in the interface. In the end it is extremely important what Tufte says: "above all do no harm".

  • Learn to love grays and gray scales (grids!) - The best use one can find of color is to understand how powerful colorless graphics are. In particular shades of grays are so useful in data representation that I am surprised there are so few, if any, specialists advocating for their use (Tufte mentions it by the way). Give a look around, pick the best known and best crafted tools and you'll see that most of the times their design is based on shades of gray. Gray is especially useful in segmenting the visualization space and organizing it in spaces. The most obvious example is the use of grids in charts and alternated rows in tables (Stephen Few shows excellent examples in Show Me the Numbers) but the same principle applies to thousands of other visualization components. So in short: learn to love gray and gray scales, they can do wonders and rarely do harm.

  • Don't represent unordered data with ordered colors - This is self-explanatory but I see it so often that I think it's worth to add it. Also, I think not everybody would agree with me on that. Some people use different intensities of the same "hue" to represent categories. In my opinion this is poor use of color and opens the door to false interpretations. Ordered colors are automatically coded as "there's some ordered here" by our brain. Why do we want to fool our mind when there are better solutions? Use distinguishable hues and, if possible, make them of the same intensity. This will work best.

  • Keep an eye to skewed distributions - Personally I always find this problem in my data visualizations and I am surprised it is not discussed more. When the dimension you map to color has a skewed distribution the result is incredibly poor: there are few items represented by the highest intensity and all the others flattened to the lower. In short, there's nothing really useful to see apart the fact that there are two or three items with very high values. In this case one option is to adopt a not linear mapping between data feature and color. Common solutions are logarithmic or square root functions that alleviate the problem and permit to reproduce a full progression of values.

Here was my list and .... oh before I forget there is one last major one!

  • Don't use the (infamous) rainbow color scale - Maybe someone would laugh at this advice as something too obvious but then, thanks to Ilya I discovered that there is nothing to laugh about. If you are not convinced see this study on the uses of the rainbow color scale and discover how many professionals and researchers still believe it has some value:Rainbow Color Map (Still) Considered Harmful

Conclusion

If you want to design great visualizations, learning to use color properly and effectively cannot be avoided. The whole system is as weak as the weakest link, therefore if color is used badly your design will suffer a lot. Take your time, read as many of these references as you can and you won't regret. They come from top class researchers and designers, you can trust their words. Your visualizations will improve, your clients will thank you, and the visual world will definitely and finally be less polluted.

Comments (3)

For the sake of completeness, two more papers on the topic:
M. Stone, Choosing Colors for Data Visualization, 2006
Penny Rheingans, Task-based Color Scale Design, 1999

Enrico Bertini:

Thanks Ilya! In fact I was aware of these two papers but I decided to keep them apart to simplify the selection and because I saw too much overlap with the others.

Anyway, for those who want to spend some time on it and really want to dig into the problem of color use there is also a full book written by Maureen Stone "A Field Guide to Digital Color".

A recent article on HCL color space:

Zeileis, Hornik, Murrel. Escaping RGBland: Selecting colors for statistical graphics.
1 July 2009. Computational Statistics & Data Analysis.

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)


Bookmark and Share

About

This page contains a single entry from the blog posted on May 27, 2009 6:05 PM.

The previous post in this blog was Extended Excentric Labeling.

The next post in this blog is New Stephen Few's book out soon: "Now you see it".

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 4.1