Main

diagrams Archives

August 14, 2006

Potential faults of interpreting web log visualizations with linear scales

Jakob Nielsen has a very nice set of simple (but powerful) diagrams showing website's page popularity (using his useit.com website as testbed). The common diagram showing pages ranked by popularity on the x-axis and the number of page views on the y-axis can hide interesting information if plotted on linear scales.

zipf_visualization_logarithmic.gif

The diagrams shows exactly the same data but the one on the right side, which is on a log-log scale, tells you something more about the website that just couldn't be inferred from the linear one (on the left side).

It's now clear that we have a drooping tail: the site simply doesn't have enough content to supply the predicted demand at the low end.
Without this fancy log-log plot, we would have never seen the site's potential for increasing traffic by adding large amounts of low-volume content. I'm amazed at how often articles analyzing Web traffic or "long tail"-type businesses use linear plots that fail to show what's really going on.

There is another related article from Nielsen pushing the analysis a but further, showing statistics on search engine queries issues towards useit.com and incoming traffic from other websites. The same rule on graphics still holds.

Interestingly enough you can see from the visualization that queries from Google are disproportinately high and that the distribution of incoming traffic doeas not drop-off at the lower end of the tail.

August 19, 2006

Bootchart: visualizing the linux boot process

Here is bootchart a nice visualization tool visualizing the performance of the linux boot process. The tool meets one important need for all linux techies (yes, those obsessioned by compiling a new kernel at least once per week :-)) that need to understand the performance of the boot process.

The challenge is to create a single poster showing graphically what is going on during the boot, what is the utilization of resources, how the current boot differs frm the ideal world of 100% disk and CPU utilization, and thus, where are the opportunities for optimization.

bootchart.sortreadahead.png

There are graphics of any linux fashions:
SUSE | fedora | Debian

The application collects data from the boot process and passes the information to a Java application rendering the thing:

The log tarball is later passed to the Java application for parsing and rendering the data. The CPU and disk statistics are used to render stacked area and line charts. The process information is used to create a Gantt chart showing process dependency, states and CPU usage.

Since the amount of data is very large some pruning techniques are used:

A typical boot sequence consists of several hundred processes. Since it is difficult to visualize such amount of data in a comprehensible way, tree pruning is utilized. Idle background processes and short-lived processes are removed. Similar processes running in parallel are also merged together.

October 29, 2006

Histograms vs Barcharts

Nice simple and plain post from statgraphics.blog.com, explaining what histograms are and why they are different from barcharts:

Histograms are often mistaken with barcharts. The fundamental distinction between the two is:
  • Barcharts show counts (or weights) for the discrete axis of a categorical variable
  • Histograms show an approximation of the density function (if scaled accordingly) of a countinuous variable.
hist-density.png barchart.jpg
HistogramBarchart

That's absolutely right, even if I didn't ever realize it, I used to mix the two names up too.

And there's more:

As a consequence, the only thing that can be quantified in a barchart is the bar height ... On the other hand, in a histogram, the area of the boxes is proportional to the density approximation. If all bars have the same width in a barcharts, or gaps are drawn in a histogram (which is complete nonsense), the two plots can get mixed up.

I would also add that another consequence of that is that the values in barcharts can be reordered to show different patterns. The most common ways are: order by frequency or alphabetical order, but there can be others available.

Barcharts and histograms does not look "cool" enough but they are among the most powerful visualizations available. If used properly, they can convey a lot of information in a compact and effective way.

November 21, 2006

World History Timeline

The World History Timeline it's a beautifully crafted time wall chart compressing major historical events and human acheivements in thousands of years, on a 195 cmx134 cm or 78"x53" frame.

global_view3.jpg_image_medium.jpeg

In the author's words:

It took more than four years to work out this timeline in a coherent and enlightening manner. The structure of this timeline is set up to facilitate interdisciplinary understanding of diverse intellectual, artistic and scientific movements, discoveries, and cultural developments. The history of the sciences, literature, art, music, philosophy, religion and political milestones, have been mapped in a coherent and synchronoptic manner.

The map has an interesting structure: a large part of it (about 80% of its width, on the left side) is devoted to events spanning a time period between 3000 BC and 2000, covering literature, religions, art, evolution, etc. On the remaining space, on the right side, there are modern items like economics, sociology, biology, etc. Here is a picture making its structure clear.

I think this is a very powerful example of how vital and useful visualization can be. I like the idea of visualization as a discipline not necessarily related to computer screens. I can imagine the impact of a chart like this in a classroom to let boys and girls understand history at a glance and stimulate their curiosity. Also, this is yet another example of how one of the primary purposes of visualization is the illustration of complex information in a way that understanding becomes easy.

In fact, it's strange how much relevance is given to the role of discovery in visualization over understanding (or as I prefer saying "making sense of data"). Sure, the discovery of previously unknown information is valuable and exciting but it is extremely rare and, in any case, understanding is always a prerequisite of it.

By the way, if you are interested the wall chart can be preordered from here. I think I'm going to buy one!

[thanx to Bruno, via pasta&vinegar. ]

About diagrams

This page contains an archive of all entries posted to Visuale in the diagrams category. They are listed from oldest to newest.

critique is the previous category.

events is the next category.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 4.1