When the number of members in a category increases (as in the view above), shifting to a boxplot (the view below) can give us the same information in a condensed space, along with a few pieces of information missing from the chart above. The same can be said when attempting to use standard bar charts to showcase distribution. It can become cluttered when there are a large number of members to display.
This type of visualization can be good to compare distributions across a small number of members in a category. The view below compares distributions across each category using a histogram. In the view below our categorical field is “Sport”, our qualitative value we are partitioning by is “Athlete”, and the values measured is “Age”. Finally, you need a single set of values to measure. You also need a more granular qualitative value to partition your categorical field by. You need a qualitative categorical field to partition your view by. Keep in mind that the steps to build a box and whisker plot will vary between software, but the principles remain the same. They also help you determine the existence of outliers within the dataset. Use a box and whisker plot when the desired outcome from your analysis is to understand the distribution of data points within a range of values.
Box and whisker plot example how to#
When and how to use Box and Whisker Plots for Visual Analysis The whiskers (the lines extending from the box on both sides) typically extend to 1.5* the Interquartile Range (the box) to set a boundary beyond which would be considered outliers. The median is the middle, but it helps give a better sense of what to expect from these measurements. The lower quartile is the 25th percentile, while the upper quartile is the 75th percentile. These sections help the viewer see where the median falls within the distribution. You can think of the median as "the middle" value in a set of numbers based on a count of your values rather than the middle based on numeric value. The median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. The box itself contains the lower quartile, the upper quartile, and the median in the center. It will likely fall outside the box on the opposite side as the maximum. The mark with the lowest value is called the minimum. The mark with the greatest value is called the maximum. The box within the chart displays where around 50 percent of the data points fall.