Histogram: Shape, Statistics, and Interpretation
Quality Advisor
A free online reference for statistical process control, process capability analysis, measurement systems analysis,
control chart interpretation, and other quality metrics.
SPC DEMO
Minimize Production Costs, Quickly Detect Issues, and Optimize Your Product Quality
Don’t miss out! Book a demo of our specialized SPC software and unlock immediate improvements in your processes.
- Quality Advisor
- Data Collection Tools
- Data Analysis Tools
- Formulas and Tables
- Glossary
- Additional Resources
Histogram: Study the shape
A histogram can be created using software such as SQCpack. How would you describe the shape of the histogram?
Bell-shaped: A bell-shaped picture, shown below, usually presents a normal distribution.
Bimodal: A bimodal shape, shown below, has two peaks. This shape may show that the data has come from two different systems. If this shape occurs, the two sources should be separated and analyzed separately.
Skewed right: Some histograms will show a skewed distribution to the right, as shown below. A distribution skewed to the right is said to be positively skewed. This kind of distribution has a large number of occurrences in the lower value cells (left side) and few in the upper value cells (right side). A skewed distribution can result when data is gathered from a system with has a boundary such as zero. In other words, all the collected data has values greater than zero.
Skewed left: Some histograms will show a skewed distribution to the left, as shown below. A distribution skewed to the left is said to be negatively skewed. This kind of distribution has a large number of occurrences in the upper value cells (right side) and few in the lower value cells (left side). A skewed distribution can result when data is gathered from a system with a boundary such as 100. In other words, all the collected data has values less than 100.
Uniform: A uniform distribution, as shown below, provides little information about the system. An example would be a state lottery, in which each class has about the same number of elements. It may describe a distribution which has several modes (peaks). If your histogram has this shape, check to see if several sources of variation have been combined. If so, analyze them separately. If multiple sources of variation do not seem to be the cause of this pattern, different groupings can be tried to see if a more useful pattern results. This could be as simple as changing the starting and ending points of the cells, or changing the number of cells. A uniform distribution often means that the number of classes is too small.
Random: A random distribution, as shown below, has no apparent pattern. Like the uniform distribution, it may describe a distribution that has several modes (peaks). If your histogram has this shape, check to see if several sources of variation have been combined. If so, analyze them separately. If multiple sources of variation do not seem to be the cause of this pattern, different groupings can be tried to see if a more useful pattern results. This could be as simple as changing the starting and ending points of the cells, or changing the number of cells. A random distribution often means there are too many classes.
Follow these steps to interpret histograms.
- Study the shape.
- Calculate descriptive statistics.
- Compare the histogram to the normal distribution.
Additional reference material
Additional sections from legacy histogram-calculate-descriptive-statistics:
Histogram: Calculate descriptive statistics
Histogram: Calculate descriptive statistics
There are several statistics which are useful to describe and analyze a histogram. They are calculated to describe the area under the curve formed by its shape. These descriptive statistics can be calculated using software such as SQCpack.
Central location
The central location of a set of data points is where (on what value) the middle of the data set is located. Central location is commonly described by the mean, the median, and/or the mode. The mean is the average value of the data points. The median is the middle number in the data set when the data points are arranged from low to high. The mode is the value in the data set that occurs most often.
Spread
Both range and the standard deviation illustrate data spread. Range is calculated by subtracting the minimum data value from the maximum data value. The standard deviation is a measure that indicates how different the values are from each other and from the mean. There are two methods of calculating standard deviation using individual data points or using a samples average range. Both formulas are available here.
Skewness
Skewness is the measure of the asymmetry of a histogram (frequency distribution). A histogram with normal distribution is symmetrical. In other words, the same amount of data falls on both sides of the mean. A normal distribution will have a skewness of 0. The direction of skewness is “to the tail.” The larger the number, the longer the tail. If skewness is positive, the tail on the right side of the distribution will be longer. If skewness is negative, the tail on the left side will be longer. The formula for skewness is available here.
Kurtosis
Kurtosis is a measure of the combined weight of the tails in relation to the rest of the distribution. As the tails of a distribution become heavier, the kurtosis value will increase. As the tails become lighter the kurtosis value will decrease. A histogram with a normal distribution has a kurtosis of 0. If the distribution is peaked (tall and skinny), it will have a kurtosis greater than 0 and is said to be leptokurtic. If the distribution is flat, it will have a kurtosis value less than zero and is said to be platykurtic. The formula for kurtosis is available here.
Coefficient of variance
The coefficient of variance is a measure of how much variation exists in relation to the mean. It may also be described as a measure of the significance of the sigma in relation to the mean. The larger the coefficient of variance, the more significant the sigma, relative to the mean. For example, if the standard deviation is 10, what does it mean? If the process average (mean) is 1000, a sigma value of 10 is not very significant. However, if the average is 15, a standard deviation of 10 is VERY significant. The formula for coefficient of variance is available here.
Chi-square
In SPC, the chi-square statistic is used to determine how well the actual distribution fits the expected distribution. Chi-square compares the number of observations found in each cell in a histogram (actual) to the number of observations that would be found in an expected distribution. If the differences are small, the distribution fits the theoretical distribution. If the difference are large, the distribution probably does not fit the expected distribution.
Using Chi-square with the assumption of a normal distribution
- The calculated chi-square is compared to the value in the table of constants for chi-square based on the number of “degrees of freedom.”
- If the calculated chi-square is less than the value in the table, the chi-square test passes, affirming that the process has a normal distribution.
- If the chi-square is larger than the value in the table, the chi-square test fails. At this confidence level, you either do not have enough data to judge the process, or you should reject the assumption that the process has a normal distribution.
Note: Theoretical percent outside of specifications may be misleading.
The formula for chi-square is available here along with the degrees of freedom table.
Follow these steps to interpret histograms.
- Study the shape.
- Calculate descriptive statistics.
- Compare the histogram to the normal distribution.
Additional sections from legacy histogram-compare-to-normal-distribution:
Histogram: Compare to normal distribution
Is the shape of the histogram normal? The following characteristics of normal distributions will help in studying your histogram, which you can create using software like SQCpack.
- The first characteristic of the normal distribution is that the mean (average), median, and mode are equal.
- A second characteristic of the normal distribution is that it is symmetrical. This means that if the distribution is cut in half, each side would be the mirror of the other. It also must form a bell-shaped curve to be normal. A bimodal or uniform distribution may be symmetrical; however, these do not represent normal distributions.
- A third characteristic of the normal distribution is that the total area under the curve is equal to one. The total area, however, is not shown. This is because the tails extend to infinity. Standard practice is to show 99.73% of the area, which is plus and minus 3 standard deviations from the average.
- The fourth characteristic of the normal distribution is that the area under the curve can be determined. If the spread of the data (described by its standard deviation) is known, one can determine the percentage of data under sections of the curve. To illustrate, refer to the sketches right. For Figure A, 1 times the standard deviation to the right and 1 times the standard deviation to the left of the mean (the center of the curve) captures 68.26% of the area under the curve. For Figure B, 2 times the standard deviation on either side of the mean captures 95.44% of the area under the curve. Consequently, for Figure C, 3 times the standard deviation on either side of the mean captures 99.73% of the area under the curve. These percentages are true for all data that falls into a normally distributed pattern. These percentages are found in the standard normal distribution table.
- Once the mean and the standard deviation of the data are known, the area under the curve can be described. For instance 3 times the standard deviation on either side of the mean captures 99.73% of the data.
Follow these steps to interpret histograms.
- Study the shape.
- Calculate descriptive statistics.
- Compare the histogram to the normal distribution.
Additional sections from legacy histogram-chart:
What is a Histogram?
A histogram is a graphical frequency distribution of raw data values. Histograms reveal the distribution of data values, compare them with specification limits, and generate useful metrics and statistics that describe the data set in detail.
When analyzing data, histograms are often used with statistical process control (SPC) control charts. That is, the same data used to create control charts can be used to create histograms. While control charts display data in time sequence, histograms do not. Instead, histograms show individual data values summarized and compared to engineering specifications.
Using Histograms
Histograms communicate the central tendency and spread of a data set. When compared with specification limits, histograms reveal how close the average of the data is to the engineering target. In the histogram below, the average (represented by the black vertical line) falls to the right of the target value. This indicates that the process is not centered. This fact is also indicated by the yellow bars, which show that most of the data values fall to the right of the target.
Histograms also allow users to compare individual data values to both upper and lower engineering specification limits. This allows quality professionals to access a variety of different statistics and improvement metrics.
This histogram represents the measurements of a feature, “Location C,” from a specific part, revealing the following important information:
- The mean value is larger than the target value.
- Data values are “bunching up” against the upper specification limit.
- The yellow bars do not seem to match the normal curve fit.
- No data value falls below the lower specification limit.
- If unchanged, the process is expected to generate a loss of about 4.8% above the upper specification limit.
Using Histograms
Histograms communicate the central tendency and spread of a data set. When compared with specification limits, histograms reveal how close the average of the data is to the engineering target. In the histogram below, the average (represented by the black vertical line) falls to the right of the target value. This indicates that the process is not centered. This fact is also indicated by the yellow bars, which show that most of the data values fall to the right of the target.
Histograms also allow users to compare individual data values to both upper and lower engineering specification limits. This allows quality professionals to access a variety of different statistics and improvement metrics.
This histogram represents the measurements of a feature, “Location C,” from a specific part, revealing the following important information:
- The mean value is larger than the target value.
- Data values are “bunching up” against the upper specification limit.
- The yellow bars do not seem to match the normal curve fit.
- No data value falls below the lower specification limit.
- If unchanged, the process is expected to generate a loss of about 4.8% above the upper specification limit.
Automate and Simplify Control Chart Analysis
See how easy it is to access actionable information from your SPC control charts.
Automate and Simplify Control Chart Analysis
See how easy it is to access actionable information from your SPC control charts.
See the Histogram in Action
Histograms can be combined with other SPC tools and control charts to reveal important quality information and opportunities for improvement at a plant—and even across sites.
Modern SPC software solutions make these complex analyses possible. When data is centralized and standardized in a unified data repository, SPC software provides instant access to quality information, ensuring immediate attention to your greatest quality challenges.
See how InfinityQS® reveals valuable quality information and makes SPC easy.
See the Histogram in Action
Histograms can be combined with other SPC tools and control charts to reveal important quality information and opportunities for improvement at a plant—and even across sites.
Modern SPC software solutions make these complex analyses possible. When data is centralized and standardized in a unified data repository, SPC software provides instant access to quality information, ensuring immediate attention to your greatest quality challenges.
See how InfinityQS® reveals valuable quality information and makes SPC easy.
Speak to a Manufacturing Industry Expert
What to Expect
- Free 20-minute call with a product expert
- Explore which solutions best suit your needs
- No-pressure conversation
- Get a live, personalized demo
Speak to a Manufacturing Industry Expert
What to Expect
- Free 20-minute call with a product expert
- Explore which solutions best suit your needs
- No-pressure conversation
- Get a live, personalized demo
Customers using Advantive in quality advisor
“Enact helps us quickly respond to quality issues, which saves us money.”
““What sets Ben & Jerry’s apart from our competitors is not only our insistence on high-quality ingredients, but also the extra and unique flavours we use to create a euphoric customer experience. Ensuring the final product reflects the passion and quality that we put into each pint required a quality solution that emphasized the same attention to details that we do.””
““By utilizing InfinityQS® ProFicient™ to implement SPC and Six Sigma best practices across our manufacturing processes, Ben & Jerry’s will continue to identify opportunities for cost savings and ensure the highest level of customer satisfaction. The result is the perfect pint for our customers.””