A Complex Complex with Complicated Complications

August 3rd, 2010

I haven’t had the time to develop a mock chassis to show an example of a real world tolerance analysis.  I’m tired of writing about tolerance analysis anyway, and you are probably tired of reading about it.  Let’s examine a different problem.

For a tolerance analysis, the tolerance or loop equation, is simple addition or subtraction and looks something like this

Xtotal = X1 + X2 + … + Xn

Suppose we have mathematical descriptions that are more complex. For example

f(X) = X1X2

f(x) = mX + b

T(t) = Ti + Tf(1 – e-t/τ)

How do we go about analyzing the quality of our design with these types of mathematical relationships?  Let look at a simple example.

Anyone, with even a modicum of technical training, is familiar with the spring force equation, or Hooke’s law. (Hooke was commissioned to develop a reliable time piece to help ships more accurately estimate their longitudinal location, and he did so using a spring).  The spring equation is

F = kΔX

What would be the minimum and maximum force?  We could say the spring constant, k, varies by +/- 10% and hence the force varies +/- 10%.  We could also say the ΔX varies by +/- 10% and hence the force varies by +/- 10%.  Or, we could say both k and ΔX vary by +/- 10%, and hence the force would vary by +/- 21% (1.1 X 1.1 = 1.21).  Although these may be good guesses, and follow a logical line of thinking, these guesses are not technically correct.  We can do better.

There are two methods that we can use to solve this type of problem (two methods to which I am aware anyway).  One is using a Monte Carlo simulation.  I am going to wait for a future post to talk about Monte Carlo simulations, but it is an iterative method for estimating the variation for complex problems – not terribly different than Finite Element Analysis (FEA), or Computational Fluid Dynamics (CFD) modeling.   The most popular Monte Carlo Simulation software is called Crystal Ball and was developed by Precision Engineering and runs on top of MS Excel.  Precision Engineering was acquired by Oracle, so it you want to buy it, you have to go through Oracle.  (If you really understand how Monte Carlo simulation works, and you are good with spread sheets and macros, you can develop your own Monte Carlo simulator without shelling out a bunch of money).  It depends on whether you have more time than money.  One of the more popular applications for Monte Carlo simulation is in the financial industry, where financial institutions invest a considerable amount of manpower and hardware (computers) in an effort to predict the economic future.

The other method is more theoretical and manual, and it requires a basic understanding of differential calculus.  This is the method that I will explain in this post.  Don’t be scared.  It is not as difficult as it sounds.  The good thing about this method is that if a system can be described by a linear equation (linear equation does not mean a straight line), and if the equation can be differentiated explicitly, we can estimate the standard deviation of the system, regardless of its complexity. Once we have an estimate of the standard deviation, and the specification limits, we can estimate the quality of our design.  Let’s jump in.

Let’s examine the following generic equation

 f(X) = f(X1, X2,… Xn); f(x) is the same as y.

Equations like this mean that the result, f(x), is dependent on several independent inputs, Xn.  You are already familiar with one example.  In a tolerance analysis, the tolerance equation looks like this

XTotal = X1 + X2 + … Xn

The result depends on the length or distance of the parts of loop.  You see, you are already on your way to solving more complex problems.

The standard deviation of the system can be described as follows

 σ2Total = (∂(f(X1,X2, … Xn)/∂X1)2σ12 + (∂(f(X1,X2, … Xn)/∂X2)2σ22 + … + (∂(f(X1,X2, … Xn)/∂Xn)2σn2

I can imagine your reaction.  This is way too complicated!!  Wait a minute, Hileman, you’re supposed to make complex stuff simple!!  You betrayed us!!  Don’t worry, I will simplify this.

You should notice a couple of things about this equation.  The standard deviation is not necessarily a single number.  The standard deviation depends on where you are within the solution space.  The first differentiation of any function is the slope.  In this equation, we are essentially taking the first differentiation of a function, and hence, the standard deviation is the slope of the function.

The previous equation is called partial differentiation.  Before we can appreciate partial differentiation, let’s look at regular differentiation.

What is differentiation?  Differentiation is nothing more that determining the slope, or grade, of a line.  It’s simply rise over run.

               Slope = Δy/Δx

If we make Δy very small, infinitesimally small, and we make Δx small, infinitesimally small, then we can determine the slope anywhere on a line, regardless of whether the line is straight or curved.

Mathematically

Slope = dy/dx, or d(f(x)/dx

Here an example of differentiation of some popular functions

Let f(x) = C, or a constant

Df(x)/dx = d(C)/dx = 0.   The slope of a constant is 0. It’s a horizontal line where Δy always = 0.

f(x) = X

df(x)/dx = d((X)/dx = 1  As X changes, f(x), or y, changes by the exact same amount.

f(X) = mX

df(x)/dx = d(mX)/dx = m   As X changes f(x) changes more or less than X depending on whether m is greater or less than 1.

f(x) = xn

df(x)/dx = d(xn)/dx = nxn-1

Commonly d(x2)/dx = 2x

Other common function

d(ex)/dx = ex This is the only function where the slope is equal to f(x).

d(ln(x))/dx = 1/x

d(sin(x))/dx = cos(x)

d(cos(x))/dx = -sin(x)

Differentiation can get more complex when functions are embedded in other functions, but we don’t need to go into that here, and, for the most part, we rarely encounter these types of functions in basic engineer.

Now that we know a little about regular differentiation, we can look at partial differentiation.  When we perform partial differentiation, we differentiate with respect to one of the independent variables, and all other variables as treated as constants.  As we saw earlier, the slope of a constant is zero.

Here’s an example

               f(X) = X1 + X2

               ∂(f(x)/∂X1 = 1

X2 is treated as a constant, and the slope of a constant is zero.  d(X1)/dx = 1, as we saw before.

Now let’s apply what we know to a simple example and estimate the systems standard deviation.

We have three elements with lengths X1, X2. and X3.  The standard deviation of these components is σ1, σ2 and σ3 respectively.

The total length can be described as

XTotal = X1 + X2 + X3 

The system standard deviation can be described as

σTotal2 = (∂(= X1 + X2 + X3)/∂(X1))2σ12 + (∂(X1 + X2 + X3)/∂(X2))2σ22 + (∂(X1 + X2 + X3)/∂(X3))2σ32

When we follow the rules we have established, we end up with

σTotal2 = σ12 + 2σ22 + 2σ32

σTotal = (σ12 + 2σ22 + 2σ32 )1/2

This should look familiar.  The total standard deviation of a system, when the elements are simply added or subtracted, is the square root of the sum of the squares of the element’s standard deviations.

Now, let’s get back to the spring force problem.

F = kΔX

The spring constant, k, is not really constant.  It has some variation.  In a coiled spring, the spring constant is dependent on the wire diameter to the 4th power, inversely dependent on the coil diameter to the 3rd power, and dependent on the material’s torsion stiffness.  If there is no variation in any of these elements, then yes the spring constant in constant.  But the elements do vary, and the spring constant is not constant.

The system standard deviation is

               σF Total2 = (∂(kΔX)/∂(k))2σk2 + (∂(kΔX)/∂(ΔX))2σΔX2

σF Total2 = ΔX2σk2 + k2σΔX2

σF Total = (ΔX2σk2 + k2σΔX2 )1/2

Let’s examine this relationship a little.  The standard deviation of the spring force is not constant.  Even if the standard deviation of the spring constant is constant, and the standard deviation of the spring deflection is constant, the total standard deviation is not!  The more the spring deflects, the greater the influence the standard deviation of the spring constant has on the spring force.  The greater the spring constant, the more influence the standard deviation of the spring deflection has on the spring force.  This result may be more intuitive to some, and less intuitive to others.  One thing is true we now have an accurate estimate for the standard deviation of the spring force.

I should also note that this method has an advantage over Monte Carlo simulation because we can see how the system’s components influence the system standard deviation.  It’s more difficult to pick this information out of a Monte Carlo simulation.  Some experts feel that a Monte Carlo simulation is faster, but I have done a lot of analysis both ways, and I am not convinced that Monte Carlo simulation saves that much time.

It took a little bit of effort to get here, but with some imagination, we can now apply this method to heat sink attachment, system level cooling, kinematics, dynamics, stress analysis …  You could also apply this method when using FEA of CFD models.  You could vary the nominal input variables to obtain a mathematical relationship for stress or temperature.  You may need to use some type of linear regression, or multiple regression to get there, but it can be done.  Minitab is a great software package for regression analysis.  You can also use a simple fractional factorial to do pretty much the same thing, (and you can do it in a spread sheet). I will discuss fractional factorials in a later post.

In my next post, my discussion will move from theoretical predictions to using data to help us predict quality when we actually start building stuff.

Rivets, Tabs and Slots … Where am I?

June 24th, 2010
Rivets are great for clamping things together, particularly pieces of sheet metal.  I have seen many designs where rivets, either intentionally or unintentionally, are used as locating other features of interest.  When analyzing a connector alignment of an assembly for example,  rivets in assembly seem to always be a part of the loop.  Rivets, however, are not precise locating features as I will demonstrate.  Tab and slots, if implemented correctly can improve the quality of assemblies.  

Let’s start with two pieces of sheet metal that are riveted together and determine how the rivets will contribute to position variation.    

Sheet metal Riveted Assembly

Sheet Metal Riveted Assembly

In this example I am using a #4 rivet that has the following specifications     

  • Rivet Diameter = 3.18mm +/- 0.08 mm
  • Hole Diameter = 3.3mm min, 3.4 mm max (3.35 +/- 0.5)

      

#4 Rivet Assembly

#4 Rivet Assembly

Let’s develop the tolerance equation.  There are at least two different ways to develop the tolerance equation.  One is to use the radius of the components.  The other is to use the the diameter of the components.  In the end, we will end up with the same answer.  The radial method takes advantage of symmetry; we are analyzing only half of the feature, or rivet assembly.  This means that the standard deviation that we calculate applies to only half of the riveted assembly.  In other words, we are able to calculate the variation in the plus or minus direction only.  In order to estimate the variation in both plus and minus direction, we would need to double the standard deviation that we calculate.  I prefer the radial method just because it’s seems to suite my mind better, but both the radial and diametral methods are valid.     

Rivet Tolerance Loop

Rivet Tolerance Loop

  • A = Radius of Hole in Plate A, 1.675 mm
  • B = Radius of Rivet, 1.59 mm
  • C = Gap Between Rivet and Hole Plate A 0.085mm
  • D = Gap Between Hole in Plate B and Rivet, 0.085mm
  • E = Radius of Rivet, 1.59 mm
  • F = Radius of Hole in Plate B, 1.675 mm

The tolerance equation is       

A - B – C + D + E – F = 0       

1.675 - 1.59 - 0.085 + 0.085 + 1.59 - 1.675 = 0       

0 = 0       

The sum equals 0, which is good because we we started at the surface of the hole in Plate A and ended on the coincident surface of the hole in Plate B.       

Let’s apply the variation.  The tolerance of the diameter of the rivet is 0.08 mm. The radial tolerance is half of the diametrical tolerance, or 0.04 mm.  I will model this as a triangular distribution because the supplier has not provided any quality limits with respect to the tolerance.  The standard deviation is       

Rivet SD = 2*T/5 = 2(0.04)/5 = 0.016 mm       

The radial clearance is 0.085mm.  When the rivet is installed, it could end up any where in the hole.  The gap could be 0.0 mm, or as high as 0.085 mm.  The gap cannot be higher than 0.085mm because we are using the radial, or symmetry method, and nothing “exists” below the center of the hole or rivet.  The standard deviation is       

Clearance SD = (0.85 – 0.0)/3.5 = (0.085)/3.5 = 0.024 mm       

According to the rivet specifications, the minimum recommended hole diameter is 3.3 mm, and the maximum hole diameter is 3.4 mm.  From this, we can conclude that the tolerance for the diameter of the hole is +/- 0.05 mm.  This is a tight tolerance for a punched hole.  The generally accepted recommended tolerance by sheet metal fabricators is +/- 0.08 mm, so that is what I will use.  The radial tolerance will be half of 0.08mm, or 0.04 mm.  I will also assume a Cp = 1.00 for the hole diameter.  The standard deviation for the hole diameter is       

Hole SD = 2T/6 = 2(0.04)/6 = 0.0133 mm       

Let’s sum the standard deviations as we did in previous posts.  Note that calculation if for only the plus or minus direction.       

 σTotal = (σHoleA2 + σRivet2 + σGap2 + σGap2 +  σRivet2 + σHoleB2)1/2       

σTotal = ((0.0133)2 + (0.016)2 + (0.024)2 + (0.024)2 +  (0.016)2 + (0.0133)2)1/2       

σTotal = 0.045 mm       

σ+/- Total = 0.09mm       

This is a sizable amount of variation and is on par with the standard deviation of a punched feature to a bend, 0.08mm.  This analysis assumes that rivet is inserted perfectly perpendicular to the sheet metal surface.  Rivets can be installed crooked, but I think this is still a pretty good estimate for the variation we should expect to see.       

Next let’s look at a typical tab and slot.      

Tab and Slot Assembly

Tab and Slot Assembly

Determine the appropriate tab and slot clearance.  In order to determine the variance this feature, we need to determine the appropriate (or smallest) clearance for the slot about the tab.  A tab and slot can be used as a positioning mechanism in two different directions.  Perpendicular to the face of the tab, and parallel to the face of the tab.  Let’s look at perpendicular to the tab first.  The sheet metal is 1 mm thich, or about 19 gauge.  The tolerance range on sheet metal thickness is about 10%.  For 1 mm thick sheet metal, the tolerance is 0.4 mm.  It’s difficult to imagine that is that much, but based on my findings from various sources, it is indeed 10%.  If we assume the quality of this tolerance limit has a Cp = 1.00, the standard deviation can be calculated as follows.       

Tab Thickness SD =2 T/6 = 2(0.1)/6 = 0.033 mm       

A stamped feature will usually have a tolerance of 0.08 mm.  If we assume a Cp = 1.00, the standard deivation is       

Slot SD = 2T/6 = 2(0.08)/6 = 0.027 mm       

We should apply the rigor for the tab and slot as we would any other tolerance analysis.        

  • A = Width of Slot, ???
  • B = Gap Between Slot and Tab, ???
  • C = Tab Thickness, 1.0 mm
  • D = Gap Between Tab and Slot, ???

The tolerance equation is       

A – B – C – D = 0       

Solve for the gap.  If we assume the tab is centered in the slot, then B and E are equal.  The tolerance equation reduces to       

Gap = (A – B)/2       

This is the gap between the tab and slot on each side of the tab.  Notice that this tolerance equation is also the equation used for the Gap’s standard deviation.  If I had not through the rigorous development of the tolerance equation, then I could have estimated the gap standard deviation to be the standard deviation of the slot plus the standard deviation of the tab width, rather than this sum divided by two!  I could have made the clearance between the tab and the slot too much.       

The specification limits on the Gap will be 0.0 mm.  The standard deviation for the gap will be       

σGap = ((σSlot2 + σTab2)1/2)/2       

σGap = (((0.027)2 + (0.033)2)1/2)/2       

 σGap = 0.021 mm       

If we want a 4 sigma fit between the tab and the slot, then the clearance on each side of the tab should be       

 Clearance = 4(0.021) = 0.084       

In most designs I have seen for tab and slots, 0.2mm on each side is used, but we don’t need that much clearance.  0.084 mm is close enough to 0.1mm, and 0.1mm is easier to remember, so that’s what I will use.  Let’s examine how the tab and slot effect position.  The slot width we will use is 1.2mm       

In this analysis, let’s use the “radial” method.       

  • A = Half Slot Width, 0.06 mm
  • B = Half Sheet Metal Thickness, 0.5 mm
  • C = Gap, 0.1mm
Tab and Slot Loop

Tab and Slot Loop

The tolerance equation is       

A – B – C = 0       

0.6 – 0.5 -0.1 = 0       

0 = 0       

Let’s apply the variation       

σSlot2 = 0.027/2 = 0.0135 mm       

σTab2 = 0.033/2 = 0.0165 mm       

The nominal gap is 0.1 mm.  It could be as small a 0.0 mm, or as big as 0.1 mm.  Since this is a clearance, we should use a uniform distribution.       

 σGapt2 = (Max – Min)/3.5 = (0.1)/3.5 = 0.0285       

The total standard deviation is       

σTotal = (σSlot2 + σTab2 + σGap2)1/2       

σTotal = ((0.0135)2 + (0.0165)2 + (0.0285)2)1/2        

σTotal =  0.036 mm       

σ+/- Totall = 0.072 mm       

Notice that the tab and slot provide less positional variation than the rivet assembly.  0.18 mm may not seem like much, but is a 20% improvement.  This improvement could be the difference in a high quality design and an average design.       

Let’s examine one more alignment scheme, and that is a tab and slot where the positional variation is parallel to the tab.       

We will follow the same method as before.       

Tab Width SD =2 T/6 = 2(0.08)/3 = 0.027 mm       

A stamped feature will usually have a tolerance of 0.08 mm.  If we assume a Cp = 1.00, the standard deviation is       

Slot SD =2 T/6 = 2(0.08)/3 = 0.027 mm       

We should apply the rigor for the tab and slot as we would any other tolerance analysis.        

  • A = Width of Slot, ???
  • B = Gap Between Slot and Tab, ???
  • C = Tab Thickness, 12.0 mm
  • D = Gap Between Tab and Slot, ???
Tab Width

Tab Width

The tolerance equation is       

A – B – C – D = 0       

Solve for the gap.  If we assume the tab is centered in the slot, then B and E are equal.  The tolerance equation reduces to       

Gap = (A – B)/2       

The specification limits on the Gap will be 0.0 mm.  The standard deviation for the gap will be       

σGap = ((σSlot2 + σTab2)1/2)/2       

σGap = (((0.027)2 + (0.027)2)1/2)/2       

 σGap = 0.019 mm       

If we want a 4 sigma fit between the tab and the slot, then the clearance on each side of the tab should be       

 Clearace = 4(0.019) = 0.08       

In this analysis, let’s use the “radial” method.       

  • A = Half Slot Width, 6.08 mm
  • B = Half Tab Width, 6.0 mm
  • C = Gap, 0.08mm
Tab Width Loop

Tab Width Loop

The tolerance equation is       

A – B – C = 0       

6.08 - 6.0 -0.08 = 0       

0 = 0       

Let’s apply the variation       

σSlot2 = 0.027/2 = 0.0135 mm       

σTab2 = 0.027/2 = 0.0135 mm       

The nominal gap is 0.08 mm.  It could be as small a 0.0 mm, or as big as 0.08 mm.  Since this is a clearance, we should use a uniform distribution.       

 σGapt2 = (Max – Min)/3.5 = 2(0.08)/3.5 = 0.023       

The total standard deviation is       

σTotal = (σSlot2 + σTab2 + σGap2)1/2       

σTotal = ((0.0135)2 + (0.0135)2 + (0.023)2)1/2       

σTotal =  0.03 mm       

σ+/- Totall = 0.06 mm       

This is a substantial improvement over a rivet, and almost a 16% improvement over the tab and slot when using the tab thickness rather the tab width.  A half shear with a clearance hole would yield very similar results to the tab and slot just analyzed.  The down side of the half shear is that if it used for long sheet metal parts, the parts can bow, and the half shear will not engage before the adjoining sheet metal prior to being assembled.       

Now that we have gone through this analysis, we don’t need to go through it again.  In a large assembly, if a #4 rivet is in the tolerance loop, I can apply a standard deviation of 0.9 mm.  If a tab and slot is in the loop, a stadard deviation of 0.072mm or 0.06mm.  For future analysis, I have just saved myself a ton of work, and improved the quality at the same time.  Who says you need to pick two between Speed, Quality and Cost?       

So what’s this post all about?  It’s about two things really.  One is that we should not assume rivets, tab and slots, or half shears  locate anything precisely.  Two is that a detaileded tolerance analysis means paying attention to the details.

Sloppy Joes

June 9th, 2010

In my previous post, I said that I would go through a detailed example of a tolerance analysis of a blind mating connector.  It has taken a little longer to build that CAD model than I thought.  In the mean time, there are a few things that I would like to discuss regarding tolerance analysis.       

In the prior posts, we have been concerned with features that follow the normal distributions.  In this case we know the design nominal and the standard deviation.  What happens when the supplier provides tolerance limits with no indication of the feature quality?  We could assume a quality for the tolerance limit, a Cp = 1.00 for example, and that would probably work.  I prefer to model this type of feature with a Triangular Distribution as shown below.       

Traingular Distribution

Triangular Distribution

In a triangular distribution, the left and right vertices are the tolerance limits.  A rough estimate for the standard deviation is to divide the tolerance limits by 5.       

σ = 2T/5       

We can see that this is a conservative estimate for the standard deviation, meaning it will yield a larger standard deviation than if we a assumed a Cp = 1.00 for example.  You can choose which ever estimate you feel is appropriate, but I prefer the more conservative approach of using a triangular distribution.       

The next distribution that I commonly use is a Uniform Distribution.       

Uniform Distribution

Uniform Distribution

The uniform distribution has the shape of a rectangle.  It means that the probability of a feature’s value between the tolerance limits is the same.  It is just as likely that the feature’s value will be at the right most tolerance limit, as the left most tolerance limit, as the design nominal.  I use this distribution for clearances.  The most common use would be a guide pin, screw or rivet that fits within a clearance hole.  We do not know where within the clearance hole, the pin will end up, and the chance of the pin being anywhere within the clearance hole is equal.  Of course we could bias the pin to one side or the other, but when the assembly is complex, and there are several pin/clearance holes interfaces, biasing all those pins becomes difficult and can lead to a worst case type of analysis.      

The rough estimate for the standard deviation is to divide the tolerance limits by 3.5       

σ = 2T/3.5       

Compare the estimates for a normal distribution that has tolerance limits with a Cp = 1.00.  The standard deviation is estimated by dividing the tolerance limits by 6 rather than 3.5!  This is why assemblies that have a lot of slop don’t seem to work very consistently.       

As I have mentioned in my previous post, tolerance analysis can be most effective during design development when simple changes can be made on the fly.  Let’s take a look at a simple example 0f a guide pin that is used to align a PCB.   

Guide Pin Assembly

Guide Pin Assembly

It seems to be common practice to model clearance holes with a nominal size that is 0.25 mm radially larger than the guide pin.  

Original Clearance Hole

Original Clearance Hole

Let’s calculate the standard deviation of the nominal clearance in this simple assembly.  Since it is a clearance hole about the guide pin, we can estimate the standard deviation using a uniform distribution.     

σ = 2(0.25)/3.5 = 0.143 mm      

The guide pin I have chosen is a Pencom self clinching guide pin.  According to the specification, the diameter is 5 mm + 0.00 mm, – 0.08 mm.  If you have paid any attention to my previous post, you will know that I don’t prefer this type of tolerance scheme.  I have adjusted the guide pin diameter in the model to 2.96 mm +/- 0.04 mm.  The specification makes no mention of a quality standard.  I do not know the Cp relative to the tolerances.  This being the case, I will estimate the standard deviation of the guide pin using a triangular distribution.      

σ = 2(0.04)/5 = 0.016 mm      

The tolerance on a drilled hole is +/- 0.05 mm.  I have talked to a number of PCB suppliers, and it seems that most suppliers admit to holding this tolerance to Cp = 1.00.  We can now estimate the standard deviation of the hole.     

σ = (0.05)/(3*1.00) = 0.017     

The total standard deviation for the guide pin and hole assembly can be estimated.     

σTotal = ((0.016)2 + (0.017)2)1/2 = 0.0233    

If we want a 4 sigma quality fit between the guide pin and the clearance hole, then the clearance should be    

Nominal Clearance = 4 * 0.0233 = 0.0933    

This is close enough to 0.1 mm, and 0.1 mm is a lot easier to remember.  Now we can change the design from 0.25 mm radial clearance to 0.1 mm radial clearance.  

Improved Clearance Hole

Improved Clearance Hole

Let’s estimate the standard deviation of the nominal clearance for this improved guide pin assembly.  

σ = 2(0.1)/3.5 = 0.057 mm    

Compare this standard deviation to the 0.143 mm estimated standard deviation for the original design.  The new design could be a substantial reduction in assembly variation.  In my experience, the largest contributors to assembly variation are clearances and rivets.  We now know how to manage some of the clearances.  In the next blog, I will go into a detailed analysis of rivets and why rivets make for poor positioning mechanisms.    

A NOTE:  Although we were able to reduce the radial clearance for the alignment hole, we probably cannot make the fit as tight for all of the mounting holes because there are additional sources of variation that need to be considered before determining the clearance for those holes.

The Chicken or The Egg? Pass the Potatoes, Please

May 25th, 2010

Tolerance analysis is not a glamorous engineering endeavor.  In fact, it borders on drudgery.  Right up there with washing dishes after a major holiday meal.  I’m the cook in the family, and I have learned that cleaning up is a lot easier when I wash as I cook.  Clean up is more difficult when I wait until after everyone is done eating.  Especially when my shirt is busting its buttons and my pants feel a lot tighter than they did a couple of hours ago. The job is much nastier if I wait until the next day.  Congealed gravy.  Dried up pie filling.  Wine glasses with lipstick.  Yuck!! 

And so it is with tolerance analysis.  If we can perform even simple analysis as we are designing, simple changes can be made on the fly.  If we wait for pre-production, then changes will probably be more difficult and costly.  Parts may need to be scrapped, and tools may need to be modified.  If we wait until production, and we have the pleasure of the customer informing us of design errors, then we are really in trouble.  Remember the old saying “There’s never enough time to do it right, but always enough time to do it over”.  Baloney!!! 

Dish washing and tolerance analysis.  Not too many people like to do either, but they have to be done. 

What is the goal of tolerance analysis?  Tolerance analysis is intended to theoretically estimate the functional quality of an assembly while taking into account the natural part and assembly manufacturing process variation.   Just add up the sources of variation and compare that total to the specification limits.  It is really that simple. 

Tolerances and manufacturing variation (standard deviation) are related, but are quite different.  The manufacturing variation is inherent to the process.  The variation does not change regardless of the tolerance that we apply to the feature.  Tolerances are calculated from the variation.  The variation is not calculated from the tolerance. 

Suppose we have two holes stamped into a piece of sheet metal.  The standard deviation for the center to center distance is about 0.03 mm.  If we would like the center to center distance to have a Cp = 1.33, what should the tolerance be? 

Cp = 1.33 = T/(3σ) 

T = 1.33 * (3σ) 

T = 1.33 * 3 * 0.03 

T = 0.12 mm 

Now we have a tolerance that will allow for the center to center distance to have the quality level that we desire. 

Suppose we have determined that, in order to meet the assembly’s functional quality goal, the tolerance can be only 0.1 mm, and we still need a feature quality of Cp = 1.33.  A common practice would be to divide the tolerance by 4 to get a process standard deviation of 0.025 mm.  On the surface, it seems we have a good design, but we have only kidded ourselves because we are asking the supplier to create parts with a variation that is too small.  If we actively change the process, let say going from an NC punch to a progressive die, then we may be able to justify the change in tolerance. 

I must emphasize this point.  Tolerances are derived from process variation, and not the other way around. 

I would like to impart one additional thought on tolerances.  Tolerances should be symmetric about the design nominal.  I often see tolerances that are greater on one side of the design nominal than the other.  This is acceptable if there is a priori knowledge that the manufacturing process is skewed or asymmetric.  Most processes are symmetric, and hence tolerances should be symmetric. 

One tolerance scheme that really chaps my hide is +0, -T.  Let’s think about this a little.  If I am a supplier, then I will shift my manufacturing process so that the process is centered at the Design Nominal – T/2 (µ – T/2).  The “tolerance window” I will shoot for is +/- 1/2T.  If this is the case, then why not have a tolerance that is +/- T/2?  As I have suggested before, a tolerance without a Cp is useless.  In the case of +0, -T, the upper specification limit is the same as the design nominal.  If the manufacturer centers the manufacturing process about the design nominal, then we can expect 50% of the parts to be out of tolerance.  That may be fine, but who wants to live with a 50% defect rate?  I will show in a future post that an “inspect and reject” process will result in a lower assembly quality. 

How do we determine the acceptable functional quality level of an assembly?  It depends on the consequence of failure.  Will a defect cause injury or death?  Will the defect cause valuable data to be lost?  Will the defect cause customer dissatisfaction?  The greater the consequence of failure, the higher the functional quality of the design should be.  Since most of my experience is in the computer industry, I have found that a functional quality goal of Z = 4 (4 sigma, Cp = 1.33) works pretty well. 

I will now outline the method that I use for tolerance analysis.  I have performed hundreds of tolerance analyses, and this method works well for me.  My method is quite rigorous though.  One of the unexpected benefits of a rigorous analysis process is that provides a structured method in which to review designs in detail.  In my experience in working on other engineer’s designs, I have found many design errors that would have been missed, or not discovered until late in the development process. 

Tolerance Analysis Process 

  1. Determine the Functional Quality Goal of the Assembly
  2. Develop the Tolerance Equation
  3. Apply Variation to the Components
  4. Estimate the Functional Quality of the Assembly
  5. Compare the Estimated Functional Quality of the Assembly to the Quality Goal
  6. Make adjustments if necessary

Below is a simple assembly of some blocks inside a bracket.  I will admit that this example is not very practical, but it is useful to illustrate the tolerance analysis method. 

Block and Bracket Assembly

Step 1 – Determine the Functional Quality Goal of the Assembly 

In this example, I am concerned that the blocks will always fit inside the bracket.  I am not concerned with the total length of the boxes being too short.  I want to determine the chance that the Gap will be less than or equal to zero.  I will set the quality goal of the assembly to Z = 4 (32 DPM) because I want all of the blocks to fit all of the time. 

Step 2 – Develop the Tolerance Equation 

Most people use the term Tolerance Loop.  I prefer Tolerance Equation.  This step applies to all forms of tolerance analysis, worst case, RSS, and Statistical Tolerance Analysis.  The shocking discovery that I have made in teaching Tolerance Analysis is that the vast majority of engineers and designers do not know how to do it.  All of them claim to know, but when asked to solve some simple class room exercises, many of the students struggle mightily. 

The tolerance equation is valid for one dimension (1D) only.  The tolerance equation, and the subsequent analysis, assumes that the components of interest are flat, parallel, and rigid.  That’s a lot of assumptions, and may not seem realistic, but if the design cannot meet the quality goal for an idealized case, then there is even less chance of meeting the quality goal when all of the other variable are brought into play.  As David Trindade (Applied Reliability, Chapman & Hall/CRC, 1998) says “All models are wrong.  Some are useful.” 

Development of the tolerance equation requires three steps. 

1.  Determine a start point – The start point should be a feature that is associated with the area that is being analyzed.  In this case, I am analyzing the gap between Block 3 and the Bracket.  I have chosen the start point to be the left surface of Block 3. 

Start Point

2.  Determine a sign convention – When we sum the lengths of components, it is critical to keep track of whether we are coming or going.  In this example, the positive X direction points to the right.  All measurements that move to the right are positive, and all those measurements that move to the left are negative. 

Sign Convention

3.  Select the components – This is an area where the tolerance analysis seems to fall apart for some engineers, particularly for those who do RSS.  It is common in an RSS analysis to pick the “big hitters”, and base the analysis on those components only.  It’s the old garbage in, garbage out theory. 

The tolerance equation for this example is 

Block 3 + Block 2 + Block 1 – Bracket + Gap = 0 

Notice the Gap is a component in the tolerance equation.  I must end up where I started.  I will solve the tolerance equation for the Gap 

Gap = Bracket – Block 3 – Block 2 – Block 1 

Replacing the component names with the nominal design values 

Gap = 610 – 200 -200 -200 = 10 

It is very important that the sum of the components’ nominal design values precisely match the gap that I measure in the CAD model.  Suppose I measured the Gap to be 10.028.  The tolerance equation should sum to exactly 10.028.  If it does not, then need to go back and determine why.  I may have missed a component.  The relationship between components may be different than I had intended.  As I stated earlier, a side benefit of tolerance analysis is that it provides a structured method in which to review the details of the design.  If the tolerance equation and the measure Gap do not match, then some investigation is required. 

Step 3 – Apply Variation to the Components 

It may be difficult to get the proper information from the supplier.  The supplier will of say “I can hold that feature to +/- 0.25mm”.  To which I would asked “What is the Cpk of that tolerance?”  If the supplier asks “What’s a Cpk?”  Get a different supplier.  If the supplier responds with “What Cpk would you like?”  Get a different supplier (they obviously have no idea of their own process variation).  If the supplier response with “I can hold the tolerance to a Cpk of 1.33”, then we have a winner.  It would be even better if the supplier would say “The standard deviation for that feature is 0.062 mm”, but that is infrequent. 

In this example, assume that I have been able to wrangle the standard deviation out of the supplier.  The standard deviation of the Blocks is 1.25 mm. The standard deviation of the bracket is 1.5 mm. 

Step 4 – Estimate the Functional Quality of the Assembly 

At long last we are almost there. All that remains to be done is to calculate the assembly standard deviation and compare it to the Gap. 

σAssy = (σBlock 12 + σBlock 22 + σBlock 32 + σBracket2 )1/2 

σAssy = ((1.25)2 + (1.25)2 + (1.25)2 + (1.5)2)1/2 

σAssy = 2.634 

ZAssy = Gap/ σAssy 

ZAssy = 10/2.634 

ZAssy = 3.8 or 72 DPM 

Step 5 – Compare the Estimated Functional Quality of the assembly to the Quality Goal 

Step 6 – Make adjustments if necessary  

(I have combined Step 5 and Step 6 in thei example)

The Functional Quality Estimate is lower than the quality goal, but not by much.  I have some decisions to make.  Is the estimated quality close enough?  Can I increase the gap?  Is there a possibility of changing or improving the manufacturing process to reduce the process variation?  Can I change the design of the feature(s) to a feature that has less variability?  All good questions and all things that I would consider before deeming the design complete. 

If we knew the largest contributors to the assembly variation, then we can use that information as a possible starting point.  From the previous post we know the total Variance is 

VTotal = V1 + V2 + … + Vn  

The relative contribution of each component’s variation to the assembly variation is 

% Contribution = Vn/VTotal X 100 

% Contribution = σN2/ σAssy2 X 100 

% Contribution of Blocks = σBlocks2/ σAssy2 X 100 

% Contribution of Block = (1.25)2/(2.634)2 X 100 

% Contribution of Blocks = 22.5% each 

% Contribution of Bracket = (1.5)2/(2.634)2 

% Contribution of Bracket = 32.4% 

The biggest individual contributor to the assembly variance is the bracket.   The biggest bang for the buck would be to investigate the variation of the block since their collective contribution to variance is 67.5%. 

Blocks and brackets are not very interesting.  In my next post I will illustrate the use of tolerance analysis on blind mate connectors.  I will also have a simple spreadsheet that you can use to make the calculation part of the analysis easier.

Tolerance Wasteland

May 17th, 2010

In this post I will discuss the quality of an assembly.  Specifically, the nominal value of the final assembly is the sum of the nominal values of the components that comprise that assembly.  The generic equation follows this formal

XTotal = X1 + X2 + … +Xn

The theoretical mean of the assembly, is simply the sum of the theoretical means of the components that comprise that assembly.  In most cases, the component’s design nominal and the component’s theoretical mean are the same.

µTotal = µ1 + µ2 + … + µn

The total theoretical variance of an assembly is the sum of the variance of each component in the assembly

VTotal = V1 + V2 + … + Vn

Most engineers are familiar with the standard deviation, not the variance.  The relationship between variance and standard deviation is

V = σ2

The total assembly variance using standard deviation is shown below

σ2Total = σ21 + σ22 + … + σ2n

The total assembly standard deviation can be derived by taking the square root of both sides of the above equation.

σTotal = (σ21 + σ22 + … + σ2n )1/2

If you recall from my previous post, the mean and the standard deviation by themselves do not have much value.  We will need to compare these quantities to the specification limit to obtain Z, which in turn can be used to estimate the defect rate of the assembly.

ZAssy = (USL – µTotal)/ σTotal

ZAssy = (µTotal – LSL)/ σTotal

That was pretty painless, wasn’t it?

Let’s talk about specification limits a little bit.  In almost every assembly, specification limits are independent from the assembly.  Suppose I have two blind mating connectors, and the allowable radial misalignment is 1.5mm.  The upper, or lower, specification limit is 1.5mm.  This specification limit is independent of the assembly, or the connectors’ implementation.  The specification limits apply regardless of whether the assembly is used on the moon, or at the bottom of the ocean.  I have seen some engineers modify the specification limits to “improve” the theoretical quality of their design.   This practice, by and large, is really only cheating, and is analogous to using a “foot wedge” in golf.

This is probably a good place to broach the subject of tolerance analysis.  There are three commonly used techniques for tolerance analysis.

  • Worst Case Tolerance Analysis
  • Root Sum of Squares Tolerance Analysis
  • Statistical Tolerance Analysis

Worst Case Tolerance Analysis

This technique employs a simple addition of the nominal value and the tolerance limit of the assembly components.  It follows the form

(XTotal + TTotal) = (X1 + T1) + (X2 + T2) + … + (Xn + Tn)

There are not very many engineers that use this method anymore because, with an assembly of even modest complexity, it is often difficult to meet the specification limits.  This technique has some fundamental flaws as well.  If the components  follow a normal distribution, and the tolerances are chosen properly, the chances of a component being near the tolerance limit is small.  The chances of all of the components being near the tolerance limit simultaneously (which is the underlying assumption in worst case analysis), is even smaller.  The design tends to be optimized toward the worst case limits, which will probably never happen, rather than near the nominal, which is much more likely to happen.

A common practice is to tighten the tolerance.  This is probably the worst thing you can do and without much benefit.  The supplier will now be asked to perform tasks that are beyond their normal process.  The quality of the part will go down, the price will go up, and the lead time will be less predictable.

Root Sum of Squares Analysis (RSS)

This is common techniques that used mostly because it gives a “better” answer than worst case analysis.  RSS is often called RMS or Root Mean Squared.  RMS is the incorrect term because we are adding the tolerances, not averaging the tolerances.  RSS analysis has the following form.

TTotal = (T12 + T22 + … + Tn2)1/2

Where does this RSS calculation come from?  It comes from the relationship for total assembly standard deviation.

σTotal = (σ21 + σ22 + … + σ2n )1/2

From my previous post, we know

Cp = (USL -µ)/3σ

Let the tolerance = (USL -µ) = T

Cp = T/3 σ

σ = T/3Cp

Substituting this relationship into the equation for total assembly standard deviation

TTotal/3CpAssy = ((T1/3Cp1)2 + (T2/3Cp2)2 + … + (Tn/3Cpn)2)1/2

If we assume the quality of each component in the assembly, relative to the tolerance, is equal, then we can multiply each side of the equation by 3Cp.  We end up with the below relationship.

TTotal = (T12 + T22 + … + Tn2)1/2

Viola!  The RSS equation!  Pretty cool!  There are a couple of things I should bring to your attention.  RSS analysis is only valid when the Cp’s of all of the components are equal.  It is not valid if the Cp of one component is 1.33, and the Cp of another component is 0.5 (which would happen when the tolerance is tightened without regard to the supplier’s process variation).

The resulting assembly tolerance, TTotal has the same Cp as the individual components.  This means that if the assembly tolerance is less than the specification limit, you can conclude that the assembly quality is at least the Cp of the components that make up the assembly.  If all of the components have a Cp = 1.33, then the assembly will have a Cp of at least 1.33, or Z = 4, or 32 defects per million.

Statistical Tolerance Analysis

This method uses the relationships that were introduced at the beginning of this post.

XTotal = X1 + X2 + … +Xn

σTotal = (σ21 + σ22 + … + σ2n )1/2

ZAssy = (USL – µTotal)/ σTotal

ZAssy = (µTotal – LSL)/ σTotal

This is the method that I prefer.  It requires no more effort than RSS analysis, but it has some important benefits.  I can use components that have distributions that are not normal.  I am not tied to components that have equal Cp’s.  I can precisely predict the defect rate.

When you review the equations for assemblies, you may notice that something is missing, tolerances.  I do not need a single tolerance to perform a valid tolerance analysis.  All I need is the standard deviation of the supplier’s process.

Why do we put tolerances on drawings?

  • We put tolerances on drawings to account for the supplier’s process variation.
  • We put tolerances on drawings to indicate the quality of the components we would like to receive from the supplier.

If you buy into these two statements, then we will conclude that every tolerance must have a Cpk .  (I use Cpk rather Cp since we will be sampling the parts).  I cannot be any clearer on this point.  A tolerance without a Cpk is useless.  If I am a supplier, and I have a tolerance without quality statement or Cpk, then how am I supposed to know the quality of the parts?  Do you want 50% of the parts within tolerance?  Do you want all of the parts within tolerance? (This is statistically nearly impossible unless the tolerance relative to the process variation is huge.)

If you agree that a tolerance must have a Cpk, then you will agree with this statement.  “Having all parts within tolerance, does not mean the parts are within specification”.  I will go into this statement in much more detail in a future post, but it means we are moving from a mere tolerance statement to quality statement  After all, we are really interested in part and assembly quality, not tolerances.

This post was concerned with 1st order polynomials, i.e., simple addition and subtraction.  In a future post I will discuss techniques to estimate the variation of assemblies that are described by higher order polynomials and mathematical functions.  Heat transfer and assemblies with spring elements would be good examples of assemblies that are described by higher order mathematical relationships.

In my next post,  I will discuss tolerance analysis in detail.

Smokey the Bear or A Hard Day’s Night

May 10th, 2010
What is the quality of your design?  I have taught engineering statistics, in one form or another, to a few hundred people around the world, and I open the instruction with the same question – What is the quality of your design?  The usual response is a blank stare.  A deer in the head lights look.  Sometimes people respond with “My designs are always high quality.” ”I design to worst case.”  I always design with plenty of margin.”   But can the quality of your designs be quantified?  How close is it to failing? How far is it from working?

Time, more than quality, is all too often the driver of many designs.  Engineers are often grateful that the design works for nominal conditions, and maybe a couple of corner cases.  Many managers have the thought that the design needs to “get out”, and we can fix it later.  These are the type of managers that revel in fighting fires.  These are the managers that love Dingo (Tiger) teams.  These are the managers that reward engineers for working the weekend to solve a problem that should not have been there to begin with.  It is a shame that some good engineers, who consistantly produce designs that require no fire fighting, are often not rewarded for their efforts.  

Let’s think about this a little.  Would you rather work eight days a week fighting fires, or would you rather relax on the weekend?  Huddled over a lab work bench or watching the game on the couch with a couple of cold ones?  Fighting CAD model regeneration errors, or going to the zoo with your kids?  If you like relaxing more than working, the next several blog posts are for you.  As Smokey the Bears says “Only You Can Prevent (Forest) Fires”.  

In this post, I will explain some basics of statistics.  Statistics is THE most effective tool for producing quality designs.  It is also the most effective weapon against mediocre quality designs.  Statistics also works pretty darn well in fighting fires (that is, of course, if you want to put the fire out for good).  So what is statistics?  Statistics is nothing more than gathering some information on a subset of a population, and using that information to predict the behavior of the population.  First things first though.  The population of interest must be defined with a fair degree of precision.  

Let’s examine the plastic module guide shown below.  There are several features on this module guide that could be of interest.  The width of the guide slot.  The depth of the guide slot.  The diameter of the bosses.  The center to center distance of the bosses.  The location of the boss realtive to the slot.  You can probably think of many more features of interest.  

Module Guide

Module Guide

Rear View

Rear View

What is the population of interest?  Below is a list that you may want to consider.  

  • All module guides that have been or ever will be manufactured
    • All module guide that have been manufactured
      • All module guides produced in the first six months of production
        • All module guides produced on press A
        • All module guides produced on press B
      • All module guides that have been produce before a process change
      • All module guides that have been product after a process change
      • All module guides produce during cold months
    • All module guides that have yet to be manufactured
      • All module guides with a new design change
      • All module guides with a new process change
      • All module guides from a new supplier

As you can see, defining the population on interest will influence the type of information you gather, and the decisions that you will make.  

Let’s begin with a population that has normally (Gaussian, bell shaped) distribution as shown below.  Ironically, almost no population follows a normal distribution precisely, but in the great majority of cases, it’s close enough.  A normal distribution can be defined by two population parameters – the mean (average) and standard deviation.  The mean is the central tendency of the population.  The standard deviation is the dispersion or spread of the population.  

Normal Distribution

Normal Distribution

The population mean is denoted by the Greek symbol µ.

µ =( ∑(xi))/N   

Where N is the size of entire population and xi  is data from each element of the entire population  

The standard deviation is denoted by the Greek symbol σ.  Commonly called sigma,  There are two ways to calculate the standard deviation – Biased and Unbiased.  

Biased  

σ = (∑((µ – xi)2)/N)1/2  

Unbiased  

σ = (∑((µ – xi)2)/(N – 1))1/2  

The unbiased calculation is almost always used.  Why?  Within the calculation for standard deviation is the mean.  In order to calculate the mean, we need to use the population size, N.  Since we have used N once in calculating the mean, we are not allowed to use all of N again when calculating the standard deviation.  That is why the unbiased calculation for the standard deviation is N – 1.  (That’s my story, and I’m sticking with it.)

At this point, you should be made aware of an important distinction between a population parameter and a population statistic.  A population parameter, mean and standard deviation, are absolute descriptors of the population.  These can only be calculated when ALL elements of the population are measured, and are measured with absolute precision.  A statistic, sample mean or sample standard deviation, estimates the population parameters by measuring a subset of the population.   

There are certain things that us humans are just not meant to know, and one of them is population parameters.  We cannot measure all of the elements in a population with absolute precision.  There will always be some uncertainty (I will go into detail of uncertainty in a future post).  As Bill Diamond (Practical Experiment Designs, John Wiley and Sons, 2001) used to say.  “Only God knows the true mean and standard deviation.”  

The population mean and standard deviation are good things to know, but by themselves, they are not very useful.  We can use these parameters to help us figure out the percentage of the population that is within or out of spec.  This quantity is call Z.  

Z = (USL – µ)/σ     where USL is Upper Specification Limit  

Z = (µ – LSL)/σ     where LSL is Lower Specification Limit  

This is where your eyes may start to glaze over.  Let’s consider the standard deviation as a unit of measure, not much different than a foot, a meter, a gallon or a liter.  The Z value is nothing more than counting how many standard deviations it takes to go from the mean to the specification limit.  If it takes three standard deviations, then Z = 3.  You have a 3 sigma design.  If it take 3 and 1/3 standard deviations to get from the mean to the specification limit, then Z = 3.3.  I hope this helps to demystify the definition of Z.

The Z value does not have much value by itself.  We need to convert the Z value to rate of defects.  Before the wonders of Excel, we would use Z tables.  But now it’s easy.  Just type =(1 – NORMSDIST(Z)) into the spreadsheet, and it will return defect rate.  Mutiply it by 100 (or use the % conversion) to obtain the percentage of parts beyond the specification limit.  Multiply it by 1,000,000 to obtain Defects per Million or DPM.  Below is a table for some common values of Z.

Z Defect Rate Percentage DPM
0.0 0.500000 50.00% 500000
0.5 0.308538 30.85% 308538
1.0 0.158655 15.87% 158655
1.5 0.066807 6.68% 66807
2.0 0.022750 2.28% 22750
2.5 0.006210 0.62% 6210
3.0 0.001350 0.13% 1350
3.5 0.000233 0.02% 233
4.0 0.000032 0.00% 32

Suppliers and supply engineers often like to use a quantity called Process Capability or Cp.  (I will discuss Process Capability Index, or Cpk, in a future post).  Cp and Z are almost the same.  

Cp = (USL – µ)/3σ     where USL is Upper Specification Limit  

Cp = (µ - LSL)/3σ     where LSL is Lower Specification Limit  

Notice that the only difference between Cp and Z is that in the denominator of Cp, the standard deviation is multiplied by 3.  I don’t know why Cp is done this way, or why suppliers don’t use Z.  Maybe that’s just another thing that us humans are not meant to know.  

In my next post, I will discuss how to combine multiple parts with different means and standard deviations to estimate the quality of an assembly.

“There’s water at the bottom of the ocean.”

May 3rd, 2010

In their song Once In A Lifetime, the Talking Heads state “There is water at the bottom of the ocean”.  The good thing is you don’t have to go far to find it, water at the bottom of the ocean that is.  And so it is with information, you don’t have to go far to find it.  The specific topic of this post is acoustic specifications, but in general, it’s about getting the correct information you need to make a sound engineering decision.  Few things are more professionally frustrating to me than having a technical discussion with someone whose “facts” are based on opinion, hearsay and innuendo.  As you may have guessed, I have a couple of stories on the subject.

I was working on a telecommunications server several years back.  Telecommunications companies require a standardzed label that allows them to manage their inventory.  In some strange way, this inventory information is used as part of the calculation for your land line and mobile phone charges.  With all the old legacy equipment most telecommunication companies have holding down the building, it’s little wonder why cell phone bills remain higher than they should.  The label is called a CLEI (Common Language Equipment Identification) label.  In the development of the server, we didn’t pay much attention to this label until we were just getting ready to release the product to production.  The ideal implementation of the label required it to be pretty big, about 1 x 2 inches. The ideal location was somewhere on the front of the system where, as luck would have it, we had no room.

Now the fire storm starts.  Marketing says we need the label to do this or that because that’s what he or she said they overheard at lunch from a waitress who was yelling at a dog that was eating out of the garbage can that  had a red collar with the name REX on it, and that could only mean that CLEI labels had to be exacly one size and in exactly only one location, and if we could not implement it exactly so, then we may as well cancel the program – and who cares about the 2 years and $2M in development it took to get here.  After about two or three days of this bologna, I decided to get the facts.  I ordered specification for CLEI labels from Telcordia, GR-383-CORE.  In a few days, I received it in the mail.  I sifted through the specification, and it turns out that there was a considerable amount of flexibility.  The specification stated the ideal size of the label, the ideal information to be on the label, the ideal format for the human readable characters and bar code, and the ideal location.  It also said (very loosely paraphrasing of course) “You know what?  If your computer can’t accommodate a label with all of the ideal stuff, then you can put less stuff on it, but you must put some stuff on it.  And not only that, here’s a few options on the label’s location”.  In the end, we had a smaller label, in a different location, that met the standard.  The product shipped, and we all lived happily ever after. 

One more.  I was working on a server that was a thermal challenge from the get go.  The fans used to cool the system were loud – extremely loud when operating at full speed.  Management asked me if the system’s acoustic noise was below the OSHA (Occupational Safety and Health Administration) limit.  I didn’t know the answer, but one of the respected engineers that was “in the know” stated that the OSHA limit was 85 dB(A)  (sound pressure? sound power? Just 85 dB(A)).  We were certainly above that limit, no matter how you sliced it.   The engineering and management team were really starting to shake in their boots.  What if we can’t ship?  I took it upon myself to dig into the matter.  After navigating through a labyrinth of specifications from a variety of government agencies, both US and international, it turns out that, unlike FCC, UL, TUV, CSA or other international regulatory bodies,  OSHA had no charter to prevent any manufacturer from shipping anything, regardless of the noise level.  OSHA is concerned with the worker’s environment after the equipment is installed.  Although OSHA cannot prevent shipment, there may be few customers if the equipment is too loud.   I will go into the gist of the OSHA specifications later.

These two examples show that a lot of aggrevation and stress can be avoided once we move from ”I think I know” to”I know I know”.

I am presenting acoustic standards from three main standards bodies

As I stated at the end of my previous post, I will show you how to get the standards information you need for free.  There are a few places on the net you can go to that will allow you to down load pirated versions of some of the standards for free.  In my mind, doing such a thing is dishonest and the same as stealing.  Just like buying a pirated version of Windows 7, or ProE, or DVDs, or games.  It’s flat out wrong.

The good folks at ECMA give their standards away for free.  The information in these standards is very similar to, an in some cases, more comprehensive than. the standards offered by ISO.  ISO standards range in price from $40 – $200 US.  IPC also charges for their standards, but they have only one standard on the subject of acoustics ($52 US)

For generalized sound pressure and sound power measurements, the following standards can be used.

  • Standard ECMA – 74  Measurement of Airborne Noise Emitted by Information Technology and Telecommunications Equipment
    • ISO 7779  Measurement of Airborne Noise Emitted by Information Technology and Telecommunications Equipment
    • ISO 3740 – Acoustics – Determination of Sound Power Levels of Noise Sources – Guidelines for the Use of Basic Standards
    • ISO 3741 – Acoustics – Determination of Sound Power Levels of Noise Sources Using Sound Pressure – Precision Methods for Reverberations Rooms
    • ISO 3744 -  Acoustics – Determination of Sound Power Levels of Noise Sources Using Sound Pressure – Engineering Method in an Essentially Free Field Over a Reflecting Plane
    • ISO 3345 -  Acoustics – Determination of Sound Power Levels of Noise Sources Using Sound Pressure – Precision Methods for Anechoic and Hemi-Anechoic Chambers (This is most commonly used)
    • ISO 11201 – Acoustics – Noise Emitted by Machinery and Equipment – Measurement of Emissions at a Work Station and at Other Specified Positions – Engineering Method in an Essentially Free Field over a Reflecting Plane

All measurements have some uncertainty, and all systems have some random variation.  Most fans have a 10% speed variation at a given voltage and back pressure.  The following standards show how to account for this uncertainty and variation.  The result is usually a 0.3 Bel adder to the sound power measurements.

  • Standard ECMA – 109  Declared Noise Emission Values of Information Technology and Telecommunications Equipment
    • ISO 4871 – Declaration and verification of Noise Emission Values of Machinery and Equipment
    • ISO 7574 Parts 1, 2 and 4  Acoustics – Statistical Methods for Determining and Verifying Stated Noise Emission Values

There has been some special consideration for fans by standards bodies.

  • ECMA Technical Report TR/99  Constant Sound Power Curves for Small Air-Moving Devices
  • ISO 10302 - Method for the Measurement of Airborne Noise Emitted by Small Air-Moving Devices
  • IPC9591 – Performance Parameters (Mechanical, Environmental and Quality Reliability) for Air Moving Devices

I am not familiar with the IPC standard, but it seems to not only encompass the acoustics of fans, but also standardizes on reliability testing.  It’s only $52, so probably a good investment.  ISO 10302 is a very good standard, and even describes how to build a fixture to measure the sound power with varying back pressure.  Most fan manufacturers make sound power measurements in accordance with ISO 7779.  Only a few make sound power measurements according to ISO 10302.  ISO 10302 provides much richer information, and can be used as an indicator into the quality of the fans venturi and blade design.  Better venturi and blade designs will show less increases in sound power when the back pressure is increased.

The OSHA standard for noise, OSHA 1910, if you pick through it, says a worker cannot be exposed to a sound pressure of 85 dB(A) for more than 8 hours in a 24 hour day (without hearing protection).  If the sound pressure is higher, then the length of exposure must be lower so that the product of 85 dB(A) x 8 hours is not exceeded.  This requirement is why most customers who deploy equipment in large data centers are more and more sensitive to system level noise.

One last item is the subject of weighting.  The human ear does not respond to extremely low frequencies (below 40 Hz) or high frequencies (above 5,000 Hz) as it does to mid range frequencies (300 – 2,000 Hz).  A weighting (or filter) is applied to raw sound power or sound pressure data so that the noise to which the ear is most sensitive is weighted more than noise to which the ear is less sensitive.  This weighting is called A Weighted, and sound power is in Bels(A) and sound pressure is in dB(A).  The human ear does not respond linearly to noise level.  For very high noise levels, airports would be a good example, a different weighting is applied.  These are usually C Weighted or D Weighted.

This concludes my series on acoustics, and now you know pretty much as much as I know on the subject.

My next blog post will begin a series on quality and engineering statistics.  Don’t be scared.  Allies are often found where you least expect them.

Ducks, Freeways and Airplanes

April 27th, 2010

In this post, I discuss the relationship between sound pressure and distance.  If you recall from my earlier post “In Search of the Holy Heater”, the sound pressure is highly dependent on the environment, unlike sound power.  The sound pressure depends on the distance from the noise source.  

I will discuss three cases.  

  • Point Noise Source
  • Linearly Distributed Noise Source
  • Ducted (ducks) Noise Source

Point Source – For an ideal point source, the sound pressure will emanate uniformly in the shape of a sphere as shown in the figure below.  

Point Noise Source

The sound pressure density (not to be confused with sound intensity) will be inversely proportional to the surface area of the sphere, which is  A = 4πR2, where R is the distance from the noise source.  Mathematically  

Sound Pressure ≈ P/R2  

If I stand 1 meter from the noise source, and a second person stands 2 meters from the noise source, the sound pressure heard by the second person will be ¼ of the sound pressure than I hear.  

Linearly Distributed Noise Source – A good example of a linearly distributed noise source is a busy freeway.  The sound pressure emanates more in the shape of cylinder than a sphere as shown in the figure below.  

Linearly Distributed Noise Source

The sound density will be inversely proportional to the surface area of the cylinder.  If we normalize the length, then mathematically, the sound pressure follows this relationship  

Sound Pressure ≈ P/R  

Notice that the sound pressure for a linearly distributed noise source is inversely proportional to distance from the noise source, rather than inversely proportional to the square of the distance from a point noise source.  This explains why I can hear the freeway from my house, even though the freeway is over 5 miles away.  I can hear the freeway, that is, until my next door neighbor’s pool pump fires up, and then I cannot hear even a bird chirp.  

A more practical example of a linear distributed noise source would be a computer system that occupies a full 42 RU rack, where the rack could have a virtual wall of fans.  In this case, the difference in sound pressure between the user position and the bystander position is inversely proportional to the difference in distance between the two and not inversely proportional to the square of the distance.  The implication is that, with larger systems, meeting acoustic specifications is challenging not only because there are probably many more fans, but also because the noise source is distributed rather than a point.  

Ducted Noise Source– If we assume that a duct has constant cross section, the area does not change, and hence the sound pressure density does not change.  Ideally this means that sound pressure entering the duct is the same as sound pressure exiting the duct.  Practically speaking, the sound pressure exiting the duct will be less than that entering the duct because there will be losses.  The sound pressure is really no more than a gaseous vibration source, which will lose energy as it causes the duct to vibrate.  Structural vibration loses energy through hysteretic damping, which will eventually result in small amounts of heat.  The longer the duct, the greater the energy loss, and eventually the sound pressure energy will drop to zero.  

A very common application of a ducted noise source is the head phones on commercial airlines.  (For US domestic flights, I refuse to pay the $3 usage fee based on general principles alone).  These head phones are nothing more than a hollow tube with a quick connect on one end to allow the sound source to enter the tube, and the head set on the other end where the sound exits.  The sound pressure entering and exiting the tube is virtually the same.  

In a previous post “When the of the parts does not equal the whole”, I explained that placing fans in the middle of the system would result in only marginal reductions in sound power.  This is true if the cross section of the fans and the cross section of the duct are nearly equal.  This is also true if the inlet and exhaust sides of the chassis are very open using a perforation pattern with a high percentage of open area.  If the inlet and/or exhaust are covered with solid material, e.g., I/O panels, filler panels, connectors …, then the sound power and sound pressure that exits the system will be less than the sound power and sound pressure emanating from the fans.  This is not very different than rolling up the window on your car.  For cases where fans are deeply embedded in a system, and the inlet and exhaust have a substantial amount of blockage, the calculations for sound power and sound pressure, although still valuable, will likely yield a conservative estimate.  

In my next post, I discuss acoustic specifications, and if you are on a budget, I will show you how to get most of the information you need for free.

The Butler Did It

April 20th, 2010

The great acoustic mystery is if one has two identical noise sources, the sound power is doubled, but is the sound pressure doubled?  “It’s elementary my dear Watson.” 

Before I develop the relationship for sound pressure and multiple noise sources, I would like to impart a couple of interesting anecdotes on the subject.

When I was in my last year of college, a professor told me that one of his graduate students was doing research to determine why the sound pressure is not doubled when the number of noise sources is doubled.  Now if I wasn’t flat broke by the time I graduated, I could have gone to graduate school, and if my master’s thesis was on the subject that was described by this professor, I could have completed my thesis in about an hour.  That’s an hour using my old Brother electric typewriter and bottles of white out.  I could have spent the rest of my time in graduate school getting a virtual degree in oenology (the beer part), studying the differences between pilsners, lagers, and ales (and ports, stouts and bitters (or bi’ers as they say in Scotland)).  The moral of the story – Things are not always as difficult as they seem.

Big corporations are often comprised of a core engineering organization that is responsible for the designs behind the products that produce the bulk of the corporate revenue.  Usually the engineers in “the core” walk around as though whatever is  produced from their bowels (or brains) doesn’t stink (despite of the opinions of others).  Corporations also have smaller, ad hoc organizations that are focused on smaller, yet strategic business opportunities.

I was in one of the smaller organizations, and it just so happened that a desk side workstation architecture that I was working on was noticed by “the core”.  (Deskside worksations at that time were about three times the size of deskside workstations today). Before I knew it, this architecture was competing against the architecture that was being developed by the “the core”.  The drama came to a head with a summit meeting (more like a western shoot out) that, in addition to engineers, was attended by only the finest VPs the corporation had to offer.  In an effort to tip the scales in their favor, “the core” even hired “a gun”.  A thermal guy who knew how to use CFD software before the good folks at Flomerics made CFD available to the masses.

To make a long story short, the architecture developed by “the core” stunk.  My architecture had quite a few fans that were operating in parallel.  There was some concern about the acoustic sound pressure.  I told them that doubling the number of fans did not translate into twice the sound pressure.  “The core” told me I was wrong, and before I could prove to them that I was right with a quick derivation on the white board, one of the “finest VPs” told me that he agreed with the “the core” and that I should consult with “the core” to clarify my technical understanding.  The moral of the story – Management should spend more time smelling organizations that “don’t stink”.

In the previous few blogs, I have talked about sound power.  In this blog I will talk mostly about sound pressure.  The units for sound power are usually Bels, whereas the units of sound pressure are usually decibels.  I don’t know why this is, but it must have made sense to someone at some time.

The mathematical relationships for sound pressure are derived from sound power.  This is why the pressure term is squared.

Sound Pressure = SP

SP = 10Log((P/Pref)2)

where Pref = 20 micro-Pascals or 20µP

SP = 20Log(P/Pref)

Let’s double the sound pressure so that P = 2P.

Total SP = 20log(2P/Pref)

Total SP = 20(Log(P/Pref) + log(2))

Total SP = 20Log(P/Pref) + 6 dB

Twice the sound pressure is an increase of 6 dB

Generically:  Total SP = 20(Log(P/Pref) + log(Change))

Where “Change” is a multiplier and can be positive or negative

What happens when the number of noise sources is doubled?

Total SP = 10Log((P/Pref)2 + (P/Pref)2)

Total SP = 10Log(2(P/Pref)2)

Total SP = 10(Log((P/Pref)2 + Log(2))

Total SP = 10(Log((P/Pref)2 + 10Log(2)

Total SP = 20Log(P/Pref) + 3 dB

Generically: Total SP = 20Log(P/Pref) + 10Log(N) dB

Where N is the number of identical noise sources

Doubling the number of noises increases the sound pressure by 3dB, not 6dB!  This is an amazing, and counter intuitive result.  The implications are particularly important for systems that are used in close proximity to the user, such as desk side workstations where sound pressure is more important than sound power.

It just so happens that the relationships for sound pressure from multiple noise sources is remarkably similar to that for sound power.  We need to convert decibels to Bels first though.  In the relationship below, I make conversion inside the formula.

For multiple identical noise sources.

Total SP = Log(10SP1(dB)/10 + 10SP2(dB)/10

+ … + 10SPN(dB)/10)

The estimate for sound pressure as a function of fan speed is also very similar to those estimates for sound power.  Notice that the coefficient is 50 for sound pressure.  This is because the units for sound pressure are decibels.

Total SP = Sound Pressure (dB)

+ 50Log(RPMLow/RPMHigh) (dB)

Just as with sound power, we can develop a generic relationship for multiple fans running at multiple speeds.

Total SP = Log(10((SP1 + 50(RPM-Low/RPM-High))(dB)/10)

+ 10((SP2 + 50(RPM-Low/RMP-High)h)(dB)/10)

+ … + 10((SPN + 50(RPM-Low/RPM-High))(dB)/10))

I have a spreadsheet that you can down load that will calculate the estimate for sound pressure from multiple noise sources and fans running at multiple speeds.  As with the spreadsheet for sound power, your may copy it, or change it.  The pink and yellow calculating cells are protected, but you can unprotect the document with the password “fanspeed”.

System Sound Pressure Estimator

In my next blog, I will talk about ducks, freeways and airplanes.

(Fan) Speed Kills

April 12th, 2010

In this blog I discuss the relationship between sound power and fan speed.  The relationship for a single fan, operating against zero back pressure is

               Total Lw = Lw + 5Log(RPMlow/RPMHigh)

Where Lw is the sound power when the fan is operating at RPMHigh..  (This relationship also works for fan speeds measured in Hertz or radians/sec.  As long as the units are consistent.)

Before we go too far on this, compare the relationship of sound power and fan speed to the relationship sound power and the number of identical noise sources.

Fan Speed      Total Lw = Lw + 5Log(RPMlow/RPMHigh)

Number of Noise Sources      Total Lw = Lw + Log(N)

Notice reducing the fan speed is FIVE TIMES more effective in lowering the sound power than reducing the number of fans.  I have been involved with some projects where the acoustic noise was an issue.  Turning fans off seems to be the first consideration by some teams, until they are made aware for the fan speed relationship.

We can use the relationship that was developed in the previous blog (Noah’s Ark, Gilligan, and Sound Power).  We can toss in fan speed to obtain a very generic relationship for the sound power emitted by a system.

Total Lw = Log(10Lw1 + 10Lw2  … + 10LwN )

Total Lw = Log(10Lw1+5Log(RPM-Low/RPM-High)

+ 10Lw2+5Log(RPM-Low/RPM-High) 

… + 10LwN+5Log(RPM-Low/RPM-High) )

Notice that relationship now allows us to estimate the system level sound power for any number of different fans operating at any number of different speeds.  Let’s do some simple examples.

  1. The team would like to reduce the sound power by half, 0.3 Bels.  The system has 8 fans operating in parallel.  (Part 1) How many fans must be turned off to achieve 0.3 Bels reduction?  If the speed of the fans is 8,000 RPM, how much must the fan speed be reduced for the sound power of 8 fans to be reduced by 0.3 Bels?

Part 1

Reduction Lw= -0.3 Bels = Lw + Log(N) – (Lw + Log(8))

-0.3 Bels = Log(N) – Log(8)

Log(N) = Log(8) – 0.3 Bels

N = 10(Log(8) – 0.3 Bels)

N = 4  We could have guessed this since we already know reducing N by half equals -0.3 Bels

 

Part 2

Reduction Lw= -0.3 Bels = Lw + Log(8) + 5Log(RPM/8,000) – (Lw + Log(8) + 5Log(8,000/8,000))

-0.3 = 5Log(RPM/8,000)

-0.06 = Log(RPM/8,000)

10(-0.06) = RPM/8,000

RPM = 10(-0.06)(8000)

RPM = 6,968   This speed is 87% of full speed.

Intuitively, which acoustic solution will provide the better cooling solution?  In a future blog, I will combine the acoustic and cooling analysis to demonstrate a comprehensive solution, but for now, I will put my money on the solution to Part 2.

  1. Marketing has told the team that the system sound power must be less than 7.5 Bels when the system is operating at 25 oC.  There are two sets of fans in the system.  There are 8 systems fans, and each fan emits a sound power of 8.5 bels at a fan speed of 8,000 RPM.  There are three power supplies with two fans each.  Each fan in the power supply emits 7.5 Bels at a fan speed of 10,000 RPM.  Based on some preliminary thermal work, the system will cool at a room ambient of 25 oC when the system fans run at 4,200 RPM and the power supplies run at 6,000 RPM.  Will the system meet the acoustic goal?

 

Total Lw = Log(10(8.5 + 5Log(4,200/8,000) + log(8) + 10(7.5 + 5Log(6,000/10,000 + Log(6) )

Total Lw= 8.1 Bels  The system will not meet the acoustic goal.  More work needs to be done.  This is a prime example where the cooling solution is not complete without an acoustic solution.

 

I must remind you that the relationships that have been used to estimate the system level sound power are just that – estimates.  When fans are running at different speed, with different number of blades, the frequency spectrum emitted by different groups of fans will be different, and the actual sound power that is measured will be a little different.  That is not to say these estimates are useless.  As Example 2 clearly illustrates, without a preliminary acoustic estimate, the problems with the cooling solution would not have been discovered until much later in the development process.

Let’s take a brief detour to check our estimates against some actual measured data.  The good folks at Nidec, have an abundance of fan information available.  The table below includes data from fans that range in size from 40mm to 120mm, and includes tube axial and vane axial fans.

Full Speed Lw Full Speed % of Full Speed Estimate Measured
7.5 7950 71% 6.8 6.4
7.7 10000 73% 7.0 6.9
6.9 6000 74% 6.3 6.1
7.7 5300 94% 7.6 7.5
7.5 7200 74% 6.8 6.5
8.1 8500 75% 7.5 7.1
8.0 5300 57% 6.8 6.6

 

You will notice that the estimates do not exactly match the actual measurements, but for the most part, they are pretty close.  Close enough to make some informed engineering decisions anyway.

As you recall in the beginning of this blog, I said the estimates were for fans “operating against zero back pressure” – the fans’ free flow condition.  The amount of back pressure or resistance (or impedance) can have quite a dramatic effect on the sound power emitted by a fan.  Once again, the good folks at Nidec have given provided some good data.  The table below is an example.

Operating Fraction Fan Static NPEL
Voltage of Max. Speed Pressure (Bels)
(V) Air Flow (rpm) (inwg)  
13.8 100% 10000 0 7.7
80% 9950 0.7 7.8
20% 9820 1.75 8.3
12 100% 10000 0 7.7
80% 9950 0.7 7.8
20% 9820 1.75 8.3
7 100% 7300 0 6.9
80% 7250 0.35 7
20% 7190 0.94 7.5

 

Notice that the fan speed stays nearly constant as the back pressure is increased.  More importantly, as the back pressure increases, so does the sound power.  This increase in sound power is not taken into account in our estimate.  It would be nice if the relationship between sound power and system back pressure was the same for all fans.  Then we could toss the back pressure into our generic estimate as well.  But that is not the case.  For the example above, the selling point of this fan is that it stays at constant speed regardless of the back pressure.  For other fans, the fan speed goes down as the back pressure increases (despite of the best PWM control attempts).  This business of sound power and back pressure is another good criteria to use when making fan selections.

All is not lost though.  If you have a good idea of the back pressure of the system, and you can get the sound power and fan speed from the manufacturer for this condition, you can then use this as a starting point for your sound power estimates.  There are several international standards that are used to measure the sound power of fans and systems.  In a future blog, I will discuss these specifications in detail.

As promised from the last blog, I have developed a simple spreadsheet to help with the system level sound power estimates.  It is quite basic.  I learned a long time ago that when you need to pound a nail, you need only a hammer and not the entire hardware store.  This spreadsheet is the hammer.  It should open (and work) in both Excel 2003 and Excel 2007.  It is your for the taking.  You can copy it, use it, share it, change it…  In fact, if you think you have made some good improvements, send it back to me, and I will post it for others to use.

 System Sound Power Estimator

The input cells are in gray.  The calculating cells are in pink and yellow and are protected.  You can unprotect these cells.  The password is “fanspeed”.

In my next blog I will search for the truth behind the “Great Acoustic Mystery”