The average distance moved in a given time
Consider a fluid, and focus on an imaginary plane through it (the red line in the Figure) across which diffusion is occurring in the x direction.
|Suppose that in time t, the average particle will have moved a distance s along the x direction. Since the movement is random, half the particles will have moved to the left, and half to the right. Thus, half the particles in the left zone between the tall dashed line and P will cross P. Xa is the midpoint of this zone, Ca is the concentration of diffusing material at Xa, and we assume Ca is close to the average concentration in that zone. Per unit area, the total flow, Fa, in time t, from left to right across P is thus equal to half the number of particles in that zone, which is s times the concentration:|
 Fa = (s Ca) / 2
Since the total flow is equal to the flow per unit time, Ja, multiplied by the time, we can substitute Ja t for Fa, and then divide both sides of the equation by t :
 Ja = (s Ca ) / 2 t
and of course a similar relation for Jb, but with Cb replacing Ca, is valid for the right zone. The net flow across P is just the difference between the flow to the right and the flow to the left:
 J = Ja - Jb = (s Ca - s Cb) / 2 t = s (Ca - Cb) / 2 t
We want to compare this equation with the 1st law of diffusion:
 J = -D dc/dx
so we want to express (Ca - Cb) in terms of the concentration gradient, dc/dx. Since dc/dx is the change in concentration divided by the distance, the change in concentration is just dc/dx times the distance between Xa and Xb, which is s:
 Ca - Cb = - ( Cb - Ca ) = - s dc/dx
Substitution of  into  gives:
 J = - s2 (dc/dx ) / 2 t = - (s2 / 2 t) dc/dx
Now look again at the 1st law of diffusion ; comparing it with , we see that D must be:
 D = s2/ 2 t or s2 = 2 t D Q. E. D.
We started talking about diffusion of (presumably) invisible particles in a fluid, but we end up with an equation that is much more useful when we can actually see the particles. It's not clear how you measure s during diffusion, but when you look at Brownian motion, it's just the average distance moved in time t by the particle.
I think the s squared term is remarkable (those steeped in dimensional analysis will claim it was obvious from the beginning, but I'm talking about intuitive understanding here). The first s comes from the fact that the amount of material crossing P per unit area, from the zones of thickness s on each side of P, increases linearly with s because the volume of the zones increases with s. The second s comes from the fact that, for a given concentration gradient, the difference in the average concentration in the two zones increases linearly with s.
What is really going on during Brownian motion is revealed in the s squared term, which means that s increases with the square root of the time. The big particle is being hit on all sides by solvent molecules, which we can't see. This bombardment is random with time, and because every particle, large or small, has the same thermal kinetic energy, the total energy in all the smaller solvent molecules is actually much larger than the energy in the one large particle that we can see. This random bombardment causes the large particle to jump around with very small steps, in very short time intervals, because the solvent molecules are about a nanometer in diameter, a nanometer apart, but move at a speed of many meters per second, while a typical pollen grain is thousands of nanometers in diameter. Thus the fundamental time scale for Brownian motion, in which you could see the actual thermal motion, is on the order of fractions of a nano second.
Our eye-brains can't see events occurring faster than 1/30 th of a second (otherwise TV wouldn't work), and thus we see only the sum of many small thermal movements. But because s increases only with the square root of time, the apparent speed decreases when we go from nano seconds to seconds. Thus, the pollen grain appears to move much slower that the real thermal speed of either the pollen grain or the solvent molecules. A thermal kinetic origin for Brownian motion had been suggested before Einstein's paper, but the apparent low speed seemed to rule that theory out. Einstein rescued the thermal kinetic model by his quantitative analysis of the process.
More thoughts on the units of D
What are we to think about the units of D as applied to the motion of one particle (besides pedantically declaring that they are what they are)? There is no one way, but try this one. Think of the particle as time increases, and its most probable position (the motion is random, and thus we can only talk about probabilities). The most probable position is on an expanding spherical surface, and it's the area of that surface which increases linearly with time, as the distance of the surface from the starting point increases with the square root of time.