|
of the filter. This too is rubbish, because this ringing of the playback filter does not cause time smearing of the music. In fact this ringing is vital to correctly reconstructing the correct music waveform as only a perfect boxcar filter (which rings like a banshee) can. If we look at the reconstruction process in the time domain, we see that the energy to fill in the missing gaps of the sketchy data from the CD, in order to complete the correct music waveform, has to come from somewhere, and it can only come from a ringing filter response. If the impulse response of a digital filter looked like the good quality impulse some people espouse, without ringing, then such a filter could only track the sample dots coming off the CD in a simplistic connect-the-dots fashion. For example, suppose a single high frequency musical transient spike comes along. A filter with a so-called good impulse response could track and reconstruct that single musical transient waveform and its single peak only as high as the incoming sampling dot from the CD just happened to be lucky enough to sample that peak. As you know, digitizers sample the analog music waveform at discrete time intervals (determined by the sampling rate), and do not look at what the music signal does between sampling points. If perchance the digitizer at the beginning of the digital chain just happened to be lucky enough to sample that musical waveform peak precisely at the instant of its very top peak amplitude, then sample dot would very fortuitously be right at the peak amplitude of that musical transient, and then even a simple non-ringing filter with a so-called good impulse response could connect the sample dots to reconstruct the waveform pretty well. But 99.9% of the time the sample dot will not be located at the very top of the music waveform peak, and instead will be located down on the left or right skirt somewhere. This puts the digital filter with the so-called good impulse response in deep trouble. It can reconstruct up to the height of the sample dot, but no higher, and more particularly no higher at any time other than the time of the single sample dot and the filter's single (non-ringing) impulse in response to that sample dot. Thus, this so-called good non-ringing filter will reconstruct the single musical peak with the wrong amplitude (too low) AND AT THE WRONG POINT IN TIME!!! That's gross distortion of the music waveform, which often sounds smoothed down and fuzzily defocused, especially at high frequencies, where the connect-the-dots model fails the worst. This distortion sounds less dynamic because the true high amplitude of the peak was not reproduced; it is smooth because a lot of high frequency details are smoothed over and missed; and it sounds fuzzily defocused because the peaks don't even occur at the right time, so they don't coherent focus with the rest of the musical note. All these distortions occur because a filter with so-called good (non-ringing) impulse response can't deliver energy at any time other than at the time of the sample dot. If it can't deliver energy at a time other than a sample dot, then it can't re-create a musical transient peak that occurs at a different time than a sample dot, especially if that peak has a higher amplitude than any of the sample dots. But how then could it be possible to ever re-create a musical peak whose timing did not happen to coincide with one of the sample dots? And how and where could we ever find the energy to re-create a musical peak that was higher in amplitude than any of the sample dots? The strong ringing of the ideal boxcar filter supplies abundant energy before and after the filter's dominant single impulse response to a single sample dot. This added energy off to the sides in time is vital to filling in the missing gaps of energy among the too sketchy sample dots coming in off the CD. For example, consider that sample dot which 99.9% of the time will be located on the skirt of a single musical transient peak, rather than right at the top of that peak. The pre and post ringing energy from the filter's ringing response, to previous and later sample dots, furnishes additional energy at just the right moment, such that the sum of ringing responses over time does indeed add up to the true original musical transient peak reproduced at the correct instant -- even if that peak occurs at a different instant than the sample dot, and even if that musical peak was higher in amplitude than the value of the sample dot that was only partway up on the skirts of this peak. As you can imagine, sometimes the sample dot will be on the left skirt of the peak, and sometimes on the right skirt. This means that both pre and post ringing are vitally necessary to the success of the digital filter. Some writers have opined that pre-ringing of a digital filter must somehow be sonically obnoxious, since it is a crime against nature to have an effect precede the cause, and as such pre-ringing should be banished or at least reduced. And they have therefore also opined that higher frequency sampling sounds better because this dreaded pre-ringing doesn't last as long. That's all rubbish. In truth, the pre-ringing helps to fill in the gaps to provide the correct music waveform, and without the pre-ringing the reconstructed waveform would look and therefore sound less like the original music waveform. The pre-ringing, together with the post-ringing, are the only source of energy to enable musical peaks to be re-created at their correct amplitude and correct temporal instant, even if that amplitude and instant does not coincide with the amplitude and instant of any sample dot. And the precise nature and pattern of the pre-ringing and post-ringing determines just how accurately the music waveform will be re-created at all amplitudes and instants other than where the sample points are. At progressively higher frequencies within the passband, there are progressively fewer sample points per waveform cycle, so progressively more of the music waveform is located where there are no sample points, and thus progressively more of the music waveform must literally be generated (from sketchy clues) by the precise ringing pattern of the digital filter. If this pre and post ringing pattern is a little off the ideal, then the music waveform re-created between sample points will be a little off. An ideally perfect boxcar filter has a certain characteristic profile in the frequency domain, with a flat passband, abruptly sharp corner, and infinitely steep cutoff slope. This ideal profile in the frequency domain has a corresponding profile in the time domain, in the form of a precise ringing pattern (pre and post). We cannot in practice achieve the ideal boxcar profile in the frequency domain, and therefore we likewise cannot achieve the ideal ringing pattern in the time domain (for example, our practical filters have corners that are rounded, and therefore ring less, than the abruptly sharp corner of an ideal boxcar filter would). The reason that slightly imperfect real world boxcar digital filters fail to accurately re-create the correct musical waveform is not that their frequency domain performance is inadequate at filtering out ultrasonic spuriae. Rather, the actual reason centers on their inadequate performance in the time domain. The true shortcoming of our best practical approximations to the boxcar filter ideal is that their strong ringing pattern isn't quite strong enough (and precise enough), to accurately and completely fill in the gaps in the sketchy data coming off the CD. So, contrary to popular speculation, what we actually want are digital filters that ring more, not digital filters that ring less. Indeed, CD players and D-A processors that attempt to reduce or banish this ringing (in order to misguidedly obtain presumptively better time domain impulse response from their digital filter) actually degrade the re-creative ability of their digital filter by taking its design farther away from the ideal boxcar that liberally rings, and they thereby cripple its ability to even closely re-create the original music waveform accurately (such attempts can sound smooth and sweet, but they miss a lot of important musical details, and we've seen and measured them [for example] generating a simplistic V shape in the music waveform where there should have been a W shape [in other words they totally missed the central transient spike of the W]). Many have speculated that the abundant ringing of a boxcar filter has adverse audible effects, especially the pre-ringing, which must sound especially awful and unnatural because it precedes the main transient, and in nature no effect ever precedes the cause. This speculation is rubbish. In truth, the correctly large amount of ringing, even larger than the ringing our boxcar approximations do, and including pre as well as post ringing, would be entirely in audible. It would be inaudible because it would blend into the correct re-creation of the original music waveform. A fortiori, it would be inaudible because it forms an essential constituent of the re-creation of the original music waveform. Indeed, it is the ABSENCE of sufficient, correctly large amounts of ringing that is audible, and that audibly degrades the music. If there is not enough ringing to accurately re-create the original music waveform, the re-created music waveform is degraded, probably audibly so.
Nyquist Theory vs. Practice
If all these common speculations are rubbish, what's the true explanation? Why does doubling the original digitizing sampling rate, and extending system response superfluously into the ultrasonic region, produce so many sonic improvements of such remarkable degree over so much of the spectrum? To find the answer, let's first spend a minute visiting everyone's friend, Harry Nyquist. Nyquist's famous theory is often misquoted as saying that if you digitally sample a signal at a sampling rate twice the highest frequency of interest, then you will preserve all the information of that signal in digital form. In truth, Nyquist's theory imposes a lot more restrictions, qualifications, and caveats than this common misquote suggests. Nyquist's theory also requires that the analog input signal first be totally band limited before it is digitized. And his theory does not promise that all the waveform information will actually be preserved in digital form; rather, it merely promises that enough clues are preserved so that the original signal could be reconstructed later, at least in theory, if you could somehow access the theoretically correct reconstruction tools (not accessible in practice). And his theory requires that the reconstruction tool be the theoretically correct, perfect, ideal reconstruction algorithm known as a boxcar filter, which as we've seen is impossible to achieve in practice. And his theory requires that the corner of this theoretically perfect boxcar filter be set at the Nyquist frequency (half the sampling rate), so that its ringing will have precisely the correct temporal pattern to add and subtract at the correct instants, so as to correctly reconstruct the desired original waveform, especially at higher frequencies within the passband (but this theoretical requirement is also impossible to meet for 44.1 kc CDs, since a necessarily imperfect boxcar filter must in practice have its corner set at 20 kc, not the theoretically ideal 22.05 kc Nyquist frequency, in order to establish a protective guard band in the 20-22.05 kc region). Now, there's nothing wrong with Harry Nyquist's theory, once we understand its requirements. The theory is correct, in theory. But it's impossible to implement correctly in practice. Now, our hearing mechanism, wonderfully sensitive as it may be, is limited to hearing signals as they are realized in practice, and sadly cannot discern signals that are realizable only in theory. Thus we must amend the Nyquist gospel of the theoretical to allow for what is practically realizable. The best we can do in practice is to crudely approximate the requirements of the Nyquist theory. But then it becomes our responsibility to recognize specifically where and how our practical limitations fall short of the theoretical Nyquist requirements. It becomes our responsibility to discover and analyze what adverse consequences will result from our specific shortfalls in practice. It becomes our responsibility to find and engineer practical remedies to deal with these recognized shortfalls and adverse consequences. We cannot simply bury our head in the sand and pretend everything must be OK because Harry our guru had a theory, which if misquoted promised us that everything would be OK so long as we merely sampled music at least twice as fast as the highest frequency of interest.
Four Problems in Practical Digital Systems
There are many aspects in which practical digital systems fall short of the theoretical ideal necessary for Nyquist's theory to work as advertised. We've already discussed some of them above. Let's concentrate on four problems here. First, the analog signal must be severely band limited before it is even digitized. If the input sampling rate is 44.1 or 48 kc, and the highest frequency of interest we wish to record is 20 kc (less than half of 44.1 or 48 kc), then some would have you believe that the Nyquist theory promises everything should be perfect. However, some musical instruments radiate a lot of spectral energy well above 20 kc. For example, IAR measured even a gentle cymbal kiss as having its strongest power peak at 40 kc. This means that the analog music signal must be put through a sharp, steep, complex analog filter before the digital system can even handle it. This degrades the music signal in a number of ways. The number of parts alone that such a complex filter has, and throws into the music signal path, inevitably degrades the music signal. Also, a sharp, steep analog filter like this ruins the music signal waveform so that it is no longer even vaguely recognizable. This waveform brutalization is mostly due to severe phase rotation, which at least is not as sonically obnoxious as other types of distortion -- but it makes the whole issue of music waveform fidelity a difficult issue to master for the rest of the digital chain, where even subtle waveform distortions can have devastatingly obnoxious sonic consequences. Second, as already discussed above, practical real world digital filters cannot achieve the ideal boxcar shape required to correctly reconstruct the music waveform from the sketchy clues coming off the CD. Their failure to correctly re-create the music waveform becomes noticeable above 2 kc, where there are no longer enough sample dots to adequately outline the music waveform sufficiently well for a connect-the-dots model or algorithm to be able to discern what the music waveform should be. And this failure of digital filters becomes progressively worse at higher frequencies, as the calculations spawned by their non-boxcar shape stray progressively farther from what the waveform should be. Thus, these errors from a practical digital filter's non-ideal nature are worse in the 4kc to 8 kc octave than they were in the 2 kc to 4 kc octave. Third, a specific failing of practical real world digital filters is worth attention in this context. For the digital filter's re-creation process to work correctly, especially at higher passband frequencies where the incoming clues from the CD are sketchier, the ringing pattern of the boxcar filter must be precisely right. The ringing pattern (the classic sinx/x function) has wavelets of ringing on both sides of the main impulse spike. These ringing wavelets are spaced apart a precise distance in time, and they have precise (progressively smaller) amplitudes. The energy of these precisely patterned ringing wavelets, from one sample point, adds to the energy from other nearby sample points, both before and after. The added ringing energy has to be precisely the right amount at precisely the right time, in order to add up to an accurate reconstruction of the original music waveform. If the amplitude of the ringing wavelets is wrong, then they won't add the correct amount of energy needed for an accurate reconstruction at that instant. And if the time distance between the ringing wavelets is wrong, then they won't add their energy at the correct time needed for an accurate reconstruction. In sum, we're screwed if the amount of ringing isn't right, and we're screwed if the temporal spacing of the ringing isn't right, and if they're both not right then we're doubly screwed. We already know that the amount of ringing isn't right. We would actually need more ringing to get an accurate reconstruction of the music waveform. A theoretically ideal boxcar filter would have enough ringing, but we can't get enough ringing because we can't practically make a filter with an abruptly sharp enough corner and infinitely steep enough cutoff slope. But now it's time for discovery of another nasty shortcoming. We also can't get the temporal spacing of the ringing pattern to be right. The temporal spacing of a boxcar filter is determined by the frequency at which we make the corner of the filter. For the temporal spacing of the filter's ringing pattern to be correct, we merely have to design our reconstruction filter so its corner is at the Nyquist frequency, half the sampling frequency. But we can't do this (at least not in a simple 44.1 kc CD player handling 44.1 kc media). Why not? The digital reconstruction filter also acts as an anti-imaging filter, a playback filter role that is similar to the anti-aliasing role of the recording filter. In both cases, the goal is to eliminate ultrasonic spuriae beyond the Nyquist frequency (in this case above 22.05 kc) that could cause distortion. But, since our practical real world playback filter cannot have an infinitely sharp cutoff slope, we are forced to set its corner frequency at a lower frequency (typically 20 kc) in order that its skirt can achieve adequate rejection of images by the time it reaches 22.05 kc, above which the dreaded images begin. We need to establish a guard band between the desired passband (to 20 kc) and the Nyquist frequency (here 22.05 kc). That's why most CD players and D-A processors have had their playback reconstruction/anti-imaging filter designed with a corner frequency at 20 kc, not 22.05 kc. But this 20 kc corner frequency gives us a filter ringing pattern with erroneous time intervals between the wavelets. This filter design with the corner set at 20 kc might do a good job as an anti-imaging filter, but it does a lousy job as a reconstruction filter, since its ringing wavelet pattern is temporally wrong, and so it supplies the energy at the wrong time to accurately re-create the original music waveform. In sum, practical reconstruction filters, even when they try to approach the boxcar ideal, suffer from two distinct kinds of inaccuracies. Their amplitude inaccuracy arises in part because the corner cannot be made perfectly abrupt and the cutoff slope cannot be made infinitely steep, thereby producing a ringing pattern with incorrect and insufficient amplitude to correctly re-create the original music waveform. Their distinct temporal inaccuracy arises in part because the corner cannot be set at the correct frequency, thereby producing a ringing pattern with the wrong temporal spacing between ringing wavelets. Thus, practical reconstruction filters supply the wrong amount of energy at the wrong time, in their attempt to correctly reconstruct the original music waveform. No wonder digital sounds like digital! Incidentally, there are also other factors contributing to digital errors, such as amplitude errors from computational approximations and temporal errors due to jitter, but these are beside the point of the present discussion. Fourth, even the magic of high power averaging, discussed above, has its limitations. Basically, it runs out of steam at the top end of the music passband, where there are no longer plural sample points to calculate an average for. High power averaging achieves some awesome sonic miracles by fitting a smooth curve through a noisy, error-laden statistical scatter of sample dots. But if the statistical scatter contains only one dot (as it does for the top frequencies of the passband), you can't (Continued on page 29)
|
|