The oscillators are the only interesting aspect of SARAH's design. All of the other elements shown in the signal-flow diagram (Envelope Generators, LFOs, summing and scaling) are entirely conventional.
SARAH uses two oscillator instances per voice. The oscillators themselves are identically structured, but their settings are independent, i.e., they can be set to produce different waveforms, detuned, and mixed so as to provide at least a basic set of options to create composite timbres, and each oscillator's inherent “harmonic shaping” can also be set up differently, to provide further control. The following explanation (which is just an overview) applies equally to OSC1 and OSC2.
SARAH's oscillators are essentially wave-table based. They simply play out samples from a 1024-element digitized representation of one cycle of the selected waveform—sine, triangle, square, or sawtooth. What is interesting and new is how the 1024-element wave tables are populated.
There is a common sine wave table which is generated once and shared by all oscillator instances (including the LFOs in sine-wave mode). This is adequate, because the sine wave has no higher-order harmonics which need to be suppressed to avoid aliasing.
The triangle, square, and sawtooth wave tables are generated dynamically, as follows:
The interesting stuff happens at step 4, but to understand it we must first discuss step 5: When the oscillator is assigned a note frequency, the note's frequency in cycles per second (Hertz) is divided by the sampling rate in use (typically 44100 Hz) to yield a float
-valued “phase increment” in samples per cycle. The oscillator also has a float
-valued “phase” variable, restricted to the range 0.0 to 1.0, where 0.0 represents the beginning of the cycle and 1.0 represents the end. Each time the oscillator generates a new sample, it multiplies the phase by 1024 and rounds the result, to obtain a wave-table index in the range 0-1023, plucks that sample out of its wave-table and outputs it, then adds the phase increment to the phase, ensuring the result “wraps around” if necessary, so it remains in the range 0.0 to 1.0.
When the phase-increment is exactly 1.0, the oscillator replays its wave-table exactly. At a sampling rate of 44100 Hz, this would happen at an oscillator frequency of 44100/1024 = 43.066 Hz. This is a bit below F1, way down in the bottom octave of the piano range. Even lower notes result in a phase-increment a bit lower than 1.0, in which case, wave-table samples are occasionally repeated, resulting in slight quantization artifacts which are not very noticeable at such low notes.
For all notes above F1—i.e., just about every note you'll ever play—the phase increment will be greater than 1.0, meaning that some wave-table samples will be skipped. Basically, you are trying to replay the basic 43 Hz note at the higher pitch, and so all harmonics of the original tone will be multiplied by the phase increment value. The 512th harmonic of 43.066 Hz is 43.066 x 512 = 22049.79 Hz, just below half the 44100 Hz sampling rate—the so-called Nyquist frequency. Playing F2 means a phase-increment of around 2.0, so all harmonics above the 256th would be above the Nyquist frequency and would be “aliased” to lower frequencies. Up near the top of the piano range, the aliasing of even very low-numbered harmonics (which have substantial energy) will result in very noticeable aliasing artifacts.
In SARAH, this is avoided at step 4 above, by figuring out the highest-numbered harmonic which will still be less than the Nyquist frequency, and setting all higher-numbered harmonic coefficients to zero. When the resulting table is inverse-FFT-transformed, we obtain a version of the original waveform which is almost perfectly band-limited, and plays back at the new rate with no aliasing whatsoever.
In order to write the code for step 4, it's critical to understand the format of data produced by juce::dsp::FFT after a forward FFT operation. This is not published, and it's possible that some FFT implementations may differ (the JUCE code is structured so as to support multiple implementations in future), but I wrote my initial code based on my experience working with other FFT code, and my guess turned out to be correct.
I'm not going to give a full explanation of discrete Fourier transforms and the FFT here. Refer to Wikipedia or (preferably) any good Digital Signal Processing textbook to learn more. The following description concentrates entirely on pragmatic details relevant to C++ coding.
juce::dsp::FFT requires arrays that are exactly a power of 2 in size. The specific power is referred to as the order of the FFT, and must be specified when declaring a juce::dsp::FFT object. In SARAH, we use order 10, implying arrays with 2^10 = 1024 data points. The Fourier Transform is defined only for Complex Numbers, which are ordered pairs (re, im) consisting of a real and an imaginary component. These are represented in C++ as something like
struct Complex { float re; float im; };
such that sizeof(Complex)
is two times sizeof(float)
. However, juce::dsp::FFT itself is declared so that all array arguments are treated as float[]
, and the documentation states that all arrays must be twice the size given by the order, e.g., float[2048]
for order 10. This is because, internally, each float[2048]
array is interpreted as a Complex[1024]
array.
After a forward FFT operation, the frequency-domain data are again in a Complex[1024]
array, arranged as follows:
Why are the 1st through 511th harmonics represented twice? This is because the Fourier Transform works in terms of both positive and negative frequencies, where a negative frequency is interpreted as the Nyquist frequency minus the absolute frequency value. (This is related to the phenomenon of aliasing. When a positive frequency is beyond the Nyqyist frequency, it is “aliased” to the corresponding negative frequency. The human ear hears this negative frequency just the same as the corresponding positive one.) The 0th harmonic, whose frequency is zero Hz, is its own negative. Owing to the arcane mathematics of the FFT, so is the 512th harmonic.
The result of the above analysis—which mercifully seems to be correct for the default juce::dsp::FFT implementations on both Windows and Macintosh—is that the code for reconstructing an anti-aliased and harmonic-shaped time-domain wave table is as follows:
void SynthOscillator::setFilterCutoffDelta(float fcDelta) { if (waveform.isSine()) return; zeromem(waveTable, sizeof(waveTable)); int maxHarmonicToRetain = int(1.0 / (2.0 * phaseDelta)); if (maxHarmonicToRetain >= fftSize / 2) maxHarmonicToRetain = fftSize / 2; float cutoffHarmonic = (filterCutoff + fcDelta) * fftSize / 2; for (int ip = 1; ip < maxHarmonicToRetain; ip++) { int in = fftSize - ip; float* fftBuffer = SynthOscillatorBase::getFourierTable(waveform); memcpy(waveTable + ip, fftBuffer + ip, 2 * sizeof(float)); // positive frequency coefficient memcpy(waveTable + in, fftBuffer + in, 2 * sizeof(float)); // negative frequency coefficient float fv = 1.0f; if (ip > int(cutoffHarmonic)) fv = std::pow(ip / cutoffHarmonic, -filterSlope / 6.0f); waveTable[ip + 0] *= fv; waveTable[ip + 1] *= fv; waveTable[in + 0] *= fv; waveTable[in + 1] *= fv; } inverseFFT->performRealOnlyInverseTransform(waveTable); }
This code works as follows:
waveTable[]
array.waveTable[]
is cleared to all zeroes, and we compute the index of the maximum harmonic to retain, as the reciprocal of twice the phase delta (oscillator frequency in cycles per sample)cutoffHarmonic
which represents the cutoff frequency of our simulated low-pass filter. This is based on the fixed filterCutoff
parameter and a dynamic fcDelta
which is the sum of the shape-LFO and scaled shape-EG outputs. (Both these quantities have the range 0.0 to 1.0.)for
loop runs through the harmonic indices 1 through maxHarmonicToRetain
, copies both the positive and negative-frequency coefficients out of the master table (a pointer to which is returned by SynthOscillatorBase::getFourierTable(waveform), and then scales the resulting copies by a factor fv
computed by an expression which models the response curve of a simple low-pass filter with a given cutoff frequency (expressed in cycles per sample) and slope (expressed in dB per octave).waveTable[]
array.
The astute reader will have noted that the for
loop skips the 0th harmonic. The 0th harmonic of an digitized AC signal represents the DC component (net offset from zero) which is neither audible nor desirable in a digital signal processing system. Ensuring that the 0th-harmonic coefficient is always zero results in output waveforms which are guaranteed to be symmetric about zero—yet another free gift from the FFT.
As I said earlier, the Fourier Transform is defined over the Complex domain. An ordinary sequence of time-domain samples is “real-only”—corresponding to Complex numbers having zero imaginary components. After a forward FFT operation, the Complex frequency components will be such that for each positive-harmonic coefficient (re, im), the corresponding negative-harmonic coefficient will be (re, -im); they are so-called Complex conjugates. These special properties can be utilized to define “real-only” variants of the forward and inverse FFT algorithms which use less CPU than the “full Complex” version, and the juce::dsp::FFT module includes these.
Any CPU efficiency is a bonus, but for practical purposes, what you need to know is that the “real-only” FFTs interpret the time-domain data not as an array of 1024 Complex
values (a real float
followed by the corresponding imaginary float
), but rather as an array of 1024 real float
s followed by an array of 1024 imaginary float
s (which are all zeros). Hence, even though the array is declared as size 2048, we simply ignore the second half, and use only the first 1024 float
s. SynthOscillator::getSample() is simply
float SynthOscillator::getSample() { int sampleIndex = int(phase * fftSize + 0.5f); if (sampleIndex >= fftSize) sampleIndex = 0; phase += phaseDelta; while (phase > 1.0) phase -= 1.0; if (waveform.isSine()) return sineTable[sampleIndex]; else return waveTable[sampleIndex]; }