Author: | Felix Voigtlaender |
---|---|
Description: | This is a rough transcript of the blackboard part of the tutorial on Wavelets and Shearlets which I presented at the DEDALE Tutorial Day, November 16, 2016. The second part of these notes can be best (or maybe only) understood in combination with the associated slides. |
Warning
This document is intended to serve as an informal tutorial. My aim is not that every statement is 100% mathematically correct or precise. Rather, I want to present the underlying ideas and techniques.
Contents
This whole tutorial is concerened with different systems for sparsely representing certain signals, e.g. sound signals, images, videos, etc. Mathematically, these signals will be elements of a certain Hilbert space ℋ, e.g. ℋ = L2(ℝ) for sound signals or L2(ℝ2) for (continuous) images. In fact, for (quadratic, continuous) images, one might also use ℋ = L2([0, 1]2).
Of course, in practice, an image is not a function f:[0, 1]2 → ℝ, since this would require us to store uncountably many values. Instead, one usually models an image (consisting of discrete pixels) using the finite dimensional Hilbert space ℋ = ℂN × N (or ℋ = ℝN × N).
Formally, what we are looking for is a linear transformation
of ℋ into a certain coefficient Hilbert space ℋC. To justify the name coefficient Hilbert space, this space should consist of (coefficient) sequences. Hence, we will assume ℋC = ℓ2(I) for a certain index set I. Here,
Typical examples are ℓ2(ℤ) or ℓ2(ℤ2), or ℓ2({1, …, N}).
Hence, for each f ∈ ℋ, the transformation Φf is of the form
In fact, using the theory of Hilbert spaces (precisely, the Riesz representation theorem), one can show that there is a family (φi)i ∈ I in ℋ such that
Of course, we want the transformation Φ to satisfy a number of nice properties:
It should be possible to reconstruct f ∈ ℋ from its coefficients Φf in a stable way. In other words, there should be a (linear, continuous) reconstruction map R:ℋC → ℋ satisfying f = RΦf for all f ∈ ℋ.
As we will explain later on, this is equivalent to requiring that the family (φi)i ∈ I forms a frame.
For a certain class of important/interesting signals f ∈ G ⊂ ℋ (the good signals), we want the coefficients Φf to be sparse.
In a finite dimensional setting, this means that most of the coefficients Φif should be zero, or at least negligible in comparison to the significant coefficients. In the infinite-dimensional setting, this is encoded by requiring Φf ∈ ℓp(I) for some p ≤ 1. Here, ℓp(I) is defined similar to ℓ2(I), i.e.,
The first question is of course why such a system for sparse signal representation is interesting at all. In fact, there are many areas where such systems are useful. In the following, we only mention three prominent examples.
Such a system can be used for data compression: Since for a good signal f ∈ G, most of the coefficients Φif vanish, or are at least very small, one can save only the significant (large) coefficients, and set the remaining ones to zero, which yields the compressed coefficients C(Φf).
This saves memory and the resulting (compressed) coefficients yield a good approximation to f, i.e., ∥f − RC(Φf)∥ℋ is small.
This general idea is the basis for the JPEG 2000 standard for image compression, which uses Wavelet systems as the system for sparse signal representations.
In many applications, one wants to recover a signal f from incomplete measurements. Formally, this means that we only know (e.g. from taking certain measurements like MRI measurements) the quantitiy
Since n≪N, this linear system is highly underdetermined, i.e., there is no unique solution if one has no additional a priori information about x.
The main statement of the theory of compressive sensing, however, is that x is uniquely determined (and can be computed efficiently), if it is known that x is sparse. Of course, this only holds under additional suitable assumptions on the measurement matrix A. These assumptions, however, are outside of the scope of this tutorial.
Typically, x will not be sparse in the standard basis. For example, for most natural images, it is not true that most pixels of the image are (almost) zero (black). Hence, to apply the theory of compressive sensing, one needs to know a system Φ which sparsifies x, i.e., such that Φx is sparse. Then we have
where the vector z = Φx is sparse. Now, one can apply compressive sensing methods to recover z = Φx and then also x = RΦx = Rz.
Note
For the theory of compressive sensing to be applicable, one needs to have a certain compatibility between the measurement matrix A and the sparsifying system Φ. This, however, is outside the scope of this tutorial.
In the preceding point, we considered an underdetermined linear system in which the solution (if it exists) is not unique. But even if - in principle - there is a unique solution, i.e., if we want to reconstruct x from
simply computing this solution might be a bad idea.
To see this, we consider a numerical example: In this example, we take a certain image (a racoon) and blur this image, by convolving it with a gaussian of a certain size. One can mathematically show (see The inverse of the convolution operation) that this blurring operation is invertible.
However, once the blurring is strong enough, the numerical implementation of the inverse operation yields catastropical results:
# First, we import some useful packages import numpy as np import matplotlib.pyplot as plt import scipy.misc from numpy.fft import fft2, ifft2, fftshift # We load the racoon image and display it for visual inspection im = scipy.misc.face(gray=True) im = im[:, :768] plt.imshow(im, cmap='gray') plt.colorbar() plt.show()
# We compute and display the Gaussian with which we will convolve the racoon image. # In the plot below, we show a zoomed version of the actual plot grid = np.mgrid[-384:384, -384:384] gaussian = np.exp(- (grid[1]**2 + grid[0]**2) / 5) gaussian /= np.sum(gaussian) plt.imshow(gaussian, cmap='gray') plt.show()
# We concolve the racoon image with the Gaussian and display the result blurred = ifft2(fft2(im) * fft2(fftshift(gaussian))) plt.imshow(np.real(blurred), cmap='gray') plt.show()
# We apply the exact inverse operation of the blurring operation and display the result recon = ifft2(fft2(blurred) / fft2(fftshift(gaussian))) plt.imshow(np.real(recon), cmap='gray') plt.show()
# We plot the (very small) reconstruction error plt.imshow(np.abs(recon - im), cmap='gray') plt.colorbar() plt.show()
# We convolve the racoon image with a wider Gaussian and display the Gaussian and the result # Again, the plot of the Gaussian which is shown below is a zoomed version gaussian = np.exp(- (grid[1]**2 + grid[0]**2) / 10) gaussian /= np.sum(gaussian) plt.imshow(gaussian, cmap='gray') plt.show() blurred = ifft2(fft2(im) * fft2(fftshift(gaussian))) plt.imshow(np.real(blurred), cmap='gray') plt.show()
# Again, we apply the exact inverse operation and display the result, which is horrible recon = ifft2(fft2(blurred) / fft2(fftshift(gaussian))) plt.imshow(np.real(recon), cmap='gray') plt.colorbar() plt.show()
The reason for this phenomenon is that while the blurring operation considered above is indeed invertible, it is very close to being noninvertible. One says that the deblurring problem is badly conditioned [1] or unstable.
One can circumvent this unstability by regularization. As we saw above, the horrible reconstruction in the second case has very large numerical values. The idea of Tychonoff regularization is to penalize such a blow-up by solving the minimization problem
for a certain parameter λ > 0. Here, the first term (the data fidelity term) forces the measurements Ax∗ belonging to the reconstruction x∗ to be close to the measurements y = Ax belonging to the ground truth x. The second term prevents the blow-up discussed above.
Another possibility is to use the prior knowledge (or assumption) that the ground-truth x is sparse with respect to Φ. In this case, one solves the minimization problem
For simplicity, we only present the numerical results using the Tychonoff regularization:
# First, we import some useful packages and load the racoon image import numpy as np import matplotlib.pyplot as plt import scipy.misc from numpy.fft import fft2, ifft2, fftshift im = scipy.misc.face(gray=True) im = im[:, :768] # We compute the wide Gaussian from above grid = np.mgrid[-384:384, -384:384] gaussian = np.exp(- (grid[1]**2 + grid[0]**2) / 10) gaussian /= np.sum(gaussian) # We compute and display the blurred image (plot omitted here) blurred = ifft2(fft2(im) * fft2(fftshift(gaussian))) plt.imshow(np.real(blurred), cmap='gray') plt.show() # We compute the solution to the Tychonoff regularized problem # and display the reconstruction better_recon = ifft2(fft2(blurred) / (0.01 + fft2(fftshift(gaussian)))) plt.imshow(np.real(better_recon), cmap='gray') plt.colorbar() plt.show() # We plot the reconstruction error plt.imshow(np.abs(better_recon - im), cmap='gray') plt.colorbar() plt.show()
As we see, the reconstruction is much better, although not perfect, as the plot of the reconstruction error shows.
In a nutshell, the main theorem of Fourier analysis on the interval (0, 1) is that the family of pure frequencies (ek)k ∈ ℤ = (e2πik•)k ∈ ℤ forms an orthonormal basis (ONB) of the Hilbert space L2((0, 1)).
In particular, this implies that the Fourier transform
is a well-defined linear map and in fact an isometry, i.e., ∥^f∥ℓ2 = ∥f∥L2((0, 1)).
Furthermore, the properties of an orthonormal basis also show that we have
from which it follows that f can be reconstructed from its Fourier transform (coefficients) in a stable way.
Note
By what we just saw, the Fourier transform Φ = ℱ satisfies the first of our two desired properties from The general setting.
Our second desired property is that the Fourier transform of a function f should be sparse if f belongs to a certain class of "nice" functions. To see what this class of nice functions could be for the Fourier transform, let us first assume that f:ℝ → ℂ is 1-periodic (i.e., f(x + 1) = f(x) for all x ∈ ℝ) and continuously differentiable. In this case, a partial integration yields, for k ≠ 0,
Here, we used that
so that the boundary terms cancel out. By rearranging equation (∗), we see
More generally, if f is 1-periodic and N times continuously differentiable, we get a very fast decay of the Fourier coefficients of f, namely |^f(k)|≲|k| − N for k ≠ 0.
Conversely, if the Fourier coefficients ^f(k) of f have a fast decay, namely, if ∑k ∈ ℤ|^f(k)| < ∞, then one can show (using the so-called Weierstrass M-test) that the series
converges uniformly. Since each summand of the series is continuous, this implies that f itself must already be continuous. More generally, one can show that if the Fourier coefficients have very fast decay (we omit the details), then f has to be differentiable.
Important
We have thus seen that there is a very tight connetion between smoothness of f and the decay (sparsity) of the Fourier coefficients of f.
Conversely, this means that if f is not differentiable or not even continuous, then the Fourier coefficients of f cannot be sparse, since otherwise, the argument from above would imply that f is continuous (or even differentiable). Hence, even if f is badly behaved (discontinuous, non-differentiable) only at one point, the Fourier coefficients of f will behave badly.
Let us now see these theoretic arguments in practice:
# First, import packages for plotting and computing the Fourier transform import numpy as np import matplotlib.pyplot as plt from numpy.fft import fft, ifft, fftshift # Generate and plot a very smooth function f N = 100 grid = np.linspace(0, 1, N) f = 20 * grid**2 * (1 - grid)**2 plt.plot(grid, f) plt.show()
# Compute and display the Fourier coefficients of f # -> they decay very quickly plt.plot(np.arange(-N//2, N//2), np.abs(fftshift(fft(f)))) plt.show()
# Create and plot a modified version g of f which is still very smooth, # except at TWO POINTS. g = np.copy(f) g[N//4:3 * N //4] -= 1 plt.plot(grid, g) plt.show()
# Compute and display the Fourier coefficients of g # -> they have very bad decay! plt.plot(np.arange(-N//2, N//2), np.abs(fftshift(fft(g)))) plt.show()
As we saw in the preceding section, the pure frequencies are very well suited for sparsely representing functions that are smooth. However, even if the function only has one point at which it is badly behaved, the pure frequencies are not a good choice anymore, mainly because they are badly localized.
In the following, we will thus consider Wavelets and Shearlets as alternative systems. But at least for Shearlets (and also in many cases for Wavelets), one does not get an orthonormal basis for the Hilbert space under consideration. Hence, we will now briefly discuss the concept of frames. In a nutshell, these are a generalization of the concept of an orthonormal basis.
As explained in The general setting, we always consider a coefficient map of the form
One can then show that the following are equivalent:
The map Φ is well-defined (i.e., ∑i ∈ I|⟨f, φi⟩|2 < ∞ for all f ∈ ℋ) and one can reconstruct each f in a stable way from its coefficients Φf, i.e., there is a linear, continuous reconstruction map R:ℓ2(I) → ℋ satisfying RΦf = f for all f ∈ ℋ.
There are positive constants A, B > 0 such that
holds for all f ∈ ℋ.
If either of these properties is satisfied, the family (φi)i ∈ I is called a frame for ℋ.
In fact, one can describe the reconstruction map more precisely: For each frame (φi)i ∈ I, there exists a dual frame (φ♮i)i ∈ I which satisfies
for all f ∈ ℋ. This should be compared to the formula f = ∑i ∈ I⟨f, φi⟩φi which holds for orthonormal bases. Hence, for orthonormal bases, one possible dual frame (and in fact the only one) is the orthonormal basis itself.
Note
In a finite dimensional Hilbert space of dimension N, each orthonormal basis also has length N.
For a frame, this need not be the case: Since each element of the Hilbert space can be written as a linear combination of the frame elements, it follows that each frame (φi)i ∈ I has at least N elements, but in general, it can have much more. The quotient #I ⁄ N of the number of frame elements to the dimension of the space is called the redundancy of the frame.
Now, even if a highly redundant frame yields a very sparse representation in the sense that only a small fraction of the coefficients |⟨f, φi⟩| is not negligible, it can still be the case that a less redundant frame which yields a less sparse representation still leads to a smaller absolute number of non-negligible coefficients.
Thus, for data compression, one often still uses Wavelets instead of Shearlet or Curvelets, since although Wavelets usually provide less sparse expansions, they are much less redundant. For regularization purposes (cf. Motivation), however, sparsity is often more important than having a small redundancy.
For the definition of the different types of Wavelet systems, we refer to slide 3 of the associated slides to this document.
As indicated by the example on slide 4, wavelets yield fairly sparse representations of natural images. Furthermore, there is a mathematically rigorous theorem stating that Wavelets are suited optimally for approximating functions which are smooth apart from finitely many point singularities.
Hence, Wavelets yield a great improvement compared to the highly nonlocal Fourier basis, where one single singularity completely ruins the decay of the coefficients.
Let us now see the improvement of Wavelets in comparison to the Fourier basis in practice:
# First import packages for plotting and for computing Wavelet transforms import numpy as np import matplotlib.pyplot as plt import pywt # Compute and display the same smooth function f which we used above # in the example with the Fourier transform N = 128 grid = np.linspace(0, 1, N) f = 20 * grid**2 * (1 - grid)**2 plt.plot(grid, f) plt.show()
# Compute and display the (absolute value) of the Wavelet coefficients # -> These have a nice decay coeff = pywt.wavedec(f, 'haar', level=4) coeff_j = np.concatenate(coeff) plt.plot(np.abs(coeff_j)) plt.show()
# Compute and display a modified version g of f which is smooth # except AT TWO POINTS. This is the same g as for the Fourier transform example g = np.copy(f) g[N//4:3 * N //4] -= 1 plt.plot(grid, g) plt.show() # Compute and display the (absolute value) of the Wavelet coefficients of g coeff2 = pywt.wavedec(g, 'haar', level=4) coeff2_j = np.concatenate(coeff2) plt.plot(np.abs(coeff2_j)) plt.show()
As we see, the wavelet coefficients of the more singular function do have a slightly worse decay than those of the smooth function, but the decay is significantly better than that of the Fourier coefficients from the preceding section.
We remark that we here used one of the simplest Wavelets, the so-called Haar-Wavelet. One could probably achieve even better sparsity by using a different (mother) Wavelet. A list of available Wavelets can be obtained via
pywt.wavelist()
As we saw above, Wavelets provide sparse representations for signals that only have finitely many point singularities. But as indicated on slide 5, natural images typically exhibit singularities (discontinuities) along curves instead of point-like singularities.
To capture these types of singularities in a precise mathematical definition, Donoho introduced the (highly simplified) model of cartoon-like functions, see slide 5. Donoho also proved a "no-go theorem" which provides a certain upper limit on how good any system can approximate cartoon-like functions, cf. slide 6. The same slide also shows that neither the Fourier basis, nor Wavelets do achieve this optimal approximation rate.
For Wavelets, the intuitive reasoning (shown on slide 6) is that since they use scalar dilations, all wavelets are approximately square shaped, which is not well-suited for approximating a curve. Instead, one needs building blocks which are elongated and which can adapt to the direction of the curve.
As seen on slide 8, the idea of Shearlets is
Thus, a first definition of Shearlets (see slide 9) is to take a single mother shearlet ψ and build the shearlet system only of translated, sheared and parabolically dilated versions of ψ.
However, as seen on slides 9 and 10, if one considers the associated Fourier tiling (i.e., the (essential) Fourier supports of the different elements of the Shearlet system), then this tiling behaves very differently with respect to the ξ1 direction and the ξ2 direction. This is undesirable in practice: For any "nice" image f, if we flip the x- and y-axis, we should still have a "nice" image, which should thus have a sparse representation using the shearlet system.
The solution to this problem is to divide the frequency plane into a horizontal and a vertical cone and a low-frequency part, cf. slide 10. For each of these regions, one then uses a different generator, as explained on slide 11.
Slide 13 shows that one can even construct shearlet frames which are compactly supported, which is important for theoretical, as well as for practical problems.
Finally, slide 14 shows that shearlets provide the optimal approximation rate (and also sparsity of the Shearlet coefficients) for the class of cartoon-like functions. This cannot be improved in view of Donoho's "no-go theorem" (slide 6).
Finally, let us compare the performance of Wavelets and Shearlets in a concrete example:
# Import packages for plotting and computing Wavelet and Shearlet transforms # Here, we use the (alpha)-shearlet transform that we implemented as part of the DEDALE project from AlphaTransform import AlphaShearletTransform as AST import numpy as np import matplotlib.pyplot as plt import scipy.misc import pywt # Load and display a sample image (the usual racoon image) im = scipy.misc.face(gray=True) im = im[:, :768] plt.imshow(im, cmap='gray') plt.show()
# Compute and plot (the absolute value of) the shearlet coefficients trafo = AST(768,768, [0.5]*3, subsampled=True, verbose=False) coeff = trafo.transform(im, do_norm=False) coeff_j = np.concatenate([c.ravel() for c in coeff]) plt.plot(np.abs(coeff_j)) plt.show()
# Compute and plot (the absolute value of) the wavelet coefficients using the Haar mother wavelet coeff2 = pywt.wavedec2(im, 'haar', level=3) coeff2_joined = np.concatenate([coeff2[0].ravel()] + [c.ravel() for cs in coeff2[1:] for c in cs]) plt.plot(np.abs(coeff2_joined)) plt.show()
# Compute and plot (the absolute value of) the wavelet coefficients using a Daubechies mother wavelet coeff3 = pywt.wavedec2(im, 'db3', level=3) coeff3_joined = np.concatenate([coeff3[0].ravel()] + [c.ravel() for cs in coeff3[1:] for c in cs]) plt.plot(np.abs(coeff3_joined)) plt.show()
Here, we clearly see one of the phenomena that was already discussed at the end of the Section Frames for Hilbert spaces: The shearlet coefficients are very sparse in the sense that a very large fraction of the coefficients is negligible, while the wavelet coefficients are slightly less sparse (for both choices of the mother wavelet). Note though that the total number of wavelet coefficients is identical to the dimension of the image (7682 = 589824 ≈ 600000 pixels), while the number of shearlet coefficients is 5670929, which is about 10 times the number of pixels in the original image.
In this tutorial, we discussed different systems which yield sparse representations for different classes of signals.
We saw that these representation systems can be used for data compression, for compressive sensing and for the regularization of inverse problems.
Finally, we saw that Wavelets as well as Shearlets and Curvelets have a certain associated tiling of the Fourier domain which determines many of the properties of the respective system.
Footnotes
[1] | As seen above, the inverse operation uses a division by the Fourier transform of the Gaussian. While it is nonzero, the Fourier transform of the Gaussian is almost zero at most frequencies. Hence, we are dividing by a value which is almost zero. Together with very small, unavoidable rounding errors, this explains the catastrophic result from above. |