联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codehelp

您当前位置:首页 >> Matlab程序Matlab程序

日期:2024-01-21 07:58

Project - Image Denoising via Sparse Coding
1 Introduction
In the area of signal/image processing, the tasks often related with recovering a target x
?
from its
measurements, which can be modeled as
f = Ax? + ?
where A is the observation operator and ? stands for noise. One simple example of (1) is signal
sampling and reconstruction: suppose we have a analog signal from which are sampling at the
rate of F samples per second without noise, the classical Nyquist sampling theorem tells us that, in
order to obtain a perfect reconstruction of the signal, the sampling rate must be at least twice its max
frequency. Sampling theorem poses a hard constrain onto the signal we can recover, or challenges to
hardware design of the target we wish to recover is not band-limited, or its max frequency is too high.
However, when the signal x
? has certain structure, the Nyquist theorem can by bypassed and the
signal can be recovered from a very smaller amount of observations. One example of such “certain
structure” is sparsity or sparse representation, which in general means that though the dimension
of the signal can be very large (for example high sampling frequency means huge dimension), its
representation requires much less number of points.
Example 1.1 (Staircase function). For the staircase function if Figure 1, it is clear that representing it using Fourier Transform requires the whole spectrum due to the singularity. However,
representing it using only position + jump requires only two numbers.
Figure 1: Staircase function.
Hence in sparse representation, the term “sparse” implies that most of the coefficients in the linear
combination are zero, or in other words, each signal or image can be represented as a combination of a
small number of basic elements. Sparse coding was initially developed in the context of neural coding,
where it was proposed as a model for how the mammalian visual cortex represents visual information
efficiently. Theoretical breakthrough of sparse coding was around 2005 due to E. Candès, T. Tao,
J. Romberg and D. Donoho. In the work Robust uncertainty principles: Exact signal reconstruction
from highly incomplete frequency information by Candès, Romberg and Tao, they show that for the
problem (1), if the number of non-zero elements, say κ, in x
?
is small enough, and A satisfies certain
properties, then x
?
can be recovered from O(κ) number of measurement. Later on, Donoho proposed
the term compressed sensing in a paper with the same name.
2 Sparse Representation and Dictionary Learning
The basic elements used in sparse coding come from a so-called “dictionary” of atoms. Each atom is
a small signal or image patch, and a complete signal or image can be approximated by adding up the
atoms, each multiplied by a corresponding coefficient. The goal of sparse representation is to find the
coefficients that give the best approximation, under the constraint that the coefficients are sparse.
1
Dictionary Learning, on the other hand, is the process of learning the dictionary of atoms used in
sparse coding. Instead of using a predefined dictionary, such as a Fourier basis or a wavelet basis, the
idea in dictionary learning is to learn the dictionary from the data itself. This has the advantage that
the dictionary can adapt to the specific structures in the data, leading to more efficient representations.
In the process of dictionary learning, an initial dictionary is typically chosen at random, and then
iteratively updated to better fit the data. This process involves two alternating steps: a sparse coding
step, where the coefficients for the current dictionary are found, and a dictionary update step, where
the dictionary atoms are updated given the current coefficients. This is a non-convex optimization
problem, and various algorithms have been proposed to solve it, such as the K-SVD algorithm, the
method of optimal directions (MOD), and online dictionary learning.
Sparse coding and dictionary learning together form the basis for a class of models known as sparse
coding models. These models have been successful in a variety of tasks, such as image denoising,
inpainting, and compression, audio source separation, and more. They have also been used as a
tool for feature extraction in machine learning, where the sparse coefficients are used as features for
classification or regression tasks.
2.1 Sparse presentation
Before the era of deep learning, sparse representation was of the most successful technique in signal/image processing and computer vision. As a matter of fact, sparse coding and dictionary learning
provide a framework for understanding deep learning models. In particular, convolutional neural networks can be seen as a generalization of sparse coding models, where the dictionary is learned in a
hierarchical fashion; see for instance Visualizing and understanding convolutional networks by Zeiler
and Fergus.
Given an image x of size m × n. We first decompose it into patches, then for each patch, we
concatenating the columns of the patch into a single column vector, then lastly putting the columns
together we get an matrix representation of the image. More precisely, let pi be a patch of the image,
and ?pi be the column of concatenating the columns of pi
, then
X =
[
?p1, ?p2, · · · , ?pN
]
∈ R
P ×N , (2.1)
where P is the size of each patch, and N is the total number of patches. Now let the set {d1, d2, ..., dK}
be a bank of atoms, i.e. our dictionary, each atom has the same size at the image patches above.
Vectorizing each atom (noted as ?di) and putting them together, we obtain the matrix representation
of the
D =
[
?d1,
?d2, · · · ,
?dK
]
∈ R
P ×K. (2.2)
Given a patch p, it can be written as the linear combination of the atoms, that is
p =

K
i=1
aidi + e (2.3)
where ai
is the cofficient of each atom, and e is the error (meaning that p cannot be perfecty represented
by the atoms). In the vectorized form, we have
?p = Da + e (2.4)
where a = (a1, a2, ..., aK)
T ∈ R
K is the coefficient vector. With the above setup, now we can represent
X with D, which reads
X = DA + E (2.5)
where A = [a1, a2, · · · , aN ] is the coefficient of each patch and E = [e1, e2, · · · , eN ] is the error in
representing each patch.
When we are given a perfect dictionary, the representation of X under D is sparse, that is the
2
number of non-zeros in A is the smallest. However, in practice it is impossible to have the perfect
dictionary, hence we need to learning it, from given image(s).
2.2 Dictionary learning: a basic model
Given an image in patched form X, to learn dictionary from it, we need to solve the following optimization problem
min
D,A
1
2
∥X ? DA∥
2
F + λ∥A∥0, (2.6)
where
? || · ||2
F
is the sum of the squares of all the elements;
? || · ||0
returns the number of non-zeros elements;
? λ > 0 is a balancing weight.
Apparently, the above problem is non-convex due to ∥·∥0. To circumvent this difficulty, we can replace
it with the ?1-norm, leading to
min
D,A
1
2
∥X ? DA∥
2
F + λ∥A∥1. (2.7)
Note that in both models, their best λ’s have different values. By “best”, means that the resulting A
is sparest.
Though we have made the problem convex, there remains two problems
? The first problem is that (2.7) is non-smooth, making it impossible to use gradient descent
schemes. As a result, we need to further smooth the ?1-norm, for example the Huber function
which is denoted as h(A), which results in
min
D,A
Φ(A, D) = 1
2
∥X ? DA∥
2
F + λh(A). (2.8)
? The second problem is that (2.8) is non-convex, as we have the bilinear term DA. However,
when either term is fixed, the problem is convex. More precisely, when A is fixed, (2.8) is a
convex problem of D. As a consequence, we can use an alternating strategy to solve (2.8)
{
A
(k+1) = A
(k) ? γA?AΦ(A, D(k)
),
D(k+1) = D(k) ? γD?DΦ(A
(k+1), D),
(2.9)
where γA, γD are step-sizes.
See Figure 2 below for an image and dictionaries learned from it.
Figure 2: Given image and learned dictionary from the image. Taken from the paper “Sparse Representation for Color Image Restoration” by Julien Mairal, Michael Elad, and Guillermo Sapiro.
3
Remark 2.1. Take fixed A for example, note that the gradient of Φ respect to D is Lipschitz, however
the Lipschitz constant depends on A. Same for A when fixing D. So a proper strategy of choosing
step-size should be devised.
Remark 2.2. In the above discussion, we use Huber function to smooth, you can also consider other
strategies for smoothing. For the algorithm, you can also consider others instead of (2.9).
Remark 2.3. The weight parameter λ here is scalar, you can also consider other types of λ, such as
a vector where each λi
is for each row of A. Another option of λ is reweighted λ, and please search
for references for this.
3 Tasks
In this project, we provide 18 images for numerical experiments, and 3 grayscale images for test, see
below.
ALERT: the noised images for denoising are now updated, in mat file format, please use the new
data in the project.zip file.
Figure 3: McM dataset.
(a) Barbara (b) Cameraman (c) Lena
Figure 4: Grayscale images.
These images can be found in the data.zip file, including clean images and noised images.
3.1 Task 1: dictionary learning model for grayscale image and algorithm
The first task of the project consists of three sub-tasks
? Design your dictionary learning model, either based on (2.8) or other approaches from the
4
literature. But necessary references should be provided if you use any result from existing work!
? Design the optimization algorithm for solving the model. Detailed derivation should be provided.
It is good to have theoretical analysis, but this is not mandatory. Necessary references should
be provided.
? For the three images in Figure 4, output the dictionaries learned using the model and algorithm
above. Specifics of the settings should be provided.
The above task is for grayscale image, while for color image is for the second task.
Tricks There are tricks can be applied to enhance the dictionary learning result, such as i) using
overlapping patches; ii) remove the DC component (mean value of patch elements) of the patches; iii)
Normalize the contrast of the patches, make the norm of patch to be 1. There are also other tricks
available, you can discover them by checking the literature.
You can also explore different patch size and find the optimal one (if there exists one), but should
be care what is the criteria for defining such an “optimality”?
Remark 3.1. In the data.zip file, a very simple MATLAB file is provided for your reference.
3.2 Task 2: color images
Generalize the model in Task 1 to color images. For color images, we have different color space, such
as RGB color space which is the most natural way as it is how we perceive color. Other spaces can
be considered including CIELab, YCbCr, or HSI color spaces.
You can choose the color space as you prefer, one problem should be bare in mind is that if you
choose other color space other than RGB, the statistical property of noise in that space will be change!
3.3 Task 3: image denoising
The 3rd task is image denoising implementation. In the data.zip file, for each image of the McM
dataset, besides the clean one, there is also a noisy image. So for this task, use the dictionary learning
methods from the 1st task to conduct image denoising problems. To this end, we have the following
sub-tasks
? Derive or describe the details of the method that you design for imaging denoising using dictionaries learned.
? The metric of measuring the performance of your denoising result, PSNR (peak-signal-to-noiseratio) can be considered (the higher the better). The definition is
MSE =

i,j (x
?
i,j ? xˉi,j )
2
N
and PSNR = 10 × log10
2552
MSE
where x
?
is the clean image, xˉ is the denoised image, and N is the total number of pixels. For
color image, the PSNR is computed for each R/G/B channel (If you use different color space,
then denoising in that space and then back to RGB space).
For the above sub-tasks above, as in Figure 3 there are 18 images in total, we can treat each
image individually and repeat 18 times. On the other hand, we can also put 18 images together and
implement the denoising pipeline only one time. However, the scale of this configuration will be very
large which may take very long time to optimize. Moreover, the number of atoms in the dictionary
should be large enough to capture the diverse features of the 18 images. There are two possible tricks
to make the overcome these two difficulties
? To reduce the scale of problem, from each image we can choose a subset of patches and combine
them together. The question then becomes how to choose patches from each image.
? To capture features from the images, we can increase the number of atoms, for example 256 as
shown in Figure 2, or even more.
5
Remark: The red part is now optional, no need to consider 18 images into one.
3.4 Task 4: unknown ground truth
In the above tasks, the dictionary is learned from the clean images, which in most practical scenarios
is impossible. Therefore, for this task, with a given noisy image, design a model which can perform
dictionary learning and imaging denoising either in the same time or separately, output the PSNR
values. Details of the model and algorithm should be provided.
4 Assessments
Below we describe the format and assessment criteria of the project.
? Project format It is a group project, maximum number of team members is 5.
? Grades The grade for the project include three parts: report and result ranking (in terms of
PSNR values), with percentage 50%, 50%, respectively.
For the result ranking, the scores are based on percentile, which is in the table.
Percentile Scores
0 - 10% 100
11 - 30% 90
31 - 60% 80
61 - 90% 70
91 - 100% 60
? Submission Project report, source code. The source code should be executable, and all the
results should be reproducible. As a result, in the submitted codes, you should include a
readme file explaining the parameters choices of every result, so that our TA can check them.
? Report and code submission deadline 23:59, 14 January, 2024.
? Warnings
– Do not used existing existing tools, like OpenCV, skitlearn, codes from Github or others.
The source codes submitted must comply with the model and algorithm designed in the
submitted report.
– Cheating is not allowed, if found, all involved will receive a penalty.
6

相关文章

【上一篇】:到头了
【下一篇】:没有了

版权所有:留学生编程辅导网 2021,All Rights Reserved 联系方式:QQ:99515681 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。