Commit 69f5b4cb authored by Adam Formanek's avatar Adam Formanek

finished research

parent c24eb0c3
......@@ -33,50 +33,57 @@ These two architectures should be then compared and commented.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Input data}
Dataset was taken from \href{https://www.kaggle.com/yassershrief/video-classification-tutorial}{Kaggle}.
It is a 5 minute long video of popular kids show Tom and Jerry.
Video was parsed into frames and then converted and preprocessed to the dataset of 1000 frames.
Preprocessing consist of resizing image to have equal width and height.
Resized images are then downscaled to be half the size and half the resolution.
It is a 5 minute long video of a popular kids show Tom and Jerry.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Research}
Super resolution have a broad field of papers and approaches.
Since U-Net and GAN architecture must be used in solution, research was done mainly about these approaches.
Super resolution is a broad field with a lot of different approaches.
Since U-Net and GAN architecture must be used in solution, research was done mainly about these topics.
U-Net is a very common approach for image segmentation.
It firstly appeared in \cite{ronneberger2015unet}, where it was used for segmentation of biomedical images.
Thanks to the unique architecture, it was then applied for many other purposes, including super resolution.
In \cite{Park_2018} was U-Net used for upscaling x-ray images with very similar architecture.
Input image is downscaled (encoded) using convolultion and \emph{max pool} layers.
Then using \emph{upsample convolution} to get bigger image and \emph{skip connections}, from the encoding phase containing important information about the original frame, image is reconstructed (decoded) with higher resolution.
In \cite{Park_2018} was U-Net used for resolution upscaling of x-ray images.
Input image is downscaled (encoded) using convolultion and max pool layers.
Then by using upsample convolution and skip connections, from the encoding phase containing important information about the original frame, image is reconstructed (decoded) with higher resolution.
Another example can be found in \cite{lu2021single}, where original architecture from \cite{ronneberger2015unet} was modified. Firstly all batch normalizations and one convolution layer in each block are removed.
Input image is upscaled and has \emph{skip connection} with the output image.
Lastly error is measured by mixing Mean Square Error (MSE) with weighted Mean Gradient Error ($\lambda_G$MGE), which is proposed for a shart edge reconstruction.
Input image is upscaled and has skip connection with the output image.
Lastly, error is measured by mixing \emph{Mean Square Error} (MSE) with \emph{weighted Mean Gradient Error} ($\lambda_G$MGE), which is proposed for a sharp edge reconstruction.
GAN was first introduces in \cite{goodfellow2014generative}.
GAN was first introduced in \cite{goodfellow2014generative}.
This approach consists of two main parts, generator and discriminator.
Generator takes a some noise on input and generates image, that is passed to the discriminator, which then measures, how real the generated image looks.
Good analogy can be visualizing validator and generator as a oponents, trying to beat each other in minimax game \cite{whatIsGAN}.
Generator takes some noise on input and generates image, that is passed to the discriminator.
Discriminator then measures, how similar the generated image is to the reference image.
Good analogy can be visualizing discriminator and generator as oponents, trying to beat each other in minimax game \cite{whatIsGAN}.
First implementation of this approach for super resolution, SRGAN, was described in \cite{ledig2017photo}.
This approach combines GAN and deep residual network with \emph{skip connections}.
This approach combines GAN and deep residual network with skip connections.
Perceptual loss is used instead of MSE as loss function to improve visual effect of the reconstructed result.
Another approach found in \cite{sajjadi2017enhancenet}, is a modification of \cite{ledig2017photo}.
Another approach, found in \cite{sajjadi2017enhancenet}, is a modification of \cite{ledig2017photo}.
Upsampling by nearest neighbor is used instead of bicubic interpolation, which is more computationaly efficient, but produces checkerboard artifacts.
That is solved by adding convolution layer.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Methods}
Input dataset have 1280x720 resolution.
Preprocessing of the input dataset includes parsing video into individual frames and resizing them to have equal height and width (in particular 512x512).
Low and corresponding high resolution images are needed for training the model.
So for every frame low resolution one is created by using bicubic downsampling (4x $\rightarrow$ 128x128).
Original image is then representing high resolution version.
Implementation is not done yet.
As for GAN architecture, implementation is currently in progress and will follow \cite{ledig2017photo}.
Implementation of U-Net architecture will probably be inspired by \cite{lu2021single}.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% --- VYSLEDKY
\section{Výsledky}
\section{Future work}
First of all implementation of described approaches must be written and experimentaly tested.
Models can be then trained and tested on described dataset and compared to each other.
% \begin{figure}[h]
......@@ -94,10 +101,6 @@ That is solved by adding convolution layer.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% --- ZAVER
\section{Future work}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\bibliography{reference}
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment