**Disclaimer - In the following, the royal 'We' is being used - mostly as a reference to the "Grab Cut" paper authors.**
Background
The Grab Cut method addresses the challenge of separating object from background in a colored image, given certain constraints. The user is requested to mark a single rectangle around the object, defining the outer part of the rectangle as definite background, and the inner part of it as an unknown combination of the object (foreground) and some background. As explained in the following, these constraints are used as initial solution to the problem, leading to an iterative method which in conclusion aims to assign each pixel in the image its label - Background or Foreground.
The algorithm in a nutshell
1. Theory - Like other popular methods aiming to solve the segmentation/matting/re-targeting challenges, Grab Cut uses the power of the "Graph Cut" algorithm, which was designed to solve the "Min Cut/Max Flow" problem. The theoretic prologue of the algorithm starts with defining an optimization problem (Energy cost function) which can be solved by creating a specific graph model (Vertices & edges with weights) - and running the Graph Cut algorithm with it as input.
So what is this energy cost function (which is invisible in the implementation)? it's inputs are the input image & labeling, defining for each pixel whether it is part of the background or foreground. Our target is to give lower cost to better labeling. In order to do so, we encourage (lowering the cost) similar (same color) neighboring pixels to have the same label, and vice versa. Moreover, we encourage pixels to match a certain color distribution model, according to their values.
A brief description of the energy function - The function has 2 parts - Data term & Smoothness term. The data term measures how well we fit a certain model - in our case, the color distribution. For every pixel we take its label alpha & color Z, and use it as inputs to see well it matches a color distribution model using h(). In our case, the model is a K-Gaussian Mixture Model - Remember this K, we will use it later.
The smoothness term measures how smooth the labeling is over similar/a-similar neighboring pixels. For each neighboring pair that do not have the same label we increase the energy function according to a parameter Beta that lies in the exponent and effectively determines how smooth our labeling has to be (More details can be found in the paper).
Background
The Grab Cut method addresses the challenge of separating object from background in a colored image, given certain constraints. The user is requested to mark a single rectangle around the object, defining the outer part of the rectangle as definite background, and the inner part of it as an unknown combination of the object (foreground) and some background. As explained in the following, these constraints are used as initial solution to the problem, leading to an iterative method which in conclusion aims to assign each pixel in the image its label - Background or Foreground.
The algorithm in a nutshell
1. Theory - Like other popular methods aiming to solve the segmentation/matting/re-targeting challenges, Grab Cut uses the power of the "Graph Cut" algorithm, which was designed to solve the "Min Cut/Max Flow" problem. The theoretic prologue of the algorithm starts with defining an optimization problem (Energy cost function) which can be solved by creating a specific graph model (Vertices & edges with weights) - and running the Graph Cut algorithm with it as input.
So what is this energy cost function (which is invisible in the implementation)? it's inputs are the input image & labeling, defining for each pixel whether it is part of the background or foreground. Our target is to give lower cost to better labeling. In order to do so, we encourage (lowering the cost) similar (same color) neighboring pixels to have the same label, and vice versa. Moreover, we encourage pixels to match a certain color distribution model, according to their values.
A brief description of the energy function - The function has 2 parts - Data term & Smoothness term. The data term measures how well we fit a certain model - in our case, the color distribution. For every pixel we take its label alpha & color Z, and use it as inputs to see well it matches a color distribution model using h(). In our case, the model is a K-Gaussian Mixture Model - Remember this K, we will use it later.
The smoothness term measures how smooth the labeling is over similar/a-similar neighboring pixels. For each neighboring pair that do not have the same label we increase the energy function according to a parameter Beta that lies in the exponent and effectively determines how smooth our labeling has to be (More details can be found in the paper).
2. From theory to practice - In order to solve this "Theoretic" optimization problem, we use the "Graph Cut" method. As seen & proven in [1], creating a graph with weights as in Figure 1, and then running the "Min-Cut" algorithm - yields 2 groups of vertices (Each of which represents a pixel) with labeling that minimizes the cost function.
We start creating the graph by taking N vertices (N=number of image pixels) and linking neighboring vertices (meaning vertices which represent neighboring pixels) with edges whose weights are defined by the vertices' pixels color similarity. The next step is to add 2 more vertices (representing foreground & background labels) to the graph, each of which linked to all of the N pixels with an edge whose weight is defined by the probability of the pixel to match a color distribution of the background/foreground.
We start creating the graph by taking N vertices (N=number of image pixels) and linking neighboring vertices (meaning vertices which represent neighboring pixels) with edges whose weights are defined by the vertices' pixels color similarity. The next step is to add 2 more vertices (representing foreground & background labels) to the graph, each of which linked to all of the N pixels with an edge whose weight is defined by the probability of the pixel to match a color distribution of the background/foreground.
Figure 1
3. Practice - Our first iteration starts with the background constraints as marked manually. Our mission now is to create the graph, starting with assigning the "inter-neighboring-pixels" weights. These are calculated using the image horizontal & vertical gradient, manipulating it to higher weights to low gradient edges, and vice versa (using an exponent function with the Beta parameter). In order to get the Background/Foreground labeling weights, we generate two k-Gaussian Mixture Models - one according to the Certain-Background pixels (outside the rectangle) color distribution, and one according to the Uncertain-might-be-Forground pixles (inside of the rectangle). Using log-likelihood, the weights are higher as the probability to fit the model is. After solving the min-cut of this graph, we get a new label for each pixel - these labels are the starting point for the next iteration (with the Certain-Background pixels now being the ones which are labeled as background), resulting in an iterative Expectation-Maximization method.
[1] - BOYKOV, Y., AND JOLLY, M.-P. 2001. Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In Proc. IEEE Int. Conf. on Computer Vision.
[1] - BOYKOV, Y., AND JOLLY, M.-P. 2001. Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In Proc. IEEE Int. Conf. on Computer Vision.