^{1} Department of Management Studies, Centre for Sustainable Technologies ^{*}Address for Correspondence : Abstract Land cover (LC) refers to what is actually present on the ground and provide insights into the underlying solution for improving the conditions of many issues, from water pollution to sustainable economic development. One of the greatest challenges of modeling LC changes using remotely sensed (RS) data is of scaleresolution mismatch: that the spatial resolution of detail is less than what is required, and that this subpixel level heterogeneity is important but not readily knowable. However, many pixels consist of a mixture of multiple classes. The solution to mixed pixel problem typically centers on soft classification techniques that are used to estimate the proportion of a certain class within each pixel. However, the spatial distribution of these class components within the pixel remains unknown. This study investigates Orthogonal Subspace Projection  an unmixing technique and uses pixelswapping algorithm for predicting the spatial distribution of LC at subpixel resolution. Both the algorithms are applied on many simulated and actual satellite images for validation. The accuracy on the simulated images is ~100%, while IRS LISSIII and MODIS data show accuracy of 76.6% and 73.02% respectively. This demonstrates the relevance of these techniques for applications such as urbannonurban, forestnonforest classification studies etc. Land cover (LC) describes the physical state of the earth's surface in terms of the natural environment (such as vegetation, soil and water, etc.) and the manmade structures (e.g. buildings). These LC features have been classified using the remotely sensed (RS) satellite imagery of different spatial (panchromatic), spectral (multispectral, superspectral and hyperspectral) and temporal resolutions. Superspectral (SS) imagery provides additional benefits over multispectral (MS) imagery in many applications such as detection, discrimination, classification and identification of objects. Earlier SS imagery were processed and analysed by MS image processing algorithms via preprocessing such as feature extraction, dimensionality reduction and band selection. However, such MS to SS approaches are not straightforward extension of MS image processing. When the spectral resolution is low (34 bands) as in MS images, the image processing techniques are generally developed to explore spatial information for spatial domain analysis. When spectral resolution is increased as in SS imagery, such spatial domainbased MS imaging techniques may be found to be less effective in certain applications. For example, in some applications, where the object size is smaller than the pixel resolution, such as houses/buildings in a 250 m spatial resolution data, the processing relies at subpixel level. In order to address this problem, unmixing techniques have been developed to exploit subpixel level information for image analysis [1]. Subpixel class composition is estimated through the use of techniques, such as linear mixture modelling [2], supervised fuzzyc means classification [3] and artificial neural networks [4]. Linear mixture model is briefly stated here: Let h(x, y) be the signature collected by the sensor at the pixel with spatial coordinates (x, y). This signature can be considered an Ndimensional (ND) vector, where N is the number of spectral bands; it can also be modeled as a linear combination of endmembers (pure pixels or pixels representing only one category in the image) vectors e_{i} , i = 1, …, E using equation (1).
where is a scalar value representing the functional coverage of endmember vector e_{i} at pixel h(x, y). Two constraints are usually imposed in equation (1). These are the abundance nonnegativity constraint and abundance sumtoone constraint as defined by equation (2) and (3):
This allows proportions of each pixel to be partitioned between classes. The output of this technique produce abundance maps that display the portion of a certain class within each pixel and there are as many abundance maps as the number of classes. The abundance ranges from 0 to 1 in any given pixel, where 0 indicates absence of a particular class and 1 indicates full presence of that class in that particular pixel. Intermediate values between 0 and 1 may represent a fraction of that class. For example, 0.4 may represent 40% presence of a class in an abundance map and the remaining 60% could be some other class. Orthogonal Subspace Projection (OSP) proposed by Chang [5] has been implemented here using SS data based on 1) best utilization of a priori target knowledge and 2) effective use of information in numerous spectral bands. The prior target knowledge is characterized in accordance with target signature of interest, referred to as desired target signature d and undesired target signature matrix U formed by those target signatures that are known but not wanted in image analysis. OSP separates d from U in a signal detection model, then eliminate the undesired target signatures in U prior to detection of d so as to improve signal detectability. This approach is effective when the number of spectral bands is higher than the target signatures of interest. The abundance maps obtained from this technique is less error prone representation of LC than obtained using hard classification techniques [6]. However the spatial distribution of these class components within the pixel remains unknown. Atkinson [7] examined a pixelswapping optimisation algorithm within a geostatistical framework for allocating classes of subpixels. The proportion of LC within each pixel derived using various soft classification methods described above is used to map the location of class components within the pixels. Soft pixel proportions are randomly allocated to hard subpixel binary classes initially and iterated by swapping pixel values in the hard classified map. The objectives of this paper are: (i) Generation of abundance maps from SS RS data of 250 m spatial resolution using OSP; (ii) Predict LC’s spatial distribution at subpixel resolutions by maximizing the spatial autocorrelation between neighbouring subpixels. 2. Methods2.1 Orthogonal Subspace Projection (OSP) : SS data (such as MODIS with 250 to 1000 m spatial resolution), more often than not, encompass several different LC classes forming a mixed pixel. OSP eliminates all undesired spectral signatures within a pixel and then uses a matched filter to extract the desired spectral signature (endmember). The proportion of a class within a pixel is estimated as follows: Let λ be the number of spectral bands and r be a λ x 1 dimensional column vector of digital numbers (reflectance). Assume that there are p targets, t_{1} , t_{2} , …, t_{p} present in an image scene. Therefore, there are p spectrally distinct endmembers. Let m_{1}, m_{2}, …, m_{p} denote their corresponding target signatures. A linear mixture model r, models the spectral signatures of r as a linear combination of m_{1} , m_{2}, …, m_{p} with appropriate abundance fractions specified by α_{1} , α_{2 }, …, α_{p}. Therefore M is a λ x p matrix representing target spectral signature (made up of linearly independent columns), denoted by [m_{1} , m_{2} , …, m_{p}], where λ > p (overdetermined system) and m_{j} is an λ x 1 column vector represented by the spectral signature of the j^{th} target t_{j} resident in the pixel vector r. Let α = (α_{1} , α_{2} , …, α_{p})^{T} be a p x 1 abundance column vector associated with r where α_{p} denotes the fraction of the p^{th} target signature m_{p} present in the pixel vector r. A mixed pixel problem assumes that the spectral signature of the pixel vector r is linearly mixed by m_{1} , m_{2 }, …, m_{p}, the spectral signature of the p targets, t_{1} , t_{2} , …, t_{p} and equation 1 can also be written as (4) where n is the noise or can be interpreted as a measurement or model error. Equation 4 is a standard signal detection model where Mα is a desired signal vector to be detected and n is a corrupted noise. Since we are interested in detecting one target at a time, we can divide the set of p targets, t_{1} , t_{2 }, …, t_{p} into a desired target, say t_{p} and a class of undesired targets, t_{1} , t_{2} , …, t_{p1 }. We need to eliminate the effects caused by the undesired targets t_{1} , t_{2 }, …, t_{p1} that are considered as interferers to t_{p} before the detection of t_{p} takes place. With annihilation of the undesired target signatures the detectability of t_{p} can be enhanced. First m_{p} is separated from m_{1}, m_{2 }, ..., m_{p} in M and equation 4 is rewritten as (5) where d = m_{p} is the desired target signature of t_{p} and U is λ x (p1) matrix = [m1, m2, …, m_{p1}] is the undesired target spectral signature matrix made up of m1, m2, …, m_{p1} which are the spectral signatures of the remaining p – 1 undesired targets, t1, t2, …, t_{p1}. Equation 5 is called a (d, U) model; d is a λ x 1 column vector [d1, d2, …, d_{λ}]^{T}, γ is a (p1) x 1 column vector containing (p1) component fractions of α = [α1, α2, …, α_{p1}]^{T} and n is a λ x 1 column vector representing additive and zeromean white noise with cov=σ^{2}I where I is a λ x λ identity matrix. Using the (d, U) model, OS projector can annihilate U from the pixel vector r prior to detection of t_{p} similar to [8]. (7) where the undesired signal in U have been annihilated and the original noise n has been also suppressed to = Pn. Equation 7 is called a OSP model. The operator minimizes energy associated with the signatures not of interest as opposed to minimizing the total least square error. A general signal detectioninnoise model given in (4) separates a signal source from noise. The (d, U) model in (5) is derived from the general signaldetectioninnoise model by breaking up the considered signal source into two types of signal sources d and U that can be processed separately on the basis of prior knowledge. The OSP model is a one signal source (d) model derived from the (d, U) model with the U in the (d, U) model annihilated by P. Thus OSP model is a customdesigned signal detectioninnoise model from (4) where the signal source has been preprocessed by P for signal enhancement. It should be noticed that P operating on U_{γ} reduces the contribution of U to about zero. So,
On using a linear filter specified by a weight vector x^{T} on the OSP model, the filter output is given by
An optimal criteria here is to maximize signal to noise ratio (SNR) of the filter output
where E{ } denotes the expected value. Maximisation of this is a generalized eigenvalueeigenvector problem (11) where . The eigenvector which has the maximum λ is the solution of the problem and it turns out to be d. The idempotent (P ^{2} = P) and symmetric (P^{ T} = P) properties of the interference rejection operator are used. One of the eigenvalues is and it turns out that the value of x^{T} (filter) which maximizes the SNR is (12) .
is the abundance estimate of the p^{th} target material. In the absence of noise, the estimate matches with the exact value in (5). 2.2 Pixelswapping algorithm : The pixelswapping algorithm is designed to receive, as input, an image of LC proportions in K=2 classes (probably obtained by application of a soft classifier). Assumption here is that LC pixels have intra and inter spatial dependence, forcing the subpixel allocation model as an optimisation problem to maximize the autocorrelation between the pixels. First, the pixellevel LC proportions are transformed into subpixel hard LC classes. Thus, if 10 by 10 subpixels are to be mapped within each pixel, a LC proportion of 57 percent would mean that 57 subpixels were allocated to that class. Second, these subpixels are allocated randomly within each pixel. Once allocated, only the spatial arrangement of the subpixels can vary, not the actual attribute values. Furthermore, the number of subpixels allocated within each pixel remains fixed. Once the random initialization is done, the objective is to vary the spatial arrangement of the subpixels in such a way that the spatial correlation between neighboring subpixels (both within and between pixels) is maximized given that the pixellevel proportions cannot vary. The three basic steps are: 1. For every subpixel the atractiveness A_{i} of a pixel i is predicted as a distanceweighted function of its j = 1, 2, …, j neighbours :
where z(x_{j}) is the (binary) class of the j ^{th} pixel at location x_{j} and λ_{ij} is a distancedependent weight predicted as:
where h_{ij} is the distance between the location x_{i} of pixel i for which the attractiveness is desired, the location x_{j} of a neighbouring pixel j, and α is the nonlinear parameter of the exponential model. The exponential weighting function chosen here is arbitrary, and other alternatives such as simple inverse distance weighting function of the Gaussian model could also be used. The choice of nonlinear parameter and the number of nearest neighbours are also important considerations. 2. Once the attractiveness of each subpixel location has been predicted based on the current arrangement of subpixel classes, the algorithm ranks the scores on a pixel by pixel basis. For each pixel, the least attractive location at which the subpixel is currently allocated to a “1” (i.e., a “1” surrounded mainly by “0”s) is stored: Similarly, the most attractive location at which the pixel is currently allocated to a “0” (i.e., a “0” surrounded mainly by “1”s) is also stored: . (18) 3. Subpixel classes are swapped on the following basis. If the attractiveness of the least attractive location is less than that of the most attractive location, then the classes are swapped for the subpixels in question:
If it is more attractive, no change is made. The above 3 steps are repeated such that a solution is approached iteratively. The process is stopped either at a fixed number of iterations or when the algorithm converges on a solution. The algorithm is summarized below in simple terms:
(a) For each pixel
(ii) If A_{i} < A_{j} swap the single pair of subpixel allocations. 3. DataTo test the performance of the algorithms several datasets of various spatial resolutions were simulated. Algorithm implementation and artificial dataset generation were done using C language and were validated on RS data of different spatial and spectral resolutions. 1. Simple and Polygonal targets : To test the performance of algorithms two class images of simple geometric shapes (line of size 1000 rows x 1000 columns and circle of size 700 x 700) were generated. These class maps were then used to generate 100 subpixels by 100 subpixels target images for line and 70 by 70 subpixels for circle respectively; which is onetenth of the original class map resolution. The purpose of these simulated images was to use them in lieu of real soft classified images of LC proportions. Similarly, a complex shape class map (1360 x 1400) was generated to test the performance of the algorithm on irregular shapes that might correspond to any LC feature. The image was degraded to 136 x 140 to see the performance of the pixelswapping algorithm. 2. High Resolution IRS LISSIII data : IRS LISSIII (Indian Remote Sensing Satellite Linear Imaging Self ScannerIII) data acquired on 28^{th} March, 2006 (procured from NRSA, Hyderabad, India) having 3 spectral bands (Green, Red and NIR) with 23.5 m spatial resolution for Bangalore (12°39’00’’ to 131°3’00’’N and 77°22’00’’ to 77°52’00’’E) were used. The data were georegistered with known ground control points (GCPs) and geometrically corrected and resampled to 25 m (1360 x 1400 pixels) which helped in pixel level comparison of abundance maps obtained by unmixing MODIS data with LISSIII classified map. 3. SS MODIS data : MODIS (Moderate Resolution Imaging Spectroradiometer) data were downloaded from GLCF (Global land Cover Facility), NASA, USA. These data have 7 bands with band 1 and 2 at 250 m and band 3 to 7 at 500 m spatial resolution. The data were georegistered with GCPs, geometrically corrected and bands 3 to 7 were resampled to 250 m (136 x 140 pixels) so that all 7 bands are at same spatial resolution. 4. Results and discussion4.1 Classification of LISSIII data : The class spectral characteristics for four LC categories (buildings/urban, vegetation, water bodies and others) using LISSIII MSS bands 2, 3 and 4 were obtained from the training pixels spectra to assess their interclass separability and the images were classified using Gaussian Maximum Likelihood classifier with training data uniformly distributed over the study area collected with pre calibrated GPS (figure 1). This was validated with the representative field data (training sets collected covering the entire city and validation covering ~ 10% of the study area) and also using Google Earth image (http://www.earth.google.com). LC statistics, producer’s accuracy, user’s accuracy and overall accuracy computed are listed in table 1. 4.2 Spectral unmixing of MODIS data : Unmixing of MODIS data were done to obtain abundance maps using OSP. Visual inspection as well as accuracy assessment of abundance images corresponding to each endmember showed that OSP algorithm maps LC categories spatially and proportionally similar to supervised classified image of LISSIII. The proportions of the endmembers in these images (figure 2) range from 0 to 1, with 0 indicating absence of the endmember and increasing value showing higher abundance. Bright pixels represent higher abundance of 50% or more stretched from black to white. Figure 1 : LC classification using LISSIII MSS. Table 1: LC details using LISS III MSS
Accuracy assessment at pixel level : A pixel of MODIS (250 m) corresponds to a kernel of 10 x 10 pixels of LISSIII spatially. The builtup abundance map was validated pixel by pixel with the LISSIII builtup map. The proportion of builtup percentage obtained in these 136 x 140 cells were compared using three measures of performance – Root Mean Square (RMS) errors (0.160), Bivariate Distribution Functions (BDFs) between LISSIII classified (real proportion) and MODIS based abundance map (predicted proportions) and Wilcoxon signed rank test. The bivariate plot (figure 3) was helpful to visualise the accuracy of prediction by mixture models. While most of the points fall between the two 20% difference lines, some points are outside the lines indicating that there could be higher or lower prediction than the actual values. The Pearson's productmoment correlation was 0.84 (pvalue < 2.2e^{16}). 10% of the difference pixels between the LISSIII proportion and the MODIS abundance were randomly selected at the same spatial locations in the two images by random number generation. These difference data values failed when subjected to normality test (table 2). Table 2 : Normality tests for the difference in LISSIII and MODIS proportions
Figure 2 : LC abundance maps of builtup, vegetation, water bodies and others. Figure 3 : BDF of LISSIII classified map and MODIS based abundance map. The data points corresponding to builtup proportion in a MODIS pixel and corresponding LISSIII pixels were then subjected to boxplot and histogram analysis. The distribution was symmetrical and hence Wilcoxon signed rank test with continuity correction was performed (pvalue = 0.6633), which showed less evidence against null hypothesis – true location of the median of the difference between two data sets is equal to 0. 4.3 Pixel swapping algorithm on simple and irregular shape images : The simulated class maps of line and circle are shown in figure 4(A) and 4(E). Pixel swapping algorithm was implemented on the simulated abundance maps of line [figure 4(B)], circle [figure 4(F)] and polygon shape. The algorithm initially allocated each subpixel to a binary hard class randomly, such as to maintain the original pixel proportions (figure 4(C) and figure 4(G)). Thereafter, the algorithm made a maximum of one swap perpixel for each iteration. The number of nearest neighbour used was 3 and the nonlinear parameter of the exponential model α was also set to 3. The line image took 34 iterations, circle  18 iterations and complex shape took 43 iterations to converge to class maps [figure 4(D) and 4(H)] from their abundance maps. For all the three features the results are acceptable visually. The overall accuracy of the final converged map for line is 99.97, circle is 99.94 and polygon is 99.84%. Figure 4 : Subpixel mapping of a linear feature and a circle: (A) Test image of line, (B) Abundance image, (C) Random initial allocation to subpixels, (D) Solution after convergence of line, (E) Test image of circle, (F) Abundance image of circle, (G) random initial allocation, (H) Converged map of the circle after pixel swapping iterations. p  Sensitivity= TP / (TP + FN) (20) Figure 5 : (A) Builtup pixels shown in black and nonbuilt shown in white from LISSIII classified image, (B) Subpixel map of builtup, (C) Converged map of the builtup after applying pixel swapping algorithm. The sensitivity measures the proportion of actual positives which are correctly identified. The specificity measures the proportion of negatives which are correctly identified. PPV is the precision of positives that were correctly identified. NPV is the precision of negatives correctly identified. Table 3 is the contingency table for LISSIII class map of builtup and converged map. The analysis showed that sensitivity was 0.6, specificity was 0.69, PPV was 0.6 and NPV was 0.69. Accuracy assessment was also performed using error matrix. The producer’s and user’s accuracy was 60% and the overall accuracy was 65% with respect to the LISSIII original class image. Table 3 : Contingency table for LISSIII MSS class image and converged image
However, when the converged map was compared with the ground truth, the accuracy was much higher (76.6%). Next, the proportion of the number of builtup pixels before and after applying the pixel swapping algorithm were verified. The original as well as the converged map were resampled to 1400 x 1400 and divided into smallimages of 100 by 100 pixels (14 x 14 blocks). Proportion of builtup per block was computed for each block. The BDF of actual and converged builtup pixels is given in figure 6, which shows class map obtained from simulated abundance LISSIII map using pixelswapping algorithm is as good as the original image (r =0.9998275, p < 2.2e^{16}). Pixel swapping on MODIS abundance map : Similar to LISSIII, pixel swapping algorithm was implemented on MODIS abundance map. Initially the data were grouped into two categories (builtup/nonbuiltup) as shown in figure 7(A). Then subpixels within each pixel were randomly allocated to one of the two classes (builtup/nonbuiltup) as shown in figure 7(B). The nonlinear parameter of the exponential model and the bandwidth were adjusted at each run to get the best converged map at 3. A 2 by 2 contingency table was generated to assess the accuracy of the prediction (table 4). Figure 6 : BDF of LISSIII class map and LISSIII converged map. Table 4: Contingency table for LISSIII MSS class and MODIS converged image
Figure 7 : (A) Subpixel map of builtup from MODIS, (B) Random allocation of builtup subpixels, (C) Converged map after applying pixel swapping algorithm. Accuracy assessment showed that the producer’s and user’s accuracy was 52% and the overall accuracy was 58% with respect to LISSIII original classified image. Sensitivity was 0.52, specificity was 0.62, PPV was 0.51 and NPV was 0.63, indicating 50% match of the MODIS converged map with LISSIII class map. However, when the converged map was compared with the ground truth, the accuracy was 73.1%. Images of 100 by 100 pixels (14 x 14 blocks) were generated as earlier, and the BDF of actual and predicted builtup was plotted (figure 8), which indicates significant match (r = 0.9736454, p < 2.2e^{16}) in terms of the pixel proportions. The proportion of the class is almost retained in the converged map compared to the class map even though the spatial location may shift due to the inherent correlation maximisation property of the pixel swapping algorithm. Figure 8 : BDF of LISSIII class map and MODIS converged map. There are certain conditions under which the pixelswapping does not perform accurately. The algorithm fails if a fuzzy set represents the classes in the scene, such that the mixed pixels arise through vagueness rather than ambiguity. The formation of compact convex shapes is LISSIII and MODIS converged maps [figure 5(C) and 7(C)] is observed as the pixel swapping algorithm attempts to maximize the spatial correlation between neighbouring subpixels. Presence of multiple small objects in a pixel, also leads to coalescing of smaller objects, leading to incorrect prediction of a single large object. There are some issues related to choice of parameters – the distanceweighted function, the number of neighbours and the nonlinear parameter of the exponential function. With the smaller number of neighbours, the speed of the algorithm is very high, however it produces unsatisfactory results. An important consideration is that the number of neighbours should not be so large that a given subpixel is attracted to other subpixel that it cannot neighbour such as belonging to a feature that exist in a pixel that its own pixel does not neighbour [7]. Thus the maximum nonlinear parameter should be one subpixel less than the number of subpixels along a pixel side. Image resample factor (image of 136 x 140 is resampled to 1360 x 1400 which is 10 times) is another important parameter which is a relative increase is spatial resolution from the subpixel level image of proportions to the pixel level image of hard LC classes. 5. ConclusionStatistical analysis and validataion of the LISSIII proportion of 10 by 10 builtup pixels and corresponding MODIS pixel confirm the utility of OSP (Orthogonal Subspace Projection) for generation of abundance maps. The validation showed higher accuracy in LISSIII converged image (76.6%) as well as MODIS converged image (73.1%) using pixel swapping algorithm. These techniques would be useful for detecting objects of interest (such as urban sprawl detection, forest/nonforest classification etc.) from coarse resolution data in the absence of high spatial resolution image. 6. AcknowledgementWe are grateful to Prof. N. V. Joshi for suggestions during the analysis. We thank Dr. K. V. Gururaja for his useful comments during the implementation of the algorithms. We thank Indian Institute of Science for financial and infrastructure support. NRSA, Hyderabad is acknowledged for providing subsidised LISSIII data. MODIS data were downloaded from http://glcf.umiacs.umd.edu/index.shtml Google earth data (http://earth.google.com) was used as reference data for image analysis. 7. References
