The application of cellular automata modeling for enhanced landcover classification in the Ecuadorian Amazon

Joseph P. Messina, Stephen J. Walsh, Gabriela Valdivia, and Gregory Taff
University of North Carolina at Chapel Hill, Department of Geography
E-mail: messina@email.unc.edu swalsh@email.unc.edu

Abstract

Cellular automata modeling as an effective urban expansion prediction tool was proposed and analyzed within the context of local urban expansion within Ecuadorian, Amazon. The model deconstruction included: the identification of growth rules, an assessment of self-modification, and a discussion of the model validation procedures. An ERDAS Imagine-based model is proposed. While still in the early phases of development, the model effectively predicts the expansion of a central place, Lago Agrio, into the surrounding environment. By further developing the specific growth rules and developing accurate alternatives, the model will become extensible throughout the Oriente region of the Amazon.

1. Study site

The proposed study site in the northeastern Ecuadorian Amazon is significant from a social, biophysical, and geographical basis. Settlers in the Napo and Sucumbios provinces are generally poor, small-scale farmers, who settle on 50 hectare plots, clearing primary forest to grow subsistence crops, coffee, and later, pasture for cattle. Particularly high fertility and mortality rates characterize this rural population of in-migrant settlers. The "natural" process of family farmer frontier settlement in Ecuador's Amazon is not confounded by chronic conflict and land disputes (as in much of Brazil) or by illegal coca growing (as in Peru). Furthermore, rapid urbanization, large-scale logging, and large ranches are not found in Ecuador as in other countries. The opportunity to evaluate the combination of agricultural intensification and extensification on a declining resource base with patterns of regional frontier settlement in the study site lends itself to a clearer understanding of land use and land cover change. The fundamental question examines the magnitude, direction, and rate of landscape change modeled (through cellular automata modeling linked to demographic and socioeconomic, biophysical, and geographical variables), to expand the understanding of landscape variation (spatial pattern and compositional changes in landuse/landcover) within the region.

The focus on the Ecuadorian Amazon also is significant for environmental and socio-economic reasons. The western Amazon region, bordering the Andes and lying at the headwaters of the Amazon River basin, possesses several major centers of endemism, including the Napo. Despite its global biodiversity and carbon sequestration significance, agricultural settlement and concurrent deforestation threaten the region. Further, oil exploration, road construction, and immigration threaten major conservation areas contiguous to colonization zones. Within the scope of this project, intervention and degradation in major Ecuadorian Amazon protected areas will be examined.

2. Data

The input data for the model includes landuse/landcover data layers organized as binary data structures. These layers were derived from Landsat TM data. The TM data sets range temporally from 1986 through 1997. These layers were classified using a hybrid classification and the ERDAS Imagine software package. Class validation and accuracy assessment has not yet been completed; however, initial fieldwork conducted in February 1999 supports the findings. Additional data layers include roads digitized from 1:50k topographic maps and slope data. These data are rescaled (1 bit to 2 byte) in order to optimize space and computing efficiency.

3. Introduction to cellular automata

Cellular automata (CA) were originally conceived by Ulam and von Neumann in the 1940s to provide a formal framework for investigating the behavior of complex, extended systems (von Neumann 1966). Cellular automata are dynamic discrete space and time systems. A cellular automaton system consists of a regular grid of cells, each of which can be in one of a finite number of k possible states, updated synchronously in discrete time steps according to a local, identical interaction rule. The state of a cell is determined by the previous states of a surrounding neighborhood of cells (Wolfram 1984; Toffoli and Margolus 1987).

The infinite or finite cellular array (grid) is n-dimensional, where n=1,2,3 is used in practice. The identical rule contained in each cell is essentially a finite state machine, usually specified in the form of a transition function or growth rule that addresses every possible neighborhood configuration of states. The neighborhood of a cell consists of the surrounding (adjacent) cells. For 1-D CA models, a cell is connected to r local neighbors (cells) on either side, where r is a parameter referred to as the radius (e.g. each cell has 2r+1 neighbors, including itself). For 2-D CA models, two types of cellular neighborhoods are usually considered: 5 cells, consisting of the cell along with its four immediate non-diagonal neighbors, and 9 cells, consisting of the cell along with its eight surrounding neighbors. The term configuration refers to an assignment of states to cells in the grid. When considering a finite-sized grid, spatially periodic boundary conditions are frequently applied, resulting in a circular grid for the 1-D case, and a toroidal one for the 2-D case. A 1-D CA is illustrated in Figure 1 (Mitchell 1996).

Rule table

neighborhood	111	110	101	100	011	010	001	000
Output	1	1	1	0	1	0	0	0

Example

t=0	0	1	1	0	1	0	1	1	0	1	1	1	1
t=1	1	1	1	1	0	1	1	1	1	1	1	1	1
t=2	1	1	1	1	1	1	1	1	1	1	1	1	1

Figure 1: Illustration of a one-dimensional, 2-state CA (Mitchell 1996). Each cell can be in one of two states, denoted 0 and 1. The connectivity radius is r=1, meaning that each cell has two neighbors, one to its immediate left and one to its immediate right. Grid size is N=15. The rule table for updating the grid is shown on top. The grid configuration over two time steps is shown at the bottom. Spatially periodic boundary conditions are applied, meaning that the grid is viewed as a circle, with the leftmost and rightmost cells each acting as the other's neighbor.

Over the years, CA models have been applied to the study of general phenomenological aspects of the world, including communication, computation, construction, growth, reproduction, competition, and evolution (see, e.g., Burks 1970; Smith 1969; Perrier et al 1996). The increasing application of cellular automata in general phenomenological modeling is an important indicator of the developmental potential of CA. The ability of a system to grow and then alter its rate of growth and possibly reverse or "die" is a fundamental goal in biological or human system CA modeling. Ermentrout and Edelstein-Keshet (1993) performed CA applications in biological modeling. The urban systems modeled by Clarke, et al., (1996) and in the Oriente attempt to follow biological patterns of development with variable success. In the following section, the Clarke et al., (1996) model is used as an initial model template for the breakdown and construction of our model.

Synopsis: Urban Growth Model (derived in part from Clarke et al., 1996)

I. Follow classical CA approaches:

Reducing space to a grid or tesselation of cells/square grids
Establishing an initial set of conditions, which does not have to be the origin of the entire system, but can be any spatial arrangement of the phenomenon
Establishing a set of transition rules between iterations
Recursively applying the rules in a sequence of iterations of the spatial pattern.

II. Developing such a model involves:

Determining the rules from an existing system
Calibrating the CA to give results consistent with historical data (the present from the past).
Predicting the future by allowing the model to continue to iterate with the same rules.

III. Data include:

Defining cell sizes and context within the Lago Agrio region.
Defining initial conditions by 'seed' cells determined by locating and dating the founding of various settlements identified from historical maps, and remotely sensed imagery.
Developing behavior rules by: selecting locations at random, investigating the spatial properties of the neighboring cells, whether or not to urbanize any given cell (probability function) tested against a pseudo random number generated by the program.

IV. Model Operation for a single year:

Selecting a location at random--if this location has at least one urban neighbor or passes a randomized test of slope suitability, make this a new urban location.
If the first location chosen is entirely isolated but it meets tests for diffusion constraint and slope, urbanize this cell. Allow it to become a new spreading center. Search the immediate vicinity and spread at random, subject to the breed constraint, to ensure new growth.
For all cells with at least three neighbors, and repeating under the spread constraint, if the slope test is passed, make this a new urban location.
Selecting a new growth location at random, and repeating according to the diffusion coefficient, search outward a given distance, if a road is found, move to the road and along it a distance half the diffusion coefficient, then spread to enough neighbors to ensure new growth from this location.

4. Discussion and new model posit

The model uses the various rules to produce an output image of the urbanized landscape. Other variables including probability surfaces need to be inferred or derived through the output of additional software components. A number of issues appeared with respect to model assembly, cohesion, and validation. In order to minimize IO issues, the ERDAS Imagine Spatial Modeler, an interactive visual tool, was used for model development. The Spatial Modeler’s primary shortcoming is in its ability to handle iterative loops. The modeler may be exported into a script format that does support loops, though the model presented here is not quite to the point where significant improvements may be gained by doing so. In order to better parameterize the image dynamics of the model, the individual growth rules were assembled and tested as discrete elements to better visualize each step.

The first model component written creates a random number image. The random number generation procedure is vital to successful modeling. The variety of methods used in the various CA models to date support this assertion. For example, Clarke et al., (1996) followed a procedure whereby the random number field was created over an existing image base. The process appears to select a single random row – column pixel for urbanization. While this may seem inherently logical, the combination of the iteration rules with the possibility of no random pixels selected, simply due to urban extent, was very real. In effect, this problem manifests itself as an iteration of no growth, and is built into the model as the decision rule for iteration conclusion. The problem artificially inhibits growth with increasing urbanization; in effect it likely is a manifestation of an edge effect in the data. CA theoretically wraps around, however such a capability is not readily apparent in the Clarke model, nor does this model effectively wrap. The handling of the random number generation is the first and possibly most significant difference between the author's model and Clarke’s. In Clarke’s model, a random number is generated one time, placed on the image, and all the growth rules are applied. Any given random pixel has multiple opportunities to become urbanized. By design, the total number must be low, but the low number of random pixels selected necessitates an additive set of rules. While not explicitly tested, this design element likely improves the results of the Monte Carlo simulations by minimizing spatial randomness. In our model (see page 1 of the model graphic) the random number generator creates a random number field by selecting each pixel by row and column, and then applying a random number as a pixel digital number. This pixel digital number is then altered via an adjustable scalar value. The adjustability of the scalar value should allow for iterative adjustments in increased or decreased grow rate conditions approximating the self-modification characteristics of Clarke’s model.

Spontaneous growth occurs when a randomly chosen cell falls nearby an already urbanized cell, simulating the influence urban areas have on their surroundings (Clarke et al., 1996). Organic growth spreads outward from existing urban centers, representing the tendency of cities to expand. Within Clarke’s model, the breed-coefficient determines how likely a newly generated detached settlement is to begin its own growth cycle. The spread-coefficient controls the amount of outward "organic" expansion. Both organic and spontaneous growth rules are modeled in the first phase of my model. The focal filter creates a field of neighbor effect in order to approximate urban influence and allow the city to expand based on the location of the random cell. Existing models do not appear to handle these two characteristics as significant non-interacting events. This apparent odd handling of these variables might be the result of the limited random number selection problem outlined in the previous paragraph. Clarke’s model, for example, should grow more quickly than it does simply due to the growth criteria. By combining the two components into one section, the random effect is modeled as an additive component to the organic urbanization as a whole.

Diffusive growth promotes the random dispersed development of urban centers regardless of proximity functions (Clarke et al., 1996). This type of growth component is handled later in the model with the second application of the random number field. In existing models, the diffusion-coefficient controls the overall dispersiveness of growth, for both single grid cells and of the movement of new settlements outward through the road system. From Clarke et al., (1996), "If the first location chosen is entirely isolated but it meets tests for the diffusion constraint and slope, the cell is urbanized. In addition, to determine whether this location will become a new spreading center, the immediate vicinity is searched for urbanized cells, and the urbanization spreads at random from the selected cell, subject to the breed constraint." In our model, the diffusion component is modeled using its own adjustable random number field. The current version of the test model urbanizes all of the pixels selected. However, none of the truly isolated pixels randomly grows. This "random spreading" may be modeled, but it is not defined to the extent where another random number field and search routines would satisfactorily replicate the desired results. This phase may be modeled by modifying the organic growth routine by applying the focal filter using the max value criteria rather than mean as before. The random dispersiveness could be modeled by using a random number generator: image and applying a search function; or, by random number table applied to the focal filter to vary field characteristics. By creating an urban area of 5 or more pixels, the newly defined diffusive pixel should become a spreading center. Random non-isotropic spreading will be considered with the spreading center assignment acting randomly.

Road influenced growth encourages urbanized cells to develop along the transportation network replicating increased accessibility area (Clarke et al 1996). The road access and growth component was initially overlooked. It appeared to be simply another urban layer and subject to the same growth constraints; however, when combined with the urban layers and modeled under the organic growth criteria, urban areas appeared along most roads. This was obviously unacceptable, and given the explicit organic growth criteria, an alternative component was developed. Road accessibility growth is modeled with another random number image, an adjustable search function, and an adjustable conditional statement. The model component outputs a "roadgrowth" image.

The physical element, slope, is often iteratively applied. All the pixels at each step are analyzed with respect to the slope layer. This method seems unnecessarily repetitive, as the slope values themselves do not change, though it is possible that some variable effect exists whereby slope constraints vary among growth rules. While no evidence of this exists, nor is it explicitly defined, it is possible and worthy of further exploration. Our model incorporates slope as a separate layer with an adjustable scalar function to modify the slope desirability over time and because of land demands. The only flaw with this method is that it is possible, though very unlikely, that an urban area will cross an unacceptable slope area and urbanize a pixel that will only become isolated after the slope parameter is applied. The randomness of this occurrence and the reality that just such a scenario might occur make it reasonable. It manifests itself as blank areas within an urban core.

In the final phases of model execution, excluded areas are removed, and the original urban extent image is added. It is inevitable that pixels will be incorrectly urbanized during the model as water areas and other types of features are not taken into account; therefore the excluded image is subtracted from the whole. Second, the original seed image is added back into the growth image. The slope criteria function will remove some seed defined urban areas because of the slope constraint.

5. Self-modification

Self-modification is necessary as the model would otherwise produce linear or exponential growth (Clarke et al.,1996). The self-modification design element was included in order to better approximate the S-curve growth rate of urban expansion; however, considering the artificial and non-rule based component of this implementation of self-modification, the resulting growth becomes temporally scale dependent. The self-modification criteria can be adjusted interactively; however, the annual iterative decrease is hard-coded. By limiting the areal extent of the region, the model is forced to slow down in order to maintain equivalency in growth functionality. Furthermore, existing models are designed to reduce themselves every year. Over the course of 1,000 model runs, the boom and bust cycles likely cancel each other out minimizing the effect.

6. Validation

The basic calibration approach was through comparison of the model's output to a historical data set with respect to the key variable, urban areal extent. While the output from our model is certainly suitable for this type of analysis, the summary correlations by class tend to be more appropriate. As the complexity of the seed image directly influences the shape and pace of growth, complexity may ultimately be the best measure of similarity. Most existing CA models, after multiple iterations, tend to produce smooth isotropically consistent output.

7. Scale

While most geographers recognize the implications of scale on geographic inference and decision-making, cellular automata models are purported to be scale independent. In fact, Clarke comments, "The growth rules are integral to the data set being used because they are defined in terms of the physical nature of the location under study, thus producing a scale-independent model" (Clarke et al., 1996). Scale independence seemed unlikely and counterintuitive. Initially, we planned to model scale effects by modifying the elevation data in order to effect a scale-based change in the expansion of the urbanized area. While certainly plausible, it seemed too easy an effect to modify and control. Region growing within CA constructs seems to be independent of spatial scale, though the data sets are scale dependent among themselves. The scale of interaction among the input data layers is vital to successfully modeling the expansion dynamics, however, the impact of dramatically altering the scale while maintaining the growth rules remains unclear. It is possible that a correlation with time steps will occur that will, at least in Clarke’s model, extend the length of, or number of, iterative loops within a given time step. It is easy to demonstrate by isolated example that scale poses constraints and limitations on geographic information, spatial analysis, and models of the real world, but scale independence in model design may be possible. As mentioned earlier, the models may be temporally scale dependent. It certainly appears that some models (e.g., Clarke et al., 1996) are temporally dependent as demonstrated in the self-modification routines. Further exploration is required for verification.

Recent work on the scaling behavior of various phenomena and processes (including research in global change) has shown that many processes do not scale linearly. The implication is that in order to characterize a pattern or process at a scale other than the scale of observation, some knowledge of how that pattern or process changes with scale is needed. Attempts to describe scaling behavior by fractals or self-affine models, which mathematically relate complexity and scale, have proven ineffective because the properties of many geographic phenomena are not strictly repeated across multiple spatial or temporal scales. Non-uniform CA models also can be considered in which the local update rules need not be identical for all grid cells (Sipper 1994). The application of cellular automata to the Oriente may prove dependent on the incorporation of non-uniform CA models. With variable temporal and spatial scale strictly deterministic (i.e., not random spatial scaling rules), the current state of the art CA models will not be able to effectively or accurately model the necessary landscape dynamics.

8. Conclusions

The regional implications of landuse and landcover change are significant and complex. Cellular automata modeling promotes research into predictive spatial systems. While the project work presented here is by no means complete, the model has been tested and works well. The next development cycle will introduce a J++ transition, and more significantly, enhance landuse characterizations. The data layer production stream is in place, though also not complete. With improved data handling the modeling scheme will become extensible to a variety of tropical environments and regional contexts.

References

Burks, A., Ed., 1970. Essays on cellular automata. University of Illinois Press, Urbana, IL.

Clarke, K.C., L. Gaydos and S. Hoppen, 1996. "A self-modifying cellular automaton model of historical urbanization in the San Francisco Bay area," Environment and Planning B. (in press).

Clarke, K.C., S. Hoppen and L. Gaydos, 1996. "Methods and techniques for rigorous calibration of a cellular automaton model of urban growth," Third International Conference/Workshop on Integrating GIS and Environmental Modeling, Santa Fe, NM, January 21-25, 1996. Santa Barbara: National Center for Geographic Information and Analysis.

Ermentrout G. B. and L. Edelstein-Keshet, 1993. Cellular automata approaches to biological modeling. Journal of Theoretical Biology, 160, pp. 97-133.

Mitchell, M., 1996. An Introduction to Genetic Algorithms. MIT Press, Cambridge, MA.

Perrier, J.Y., M. Sipper and J. Zahnd, 1996. Toward a viable, self-reproducing universal computer. Physica D, 97, pp. 335-352.

Sipper. M., 1994. "Non-Uniform Cellular Automata: Evolution in Rule Space and Formation of Complex Structures." in R. A. Brooks and P. Maes, Eds., Artificial Life IV, Cambridge, MA, The MIT Press, pp.394-399.

Smith, A., 1969. Cellular automata theory. Technical Report 2, Stanford Electronic Lab., Stanford University, CT.

Toffoli, T. and N. Margolus, 1987. Cellular Automata Machines. The MIT Press, Cambridge, MA.

von Neumann, J., 1966. Theory of Self-Reproducing Automata. University of Illinois Press, Illinois, Edited and completed by A.W. Burks.

Wolfram, S., 1984. Cellular automata as models of complexity. Nature, 311, pp. 419-424.