SPIN!-project Working Paper

 

State-of-the-Art Geographical Data Mining

 

Andy Turner

 

a.turner@geog.leeds.ac.uk

 

 

Disclaimer

 

The views expressed in this paper are those of the author and do not necessarily reflect those of the SPIN!-project consortium.

 

Abstract

 

This SPIN!-project working paper was drafted in December 2002 to provide an assessment of the state-of-the-art in geographical data mining (GDM).  The paper contends that the spatial data mining system for data of public interest developed during the SPIN!-project (SPIN) is a prototype GDM system which is state-of-the-art in terms of its architecture and functionality.

 

Two years ago a SPIN!-project report was compiled that assessed the state-of-the-art in Exploratory Spatial Data Analysis (Turner, 2000).  Since then the state-of-the-art has not progressed far.  Although most geographical software that has been developed during this time has looked to take advantage of lower level improvements, and some software features have been enhanced and additional functionality has been incorporated, on the whole there have been no major breakthroughs.  This opinion is based on experience and a careful examination of relevant literature rather than a large scale practical and critical evaluation.  This is unfortunate, but a consequence of the restricted availability of resources.  Hopefully, the subjective assessment of the state-of-the-art that this paper offers will be useful.  It aims to provide a reference for further research and highlights some developments that are likely to play an important role in developing future state-of-the-art GDM systems.

 

Additionally, this working paper details a crucial difference in the meanings of clustering terms as used in data mining and geography.

 

 

1.   Background and Introduction

 

Available computational power has increased at an accelerating exponential rate (URL 1).  The increasing power comes from faster and more connected processors and memory, larger and more organised memory banks, and more efficient operating systems and software.  As computational power increases and means for human computer interaction evolves there arise new opportunities for the analysis of the vast amounts of geographical data that are collected.  The human and computer resources available for software development are immense yet the available software for GDM does not readily facilitate the processing of the massive volumes of geographical data into more useful information that can be analysed in highly automated ways so as to develop our understanding of geographical phenomenon.

 

Open source development and the Internet are coming of age.  This is coinciding with trends to modularise, develop well specified libraries of functionality and adopt and develop standards.  The data analysis capabilities of GIS, DMS, mathematical and statistical packages, and other bespoke tools are continually being enhanced.  Cross-fertilisation is occurring whereby methods developed in one type of data analysis software are being adapted and incorporated in others.  More concise functional toolkits for specialist applications can be more readily assembled.  Methods are also becoming more robust and packages more interoperable.

 

The spatial data mining system for data of public interest being built in the SPIN!-project (SPIN) is an attempt to integrate geographical information system (GIS) and data mining system (DMS) functionality in an open and extensible way.  In essence SPIN is a GDM system and any such system will have the following features:

1.      a database - for storing, querying and retrieving data;

2.      a graphical user interface (GUI) - for display and for interacting with and developing functionality; and,

3.      a suite of GDM analysis and GeoVisualisation methods.

 

Until recently most DMS were backed by databases which had special handlers for temporal references, but had no special handlers for spatial references and so could only readily treat them in the same way as general attribute data.  Until recently most GIS were backed by databases which had special handlers for spatial references, but had no special handlers for temporal references and so could only readily treat them in the same way as general attribute data.  Databases are standardising whether they are proprietary databases for backing GIS and DMS or not.  Most database software being developed are geared to handle complex data types (e.g. geographical data) that may have both spatial and temporal references.  Open source GDM systems developed in the near future are likely to be based on postgreSQL and mySQL (URL 2, URL 3).

 

Display environments that facilitate interactive and dynamic linking of data plots, graphs, tables, maps etc., and which enable animation are likely to be standard in the next generation of GDM systems.  Along with interactive display functionality, GUIs are likely to have visual programming workspaces which can be used to develop functions which in turn can be encapsulated and added to the widgets (buttons, menus etc.) and made accessible to command line interfaces and for scripting.

 

Arguably a GeoVisualisation system can be based solely on the first two features as described above.  What is needed for a GDM system are the additional classification, generalisation, cluster detection and inductive analysis tools geared for analysing large volumes of geographical data.

 

This section ends by listing the aims of the SPIN!-project:

·        To develop an integrated, interactive, internet-enabled spatial data mining system for data of public interest.

·        To improve knowledge discovery by providing an enhanced capability to visualise data mining results in spatial, temporal and attribute dimensions.

·        To develop new and integrated ways of revealing complex patterns in spatio-temporally referenced data that were previously undiscovered using existing methods.

·        To enhance decision making capabilities by developing interactive GIS techniques, which provide an integrated exploratory and statistical basis for investigating spatial patterns.

·        To deepen the understanding of spatio-temporal patterns by visual simulation.

·        To publish and disseminate geographical data mining services over the internet.

 

 

2.   State-of-the-art ESDA at the beginning of the SPIN!-project circa 2000

 

This section looks back to the state-of-the-art in exploratory spatial data analysis (ESDA) as it was described by Turner (2000).

 

At the beginning of the year 2000 most GIS offered:

·        no inductive analysis methods;

·        only very limited statistical functionality;

·        only basic mathematical operations that would work on tables of data; and,

·        no linked dynamic display environments.

 

With the exception of network analysis, most GIS were limited to fairly basic forms of spatial analysis.  However, it was noted that a new generation of GIS was emerging which provided linked dynamic display environments (e.g. CDV (Dykes 1997, 2002); Descartes (Andrienko and Andrienko, 1999); GeoVista Studio (Gahegan et al., 2000).  These systems are now commonly referred to as GeoVisualisation software although some may reasonably claim to be GDM systems.

 

GIS were becoming more modular with extensions being developed that could be plugged in to enhance a core of functionality.  The modular architectures came hand in hand with the establishment of larger communities of users working together by sharing scripts and indeed playing a part in the development of extensions married to their specialist requirements.

 

The OGC had begun delivering spatial interface specifications that were to encourage interoperability between technologies that use geographical information.  For developers of open source GIS and the academic community in general, it lead to the establishment of many open source projects and the provision of much freeware GIS.  Details on much of this can be obtained via the Free GIS web site (URL 4).

 

In 2000 there were numerous data mining systems (DMS) that had developed largely unconnected to GIS research.  These offered:

·        most multivariate statistical methods including linear and logistic regression;

·        various classification and modelling tools, such as decision trees, rule induction methods, neural networks, memory or case based reasoning and k-means and other so called clustering methods; and,

·        sequence discovery for finding patterns in time series data.

 

It had been shown that many of these methods (neural networks in particular) could be usefully applied to geographical data (Openshaw and Openshaw, 1997; Openshaw and Abrahart, 2000).  However, GIS contained little or none of this functionality.

 

Despite their powerful flexible and user friendly toolkits, even the most advanced DMS had no mechanism for handling location or spatial aggregation or for coping with spatial entities or spatial concepts such as a region or circle or distance or map topology.  Though the need to support so-called non-standard data types including multi-media data, images, audio and so forth which contain special patterns had been recognised, (Goebel and Gruenwald, 1999).

 

There was (and arguably still is) a considerable amount of hype about the capabilities and functionality of DMS.  Vendors argue that no credible evaluation of their software can be performed by evaluators that have not undertaken (often expensive) training courses.  Furthermore the systems were (and many still are) tuned to work on very specific hardware with specially configured operating systems underlying them.

 

Goebel and Gruenwald (1999) offered an intelligible survey of data mining and knowledge discovery software tools and noted the need for seamless integration with databases so that algorithms are scalable and do not rely on having all data in fast access memory.  It was observed that many systems relied on querying an underlying database (whose data is stored in large but slow memory banks) and holding the resulting data from the query in the fast access memory banks (RAM, real memory or “database engine”) in order to run many of the data mining methods.  This often meant that the methods worked poorly with large volumes of data and were not really as scaleable as data mining often demands.

 

In 2000 most ESDA tools were based on interactive graphics.  The Geographical Analysis Machine GAM/K was an exception and was already planned to be integrated as a key component of SPIN.  GAM/K is detailed by Openshaw (1995).  More recently, Openshaw (1998a) also detailed a Geographical Explanations Machine (GEM) which like GAM/K is based on the concept of identifying clusters.  It operates similarly to GAM/K in that data from overlapping circular regions is analysed across a range of different scales and the result is output as a map.

 

In addition to GAM/K and GEM there were a number of other cluster detection methods.  These have been detailed in other SPIN!-project reports and those with merit are included in SPIN.  Back in 2000 it was also argued that SPIN should contain a geographically weighted regression method which could be used to examine the geographical variation in regression model parameter estimates (Fotheringham et al. 2002; Brunsdon et al. 1996).  A case was also made for including local indicators of spatial association methods and a couple of other spatial statistical methods.

 

Shortly before the SPIN!-project began Openshaw (1999) outlined the key design issues for developing GDM systems and highlighted the need to respect the special features of geographical information:

·        it is spatially referenced

·        it is often temporally referenced

·        observations are not independent

·        data uncertainty and errors tend to be spatially structured

·        spatial coverage is rarely global

·        non-stationarity is to be expected

·        relationships are often geographically localised

·        non-linearity is the norm

·        data distributions are usually non-normal

·        there are often many variables but much redundancy

·        there are often many missing values

·        spatial, temporal and spatio-temporal clustering is important

·        data can be aggregated and disaggregated in space and time and in space-time

 

Turner (2000) noted that the geocyberspace contained increasing amounts of 3D spatial geographical information and whilst global coverage was rare some datasets did exist and were increasingly being used for a wide range of environmental applications.  It was argued that functionality to analyse and visualise 3D spatial information would be of great utility for the SPIN!-project especially for a seismic application in order to examine the relation between earthquake occurrences, crustal stresses and tidal patterns.

 

It is appreciated that there are many ways to represent spatial, temporal and spatio-temporal data, and that the most appropriate representation depends on what information is required and/or what phenomenon/process is under study.   In many cases topographic maps, time maps and cartograms provide more useful visualisations than Euclidean maps (Orford et al., 1998).  The ability to develop such representations in SPIN was discussed but much of the functionality has not been integrated.

 

In addition to being able to work with 3D spatial data with temporal references and alternatives to Euclidean representations, it is necessary to develop methods that:

·        can handle no-data by attempting to minimise assumptions of what the value of data would be;

·        are scalable and can cope with very large datasets; and,

·        are robust and precise so as to handle the potentially large ranges of numbers and detail involved.

 

 

3.   The state-of-the-art in GDM

 

There are a number of summaries regarding the challenges and achievements of GDM research, (see for example Buttenfield et. al., 2001; Yuan et al., 2001; Openshaw and Abrahart, 2000; Openshaw, 1999).  These agree that the development of data mining and knowledge discovery tools for geographical use must be fundamentally based on using and coping with the special features of geographical information as listed in the previous section.

 

SPIN integrates GIS and spatial data mining functionality in an open and extensible way using a client server architecture based on Enterprise JavaBeans, (May and Savinov (2002).  Enterprise JavaBeans (EJB) offers a specification and guidelines for using Java Remote Method Invocation (RMI) and JDBC (URL ref).  Java RMI enables elements of Java programs to take some actions on other Java elements on remote machines (URL ref).  JDBC is the set of interfaces for connecting to using database software (URL ref).  The SPIN architecture was adopted at an early stage of the SPIN!-project and has also been adopted by other similar systems (Bertolotto et al., 2001; Takatsuka and Gahegan, 2002).  Since the start of the SPIN!-project the EJB specification has been revised and among other enhancements version 2.1 has added support for web services.  EJB offers a state-of-the-art architecture for developing a GDM system.

 

SPIN offers state-of-the-art GeoVisualisation functionality derived from Descartes and CommonGIS (Voss et al., 2002).  GeoVista Studio developed mainly by a research team based in the USA offers similar functionality (Gahegan et al., 2000; Takatsuka, 2002; Takatsuka and Gahegan, 2002).

 

GeoVista Studio has a useful and convenient deployment mechanism via Java Web start (Sun, 2002b) and can readily be customised and packaged up as an applet that can be run in most web browsers.  It is moving towards being open source and is moving to base itself on GT2.  GT2 is an open source, Java GIS toolkit for developing standards compliant solutions.  It aims to support Open GIS and other relevant standards as they are developed.

 

 

4.   Differences in clustering terminology

 

Research into developing and testing methods of detecting and measuring clustering in space and over time is being undertaken in epidemiology, criminology, geography, data mining and science in general.  Unfortunately, there are different interpretations of what a cluster is and confusion about the terms clustering, spatial cluster and spatial clustering.

 

In the field of data mining, the term cluster is generally used to mean things which share similar characteristics or attributes like classes, sets or groups; and clustering is a term reserved for the process of classifying, sub-setting or grouping a set of things.  It differs from classification in that the number of clusters is not pre-specified.  A spatial cluster in data mining is a term which has been used for a collection of spatial objects with similar locations in space irrespective of their other attributes or characteristics.  The term spatial clustering has thus been reserved for the process of classifying or grouping spatial objects into spatial clusters without apriori specifying how many spatial clusters there are (Murray, 1997; Estivill-Castro and Houle, 1999; Han et. al., 2001).

 

Although the above offers a reasonable interpretation, it is not until comparatively recently that researchers in the data mining field have begun to consider clustering and spatial clustering in geographical data.  This has lead to a clash in terminology, thus it is important to note that:

·        spatial clustering in data mining pays no attention to the attributes associated with spatial location;

·        geographic data is of a higher dimensionality and is not only a set of locations, but comprises a set of measured attributes which may include temporal references.

 

The simplest way of defining a cluster (as used in epidemiology and much geography) is as a localised excess incidence rate that is unusual in that there is more of some variable than might be expected.  For example, a cluster could be:

·        a local excess disease rate; an unusual crime rate (hot spots);

·        an unusual unemployment rate or road traffic accident rate (black spot);

·        a region of unusually high positive residuals from a model;

·        an unusual concentration of plant species or earthquake epicentres, etc.

 

Virtually any variable that has a spatial distribution will contain some degree of pattern or concentration.  Clusters are where and when these concentrations are extreme.  There are essentially two types of clusters.  Clusters of excess, and clusters of deficit.  The former refers to unusually high concentrations of some rate and the later refers to unusually low concentrations of some rate.

 

It is very important to appreciate the difference between the interpretations of these terms.  Likewise it is important that GDM methods deal with the special nature of geographical data.

 

 

5.         The future state-of-the-art in GDM

 

There is demand for GDM tools:

·        for analysing geographical data all the time and as they are produced;

·        that analyse patterns in the spatial locations and attributes of geographical objects over time;

·        that search for spatial correlation and autocorrelation

·        that work across a range of spatial scales

 

 

6.         Conclusions

 

The spatial data mining system for data of public interest being developed in the SPIN!-project (SPIN) is an attempt to integrate state-of-the-art GIS and data mining functionality in an open and extensible way.  The result is effectively a prototype GDM system which is arguably the most complete example of its kind.  Its range of functionality is very broad and it has many advanced capabilities, however much can be done to improve it.

 

SPIN has not been subject to much user testing and has not been scientifically evaluated.  Is it really user-friendly?  If there are patterns hidden in spatial data or indeed relationships encrypted in spatial information then can SPIN help users to find them by at least pointing them in the right direction?

 

SPIN is not really open because it is not open source.  If the SPIN!-project comes to an end then what will happen to it?  Is SPIN only to be available to the SPIN!-project consortium for research purposes?  Will it not evolve into an open source project and become openly distributed?

 

Currently some algorithms in SPIN are dependent on using Oracle database software.  Would it not be better if it could optionally use one of the open source databases as well?

 

In general there is a need for new analysis methods and more example applications that show how GDM tools can be used by presenting novel results that are meaningful in a clear and understandable way.

 

 

References and Bibliography

 

Andrienko G., Andrienko N. (1999)  Interactive Maps for Visual Data Exploration.  In the International Journal of Geographical Information Science 13 (4) 355-374.

Adhikary J. (1996) Knowledge Discovery in Spatial Databases: Progress and Challenges.  Paper presented at an Association for Computing Machinery Workshop on Research Issues on Data Mining and Knowledge Discovery, Canada.

Alexander F., Boyle P. (2000) Do cancers cluster?  In Eliott P., Wakefield J., Best N., Briggs D. (eds.) Spatial Epidemiology: Methods and Applications. (Oxford University Press).

Alexander F., Boyle P. (1996) Methods for Investigating Localised Clustering of Disease. International Agency for Research on Cancer scientific publication 135.

Andrienko N., Andrienko G. (2001) Intelligent Support for Geographic Data Analysis and Decision Making in the Web. Geographical Information and Decision Analysis 5 (2) 115-128.

Bertolotto M., McGeown L., Carswell J., McMahon J (2001) e-SpatialTM Technology for Spatial Analysis and Decision Making in Web-Based Land Information Management Systems.  In Journal of Geographic Information and Decision Analysis 5 (2) 95-114.

Bivand R. (2002) Implementing spatial data analysis software tools in R*.  Paper presented at a Center for Spatially Integrated Social Science Specialist Meeting on Spatial Data Analysis Software Tools, USA.

Brunsdon C., Fotheringham S., Charlton M. (1996) Geographically Weighted Regression: A method for exploring non-stationarity. In Geographical Analysis, 28 (4), 281-298.

Brunsdon C., MacGill J., Openshaw S., Turner A., Turton I., (1999) Testing space-time and more complex hyperspace geographical analysis tools.  Paper presented at the 7th GISRUK conference, UK.

Buttenfield B., Gahegan M., Miller H., Yuan M. (2001) Geospatial Data Mining and Knowledge Discovery.  A UCGIS White Paper on Emergent Research Themes submitted to UCGIS Research Committee.

Câmara G., Neves M., Monteiro A (2002) SPRING and TerraLib: Integrating Spatial Analysis and GIS. Paper presented at Center for Spatially Integrated Social Science Specialist Meeting on Spatial Data Analysis Software Tools, USA.

Carr D., Chen J., Bell S., Pickle L., Zhang Y., (2002) Interactive Linked Micromap Plots and Dynamically Conditioned Choropleth Maps.  Paper presented at the Center for Spatially Integrated Social Science Specialist Meeting on Spatial Data Analysis Software Tools, USA.

Diggle P. (2000) Overview of statistical methods for disease mapping and its relationship to cluster detection.  In Eliott P., Wakefield J., Best N., Briggs D. (eds.) Spatial Epidemiology: Methods and Applications. (Oxford University Press).

Dykes J. (2002) Developing Tools for GeoVisualization Research. Paper presented at Center for Spatially Integrated Social Science Specialist Meeting on Spatial Data Analysis Software Tools, USA.

Dykes J. (1997) Exploring spatial data representations with dynamic graphics. In Computers & Geosciences 23 (4), 347-370.

Edsall R. (1999) Tools for the Exploration and Multivariate Classification of Large Geographical Databases. Paper presented at the 95th Annual Meeting of the Association of American Geographers, USA.

Edsall R., Roedler A. (2002) An Enhanced GIS Environment for Multivariate Exploration: a Linked Parallel Coordinate Plot Applied to Urban Greenway Use Survey Data. Paper presented at the Center for Spatially Integrated Social Science Specialist Meeting on Spatial Data Analysis Software Tools, USA.

Estivill-Castro V. (2002) Why so many clustering algorithms – A position paper.  In SIGKDD explorations newsletter of the ACM Special Interest Group on Knowledge Discovery in Data and Data Mining 4 (1) 65-75.

Estivill-Castro V., Houle M. (1999) Robust Distance-Based Clustering with Applications to Spatial Data Mining.  Algorithmica 30 (2) 216-242.

Estivill-Castro V., Lee I. (2001) Data Mining Techniques for Autonomous Exploration of Large Volumes of Geo-referenced Crime Data. Paper presented at the 6th International Conference on GeoComputation, Australia.

Fotheringham S., Brunsdon C, Charlton M. (2002) Geographically Weighted Regression (Wiley, Chichester).

Fotheringham S., Brunsdon C, Charlton M. (2000) Quantitative Geography: Perspectives on Spatial Analysis. (Sage, London).

Fotheringham S., Charlton M. (1994) GIS and exploratory data analysis: An overview of some basic research issues. In Geographical Systems, 1 (4), 315-327.

Fotheringham S., Rogerson P. (1994) Spatial Analysis and GIS. (Taylor & Francis, London).

Fulcher C., Barnett Y., Barnett C. (2002) Spatial Analysis Software for Community Decision Support.  Paper presented at a Center for Spatially Integrated Social Science Specialist Meeting on Spatial Data Analysis Software Tools, USA.

Gahegan M. (2001) Data mining and knowledge discovery in the geographical domain.  White Paper: National Academies Computer Science and Telecommunications Board. (Intersection of Geospatial Information and IT content and Knowledge distillation) http://www7.nationalacademies.org/cstb/wp_geo_gahegan.pdf

Gahegan M. (2000) On the application of inductive machine learning tools to geographical analysis. In Geographical Analysis 32 (2), 113-139.

Gahegan M., Miller H., Yuan M. (2001) Geospatial Data Mining and Knowledge Discovery.  http://www.ucgis.org/emerging/gkd.pdf (Accessed in December 2002)

Gahegan M., Takatsuka M., Wheeler M., Hardisty F. (2000a) GeoVISTA Studio: a geocomputational workbench. Paper presented at Geocomputation 2000, UK.

Gahegan M., Wachowicz M., Harrower M., Rhyne T-M. (2000b) The Integration of Geographic Visualization with Knowledge Discovery in Databases and Geocomputation.  In Cartography and Geographical Information Systems (special issue on the International Cartographic Association research agenda)

Goebel M., Gruenwald L. (1999) A survey of Data Mining and Knowledge Discovery Software Tools.  In  SIGKDD Explorations 1 (1) 20-33.

Han J., Kamber M., Tung A. K. H. (2001) Spatial Clustering Methods in Data Mining: A Survey.  In H. Miller and J. Han (eds.) Geographic Data Mining and Knowledge Discovery (Taylor & Francis, London).

Hewitson B., Crane R. (eds.), 1994, Neural Nets: Applications in Geography (Kluwer, London).

Indulska M., Orlowska E. (2002) Gravity Based Spatial Clustering.  Paper presented at the 10th Association for Computing Machinery International Symposium on Advances in Geographical Information Systems, USA.

Krivoruchko K. (2002) Bridging the Gap Between GIS and Solid Spatial Statistics.  Paper presented at a Center for Spatially Integrated Social Science Specialist Meeting on Spatial Data Analysis Software Tools, USA.

Koperski K., Adhikary J., Han J. (1996) Spatial Data Mining: Progress and Challenges Survey paper.  Paper presented at an Association for Computing Machinery Workshop on Research Issues on Data Mining and Knowledge Discovery, Canada.

Koperski K., Han J., Adhikary J. (1998) Mining Knowledge in Geographical Data.  ftp://ftp.fas.sfu.ca/pub/cs/han/pdf/geo_survey98.pdf (Accessed December 2002).

Lawson A. (1999) A Review of Cluster Detection Methods.  In Lawson A., Biggeri A., Böhning D., Lesaffre E., Viel J-F., Bertollini R. (eds.) Disease Mapping and Risk Assessment for Public Health. (Wiley, Chichester)

Lazarevic A., Fiez T., Obradovic Z. (2000) A Software System for Spatial Data Analysis and Modeling. Paper presented at the 33rd International Conference on Systems Sciences, USA.

Levine N. (1998) Hot Spot Analysis Using both the SYSTAT K-Means Routine and a Risk Assessment.  Paper presented at the Academy of Criminal Justice Sciences Annual Conference, USA.

MacEachren A., Hardisty F., Gahegan M., Wheeler M., Dai X., Guo D., Takatsuka M. (2001) Supporting visual integration and analysis of geospatially-referenced data through web-deployable, cross-platform tools. Paper presented at 20th International Cartographic Conference, China.

Masters R., Edsall R. (2000) Interaction Tools to Support Knowledge Discovery: A Case Study Using Data Explorer and Tcl/Tk.  Paper presented at the Visualization Development Environments workshop, New Jersey, USA.

May M., Savinov A. (2002) An integrated platform for spatial data mining and interactive visual analysis.  Paper presented at the 3rd International Conference on Data Mining Methods and Databases for Engineering, Finance and Other Fields, Italy.

Miller H., Wentz E. (2002) Geographic Information Systems and Spatial Analysis: Enhancing Analytical Capabilities by Expanding Geographic Representations.  http://www.geog.utah.edu/~hmiller/research.html (Accessed in December 2002).

Miller H., Han J. (2000) Discovering Geographic Knowledge in Data Rich Environments: A Report on a Specialist Meeting. In SIGKDD Explorations 1 (2), 105-107.

Openshaw S. (1999) Geographical data mining: key design issues. Paper presented at the 4th International Conference on GeoComputation, USA.

Openshaw S. (1998) Building automated Geographical Analysis and Exploration Machines.  In Geocomputation: A primer, Longley P., Brooks S., Mcdonnell B. (eds.) (Macmillan Wiley, Chichester) 95-115.

Openshaw S. (1995) Developing automated and smart spatial pattern exploration tools for geographical information systems applications. The Statistician 44 (1), 3-16.

Openshaw S., Abrahart R. (eds.), 2000, GeoComputation (Taylor & Francis, London)

Openshaw S., Charlton M., Wymer C. and Craft A.W. (1987).  A mark I geographical analysis machine for the automated analysis of point data sets.  International Journal of Geographical Information Systems, 1, 335-358.

Openshaw S., Openshaw C. (1997) Artificial Intelligence in geography (Wiley, Chichester)

(…details incomplete) Openshaw S., Turton I. (1999) Using a Geographical Explanations Machine to Analyse Spatial Factors Relating to Primary School Performance.  In Geographical & Environmental Modelling.

Openshaw S., Turton I. (1998) Application of GAM to Crime Analysis Data.  Paper presented at the Academy of Criminal Justice Sciences Annual Conference, USA.

Openshaw S., Turton I., MacGill J. (1999) Using the Geographical Analysis Machine to Analyse Limiting Long Term Illness.  In Geographical & Environmental Modelling 3 83-99.

Orford S., Dorling D., Harris R (1998) Review of Visualization in the Social Sciences: A State of the Art Survey and Report.  Report for the Advisory Group on Computer Graphics.

Pacheco B. (2001) Assessing the Applicability and Usability of GeoVISTA Studio for Health Geographics.  Published in: The Pennsylvania State University Summer Research Opportunities Program Journal 9 221-233.

Paddenburg A., Wachowicz M. (2001) The Effect of Spatial Generalisation On Filtering Noise For Spatio-Temporal Analyses.  Paper presented at the 6th International Conference on GeoComputation, Australia.

Roddick J., Spiliopoulou M. (2002) A survey of Temporal Knowledge Discovery Paradigms and Methods.  IEEE Transactions on Knowledge and Data Engineering 14 (4) 750-767.

Roddick J., Hornsby K., Spiliopoulou M. (2001) Temporal, Spatial and Spatio-Temporal Data Mining Research and Knowledge Discovery Research Bibliography.  http://kdm.first.flinders.edu.au/IDM/STDMBib.html (Accessed in December 2002)

Roddick J., Lees B. (2001) Paradigms for Spatial and Spatio-Temporal Data Mining.  In Miller H., Han J. (eds.) Geographic Data Mining and Knowledge Discovery (Taylor & Francis, London).

Sun (2002a) Enterprise JavaBeansTM Specification, Version 2.1.  http://java.sun.com/products/ejb/docs.html (Accessed in December 2002).

Sun (2002b) JavaTM Web Start http://java.sun.com/products/javawebstart/ (Accessed in December 2002).

Shekhar S., Vatsavai R. (2002) Spatial Data Mining Research by the Spatial Database Research Group, University of Minnesota.  Paper presented at a National Science Foundation workshop on Spatio-temporal Data Models for Biophysical Fields, USA.

Shekhar S., Huang Y., Wu W., Lu C., Chawla S. (2001) What’s Spatial About Spatial Data Mining: Three Case Studies.  In Kumar V., Grossman R., Kamath C., Namburu R. (eds.) Data Mining for Scientific and Engineering Applications (Kluwer).

Symanzik J., Swayne D., Lang D., Cook d. (2002) Software Integration for Multivariate Exploratory Spatial Data Analysis.  Paper presented at a Center for Spatially Integrated Social Science Specialist Meeting on Spatial Data Analysis Software Tools, USA.

Takatsuka M. (2002) An Open Component-Oriented Visual Programming Environment for Integrating Geospatial Data Analysis and Visualization Tools.  Paper presented at the Center for Spatially Integrated Social Science Specialist Meeting on Spatial Data Analysis Software Tools, USA.

Takatsuka M., Gahegan M. (2002) GeoVista Studio: A Codeless Visual Programming Environment for Geoscientific Data Analysis and Visualization.  To appear in Computers & Geosciences 28 (10) 1131-1144.

Tango T. (1999) Comparison of General Tests for Spatial Clustering.  In Lawson A., Biggeri A., Böhning D., Lesaffre E., Viel J-F., Bertollini R. (eds.) Disease Mapping and Risk Assessment for Public Health. (Wiley, Chichester)

Tung A., Hou J., Han J. (2001) Spatial Clustering in the Presence of Obstacles.  Paper presented at the 17th International Conference on Data Engineering, Germany.

Turner A. (2000) State-of-the-art Exploratory Spatial Data Analysis. SPIN!-project working paper.

Turner A., Turton I., Walder A. (2000) Evaluation of Applied Statistical Cluster Detection Methods. SPIN!-project report Deliverable 5.3.

Turton I., Walder A. (2002) Algorithms for Investigation of Temporal Change. SPIN!-project report Deliverable 6.4.

Turton I., Walder A. (2000) Why Spatial Pattern Detection is Harder than it Looks.  Paper presented at the 5th International conference on GeoComputation.

Voss H., Andrienko N., Andrienko G., Gatalsky P. (2001) Web-based Spatio-temporal Presentation and Analysis of Thematic Maps. In the Cities and Regions Journal of the Standing Committee on Regional and Urban Statistics and Research.

Voss H., Andrienko N., Andrienko G. (2002) Exploratory Data Analysis and Decision Making with Descartes and CommonGIS.  Paper presented at a Center for Spatially Integrated Social Science Specialist Meeting on Spatial Data Analysis Software Tools, USA.

Wakefield J., Kelsall J., Morris S (2000) Clustering, cluster detection and spatial variation in risk.  In Eliott P., Wakefield J., Best N., Briggs D. (eds.) Spatial Epidemiology: Methods and Applications. (Oxford University Press).

Wu Y-H., Miller H. (Forthcoming) Computational Tools for Measuring Space-Time Accessibility within Transportation Networks with Dynamic Flow.  In Journal of Transportation and Statistics special issue on accessibility 4 (2/3) 1-14.

 

The following literature was not acquired and read although it was wanted:

Andrienko N., Andrienko G., Savinov A., Voss H., Wettschereck D. (2001) Exploratory Analysis of Spatial Data Using Interactive Maps and Data Mining. In Cartography and Geographic Information Science 28 (3) 151-165.

Edsall R. (1999) Development of Interactive Tools for the exploration of Large Geographic Databases. Paper presented at the 19th International Cartographic Conference, Canada.

MacEachren A., Wachowicz M., Edsall R., Haug D., Masters R. (1999) Constructing knowledge from multivariate spatiotemporal data: Integrating Geographic Visualization with Knowledge Discovery in Database Methods. In a special issue of the International Journal of Geographical Information Science.

 

URLs

1.      Computer history

      http://home.earthlink.net/~mrob/pub/computer-history.html

2.      Free GIS

      http://www.freegis.org/

3.      Geographically Weighted Regression (GWR)

      http://www.ncl.ac.uk/geography/GWR

4.      GeoTools

      http://geotools.sourceforge.net

5.      Open GIS Consortium

      http://opengis.org/

6.      Oracle spatial

      http://otn.oracle.com/products/spatial/

7.      Postgis

      http://postgis.refractions.net/

8.      PostgreSQL

      http://www.postgresql.org/

9.      MySQL

      http://www.mysql.com