A review of some GEOGRAPHICAL tools for Data Mining


Click here to start


Table of Contents

A review of some GEOGRAPHICAL tools for Data Mining

Contents

Some introductory comments

But

PPT Slide

PPT Slide

There is an increasingly serious problem caused by IT developments ..

its called ?????????

DATA

AND..

there are LOTS of it!

Everyday there is More DATA than previously

More Data and More Data and More Data and More Data and More Data and More Data and More Data and More Data and More Data and More Data and More Data and More Data and More Data and More Data and More Data and More Data and More Data and More Data and

and

More DATA

=

?

A bigger ...

No! We now store it in Data Archives also called Data Warehouses and Data marts!

BUT

PPT Slide

Data warehouses are expensive!

Justification is HARD because the potential value can only be expressed in terms of NEW USES that were previously impossible making economic assessment uncertain

And

As a result there is a FOCUS on short-term objectives that have little real relevancy to the underlying ambitions of data mining

Maximizing Profits

PPT Slide

If YOU don’t evolve better DATA USING technologies then having access to more and more data, more bandwidth, downsized hardware, faster computers, and bigger data repositories WILL NOT HELP at all

Indeed

The problem is.. that in our current IT age a fully computerized bureaucracy with computer based management systems covering most areas of modern life - data are being created and stored many times faster than it can be processed and used!

As a result probably 90%+ of all databases are not being fully exploited via state of the art analysis and modeling technologies and most are not being used at all!

The situation is rapidly becoming WORSE..

Data Mining and Knowledge Discovery to the RESCUE?

Well that is the HOPE!

My Definition

Data Mining is HOT!

The generic Aims and Objectives in Data Mining for Knowledge Discovery are fine .. its just that much of the technology, being used to do it, is next to useless and no one REALLY knows how to do it properly as yet!!!

One problem is that people are focusing on the TRENDY parts ignoring the broader picture

To exploit your (or other people’s) Data Riches you need 3 things..

You should recognize that much of Data Mining is..

There is a GROSS underestimation of the problems of modelling the BEHAVIOUR of people

worse still!

There is gross ignorance of the Geographical Dimension

Some common Data Mining Tools

BUT

PPT Slide

So WHERE are the DISTINCTLY GEOGRAPHICAL data mining tools?

Geography is unlike any other variable!!

The Geography variable is very SPECIAL because:

PPT Slide

A geographical approach to Data Mining

PPT Slide

The most USEFUL conventional geographical tool is the MAP

The map is a wonderful data viewing device BUT does little else!

The problem here is that most GIS experts have not the vaguest idea of how to do it!

There is a deep PREJUDICE against Data Mining in Geography

Yet.. Geographical Data Mining is PRECISELY what geography and many other FACT based social sciences actually needs if they are to move forward in the IT Age

PPT Slide

The Aggregation Operator

Type 1 Aggregation

Type 2 Aggregation

Flow data aggregation

PPT Slide

Both Aggregation Operations make the data

Geographical AGGREGATION tends to be very useful at drawing out the patterns in databases

Both types of Aggregation operations can be applied to Data Warehouse Databases

HOWEVER.. BEWARE!!!!

PPT Slide

PPT Slide

PPT Slide

PPT Slide

PPT Slide

PPT Slide

PPT Slide

PPT Slide

PPT Slide

PPT Slide

PPT Slide

PPT Slide

Database Pattern Summarizers #1

PPT Slide

Database Pattern Summarizers #2

There are various EXISTING methods for creating these classifications

You then need to EMBED them in some kind of intelligent targeting system

The Intelligent Geodemographic Targeting Machine (IGTM)

Intelligence due to matching method to context!

Geographical CONTEXT is another very useful predictor

Life Style Classification is often an important as a surrogate for more complex relationships!

Pattern and Process Models #1

PPT Slide

PPT Slide

PPT Slide

PPT Slide

PPT Slide

Much of these suggestions could be performed using fairly well understood legacy technologies. The NEW aspect is LINKING these models to Data Warehouses using High Performance Computing

Doing better

Various ways of creating BETTER models

PPT Slide

Two Dimensional Spatial Pattern Detectors

The Geographical Analysis Machine

The GAM worked as follows

The GAM was used to analyze cancer data

The Geographical Correlates Exploration Machine (GCEM)

MAP Explorer (MAPEX)

BUT

Spatial Data Mining

Conventional Data Mining methods focus on the WHAT question

For Example

PPT Slide

A marketing example: Predicting Alcohol Sales

PPT Slide

Data Mining tools are mainly UNI-SPACE explorers

Developing tri-space database explorers

Its a HARD problem because the three Data Domains are characterised by data with measurements that not in the same units and cannot be related to each other in any simple way

Space-Time-Attribute Creatures STACs

Geocyberspace Movies

Robustness

Example 1. Financial Services application

Example 2. Crime Data Analysis

PPT Slide

Conclusions

AND

Data Mining and Knowledge Discovery in Databases cannot safely ignore the GEOGRAPHICAL dimension

Finally...

Do Not be too SIMPLE MINDED!

Author: Stan Openshaw

Email: stan@geog.leeds.ac.uk

Home Page: http://www.geog.leeds.ac.uk/staff/s.openshaw/