 Spatial data quality relates to and is connected to many of the processes involving spatial applications. However, some people may be unaware of the relationship and why it is important. Vector1 Media editor Jeff Thurston interviewed Steven Ramage (left) and Graham Stickler (right) of the UK-based company 1Spatial who are involved in many spatial data quality applications around the globe. The result was a series of probing questions designed to get at the meaning of spatial data quality, why it is important and where it makes a difference - and how.
Question: Different
people think of data quality in different ways. A large number of
people are still grappling to understand what data involves and what
it means. What does 'data quality' mean?
Answer: That’s a very good
question. It’s interesting that the Open Geospatial Consortium
(OGC) Data Quality Working Group has taken steps to try and
understand that question. They recently conducted a worldwide survey
to explore what ‘data quality’ means to the geospatial industry;
with almost 800 responses this is obviously a topic of concern to
industry professionals. There is an expectation that it means
different things to different stakeholders across the spatial data
supply chain. We all understand that it includes more than just
geometric accuracy; it must also include areas like logical or
semantic consistency, and it must also relate to fitness for purpose
and the data’s intended use. As an example, if you were to receive
a road network dataset that was highly accurate from a geometric
standpoint, but that was not actually connected at all nodes, then
its ‘quality’ would be very poor if you intended to use it for
routing purposes.
The following quote from J.M Juran, the
recently deceased ‘quality guru’, sums up the situation well:
“Data are of
high quality if they are fit for their intended uses in operations,
decision making and planning. Alternatively, the data are deemed of
high quality if they correctly represent the real-world construct to
which they refer. These
two views can often be in disagreement, even about the same set of
data used for the same purpose.”
Question: Can data quality impact
work processes? How? Have you been able to see the benefits of
improved data quality and could you explain some of them please.
Answer: Very much so.
Improving data quality is not a once-only task, and implementing an
ongoing quality control regime can have major impact on work
processes and benefit a business by cutting out ambiguity and
improving overall operational efficiency, e.g. limiting duplication
of effort. Data quality has been defined earlier as assessing
fitness for purpose and plays a vitally important role when looking
at the integration of datasets. We often talk about the ‘supply
chain problem’ and data quality is pivotal to this concept. A
classic manufacturing supply chain derives source materials from
various places, manufactures a product from these materials and then
delivers the product to distributors or end customers.
In the IT
world, we are attempting to do the same thing, and spatial data is
playing an increasingly large role in this process. We take data from
multiple sources, combine and integrate it (often to a single source)
to create a product – information. We then need to distribute this
product to the end customer or decision-makers. Source data may well
be of high quality in the context of what it is being collected for
and operationally used for (fit for purpose). However, the source
systems will be ‘doing things right’ in the context of their
operational role. What we are increasingly trying to achieve, as an
organisation, community or enterprise, is combining all these data so
that we can undertake effective analysis, planning and business
decisions – i.e. ‘doing the right thing’. This spatial data
integration or conflation is what 1Spatial has being doing
successfully for many years and one of the key areas of expertise
that will be highlighted at the Conference.
One organisation attempting to tackle
the data quality issue is MidCoast Water, in New South Wales,
Australia. MidCoast Water serves an area of 7000 square kilometres
and the authority is responsible for reticulated water supply and
sewerage systems to communities in the Manning and Great Lakes
regions. The delivery of services to such a vast area brings with it
many challenges. Formed just under 10 years ago from the water and
sewerage sections of three local authorities, MidCoast Water has
quickly grown to be an industry leader, and sees its encouragement of
innovation as crucial to this positioning. MidCoast Water introduced
a programme to provide improved access to accurate geographical and
asset information. The programme was developed in two stages:-
1. Improving the efficiency and
accuracy with which information is gathered and recorded
2. Improving the accessibility of this
information.
The authority needed to ensure its data
quality in order to provide the best possible service to its end
customers. They needed to be confident in the reliability of their
data at the attribute level so that no manual checking was required
and that time was not wasted searching for assets in the field due to
errors within their data. Topological connectivity between networks
also needed to be assured to prevent the duplication of editing
tasks, which could provide a drain on their manpower. As well as
MidCoast’s ongoing programme of internal improvement, from 1st July
2006, state government regulations required utility organisations to
have the ability to accurately pinpoint their assets. MidCoast Water
needed to ensure that they were fully compliant with this before the
legislation came into force.
Following the implementation of a
spatial data quality regime, MidCoast Water now enjoys the following
advantages and benefits:
• Interoperability – centrally
stored data is error-free and accessible via multiple applications
across the company
• Enhanced productivity –
significant time and cost savings have been achieved through
increased query performance
• Enterprise-wide data management –
business and spatial data has been centralised into a single
database, reducing the duplication of effort in maintenance of that
data
• Improvements in data gathering –
it now takes just a few hours, instead of a week, to translate
spatial data into maps
• Efficient processing of property
information – there has been a 60% reduction in staff time for this
task when combined with other processes
• Versatility of application – the
return on investment has increased as a result of being able to apply
the data in a variety of new ways and through data mining
opportunities.
The return on investment for MidCoast
Water has been substantial. Not only have they met their objectives
to centralise business data and to accurately manage and pinpoint
their assets, they have also achieved measurable time and cost
savings. As just one example, before the implementation of the new
regime it appeared that two sewer stations were needed at a
particular development. Under the new system, with the resultant
improvement in data quality, the information supplied clearly showed
that only one sewer station was required, saving MidCoast Water up to
$300,000 AUD.
Question: There has been a
growing discussion relating to spatial data infrastructure (SDI). It
is hard to understand how all these countries and people are going to
share data and information. What initiatives are you involved in
which will help to make SDI a reality? What are the challenges that
SDI face in your opinion?
Answer: 1Spatial has been
involved with the Digital National Framework (DNF) group in Great
Britain since its inception several years ago. The vision of the DNF
is “to enable and support easy and reliable integration of business
and geographic information regardless of who is responsible for its
maintenance and where this is undertaken, thus achieving the goal of
"plug and play information".” This vision runs parallel
to the idea of building a spatial data infrastructure (SDI) since it
is all about providing increased access to, and being able to share,
quality information across organisations. DNF is not just an activity
that is unique to ‘building SDIs’, it crosses all areas of
spatial information management and therefore supports SDI efforts, as
well as the wider goals of the Digital National Framework. 1Spatial
is also involved in SDI research in Scandinavia and the Benelux
region, for example, one project is assessing ‘quality elements
within the supply chain to deliver an SDI’.
Although not called an SDI, the work
carried out as part of the VISTA project could be described as
exactly that. The VISTA project is examining innovative ways of
integrating, presenting and making use of utility data with the aim
of reducing the direct and indirect costs of streetworks.
Inaccuracies in existing information, and the lack of methods to
integrate, share, reuse and effectively communicate knowledge held by
the owners of underground assets, means that more excavations are
required to locate those assets, causing unnecessary traffic
congestion and increased costs for the economy, i.e. unnecessary
damage to underground assets results in increased costs, injury and
or possible death of workers and loss of service to consumers, both
business and domestic.
UK Water Industry Research (UKWIR) is
the lead co-ordinating partner for the project, with Leeds and
Nottingham Universities providing the research input, and over 20
utility and other industrial organisations also involved, who have
been working on a Global Schema to provide a framework that will
unify the utility data from each of the partner domains (electricity,
gas, sewer, telecoms and water). This should not require a change in
the source schema, as well as providing enough flexibility to
articulate the visualisation and analytical requirements of each
domain.
The VISTA project industry partners have provided access to
16 datasets unique to company and domain, which were used extensively
in the modelling and design phases. 1Spatial has helped to define the
Global Schema mapping metadata, this complex metadata being
maintained within Radius Studio. It is envisaged that Radius Studio
will significantly reduce the time taken to integrate these different
assets and improve the mechanisms of domain validation with the
project partners. This will result in decreased development time and
an increase in data quality and domain knowledge.
From our experience of working with
organisations such as Fujitsu and ESRI as our partners on the
Ordnance Survey of Northern Ireland GeoHub NI project, we know there
are some practical issues to be addressed with an SDI. GeoHub NI is a
great example of an SDI since it aims to host and
connect to spatial data from multiple sources, enable metadata
searching and, via Web thin clients, allow multiple data layers to be
accessed and shared. In addition to the cultural aspects of people
buying in and using the service, there are also commercial
considerations around licensing the data. The main problems
that 1Spatial addressed were to do with data management issues and
included the validation of critical third party datasets, ease of
administration, data integration, security, standards compliance and
robustness of the solution. It highlighted how encompassing an SDI
can be with its many components.
While trying to provide a secure
gateway to web services and web mapping we encountered some
limitations in building these elements of SDIs due to the maturity of
OGC Web Feature Services (WFS) tools. This project was done using a
modular architecture that highlights that no single vendor can
provide it all, making interoperability a key consideration when
mixing and matching architecture and using a number of mainstream IT
approaches, such as Simple Object Access Protocol (SOAP) and Web
Services-Security for remote access via WFS and Web Mapping Services
(WMS). As experienced with GeoHub NI, the practical issues of secure,
reliable access to fit for purpose spatial data will be significant
challenges for any SDI.
This Vector1Media article covers it
well:
http://vector1media.com/article/feature/source-of-truth:-is-the-it-community-prepared-for-spatial-data-infrastructures?/
Question: There is a trend
to web services for many organisations. In some places it appears
people are still at the desktop approach, while at others, almost all
of the work is web service based. Are the data quality needs for
desktop versus web service oriented approaches any different? Can you
explain how data quality applies to web services and desktop
environments and the connection to organisational processes for each?
Answer: These needs are no
different from a corporate standpoint. Individual users may have
different needs in terms of what they do to ‘their’ data (e.g.
tidying up geometric accuracy on a GIS workstation) but organisations
are now seeking to build on and, in some instances, move beyond this
departmental approach to integrate, share and reuse data across the
entire enterprise. In this scenario, and if we think back to question
two and the concept of supply chains, then the benefits of web
services, or component architectures, seem obvious. Having the
flexibility to ‘interface’ into a data quality control regime
from any access point in an information workflow is vital to creating
a sustainable, enterprise-wide culture and the creation of a single
source of truth for spatial data. This has now been made possible
with web services and maturity of mainstream tools for areas like
BPEL and transaction management.
Question: It seems that
data access is a big concern for many people. When you hear the term
'access' what does it mean to you? If someone has to transform data,
what are the quality considerations involved? Are there alternate
approaches involving non-transformation?
Answer: Access can be widely
interpreted and has many issues associated with it depending on the
interpretation. However, the issue of security is perhaps becoming
the most important concern, and goes way beyond digital rights
management and IT security issues. The ‘Google effect’ has opened
up access to spatial data but have we thought through the
implications? As data becomes more available at ever increasing
levels of detail, are we happy for it to be available to everyone?
Open Source Software has been around for a while, but what are the
implications of Open Source Data? You may be happy for the floor
plans of your buildings to be available to the Emergency Services,
but do you want other organisations to have unrestricted access to
such data? Data transformation is all about the repurposing of such
data for alternate uses. This may involve spatial data integration
considerations or simply involve physical change to the data to
enable its reuse for other purposes. The quality considerations go
back to the previous answers in that the data needs to be fit for
purpose. The Quality Challenge is how to define and describe a
fitness for purpose measure and to report on this during the data
supply stage. This is the next step for the OGC Data Quality Working
Group (also referred to earlier).
Question: Increasingly we see a
trend toward real-time or near real-time applications. These are not
all necessarily mobile based. Many of them are field based from
static sites which are continually collecting information and feeding
that to other parts of networks. How does one go about ensuring data
quality over a real-time network?
Answer: Our approach to data
quality is generic, i.e. it is a rules-based approach for automating
as much of the spatial data mining, rules discovery and conformance
checking as possible. If you can provide the data specifications or
the business rules associated with the spatial data coming in over a
real-time network, then it is still about the data validation process
and reducing the roundtrip engineering costs associated with carrying
out data quality checking manually on input or once the data has been
integrated into a workflow.
Real-time data
validation services should provide tools to increase data accuracy at
the time it is entered. By using a data validation Web Service, such
as Radius Studio, when the collected information is shared or
distributed to other parts of the network the service can assess
conformance of the incoming data, according to minimum quality levels
or service level agreements if the associated business rules are
known. In the event that invalid data is submitted, a system of
automatic checks can be set up to prompt the provider to enter
corrected information, reject a transaction, or apply business rules
to improve the performance of the data collection efforts. The end
result is an improved workflow that cuts out unnecessary and
additional manual data validation stages.
Automated data validation in real-time
requires some thought as to the subset of rules to be applied and the
source of the data. For live data streams, obvious checks would be
things like "all vehicle locations must be within x metres of a
road or ferry route" and could be applied in real-time. In other
situations, particularly for data being input by a user or copied
from another system, some rules must be deferred or ignored until
enough information is present. For example, "pipes must have
connectors at both ends" and "connectors must be on a pipe"
should not be checked for each feature when it is input, as the data
will always fail at first (a new pipe will have no connectors, but a
new connector will have no pipe). Instead, a related group of inputs
need to be read before the checks can take place. Other checks, which
require large amounts of contextual data, need to be applied at
different intervals when enough data has been captured to allow a
meaningful result.
Question: The architecture
industry is heavily involved in the use of spatial geometry for
design, construction and operation of buildings. In some cases
hundreds if not thousands of polygons, nodes and lines are involved,
yet, many architects are primarily concerned with building design. It
seems like data quality would play an important role in building
geometry, or is this not the case?
Answer: The CAD arena does
indeed develop very sophisticated models and, fundamentally, they are
made up of basic geometric primitives in the same way as we in the GI
space put together maps in the 2D world. In the past it has been a
case of never the twain shall meet, but this is changing and we are
seeing a convergence of these data ‘spaces’. From a quality
standpoint, the issues are no different to those mentioned above.
These data are certainly fit for purpose in their source systems, but
as soon as we look to integrate the data then we have to consider a
new set of quality rules. 1Spatial is currently involved in research
projects in the USA and have been looking at how 3D models can be
‘stitched’ into the 2D space and how topology can then be built
in line with the 2D data to give a uniform structure and internal
intelligence – i.e. a uniform ‘quality’ measure. Our partner
LSI will be presenting on this topic as part of the CAD to GIS Track
at the Conference.
Question: In Europe the INSPIRE
program aims to harmonise spatial data and enable more integration at
the policy level. Why should Europeans care about data quality in
INSPIRE?
Answer: Policy is good for
setting the path for where we want to go; maybe we could look at data
quality as one of the vehicles for getting there. If we define
spatial data quality as ‘fit for purpose spatial data’ and follow
the old adage, ‘garbage in, garbage out’ then how can
organisations even consider providing increased access, sharing or
migrating spatial data without knowing and understanding the quality
of their spatial data. INSPIRE references data quality numerous
times, but it is not clear who has ownership of data quality in
Europe at any regional, national or European level. It is also not
clear what tools exist in the market place to discover information on
quality.
As mentioned, one of the key aims of
setting up the OGC Data Quality Working Group was to address
specifically this topic, i.e. data quality across the geospatial
supply chain. Finding better ways to measure, communicate and report
on spatial data quality by building on the work already done by ISO
with the ISO 19000 series standards are key aims. Working with
organisations like ePSIplus, EuroSDR and EuroGeographics, 1Spatial is
trying to connect a number of stakeholders and raise awareness of
issues around spatial data reuse and data quality to help prepare for
INSPIRE.
If European geospatial professionals
don’t consider data quality issues now, it will be too late once
spatial data is accessible in INSPIRE and used in applications and
the data is not fit for purpose. At that stage they will have to
tackle it retrospectively, incur the huge costs, headaches and
business process breakdowns associated with poor quality data. We
should learn from painful lessons in other IT data management
disciplines, such as CRM (customer details and addresses) and ERP
(asset information) where there is now a multi-$billion industry
addressing all their data quality problems and avoid the geospatial
industry having to spend similar amounts around INSPIRE. Expertise,
methodologies and tools exist now to avoid this situation with
geospatial data and to help prepare for INSPIRE.
Question: The game industry is
another industry that is heavily based on spatial geometry for the design
of new games as well as virtual environments. Much of this work is
leaning toward 3D. Can you explain how spatial data applies to games
in general but also the impacts on the trends towards 3D
applications.
Answer: We can learn from the
gaming industry, in particular from a visualisation perspective and
how individuals may interact with spatial data in the future. By
combining 3D models with 2D geospatial data and providing access
through gaming style platforms and interfaces, it will be possible to
create authentic virtual versions of our world, rather than fictional
ones, that we may explore remotely. The application possibilities for
planning, emergency services and, of course, the military are also
now becoming apparent. For other 3D applications it will eventually
become important for analysis and decision making to have greater
intelligence in the data, such as topology. Without this type of
connectivity, extensive manual efforts will be necessary to try and
obtain more information relating to the nature of the data and make
joined-up decisions.
Question: What are some of the
environmental and sustainability applications that involve your
products and how have they fared?
Answer: Our products are all
involved in spatial data integration or data management exercises on
a worldwide basis. Some examples that will have an impact on
environmental and sustainability applications are:
-
The Environment Agency used
1Spatial’s data re-engineering expertise to speed up spatial
queries and image rendering by simplifying the data within its
National Flood and Coastal Defence Database (NFCDD) system. An
initial prototype exercise showed that there were significant
benefits to be gained from generalising the LIDAR data using
persistent topology. Rendering an image yielded a 115% increase in
the speed, whilst spatial queries using address points showed a 229%
increase. There was also an accompanying increase in the accuracy of
the query results returned.
-
Rural Payments under the Common
Agricultural Policy have data quality challenges associated with
aligning raster plots with the national topographic vector datasets.
INGA, the Portuguese Ministry of Agriculture, is implementing a
solution to ensure more efficient administration of the system which
co-ordinates and pays EU subsidies to over 400,000 Portuguese
farmers per year. This work has been done in collaboration with
Intergraph, a member of the 1Spatial Community.
-
CEH (Centre for Ecology and
Hydrology, part of the Natural Environment Research Council)
initially undertook a joint feasibility study with 1Spatial and
Ordnance Survey Great Britain to generalise OS MasterMap® data
as a potential foundation for the Land Cover Map (LCM) 2007. This is
now at production rollout stage and will form the basis of the UK’s
submission for the Coordination of information on the environment
(CORINE) programme.
-
MidCoast Water, Australia
purchased Radius Topology from 1Spatial to improve asset data
quality and made substantial savings, as previously mentioned in
question 2.
Question: Today – more
people are interested in extracting 'value' from all the costs
involved in their data holdings. Could you elaborate on how value
could be realised more fully and provide a few examples as well?
Answer: There has been a huge
amount of investment made in the collection of both 2D and 3D data
over the years and this process shows no signs of slowing down. In
many cases it could be argued that these data represent an
organisation’s biggest and most valuable asset. Software, databases
and even staff will come and go, technologies will change, but the
data remains. However, very often the knowledge will disappear or be
isolated and the data itself will drift and slowly deteriorate.
In order to extract the maximum use
from these data (maximum value from the asset) two things are needed.
Firstly, we need to store all the intelligence, knowledge and wisdom
about these data alongside the relevant data and not have them locked
in the logic of some software application. Secondly, we must be able
to communicate how these data may be used. Metadata takes us a long
way towards the first of these aims, but the ability to define the
quality in terms of fitness for purpose will enable us to solve the
second issue; this will allow us to fully recognise or realise the
value.
As an example, Pira International Ltd
noted in a report in 2000 that the spatial component of data held
within the Public Sector across Europe was over €36bn, and growing
rapidly. At that time, the re-collection cost of these datasets was
estimated at in excess of €100bn - one can only imagine the total
investment today. At that same time, the estimated costs for North
America were $200bn. This was almost 10 years ago and there has been
a data deluge since that period. By assessing the fitness for purpose
for spatial data, organisations can take large strides towards
realising the value.
Question: “We have been doing
it this way for a long time and it is the procedure we use to collect
our information and make our decisions.” How often do you hear
this? What pieces of wisdom would you add to this line of thinking?
Answer: Unfortunately it is
human nature to respond in this way, either because change threatens
the status quo in terms of business processes and could be seen to
entail additional work, or because it can be perceived as threatening
job security or highlighting problems that may have been well hidden
until procedures were reviewed and adapted. So the answer is that we
hear it very often.
It’s a very open statement, but
change is the only constant in business life, so we would encourage
all geospatial professionals to consider this fact when thinking
about their spatial data management. It’s only by challenging the
status quo and reviewing existing processes and procedures that you
can improve them and the overall effectiveness of your operation.
We believe that quality underpins
analysis, planning and decision making based on spatial data, so we
would encourage you to step back from the way you have been doing it
and think about how you could be doing it better, starting with an
assessment of your data’s fitness for use.
Question: What is unique
about your upcoming 1Spatial Conference in the UK and what might I
learn by attending? What are some of the topics that will be
discussed and how can I find out more about it?
Answer: The 1Spatial Conference
brings together a wide range of organisations to consider
industry-relevant topics, as opposed to simply being a user group or
an event limited to one specific area of the market. The experiences
shared and the problems addressed by the conference are global in
nature and cross into all sectors of the geospatial community, hence
the marketing messages that it will highlight the topics of Spatial
Data Quality, Spatial Data Infrastructures and CAD to GIS
convergence.
The Conference aims to provide
practical advice and share experiences, for example in trying to
achieve INSPIRE goals. There will be a FREE INSPIRE seminar
and metadata workshop, speakers from the European Commission Joint
Research Centre, EuroGeographics and the Chief Executive of Ordnance
Survey of Northern Ireland will also provide insights on this topic.
Two useful opportunities in this area
concern INSPIRE and DNF activities; delegates can come and learn
about tools and processes that can be used for the discovery of
metadata that can subsequently be published in catalogues for other
users to assess its fitness for use. Published metadata consists of
data quality items (qualitative and quantitative metrics) and is
encoded in a standard form (ISO 19139 Metadata – XML schema
implementation). The final results of the conformance tests using
Radius Studio are obtained in the form of metadata, which is
compliant to the conceptual model of ISO 19115 Metadata
and encoded in the form recommended in ISO 19139. There will
also be a DNF demonstration that will highlight the benefits of
Unique IDs and how users can profit from data providers supplying
stable identifiers with lifecycles. It will also highlight how the
spatial data supply chain can use the lifecycle information once it
has been generated.
The speaker line-up is exceptional and
provides a comprehensive insight into a range of geospatial areas,
including CAD/GIS convergence, database technologies, industry
standards and best practice, Open Source and SDIs.
You can find out more, and register for
the Conference, by visiting the conference web pages at
www.1spatial.com/conference
|