Persistent and well-behaved identifiers

By Richard Duffield, Senior Consultant, GeoPlace. RD GeoPlace blog

Identifiers are having a bit of a “moment”.

For nearly two decades we have helped local government create and maintain the Unique Property Reference Number (UPRN) and Unique Street Reference Number (USRN), ensuring that they are nationally unique, persistent and otherwise well-behaved. Each local authority in England, Scotland and Wales allocates a 12-digit number known as the UPRN to each new “addressable object” at the earliest possible stage of its lifecycle.

This has never been particularly headline-grabbing work and doesn’t help you make friends at parties, but it has been incredibly effective at improving services and decision making in Great Britain. The UPRN now underpins many important aspects of government including the real-time transfer of incidents between emergency services, understanding citizens,delivering better services. It’s a helpful way to link people to places. The trend is spreading across the private sector too.

So, it is strange, although not surprising, to find that identifiers are becoming a little bit trendy – they even have their own festival (yep, really).

Eddie Copeland, Director of Government Innovation at NESTA recently began a series of “ideas on a page” with an mention of the UPRN.

EddieCopeland.PNG

Back in 2016, Thomson Reuters published a report highlighting the importance of identifiers saying:

“Identifiers are fundamentally important in being able to form connections between data, which puts them at the heart of how we create value from structured data to make it meaningful”.

The report highlights how the potential of identifiers can be unlocked by web technologies and the growing world of Open Data. In parallel we’re finding that the UPRN is key to effective machine learning and this will be the subject of another GeoPlace blog post coming soon.

A recent blog post by open data activist Owen Boswarva provided a handy list of property-related identifiers available in Great Britain. Later in this post I’ll go on to detail how these are linked to the UPRN.

Leigh Dodds from the Open Data Institute followed up Owen’s blog with his own thread ‘Lets talk about identifiers’.

The subject is technical – when required I will delve into the detail and at times I’ll simplify for the sake of readability. If you’re interested in exploring the ideas in further detail, then contact me for a more detailed discussion.

So, first allow me to make an important technical clarification. The identifiers Owen highlighted all relate to property and address in some way – but they do not necessarily identify or locate a property or address.

For example, the Royal Mail identifier identifies a postal delivery point. This may be on the ground floor when your flat is on the 5th floor. Similarly, the MPAN (an asset identifier for your utility meter) for the same flat may be in the basement. The Land Registry Transaction ID identifies the transaction of you buying or selling your house, not the property. So, the identifiers may not have the same meaning or even represent the same location but linking them is incredibly powerful.

Here goes…

Across England, Northern Ireland, Scotland and Wales the address data creators use a standardised system of Unique Property Reference Numbers (UPRNs). GeoPlace manages these centrally to ensure that they remain unique across territories. AddressBase products compile the UPRNs from these sources into a single data model based on the British Standard BS 7666: 2006 Spatial datasets for geographical referencing. To ensure conformance to the standard, for Northern Ireland we create additional records including “parent properties” and street records and allocate UPRNs to them.

In Northern Ireland they have an additional concept for buildings which has an identifier called the Unique Building Identifier and this is published alongside the UPRN in the Pointer dataset. The equivalent concept does not exist in England, Wales and Scotland and so the identifier is not required in these countries.

The Unique Delivery Point Reference Number (UDPRN) is created and maintained by Royal Mail and identifies each postal delivery point. It is available from Royal Mail and their resellers in the Postcode Address File (PAF). GeoPlace and local government work together to link these identifiers together and the links are available from Ordnance Survey in AddressBase products.

Royal Mail also maintains a dataset of multiple residences – e.g. blocks of flats – which has its own identifier called the Unique Multiple Residence Reference Number (UMRRN). We have not linked the UPRN to this data as we see no value in doing so and have found no evidence of demand.

The Valuation Office Agency (VOA) values properties for the purpose of Council Tax and for business rates in England and Wales. As stated above, it is important to note that the VOA definition of property is built on a specific set of case law and is not standardised with any of the other datasets listed in this post. The VOA creates its own identifier called the Unique Address Reference Number (UARN) which they make available to some users in their ratings lists data. GeoPlace and local government create a link between UPRN and UARN and this is available from Ordnance Survey in AddressBase Premium and AddressBase Plus.

ADDRESS-POINT is a discontinued and unsupported legacy address product from Ordnance Survey which has been replaced by AddressBase products. ADDRESS-POINT featured an identifier called the Ordnance Survey Address Point Reference (OSAPR). This is no longer created, maintained or supported but may still be stored in some users’ legacy systems. To aid migration to the UPRN Ordnance Survey created a link between the identifiers and this is available from them on request.

The MasterMap Topography Layer Topographic Object Identifier (TOID) is an identifier for the spatial representation of the extent an object as defined in the Ordnance Survey capture specification. GeoPlace creates a link between the UPRN and the TOID and Ordnance Survey make this available in AddressBase Premium and AddressBase Plus.

Land Registry Price Paid Data is available as open data and contains a Transaction Identifier which relates to that transaction, rather than an identifier for the property. A property can be sold many times so each property may have many Transaction Identifiers.  GeoPlace links this data to the UPRN and this link is available from Ordnance Survey and has been used successfully to build an Automated Valuation Model for properties.

Ministry of Housing, Communities and Local Government (MHCLG) publishes Energy Performance of Buildings data sometimes known as EPC data. This data contains two identifiers: the LMK_KEY and the Certificate Hash. Currently the LMK_KEY is not unique and so the Certificate Hash is used as the unique identifier, although MHCLG says this will change. As the data is a register of certificates the identifier identifies the certificate or inspection, not the property. Over time a property may be inspected more than once and so multiple certificates may exist for one address. When the energy performance data is created it contains a property identifier called the UPRN but this is not the same as the UPRN found in the AddressBase products. This UPRN is not published in the EPC data and doing so would not help link the data and would be confusing for users. To help users use this data alongside AddressBase, GeoPlace links the Certificate Hash and LMK_KEY to the AddressBase UPRN. This data is available from Ordnance survey on request. This link has been used successfully for energy efficiency analysis and predicting fires.

The energy industry makes widespread use of the Meter Point Address Number (MPAN) and the Meter Point Reference Number (MPRN) which is an asset identifier for your meter, rather than your property. As you may have multiple meters – for example one for gas and one for electricity – you may have more than one MPAN for your property. The energy industry and its regulator – OFGEM – recognise the value of linking these numbers to a unique property identifier and are currently working to make this happen. One potential benefit is to make it easier and more reliable for consumers to switch energy providers. This will bring efficiency to the market and better prices and service to consumers.

Who knew a twelve-digit number could be so powerful!

The Companies House data identifies each company but does not necessarily describe the location as most people understand it. For example, a retail chain may have many outlets but would record the address of its head office with Companies House. While a UPRN could be added we’ve yet to hear of a good rationale.

To our knowledge there are no published links between the UPRN and:

  • Land Registry Title Number
  • Land Registry Index Polygon ID
  • Land Registry INSPIRE polygon ID
  • Flood Re
  • Food hygiene ratings
  • Planning inspectorate appeals

If you know of a dataset linking these to the UPRN or have a need for the links, then let us know.

Under a suitable licence, Open Data publishers (or the Open Data community) may add the UPRN to their data. Commercial data providers also add the UPRN to their data – for example PointX add the UPRN to their Points of Interest (POI) product. Other UPRN users – example Land Registry and Flood Re may hold links to their data internally but choose not to publish them.

Linking these datasets to each other without the UPRN would be crazy.

If you have seven datasets and you link each one to each of the others you would need to make twenty-eight sets of links. By linking each dataset to one master identifier – the UPRN – we reduce this to seven sets of links.

Each of these datasets contains a subset of properties and so there will always be records left over if you try to link them together. By using the UPRN as the complete master list we can link all records (give or take a little bit of “devil in the detail” and quality issues in the source data).

It’s great to hear so much support for identifiers and data linking – however we also need to sound a note of caution. Data linking comes with many risks. As Thomson Reuters point out (emphasis added):

“The lack of identifiers, or the poor use of them, stifles the power of information gained from linking multiple datasets together. Some of these shortcomings might be overcome using intelligent search and fuzzy matching, but the lower precision of these techniques means that the data never reaches its full potential and there is little incentive to drive improvement of precision over time.

When these links are used for important decisions and services such as mobilising a fire engine or switching energy providers then errors can be serious and irreversible. Interested parties should review the Multi agency incident transfer guidance as a protocol which allows for incident records to be electronically shared from one emergency service to another through defined fields and values so that it can be injected into the receiving organisation’s application.

Links should not be made without due care and it is better not to make a link than to make an incorrect one. Chasing high match rates at the expense of quality is likely to cause serious problems when the data is put into live use.

The datasets listed above – for example addresses, buildings, postal delivery points, electricity meters – are Master Data for Great Britain. It’s essential that appropriate data governance is in place for these datasets providing single, definitive sources.

This is equally true for the links between them. What would you do if you had two sets of links between the same datasets – and the links were different?  Like all master data it makes sense to have a single high-quality dataset of links from an authoritative source.

Ideally this will take the form of a collaboration between the two creators of the source datasets as they have a deep understanding of their data including their meaning and how they are specified, created and maintained. This is how we create the links to Royal Mail, VOA and Ordnance Survey datasets and why we’re willing to publish the results to our users.

Our vision is to move past the current practice of building separate datasets and then batch-matching them – by working with other bodies to link data “right at the start” (as Leigh Dodds advocates) – for example when a council tax valuation is made by the VOA or property sale is recorded by Land Registry.  Maybe this will happen quickly, maybe it will take another two decades but however long it takes we’ll continue to create and link the data and bang the drums.

In the meantime, we expect to continue to link data to the UPRN and to see others doing the same. We are working towards the ubiquitous use of UPRNs within the systems of public sector organisations.  The recent discussions around unique identifiers clarify why they are such an important aspect of a national data infrastructure. Questions around access to data are likely to continue and we welcome the debate.

If you have any questions or comments, then please do get in touch and let us know. We’d love to hear from you.

UPRN tying different address structures together

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s