558 views
# Accessing and Visualizing OSM Data using Python This is a document by Stefan Keller, Institute for Software, OST Campus Rapperswil, stefan.keller@ost.ch, accompanying a talk at VIScon Digital at ETH Zurich, 2020-10-10, 14:00 (30 min.), about **"[How to mine the planet - About accessing and visualizing OpenStreetMap open data and more](https://symposium.vis.ethz.ch/en/viscon2020/public/events/172)"**. :::info There's a [public **repository** on Gitlab](https://gitlab.com/geometalab/blogposts-tutorials-presentations/accessing-and-visualizing-openstreetmap-data-using-python) associated with this document. This document is CC-BY-SA and it's written in markdown (.md) note editor CodiMD. ::: This document is a compilation of solutions and tips for the following: 1. How to extract and geo-visualize OSM data on a map? 1. How to display OSM in an interactive map? 1. How to geocode own data containing addresses and geonames? 1. How to do a spatial join with 3rd party open data? 1. Outlook - More resources An introduction to [OpenStreetMap](https://openstreetmap.org) (including Overpass API and the Overpass Turbo GUI) and more can be found at [OpenSchoolMaps](https://openschoolmaps). ## A Show Case - Outdoor Table Tennis Tables! Let's take outdoor table tennis tables or table tennis courts as show case from the story "Zurich is the hottest place in Switzerland to play ping-pong (according to OpenStreetMap of course :wink:)": see [PingPongMap](https://pingpongmap.net/8495) or [MyBinder](https://mybinder.org/v2/gh/vulpec/pingpong_map/master?filepath=Pingpong-map_Zurich.ipynb). ![](https://md.coredump.ch/uploads/upload_3b673c0f2c05be4675c5aadb1f2ca6fb.png) *Figure 1: Playing table tennis table outdoors. (Source: www.concretesports.co.uk)* ## What is OpenStreetMap? - A Short intro for techies OpenStreetMap (OSM) is a free, editable map of the whole world, a crowdsourced project of open geospatial data. It's the largest community in the geospatial domain, 2nd after Wikimedia regarding community size, and 2nd after Google Maps regarding market share of base maps, Points-of-Interest (POI) etc.. It covers base maps, POI, routing and geocoding (building addresses). The data structure consists of key-values, called "Tags", as attributes ("NoSQL Schema"), and it has a topological vector-geometry model, based on "Nodes", "Ways" plus "Relations": * Node: osm_id, lat/lon coordinates, set of tags. * Way: osm_id, array of Node osm_ids (foreign keys), set of tags. The OSM data structure is quite different from tradition Geographic Information Systems (GIS) (no layers) and from a typical relational database model (no distinct attributes, like name, class). This means among others: * It's still possible to manage Point, Lines and Polygons (polygons are either closed ways or a relation referring to set of ways). * Real world objects can be modeled either as Point, Line and Polygon. * etc. An entity in OSM is just it's visual built parts - only rarely the function. Details are added piecwise. There's no strict "conceptual modeling" of en "entity" like in Software Engineering and in GIS services. This means, that an element like the following represent no single aspects * church: is a building, is a historic place, and a disco, ... * bath: private swimming pool, a public bath, a park with bathing facilities, a water park, ... See e.g. this Overpass query which tries to catch all baths objects in OSM: ``` [out:json]; area["name"="Schweiz/Suisse/Svizzera/Svizra"]; ( nwr[sport=swimming][name](area); nwr[leisure=swimming_area][name]; nwr[leisure=water_park]; nwr[amenity=public_bath]; ); out center; ``` :::info **TIP:** Overpass... ::: ## Preliminaries: Software Prerequisite for this solution is that the following software is installed (see bibliography and resources below): * Python 3, * Jupyter notebook, * Some libraries (see ```environment.yml``` of Jupyter notebook). ## Extracting, processing and geo-visualizing OSM data on a map 1. Load OSM data (step 1), 2. process and save it as GeoJSON file (step 2) 3. geovisualize file as static map with base map. ### Solution with ~50 lines of Python code First find out what the tag, i.a. the key/value of table tennis tables are: it's ```sport=table_tennis``` together with ```leisure=pitch```. Let's check this with Overpass-Turbo GUI with this [link](https://osm.li/VxG). ``` /* Table Tennis Tables */ [out:json]; node["sport"="table_tennis"]({{bbox}}); out body; >; out skel qt; ``` This is a Python script solution with about 50 lines of Python code. We will use the same code in the next chapter using Jupyter. ```python import plotly.express as px import geopandas as gpd import matplotlib.pyplot as plt from collections import namedtuple from osmxtract import overpass from osm2geojson import json2geojson OVERPASS_ENDPOINT = "http://overpass.osm.ch/api/interpreter" # BBox with components ordered as expected by overpass.ql_query Bbox = namedtuple("Bbox", ["south", "west", "north", "east"]) Tag = namedtuple("Tag", ["key", "values"]) def main(): tag = Tag(key="sport", values=["table_tennis"]) bbox_zurich = Bbox(west=8.471668, south=47.348834, east=8.600454, north=47.434379) table_tennis_gdf = load_osm_from_overpass(bbox_zurich, tag) gdf = table_tennis_gdf.to_crs(epsg=3857) gdf.geometry = gdf.geometry.centroid gdf = gdf.to_crs(epsg=4236) fig = plot_gdf_on_map(gdf, title="Zurich Table Tennis") # if no window pops up, uncomment the following line # and open the standalone_map.html in your browser # fig.write_html("standalone_map.html") fig.show() def load_osm_from_overpass(bbox, tag, crs="epsg:4326") -> gpd.GeoDataFrame: geojson = load_osm_from_overpass_geojson(bbox, tag, crs) return gpd.GeoDataFrame.from_features(geojson, crs=crs) # TODO: Issue is that in OSM an object can be point or linestring or area # Solution: See https://gis.stackexchange.com/questions/302430/polygon-to-point-in-geopandas def load_osm_from_overpass_geojson(bbox, tag, crs="epsg:4326"): values = None if "*" in tag.values else tag.values query = overpass.ql_query(bbox, tag=tag.key, values=values) response = overpass.request(query, endpoint=OVERPASS_ENDPOINT) return json2geojson(response) def plot_gdf_on_map(gdf, color="k", title=""): fig = px.scatter_mapbox( gdf, lat=gdf.geometry.y, lon=gdf.geometry.x, color_discrete_sequence=["fuchsia"], height=500, zoom=11, ) fig.update_layout(title=title, mapbox_style="open-street-map") fig.update_layout(margin={"r": 0, "l": 0, "b": 0}) return fig if __name__ == "__main__": main() ``` *Listing 1: The Jupyter Notebook script in Python.* ### The result visualized with Matplotlib ![](https://md.coredump.ch/uploads/upload_41e2d09cb3fd2501e2d13acc135b7c7f.png) *Figure 2: Map of table tennis tables in the city of Zurich. Created with Python and Matplotlib. (Source: Own work)* ## Interactive map in a Jupyter notebook Python code can be used as above. You just have to call it directly instead of `main()`. ### Jupyter Notebooks with library Folium There are many resources on the web for an introduction to Jupyter. So this is just a brief introduction: A Jupyter notebook is a web-based interactive software environment with which Jupyter notebook documents can be created. A Jupyter notebook document is a JSON file (with extension ".ipynb"). It contains a list of input and output cells, each of which can contain code, text and plots (i.e. maps). It is very easy to execute a notebook document on the free web service MyBinder, for example. See `> "Launch on Binder"` at the bottom of the chapter in the repository mentioned above. ### Publish map on the web Next, we want to publish the map created in Jupyter Notebook simply on the web, without backend, i.e. as generated HTML5 files (incl. Javascript and CSS). This can be done by using `"export"` with Folium: see repository. That's it for now, folks :smile: - at least regarding accessing and visualizing OSM data using Python! ## The "magic 3 steps" of location analytics The "magic 3 steps" of location analytics consist of the combination of own data with foreign open data. This so-called "Geographic Data Enrichment" process ([here's a post](https://www.redpointglobal.com/blog/what-is-data-enrichment/)) is possible, since location is "the universal foreign key". One of the crucial processes and services is the enhancement of own data - which most probably contains a geoname, or an address or such - using a so-called geocoding, which is a quite a topic on it's own. Geocoding is the assignment of a coordinate given a postal address or a geoname. So, given geonames or address data, 1. step one is to geocode own data, which should contain an address or geoname (i.e. adding a coordinate attribute), then 2. step two is to spatially join (i.e. combining) it with 3rd party geodata (like table tennis tables) using a geospatial software (like PostGIS), then finally 3. step three, do the data analysis or geovisualization. We'll show this using PostgreSQL with extensions PostGIS and using the built-in language pl/pgsql plus Python. :::info TIP: An easy alternative to geocoding - besides the approach with Python and PostgreSQL+PostGIS we show here - would be using this [online service from Localfocus.nl](https://geocode.localfocus.nl/). For commercial geocoders see also ["Help! I need a geocoder"](https://getlon.lat/). ::: ### Step one: Geocoding addresses and geonames Let's look at the core of a geocoding function in Python, which is wrapped as a PostgreSQL function in PL/pgSQL language (source: [*](https://etherpad.wikimedia.org/p/Location-the-Universal-Key)). :::info TIP: In case ```CREATE LANGUAGE plpython3u;``` fails, install Python 3 for PostgreSQL as follows: * Install Python 3: See https://www.python.org/downloads/ . You can also install QGIS 3 which comes along with Python. * The system environment variables PATH and PYTHONPATH need to point to the installed Python libs. To do this, you need to edit or add them (once for all accounts). In Windows this can be done with "Start" in the lower left corner and "Edit system environment variables". Example (Windows): ``` PATH=$PATH$;C:\Program Files\QGIS 3.8\apps\Python37 PYTHONPATH=C:\Program Files\QGIS 3.8\apps\Python37 ``` * Then PostgreSQL Server must be restarted. See e.g. https://tableplus.com/blog/2018/10/how-to-start-stop-restart-postgresql-server.html . Example solution for Windows * Open CMD as Administrator * Open Run Window by Winkey + R * Type services.msc * Search Postgres service based on version installed. * Click stop, start or restart the service option. ::: ```sql CREATE LANGUAGE plpythonu; CREATE OR REPLACE FUNCTION geocode(address text) RETURNS text AS $$ """Returns WKT geometry which is EMPTY if address was not found. Please respect the Nominatim free usage policy: https://operations.osmfoundation.org/policies/nominatim/ """ import requests from time import sleep base_url = 'https://nominatim.openstreetmap.org/search?' params = { 'q': address, 'format': 'json', 'limit': 1, 'User-Agent': 'geocode hack (pls. replace)' } response = requests.get(base_url, params=params) loc = response.json() if len(loc) == 0: wkt = 'SRID=4326;POINT EMPTY' else: wkt = f'SRID=4326;POINT({loc[0]['lon']} {loc[0]['lat']})' sleep(1) # Throttle, to avoid overloading Nominatim return wkt $$ LANGUAGE plpythonu VOLATILE COST 1000; -- Test: SELECT ST_AsGeoJSON(geocode('Vulkanstrasse 106, Zürich')); ``` *Listing 2: Geocode hack with Nominatim API in Python.* Discussion: * This is just the code. It isn't yet run. We'll do this within a single SQL query in step two below (BTW.: Pls. respect the [Nominatim free usage policy](https://operations.osmfoundation.org/policies/nominatim/)). * WKT -"Well Known Text" for encoding of geometries; example "POINT(47.36829, 8.54836)". * ```SRID=4326``` - Spatial Reference ID, an id for the coordinate reference system WGS84, known for lat/lon coordinates. * Pls. replace "geocode hack" from ```User-Agent``` with a user agent name specific to you. Else you risk being banned from the Nominatim service. ### Step two: Joining spatially Now - given we've the geocoding function in place - we do a spatial join. This means combining our data - like three addresses - with 3rd party geodata - like table tennis tables - using a geospatial software (like PostGIS). We will use a dataset of OSM Switzerland imported into a PostgreSQL/PostGIS instance using the osm2pgsql bulk loader tool. And that's what we' like to analyse: **Give me a list of five closest table tennis tables with related 'customers'. Output: customer_id, distance, osm_id + osm_type (Node, Way, Relation), geometry.**: * 1: Vulkanstrasse 106, Zürich * 2: Bellevue, 8001 Zürich * 3: Weihnachtsmann ``` WITH my_customers_tmp(id, address, geom) AS ( SELECT id, address, ST_Transform(ST_MakeValid(GeomFromEWKT(geocode(address))),3857) AS geom FROM (values (1,'Vulkanstrasse 106, Zürich'), (2,'Bellevue, 8001 Zürich'), (3,'Weihnachtsmann') ) address(id,address) ), poi_tmp AS ( SELECT osm_id, gtype, -- POI address would need to reverse_geocode! tags, way AS geom FROM osm_poi WHERE tags->'sport' IN ('table_tennis') ) SELECT my_customers_tmp.id as customer_id, ST_Distance(my_customers_tmp.geom, poi_tmp.geom)::integer AS distance, poi_tmp.osm_id, poi_tmp.gtype, ST_Transform(poi_tmp.geom,4326) AS geom FROM my_customers_tmp, poi_tmp ORDER BY my_customers_tmp.geom <-> poi_tmp.geom LIMIT 5; ``` *Listing 3: SQL query of the geocode hack with Nominatim API in Python and PostgreSQL/PostGIS.* Discussion: * This is a modern SQL query using a Common Table Expression (a WITH statement). It puts the geocoding result in a temporary table 'my_customers_tmp'. * The second temporary table 'poi_tmp', stores the table tennis tables, queries from base bale 'osm_poi', which is a view on osm_point and osm_polygon from a OSM databases containing Swiss data (osm_poi thus is similar to the "nwr and center" in Overpass query before). * The note "POI address would need to reverse_geocode!" tries to indicate, that one probably would want to get addresses for the POI - here: table tennis tables - found. ### Step 3: Doing the data analysis... This is the result of the spatial SQL query above: ![](https://md.coredump.ch/uploads/upload_b2d85cba7d4b239f9af295ca47e78f49.png) *Table 1: Restult of geocode hack with Nominatim API in Python and PostgreSQL/PostGIS.* Discussion: * Customer 2 from Bellevue is closest to a table tennis table, distance 438 meters. * Customer 1 from Vulkanstrasse is second, distance 693 m. * Customer 3, the Weihnachtsmann :santa: is'nt appearing in the data output of the query - the geocoder did not find an address for this place. ## Outlook - What's next? We've mainly used the following software: The Overpass API service and the PostgreSQL/PostGIS tool (see fig. 3). ![](https://md.coredump.ch/uploads/upload_30e57852d7a7a7246c0ed1b3b3231a67.png) *Figure 3: Decision tree of tools to access OSM data. Note: Blocks in red are the services and tools used in this document. (Source: Inspired by [GIScience News Blog, Sep 10th, 2020 by Moritz Schott](http://k1z.blog.uni-heidelberg.de/2020/09/10/the-future-of-working-with-osm-data/))* The advantage of the Overpass API service is, that it's a service and recent. The drawbacks of the service are that the data can't be larger than few thousand features, and weird and the limited query language. The advantage of PostgreSQL/PostGIS is the spatial SQL query interface (see p.ex. these [Tipps and Tricks](https://giswiki.hsr.ch/PostGIS_-_Tipps_und_Tricks)). The drawbacks of this tool are the installation time and local resources needed - plus the issue that PostGIS queries (nor osm2pgsql bulk loader) can't yet be parallelized - although still fast enough for many applications if used properly with the right indexes and operators (e.g. KNN '<->'). Beside the special webservice Overpass API called from Python, there are other approaches like SQL together with pgRouting which is another PostgreSQL extension like PostGIS. There is also an interesting university course which includes ["Download and visualize OpenStreetMap data with OSMnx"](https://automating-gis-processes.github.io/site/notebooks/L6/retrieve_osm_data.html#Download-and-visualize-OpenStreetMap-data-with-OSMnx). OSMnx is quite useful and the course covers many GIS topics, from vector overlay and network analysis to raster data analysis. Finally data is everything. So the question remains: How to discover more open geodata? There is no simple answer on this and requires much experience. See e.g. opendata.swiss or "[Freie Geodaten](https://giswiki.hsr.ch/Freie_Geodaten)" for start. Last but not least, the learning portal [OpenSchoolMaps](https://openschoolmaps.ch) should be mentioned again, which is being periodically updated with new learning materials. ## Bibliography and Resources Installing Jupyter software locally (Windows): 1. [Anaconda - software distribution tool](https://docs.anaconda.com/anaconda/install/windows/) 2. [Jupyter installation](https://jupyter.org/install) (more: https://mas-dse.github.io/startup/anaconda-windows-install/ , https://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html) Jupyter notebooks / How-to's: * Different open source tools for geospatial data visualization in Jupyter Notebooks (Python): https://medium.com/@bartomolina/geospatial-data-visualization-in-jupyter-notebooks-ffa79e4ba7f8 * OSMNx: ["Download and visualize OpenStreetMap data with OSMnx"](https://automating-gis-processes.github.io/site/notebooks/L6/retrieve_osm_data.html#Download-and-visualize-OpenStreetMap-data-with-OSMnx) (Python) * "A Guide: Turning OpenStreetMap Location Data into ML Features - How to pull shops, restaurants, public transport modes and other local amenities into your ML models." by Daniel L J Thomas, Sep 17 2020. https://towardsdatascience.com/a-guide-turning-openstreetmap-location-data-into-ml-features-e687b66db210 (Note: includes OSMnx, Overpass, K-D Tree to calculate distances) Software libraries and frameworks for OSM and Python: * Python Lib "OSMnx" (Line Network, POI): https://osmnx.readthedocs.io/en/stable/osmnx.html#module-osmnx.pois und https://automating-gis-processes.github.io/site/notebooks/L6/retrieve_osm_data.html * Other Python Libs: "Overpass API python wrapper" by mvexel: https://github.com/mvexel/overpass-api-python-wrapper . OverPy: https://github.com/DinoTools/python-overpy . * "Awesome OpenStreetMap" - A curated list of 'everything' about OpenStreetMap (not yet so complete as one would wish): https://github.com/osmlab/awesome-openstreetmap#python Overpass API and GUI: * Overpass API: https://towardsdatascience.com/loading-data-from-openstreetmap-with-python-and-the-overpass-api-513882a27fd0 * Howto use Overpass: https://gis.stackexchange.com/questions/203300/how-to-download-all-osm-data-within-a-boundingbox-with-overpass * Typical Overpass Query: ```` [out:json]; area["ISO3166-2"="CH-ZH"]; ( nwr[sport="table_tennis"](area); nwr[leisure="table_tennis_table"]; ); out center; ````