Work towards supporting data sets containing more records
|Assignee:||Andy Dufilie||% Done:|
|Category:||Internal Code Refactoring|
|Required by:||Grand Rapids, Michigan|
#3 Updated by Andy Dufilie over 6 years ago
Here's what the code is doing now.
There are two sets of tiles, geometry tiles and metadata tiles. The geometry detail is not downloaded until you change the zoom so you would be able to see the shapes. When you first load a geometry column, it makes requests for the complete list of "metadata tiles" which contain keys and bounding boxes. This will take a long time if there are a lot of records. This is easily changed on this line of code to make it only download the metadata tiles that would actually be renderered at the current zoom level.
One problem is that if we don't download all the metadata tiles and you want to query your custom radius or polygon, it won't get the keys that it hasn't downloaded the metadata for yet, since the spatial query is computed on the client side. Also, when you make a selection rectangle on the map and you haven't downloaded the keys yet, you won't get the same selection result compared to when you have all the keys and bounding boxes downloaded.
Another problem is that we don't cancel any download requests yet. There's no logic in the client that determines whether or not it still needs a pending download. That logic can be added, along with logic that prioritizes the parsing of the already-downloaded data based on the current zoom. Another thing we can do is store AMF3-encoded objects that wouldn't have to be parsed as much by the client (AMF=ActionScript message format).
When you import a shapefile in Admin Console, it creates a table containing 3-dimensional bounding box information (x,y,z) for each tile and and a binary blob of data for the tile. The 3-D bounding boxes for the tiles are sent to the client so the client knows which tiles it needs to request. Based on the current zoom level of the map, individual tiles are requested as needed. The geometry data (coordinates for the polygon vertices) are not readily available to be queried on the server. Right now, in order to compute a polygon intersection, the binary blobs must be downloaded and parsed on the client side. If you want to do spatial queries on the server side, we may be better off using a WFS server, since all that functionality is already implemented. If we start using server-side spatial queries, it would require a refactoring on the client probing/selection code because it then becomes an asynchronous query instead of an instant client-side KD-tree query.
I designed this geometry tile system early on in the project (about 2.5 years ago) and it works well for large shape files containing a small number of very detailed polygons (for example, a 40-mb US States shapefile displays very quickly). It does not use specific GIS functionality available in PostGIS or other databases because we don't want to force users to use a specific database. I don't know all the details about it now, but when I tested the spatial querying features of PostGIS/MySQL, they were too slow (and MySQL only supports bounding boxes).
The problem we are having is related to a large number of records. This problem is not just limited to shape files -- the attribute columns in Weave do not support a large number of records either. Any improvement we make to the geometry support would not improve the performance of normal string/number column data.
#4 Updated by Andy Dufilie over 6 years ago
I've added an option to request only the bounding box information visible at the current zoom level (now enabled by default, changeable through the global settings panel under "Advanced").
Selection and probing on the map will not catch the shapes that are too small to be seen unless the bounding box info has been downloaded. To see this occur, follow these steps:
1. Open the Boston demo and wait for the shapes to finish downloading.
2. Draw a small selection rectangle inside Boston.
3. Draw a small zoom box inside Boston where you made the selection.
The smaller shapes will download when you zoom in and you will see that the small ones are not selected because their bounding boxes weren't there at the time you made the selection.
These demos still have the problems I've mentioned before about not cancelling downloads or prioritizing the parsing of the shapes.
#7 Updated by Andy Dufilie about 6 years ago
We are currently making incremental changes based on profiling and our understanding of various inefficiencies to improve the performance of the code, but it is unknown whether or not Weave will be able to fully support 300,000 records in the current rendering system by the requested deadline (end of 2011).
One option is to start using WMS and WFS for displaying and quering the shapes. In that case, we would have to create an adapter for the particular WMS server you would use, and we could write asynchronous wrapper functions for querying the WFS service.
Chris, what do you think of a WMS/WFS solution? Do you already have data on a server that supports those protocols?
#8 Updated by Chris Stefanich about 6 years ago
Currently we only have a WMS server setup using Mapnik 0.7.1 and ogcserver (more info can be found here: ttps://github.com/mapnik/OGCServer). We do not have a WFS server setup nor have we ever set one up so it would be a new learning curve/experience for us to do that.
We do have geography data (not indicator data) loaded into postgis so it could easily feed our wms to make the raw geography tiles (like blocks) but not fill it in with the indicator data. I think we would need to know more about the proposed solution and whether it would behave in a similar manner as the rest of weave does.
#12 Updated by david percy about 6 years ago
I swear Andy said on the conference call yesterday that it does!
We even talked about support for the wmsGetFeatureInfo request!
I'm happy to make a feature request, but some clarification of what we were talking about on Wednesday would be useful first...
#16 Updated by david percy about 6 years ago
The different implementations that you run into are the TILING schemes, again a reference to all of the different layer types in Openlayers will help enumerate these. So ArcGIS server has one tiling scheme, OpenStreetMap has one, etc.
There's an OGC initiative that standardizes tiling, and it's supported by several open source products...
I'll go file that feature request now :)
Kyle Monico wrote:
Weave doesn't support custom WMS currently. Many WMS providers use their own formats for requesting and encoding tiles. If you want it, please make a feature request :)
#17 Updated by Andy Dufilie about 6 years ago
The spatial index was being recreated too many times and it was being created all at once instead of asynchronously. I've changed it so it is now asynchronous, and the interface is now more responsive. It was unresponsive previously because ActionScript is single-threaded. Overall the idea is to eliminate unnecessary duplicate or extra work and make long computations asynchronous.
#18 Updated by Andy Dufilie about 6 years ago
- Subject changed from Work towards supporting data sets containing 500,000 records to Work towards supporting data sets containing more records
- Description updated (diff)
I'm changing the subject of this issue because the existing one is too vague. It doesn't mention anything about the content of the records (number of columns? data type?) or how many visualizations of what type would be used. These things make a big difference.
#19 Updated by Chris Stefanich about 6 years ago
That is a significant speed increase to show the geometries. Once we added an indicator to shade the map, however, it slowed way down:
I downloaded and compiled this build yesterday around 3:30 or so.
Making good progress, thanks!