Derive Metadata from Usage

Overview

GeoNode should be able to derive information about a dataset from the way that it is used, without requiring users to directly input information.

Use Cases

Currently:

Dozens of users have uploaded several layers to a GeoNode site and used them in maps illustrating various hazards for various regions. Being regular humans on tight deadlines, they provide no metadata for these layers (after all, they can build their maps without it!) Now Tom goes to put together an aggregate hazard map for a larger region, but due to minimal metadata on the layers he has trouble accumulating the data he needs. Tom is frustrated, since he can see in the map viewer that there is a ton of hazard data in the site, and gives up.

With this feature:

Those same users have uploaded those same layers to the same GeoNode site and built the same maps on top of them. This time though, GeoNode is able to add some keywords based on the title and descriptions of the maps that include layers. Now, when Tom does his search for hazard data, all of those layers show up because of the maps they are in (and maybe some kindly users have added them to tags or topic lists of their own.) Tom is able to easily build up his map this time.

Specification

This feature should be minimally visible to users and administrators alike, although perhaps administrators and data owners should be able to view statistics about things like frequency of usage, and auto-extracted tags for data.

Inferred metadata should be statistically weighted based on things like how many users have added a tag, how many maps include a layer, etc. Explicitly provided metadata should have a higher weight than metadata derived from only a handful of observed usages.

Technical Details

(will link to feature specification and technical specification when they are complete)
Some more research and fleshing-out of these ideas is needed, but for starters, this feature would require:
* work in GeoNode to record usage of the data layers in maps and directly provided “social” metadata like tags and topic lists
* work in GeoServer to record usage of data layers that bypasses the GeoNode webapp
* some processing (probably batched) to actually extract meaningful statistics from a sea of observations

Estimated costs

Ballpark costs to be determined