Mingke Erin Li

Summary and Reflections on the OGC DGGS AI Pilot Project Panel Discussion

This post summarizes the panel discussion held as part of the OGC DGGS × AI pilot project. The conversation brought together pilot contributors and geospatial AI researchers to reflect on the current state and future potential of combining Discrete Global Grid Systems (DGGS) with artificial intelligence.

From Pilots to Possibilities

The discussion started from specific pilots, such as DGGS-based flood risk and common operating picture applications, then moved into more speculative territory involving geospatial foundation models, attention on DGGS hierarchies, and vector databases of zone embeddings. Ontology and semantics reappeared throughout, especially when panellists discussed similarity between zones or the selection of services and datasets for reasoning workflows.

Michael (GeoInsight)

Michael framed the intersection of DGGS and AI in two complementary ways: first, LLMs that query DGGS services, as in the pilot, where the model simply calls DGGS APIs to retrieve zonal data; second, DGGS zones as tokens for AI models, which moves closer to classic machine learning and custom model training. He emphasized DGGS as a spatial tokenization of the world that aligns with the GeoAI idea of spatial tokens, letting us represent space as discrete, reusable units similar to text tokens. Practically, he described building tables where each zone accumulates attributes from many datasets, some zones are labeled with phenomena like flood or high risk, and then models are trained on these structured representations. Michael also stressed that DGGS can be seen as a graph whose nodes are zones and edges encode adjacency and parent-child relationships, which is very suitable for graph neural networks. Each zone can further be represented as a vector of attributes - a digital fingerprint capturing climate, population, elevation and other values - so that similarity is defined by similar fingerprints, even when zones are far apart geographically. In his view, DGGS shows its true power at scale when many datasets are integrated into rich multi-dimensional profiles per zone.

Jérôme Jacovella-St-Louis (Ecere)

Jerome focused on the potential of DGGS subzones as a bridge to modern deep learning architectures, especially vision transformers and graph-based models. He highlighted an underused API capability that returns a parent zone together with all its subzone values at high throughput, enabling millions of values to be handled at once. Conceptually, he suggested treating each parent zone and its subzones like an image patch, where every subzone is a token and attention layers learn relationships among them inside that spatial tile. This mirrors how transformers process sequences or images: alternating attention and feed-forward layers build deep contextual representations, but now over hexagonal or rectilinear DGGS cells instead of pixels. Jerome proposed that models could move up and down the resolution hierarchy, aggregating or drilling down as needed, and that this structure would be particularly powerful once combined with vision transformer-style architectures and graph neural networks over DGGS adjacency graphs. He framed all of this as a promising future direction, not something achieved in the pilot, but a key opportunity to marry DGGS structure with modern AI.

Sina Taghvakish (OGC)

Sina emphasized strategic and practical aspects of DGGS in the AI context. He argued that societies should implement DGGS once and then reuse it across projects rather than repeatedly reinventing spatial frameworks, to protect public investments in geospatial infrastructure. In his view, DGGS turns maps into machine-readable facts that AI systems can directly ingest: instead of loose spatial layers, we get cells that say which people, roads, assets and hazards they contain. Using floods as an example, he described how DGGS produces cells that already encode populations, infrastructure and risk metrics, making them immediately actionable for decision support and AI-driven analysis.

Nathan McEachen (TerraFrame)

Nathan stressed that while LLMs clearly bring value, especially for interaction and some reasoning, they have limitations when it comes to deep geospatial reasoning over DGGS zones and multi-domain attributes. He noted that the team experimented with LLM-based systems and observed both benefits and constraints, particularly in findability and structured reasoning over zonal data. Nathan pointed to a broader landscape of geospatial foundation models that are not purely text-based and said he is actively investigating these for DGGS contexts, because understanding content inside zones and across domains may require paradigms beyond text semantics. He also raised a key technical question about representing DGGS zone values in vector databases: unlike text embeddings, where similarity is well understood, DGGS embeddings must somehow combine cell location, resolution and domain values, and it is not obvious how “closeness” in that vector space should be interpreted geographically or across multiple themes. To make such embeddings meaningful, Nathan argued that attribute fields must be tied to formal ontologies so that terms like “population” in a user query can be reliably mapped to the correct fields, enabling robust semantic reasoning over DGGS-based vectors.

Dean Hintz (Safe Software)

Dean reflected on the pilot from a workflow and orchestration perspective, describing both the value DGGS delivered and the limits of their current approach. In the Terraframe use case, DGGS helped identify initial high flood risk zones that then served as inputs to a spatial knowledge graph, which inferred downstream impacts on population, infrastructure and other factors; DGGS thus functioned as a trigger and a clear spatial unit for subsequent AI reasoning. However, he pointed out that this chain was largely prescriptive: they followed a fixed pipeline from flood risk detection to predefined impact queries, rather than exploring more flexible, AI-driven orchestration. He envisioned next steps where AI agents start from natural language questions like “What natural hazards affect this region?” and dynamically choose which DGGS services to call, at what resolutions and in what sequence. Dean characterized LLMs in the pilot as mainly chat front ends that translated natural language to DGGS and knowledge graph queries, while precise reasoning happened in structured back-end models, and he suggested that future work will need smaller or domain-specific language models tuned to hazards and risk, preloaded with relevant data sources, vocabularies and rules. He also distinguished between the one way flow they implemented - from text to DGGS lookup - and a richer iterative loop where LLMs repeatedly use zone IDs and subzones as stable, linkable entities, accumulating knowledge about them over time, an evolution that will depend on better ontologies for DGGS, domains and services, plus standardized metadata so AI can determine which DGGS services are relevant to a given question.

Lucio Colaiacomo

Lucio reinforced the importance of graph-based representations and ontologies layered on top of DGGS. He advocated transforming textual and numeric information into graph structures where nodes and relationships are defined according to formal ontologies, with DGGS zones serving as the spatial anchors for those graph nodes. In this perspective, each zone becomes a hub that connects many information layers: environmental measurements, infrastructure, socio economic data and more, all linked through a graph database. Lucio also emphasized that ontologies need to be dynamic, evolving as new data sources and query types emerge, so that the knowledge graph can adapt over time while remaining coherent.

Stelios Contarinis (ER Editor)

Lucio reinforced the importance of graph-based representations and ontologies layered on top of DGGS. He advocated transforming textual and numeric information into graph structures where nodes and relationships are defined according to formal ontologies, with DGGS zones serving as the spatial anchors for those graph nodes. In this perspective, each zone becomes a hub that connects many information layers: environmental measurements, infrastructure, socio economic data and more, all linked through a graph database. Lucio also emphasized that ontologies need to be dynamic, evolving as new data sources and query types emerge, so that the knowledge graph can adapt over time while remaining coherent.

Personal Reflections

A first reflection is that DGGS can act as the stage for a geospatial foundation model by turning each cell into a spatial token in a way that is directly comparable to text tokens in a language model. I like the term ‘spatial tokenization’. The DGGS cell identifier is the discrete token, and the many attributes that can be attached to that cell, such as elevation, temperature, population, land cover or emissions, form an input feature vector that plays a similar role to the initial embeddings in a transformer. The equal area property helps avoid giving unfair weight to some parts of the globe, but the main advantage is the consistent, multi-scale grid and its explicit neighborhood structure. In a transformer, self-attention lets each token look at all other tokens and decide which ones matter most when updating its own representation. On DGGS, an adapted attention mechanism can let each cell look at its neighbors and its parent or child cells and learn which relationships are most informative, for example, strong upstream-downstream links along a river or patterns in fine-scale subcells inside a larger region. Jerome’s patch idea fits naturally here, where a parent cell and its subcells form something like an image tile that a vision-style transformer can process. If such a DGGS based model is trained on large collections of spatial attributes and time series, with objectives like predicting masked attributes or future states from surrounding cells, it could start to internalize geospatial knowledge in ways similar to how large language models internalize grammar and facts, for example learning that low lying coastal urban cells tend to be flood prone, that certain industrial zones share distinctive emission profiles, or that particular climate and land cover combinations are associated with specific vegetation patterns.

A second reflection is that ontology is crucial if DGGS-based representations and similarities are to be meaningful. Simply assigning a variable like population to two cells says very little unless the meaning of that variable is specified, for example, which year it refers to, whether it counts usual residents or daytime population, which statistical method was used, and which agency produced the data. Ontologies and domain-specific schemas encode these details and the relationships between variables so that a system can tell when two attributes are comparable and how they should be combined. In a DGGS context, this means an ontology can guide which attributes are included in the feature vector for a given task, how they are normalized, and how similarity between cells is defined, for example, for flood vulnerability, emission profile, or habitat type. EmissionML shines in this exact way. When such ontological information is linked to DGGS attributes, similarity between cells is no longer just the result of a generic embedding distance, but is grounded in shared definitions of concepts like population, emission sector, or risk indicator, which in turn improves the interpretability and transferability of any DGGS-based foundation model.