Our world is undergoing dramatic digital transformations with data generated at a never-before seen volume and velocity [1, 2]. These include data generated by mobile devices, satellite and ground sensors, social media and citizen-science platforms, coupled with cloud and high-performance computing and machine learning. Despite these technological, scientific and societal developments, we are not keeping pace with humanity’s greatest challenges to progress towards solutions.
Additionally, although our understanding of planetary-scale processes has improved, we are far from being able to accurately track key dynamics and critical thresholds across diverse scales and drivers. Key processes, entities (e.g., nations, watersheds, households, ecological communities) and their interdependencies across scales are far too complicated for individual human brains to disentangle [9]. Simultaneously, today’s repositories of human intelligence, such as the scientific publication system, fall short in connecting the pieces of knowledge produced by different fields. AI assistance offers a path forward.
To provide needed decision support, AI must ultimately simulate Earth as a real-time, dynamic system composed of nested social-ecological systems. A “digital twin Earth” has been included in recent communications on the European Green New Deal [2]. The idea of building a simulation of the planet has been proposed in different forms by global (e.g., UN Environment, Group on Earth Observations), European and U.S. institutions (e.g., European Commission and European Spatial Agency, the U.S. Geological Survey and NASA), and the private sector (e.g., Microsoft AI for Earth, Google Earth Engine). However, these are mostly understood as massive machine-learning efforts built on Earth observations from a wide range of sources, with limited attention paid to semantics and machine reasoning. Recently, a global digital ecosystem for the planet was proposed by the UN Environment Programme as “a complex distributed network” consisting of four key elements: (1) data, (2) algorithms and analytics (i.e., models), (3) supporting technological infrastructure and (4) insights and applications [10]. A primary technological bottleneck in building such cyberinfrastructures, which aim to bring data, models and processing power together in various clouds, is how to make independently produced data and models seamlessly interoperable?
We argue for a solution built upon semantics and machine reasoning [11, 12] (see Box 1). AI research points toward a convergence of technologies (machine reasoning and machine learning, geospatial intelligence, data analytics and visualization, sensors and smart connected objects) to sustain governance platforms in natural and social systems [13]. Machine reasoning is driven by facts and knowledge that can be used to validate and link information using logical inference [14]. Concepts, entities, their relationships and (to some extent) behaviours are described in shared documents (ontologies) that establish a logical foundation to consistently annotate web-accessible data and model resources. This knowledge base, paired with AI, could bring the FAIR principles to full fruition. Such AI can help harness the complexity of integrating independently produced data and models with the goal of maximizing human well-being and restoring ecosystem functioning [15]. Multidisciplinary semantics that are explicitly engineered to support reasoning can make human knowledge interoperable at a large scale and in distributed fashion, so that machines can assemble it to address complex social-ecological issues. Widespread use of semantics would vastly improve the status quo, where inconsistent and imprecise use of terms across different fields impedes the synthesis of scientific evidence (e.g., [16]).
By labelling peer-reviewed, web-based scientific information in ways readable by both humans and computers, and using common standards for machine-actionable data and models, machines can search, organize, reuse and combine information quickly and in novel ways—i.e., a semantic web of knowledge [17, 18]. Achieving this will require several actions on the part of scientists that go beyond the state of the practice for today’s open science. For example, the Artificial Intelligence for Environment and Sustainability Project (ARIES, [19]) described below provides infrastructure to enable these steps. Specifically, key elements in ARIES enable (1) data and model developers to expose and maintain knowledge resources as independently hosted and open web services using networked architecture, open standards and application programming interfaces (APIs); (2) consistent semantic annotation practices that can be applied by data and model developers, who can concurrently participate in the development of ontologies, while producing more modular models carrying documentation and appropriate reuse conditions; and (3) a vision of a peer-to-peer network hosting content available for machine-actionable synthesis, with institutions maintaining interoperable data and model resources over time. More details on each of these steps can be found in Villa et al. [20].
This approach connects existing, web-accessible data and models, so that new multidisciplinary scientific knowledge can be generated from them on demand, complementing much slower human-driven model coupling and reuse [5]. AI-supported, on-the-fly assembly of scientific workflows enables the incorporation of newly produced data sources as they become available on the network, reducing latency and providing a path toward much needed near-real-time modelling. Widely used semantics call for open, transparent and well-documented models, forcing a simple and modular model coding style where encapsulated documentation can be made mandatory. In this way, integrated computational workflows can collect and process information about each individually documented modelling component, delivering fully transparent assessments to model users [20].
In the face of widespread use of, and publicity for, “big data-driven” machine learning [9], we believe wider understanding and use of semantics and machine reasoning in scientific modelling is critical to addressing today’s sustainability challenges. Approaches such as ARIES have demonstrated how semantics can maximize data and model reusability and interoperability when assessing ecosystem services and, more generally, in modelling complex human-nature interactions and their consequences.
Notably, ARIES has been applied to the System of Environmental Economic Accounting (SEEA)—an international statistical standard used to measure linkages between national economic accounts and natural capital stocks and ecosystem service flows in physical and monetary terms, as well as information on the extent and condition of ecosystems [21]. ARIES for SEEA was released in April 2021, and it is accessible at https://seea.un.org/content/aries-for-seea. It provides a common platform to make data and models interoperable and improve the ability of National Statistical Offices to automate the compilation of environmental-economic accounts and related indicators, which requires the ability to integrate national statistics and spatial data and models. ARIES for SEEA thus demonstrates a path forward for better synthesizing the information required to monitor complex linked social-ecological systems through indicators such as the Sustainable Development Goals.
Semantic-driven integration technologies, such as ARIES, offer six critically needed advantages to twenty-first century interdisciplinary science and decision-making, and pioneer a new generation of distributed digital infrastructure to integrate independently produced data and models served online—a web of scientific observations with the capability to:
-
1.
Combine independently produced scientific products into workflows that would be too complex for individual humans to conceive, validate and navigate.
-
2.
Integrate different modelling paradigms from simple (e.g., deterministic and probabilistic models) to complex approaches (e.g., agent-based and networks) depending on context and scale.
-
3.
Rescale smartly across scales, from local to global, promoting adaptive solutions that are automatically customized to the scale of observation.
-
4.
Flexibly incorporate the best-available knowledge, from curated global public datasets to “big data” to user-provided data.
-
5.
Adopt common, non-ambiguous semantics in both the implementation and delivery of products.
-
6.
Track quality and uncertainty throughout modelling workflows.