The software Covidence will be used to manage the review process, which will occur in two stages. After removing duplicates, titles and abstracts of all papers will be reviewed to exclude those that do not meet the eligibility criteria at this first stage (see “Eligibility criteria” section below for more detail). For the papers that meet the eligibility criteria in the first stage, a second stage screening will follow, where we will skim the full text following the eligibility criteria set out below. Metadata coding will be conducted for the papers that pass the second stage of screening. None of the systematic reviewers have authored articles within the scope of this review, so there is no concern about excluding reviewers from screening their own papers.
The review team will perform a calibration exercise on a random sample of ten studies to ensure that they agree on the inclusion and exclusion criteria . Once the team reaches 90% inter-reliability agreement, two reviewers will independently screen titles and abstracts (stage 1). Disagreements for stage 1 will be resolved by discussion between the two reviewers or by the tie-breaking vote of a third reviewer. Following the completion of title and abstract screening, the team will perform another calibration exercise, with another 10 papers, with the goal of reaching 90% agreement. The full text of studies will then be screened for eligibility, and conflicts for full-text screening will be resolved by a third reviewer .
We will skim papers to determine if they address the following eligibility criteria (derived from research question components per Table 1): (1) settings: study is situated in the context of or related to environmental governance and/or management; (2) phenomena: study has a focus on ecosystem services as a tool for collaboration, (3) outcomes: study describes a collaborative process resulting from applying the ecosystem services concept as a tool or approach. (4) Additional criterion: studies will be eligible for inclusion if they discuss the ES concept being used in some type of collaboration or group process in a substantial manner, i.e. this includes empirical studies, both those using experiments or games as well as those using real-world cases. Only English-language studies will be eligible, due to the knowledge constraints of the reviewers. We have defined ‘substantial manner’ to mean that the paper mentions that either the objective or result of using the ES concept was to affect the nature of interactions among members of a particular group. If the ES concept is used in a more results-focused way, such as an analytical framework (e.g. ES assessments) or policy tool (e.g. payments for ecosystem services), the paper will be excluded. A grey area would be, for example, papers that discuss workshops held with stakeholders for the purpose of ranking or discussing ES. If such a paper discusses the quality of the interactions among the stakeholders substantially (more explanation than a single sentence), they will be included. Excluded articles will be included in a list (including reasons for exclusion) that will be available as supplementary information. Additional file 4 contains the framework used to screen, exclude, and check reporting quality of studies; an example is provided as Additional file 5, and screening criteria are diagrammed as a coding tree in Additional file 6. Any articles with missing data will be noted; however, due to the nature of the evidence under review, missing data will not be considered a critical issue and not necessarily necessitate exclusion from the review.
Critical appraisal of study validity assessment
We will not be conducting a study validity assessment, which is in line with accepted methodological guidance for systematic maps , instead, we will extract information about reporting quality. The reporting quality of each paper will be rated as ‘red flag’, ‘green flag,’ or ‘undetermined’. Assessment will be conducted using the following criteria for papers included in the review:
The paper contains a complete and detailed description of methods.
The paper makes data available.
The paper draws conclusions using suitable data.
There are no red flags that would be of clear concern to an academic peer reviewer, such as personal conflicts of interest, private sector funders of the study, etc.
Any paper that does not pass the test of exemplifying these four criteria will be labeled with a ‘red flag.’ These ‘red flag’ papers will be discussed between the two reviewers, and if the second reviewer agrees with the first reviewer’s assessment, then the study will be marked with a ‘red flag’ in the dataset. This will not cause it to be excluded from the study but is simply so that a note is made and these notes are mentioned in the study synthesis and presentation. Studies will be labeled ‘undetermined’ if the reviewers deem that they cannot determine whether the study passes the three criteria above. Grey and practitioner literature will also be critically assessed for reporting quality in this way.
Data coding strategy
Relevant metadata consist of key information that will be extracted from each included study where ES was used as a tool to foster collaborative governance and management. Our data coding strategy is based on a similar strategy created by Lemasson et al. . Metadata will be coded for each respective study that has passed screening requirements and will include a unique identifier. Studies will be coded by their bibliographic information (article reference, year, and publication journal) and reviewer information; the full framework and an example are available in Additional Files 4 and 5 respectively. These studies and extracted metadata will be used to create an open-access database. If data are missing, we will write to the corresponding author (or if no corresponding author is indicated, then the first author) of the paper to obtain or confirm missing or unclear data.
The repeatability of the data coding process will be tested by both reviewers independently entering metadata for 10 papers and then checking for discrepancies. Any discrepancies will be discussed to determine how to code, and a third reviewer brought in to resolve disagreements.
The included papers will be reviewed and the following data will be extracted from them (i.e., “core questions”):
What tool/mechanism is the focus or primary focus or example in this paper?
Why was this tool chosen for this study?
How was this tool empirically tested in a collaborative process (case study, experiment, etc.)?
What did the study find regarding the usefulness of ES as a tool in a group process?
If clearly stated in the paper, what challenges, if any, were mentioned about the process of implementing the tool in a collaborative process?
These data will be tracked by reviewers in a common spreadsheet. Papers will be coded inductively and grouped by type of tool or mechanism in order to understand author objectives and conclusions about utilizing each type of tool. Our data coding framework is available as Additional file 7.
Study mapping and presentation
The findings and respective metadata from this study will be available through Environmental Evidence in an open-access database and as a systematic map. Descriptive statistics (frequencies, mode, range) may be used to describe the nature of the body of literature reviewed (i.e. the metadata). The data collected that address the core questions of interest will be coded using an inductive, in vivo approach  and presented as themes emerging in response to the core questions above. A narrative synthesis of metadata will be conducted from eligible studies , and figures or tables will be used to present the results as the need arises. Figures will be particularly useful for highlighting knowledge gaps and knowledge clusters, as described in Lemasson et al. . Knowledge gaps will be identified where one or more of the core questions of interest cannot be (fully) answered from the available evidence from the systematic mapping review. These findings can be used to inform decision-makers and spawn further research into the potential uses of the ES concept as a collaboration tool through identification of knowledge gaps. Specifically, this protocol could be used to address systematic review questions such as “How does the ES concept affect decision-makers’ values and priorities in environmental governance,” and “How does the ES concept relate to human-nature relationships?”.