Response of chlorophyll a to total nitrogen and total phosphorus concentrations in lotic ecosystems: a systematic review protocol

Background: Eutrophication of freshwater ecosystems resulting from nitrogen and phosphorus pollution is a major stressor across the globe. Despite recognition by scientists and stakeholders of the problems of nutrient pollution, rigorous synthesis of scientific evidence is still needed to inform nutrient-related management decisions, especially in streams and rivers. Nutrient stressor-response relationships are complicated by multiple interacting environmental factors, complex and indirect causal pathways involving diverse biotic assemblages and food web compartments, legacy (historic) nutrient sources such as agricultural sediments, and the naturally high spatiotemporal variabilityof lotic ecosystems. Determining nutrient levels at which ecosystems are affected is a critical first step for identifying, managing, and restoring aquatic resources impaired by eutrophication and maintaining currently unimpaired resources. The systematic review outlined in this protocol will compile and synthesize literature on the response of chlorophyll a to nutrients in streams, providing a state-of-the-science body of evidence to assess nutrient impacts to one of the most widely-used measures of eutrophication. This review will address two questions: “What is the response of chlorophyll a to total nitrogen and total phosphorus concentrations in lotic ecosystems?” and “How are these relationships affected by other factors?” Methods: Searches for published and unpublished articles (peer-reviewed and non-peer-reviewed) will be conducted using bibliographic databases and search engines. Searches will be supplemented with bibliography searches and requests for material from the scientific and management community. Articles will be screened for relevance at the title/abstract and full text levels using pre-determined inclusion criteria; 10% (minimum 50, maximum 200) of screened papers will be examined by multiple reviewers to ensure consistent application of criteria. Study risk of bias will be evaluated using a questionnaire developed from existing frameworks and tailored to the specific study types this review will encounter. Results will be synthesized using meta-analysis of correlation coefficients, as well as narrative and tabular summaries, and will focus on the shape, direction, strength, and variability of available nutrient-chlorophyll relationships. Sensitivity analysis and meta-regression will be used to evaluate potential effects of study quality and modifying factors on nutrient-chlorophyll relationships.

Biota integrate impacts over time and so can better represent ecological condition compared to snapshot water quality measurements [21][22][23][24]. Environmental managers often use this biological information to evaluate impacts of chronic pollution (e.g. [25]). However, high spatiotemporal variability and other factors (e.g. those mentioned above) can mask links between nutrients and biota [26]. A synthesis of nutrient stressor-response relationships and how these relationships are modified by other factors could aid the setting of regulatory limits and identification impacted systems based on biota (e.g. [27]).
Algae are the main primary producers in lotic systems, and algal biomass is expected to be one of the first ecological endpoints to respond to nutrient pollution [28]. Increases in algal biomass are also associated with many of the negative human health and ecological consequences of eutrophication, such as reduced drinking water quality [29,30] and altered species composition [4]. Chlorophyll a (chl-a) is a photosynthetic pigment used to measure algal biomass [31]. In streams and rivers, researchers may sample benthic chl-a from hard substrates or sestonic chl-a from the water column [31,32] to determine chl-a concentrations.
This systematic review will compile and synthesize literature on chl-a responses to nutrients in streams and rivers, to provide a state-of-the-science body of evidence for assessing nutrient impacts. The review focuses on total nitrogen (TN) and total phosphorus (TP) concentrations in the water column. These constituents were selected for both ecological and practical reasons. Although dissolved nutrient forms may be more available for immediate uptake by biota, total nutrient forms are often more highly correlated with chl-a [28]. Dissolved forms may undergo rapid uptake and release by primary producers, such that concentrations of dissolved nutrients in the water column may not represent true availability [33,34]. In contrast, total nutrient forms may best represent trophic state and nutrient limitation in most lotic ecosystems because TN and TP account for N and P held within algae and sediment particles and thus represent integrated measures of biologically available nutrients [26,34,35]. TN and TP are also the most common nutrient measures used by environmental managers in the United States and around the globe to assess eutrophication of lotic ecosystems [36]. This review was motivated by a need for comprehensive information on stressor-response relationships to aid water quality scientists at the U.S. Environmental Protection Agency (USEPA) and state environmental agencies in better understanding the effects of nutrient pollution. In several meetings held during 2016-2017, these potential end users helped refine the scope, specific questions and objectives (including the relevant population, exposure, and outcome) of the systematic review, and the modifying factors of interest.

Objective of the review
The primary question addressed by this review is: What is the response of chl-a to TN and TP concentrations in lotic ecosystems? The nutrient stressor (TN or TP) and biotic response (chl-a) were chosen based on measures commonly used by U.S. state agencies to evaluate and make regulatory decisions about impairment of lotic ecosystems due to eutrophication. This question consists of the following components:

Population:
Lotic fresh waters, or mesocosms that mimic these systems, in any geographic location.

Exposure:
Concentration of TN or TP. We define TN as the sum of ammonia N, nitrate N, nitrite N, and organic nitrogen forms; we define TP as the sum of dissolved and particulate phosphorus forms.

Comparator:
Control group (no added TN or TP, or low exposure to TN or TP) (for experimental studies), or comparison to lower or higher TN or TP concentrations across a gradient (for observational studies).
The secondary question addressed by this review is: How are the relationships identified in the primary question affected by other factors? An initial list of potential modifying factors is provided below (see "Methods" and "Potential effect modifiers and reasons for heterogeneity"); others may be added as studies are examined in more detail.

Search strategy
Search terms and filters-Bibliographic databases will be searched using a combination of terms representing the nutrient stressors (TN or TP), the biological response (chl-a), and habitat-or study-specific terms (e.g. terms associated with types of lotic fresh waters and experimental stream studies) ( Table 1). Databases vary in how they handle search strings, so searches will be adapted as needed for each search. An appendix of search strings used for each database will be provided in the full systematic review (see Additional file 1 for an example based on the Web of Science ™ database). Books, book chapters, pamphlets and conference abstracts will be excluded from consideration unless they are submitted through calls for additional information (see "Supplemental searches"), because they generally do not have sufficient relevant primary data and results to extract, and non-electronic library resource limitations prevent a full evaluation of these resources. No language restrictions will be applied to database searches, and any other filters used for specific databases (e.g. excluding full text search to limit irrelevant literature) will be detailed in the full systematic review.
Databases-At least 16 bibliographic databases, representing peer-reviewed, non-peerreviewed, and unpublished material, will be searched to obtain articles for the review (Table  2). When databases limit the search results that can be viewed or downloaded, results will be filtered by year, when possible, to obtain subsets for viewing and download. Due to limitations on batch downloading of citations, three databases (DART, National Technical Reports Library, and OpenGrey) will be treated similarly to website searches and the first 50 items returned (for separate searches for TN and TP) will be examined (see below) ( Table  2).
Specialist websites-Websites of the following organizations will be searched for relevant literature: The first 50 items returned, sorted by relevance, will be examined for each search. For websites without a search function, relevant "publications" sections will be examined to find documents. Because many websites do not accept Boolean search strings, separate searches will be conducted for TN and TP, and a smaller set of terms will be used each of these searches. All website searches will be documented in a spreadsheet that will include the search date, the specific web URL and search terms used for each site, any website subsections used, the total number of items returned, and the number of items deemed relevant. Although the specialist website list is biased toward western countries, resource constraints limit our ability to search more broadly in non-English speaking countries. The "Supplemental searches" will be used to increase capture of relevant articles from other countries.
Search engines-Searches using Google and Google Scholar will be conducted, and the first 50 search results will be examined for relevance as with website searches. Separate searches will be conducted for TN and TP, and search terms used for each search will be documented.
Supplemental searches-To supplement these searches, additional resources will be requested from colleagues with disciplinary knowledge and through ECOLOG-L, Twitter, and ResearchGate. "Snowball" searches will also be conducted: references that cite or are cited by a small set of highly relevant literature (see below) will be compiled and any novel references not found during database searches will be evaluated.
Reference management-Articles returned by the search strategy will be stored in an EndNote library. Duplicate entries will be removed, and an initial title screen within EndNote will be used to remove entries that are clearly not relevant (e.g. Front Matter, Meeting Programs and Abstracts, Books Reviewed). The number of entries removed will be recorded. The remaining articles will be imported into the Rayyan software [37] (http:// rayyan.qcri.org/) for title/abstract screening.
Assessing search comprehensiveness-Comprehensiveness of the search strategy will be assessed by: (1) determining whether all articles in a predetermined "test set" of approximately 15 relevant papers per stressor-response relationship (i.e., TN-chl-a, TPchl-a; Table 3) are found with the search strategy; and (2) examining bibliographies of these "test set" papers, and papers that cite the "test set" papers, to determine whether relevant citations are captured in our search. If articles are missed, the search strategy will be evaluated and revised accordingly. The "test set" was created by searching the authors' personal libraries for highly relevant articles until at least 15 papers per stressor-response relationship were obtained, and includes both journal articles and reports (Table 3).

Article screening and study inclusion criteria
Screening process-Before screening all articles, consistency in applying inclusion criteria will be evaluated on a subset of articles using the kappa statistic (ranging from 0 to 1, with 1 indicating complete agreement [38]). Two to four reviewers will assess the same randomly-selected set of 10% of studies to be screened (minimum 50, maximum 200) at the title/abstract level. Kappa will be calculated, using modifications for more than two raters if necessary [39]. If kappa is low (<0.50) [40], reviewers will examine inconsistencies and clarify inclusion criteria; if kappa is moderate or high (>0.50) [40], one to four reviewers will proceed to screen all retrieved articles at the title/abstract level and, subsequently, all relevant articles at the full text level. Consistency during full text screening will be addressed by frequently convening reviewers to discuss the strategy and discuss and resolve any questions.
The inclusion criteria (see below) will be applied to systematically exclude articles that are topically irrelevant or do not contain relevant data, based on review of the title and abstract. Any article for which there is uncertainty about whether to include or exclude it based on title/abstract screening will be included for full text screening. Following evaluation of all titles and abstracts, full text screening will occur simultaneously with data extraction and quality assessment: as full text articles are examined for data extraction and quality assessment, any article judged to be irrelevant will be excluded and added to the appendix of excluded references, along with the justification based on inclusion criteria. Articles obtained through website searches will be screened during those searches by examining title/ abstract/summary and full text when necessary, and information on the number of returns and relevant articles will be recorded separately.
Inclusion criteria-The following inclusion criteria will be used to determine relevant studies (see also Table 4): Relevant population: Lotic freshwaters anywhere in the world or mesocosms made to mimic these systems.
Relevant exposure: Exposure to total nitrogen (TN) or total phosphorus (TP) measured as concentration (e.g. mg/L).
Relevant comparator: Comparison to sites or treatments with lower or higher levels of TN or TP across a gradient, or comparison to a control group (no or background TN or TP) or to lower or higher concentrations of TN or TP in experimental studies.
Relevant study type(s): Experimental studies in mesocosms or field sites, or field-based observational studies.
Relevant publication type(s): Study must contain original data and sufficient detail on methodology to assess study quality. Book chapters and conference abstracts will be excluded unless specifically suggested by outside experts.
Language: No language restrictions will be applied.
Date: No date restrictions will be applied.
Multiple studies using same datasets-For cases in which multiple studies use the same or similar datasets (e.g. a dissertation and one or more published articles from that dissertation), the following criteria (listed in order of priority) will be used to select a single source: the study with the more complete dataset, the version published as a peer-reviewed journal article, or the most recent version. The excluded duplicative study or studies may be used to fill in gaps in methodology or contextual information. These decisions will be documented in an appendix.
Unobtainable articles-Attempts to obtain full text of all articles not excluded during the screening process will be made using available library resources or by contacting authors. Articles for which full text is not obtainable will be listed in an appendix. Abstracts of non-English language articles will be translated using Google Translate to assess relevance. Every effort will be made to obtain translations of any highly relevant, non-English language papers; however, this will depend on available resources. All non-English articles considered relevant based on title/abstract screening but not fully translated will be listed in an appendix.

Potential effect modifiers and reasons for heterogeneity
One motivation for this review is the apparent variability in nutrient stressor-response relationships in lotic ecosystems. Factors that potentially modify stressor-response relationships will be extracted from relevant studies when these factors were examined in the original study. Based on evaluation of highly relevant studies and consultation with stakeholders and experts, the modifiers considered include: • ecoregion; • latitude; • altitude; • land cover/land use; • stream size; • watershed area; • geographic location; • date/season/duration of sampling; • stream gradient; • flood stage/flow regime/flow permanence; • nutrient concentration range (lowest and highest TN and/or TP); • existing background nutrient concentrations; • temperature; • canopy cover/light availability; • pH; • alkalinity; • sediment/turbidity; • conductivity; • dominant algal species/groups; and • grazing (primary consumer) pressure.
Other relevant modifying factors will be recorded as they are encountered during screening and data extraction. Existing geographic information system (GIS) layers and tools that summarize important landscape and environmental factors (e.g. StreamCat [41], Google Earth) may be used to obtain relevant modifying factors (e.g. latitude, flow regime, land use/ land cover, watershed area) for studies that do not report this information. If any outside data are associated with studies, care will be taken so as not to combine data from disparate sources (e.g. if the National Land Cover Dataset is used to estimate land cover, it will be used for all studies). Methodological modifiers, such as extraction method, measurement method, or sampling location (benthic, sestonic) for chl-a [31,42], or fraction of water sample used for nutrient measurement (filtered, unfiltered), will also be recorded.

Study quality assessment
Studies from articles included after title/abstract screening that are still categorized as relevant upon full text screening will be assessed for quality and risk of bias. Aspects of quality and risk of bias from published critical appraisal frameworks in environmental science and medicine [43][44][45] were examined to develop a quality assessment approach specific to this review, similar to [46] (Tables 5, 6 and 7). For each study, aspects of study quality contributing to a "low" or "high" risk of bias will be rated, based on specific criteria for three different study designs: (1) observational field studies, which typically sample chl-a along a gradient of nutrient concentrations; (2) mesocosm experiments; and (3) field experiments (e.g. Before-After-Control-Impact designs [47]) (Tables 5, 6 and 7). An overall risk of bias estimate for each study will be generated by dividing the number of "high" scores by the number of questions. Results of the systematic review will be discussed and analyzed in the context of this study quality assessment. All relevant studies will undergo quality assessment. To assess accuracy in quality assessment, a reviewer not involved in the initial quality assessment will independently assess quality for 25% of the studies evaluated by other reviewers, and reviewers will discuss and resolve any differences.

Data extraction
Data will be extracted from studies found in articles that are considered relevant after full text screening. The majority of studies of nutrient stressor-response relationships examine biotic responses across field sites with varying nutrient concentrations, although some compare "reference" to "impacted" sites or experimentally manipulate nutrient concentrations. Most studies will thus use correlation or regression to assess relationships between nutrients and chl-a. The shape and direction (e.g. linear-increasing, lineardecreasing, logarithmic, exponential, sigmoidal) and strength of these relationships will form the basis for meta-analysis and narrative summary of the review results. In most instances, Pearson's correlation coefficient or Spearman's rho (r) between TN or TP and chla will be used as the effect size. Other effect size measures (e.g. standardized slope coefficients: change in standard deviations of y associated with a change of one standard deviation of x [47][48][49][50]) will also be extracted and explored; however, the correlation coefficient was the most widely used and easily calculable from the example studies examined. Sample sizes will also be extracted for each effect size to estimate effect size variances using meta-analysis models (see "Data synthesis and presentation"). For experimental studies that manipulate nutrient concentrations and report differences in chl-a concentration between control and treatment groups, we will extract or calculate an appropriate "standardized mean difference" effect statistic such as Cohen's d [50,51].
Authors will be contacted if a study indicates that an effect size was calculated, but not reported (e.g. for negative associations). For studies not reporting effect sizes, raw data will be extracted from figures using image analysis software when possible and effect sizes will be calculated. If no effect size is reported and raw data are not presented (e.g. only site means are provided in a table), these studies will not be use in meta-analysis. The initial "test set" of relevant literature will be used to refine the data extraction fields as needed.
One to six reviewers will participate in data extraction from all relevant studies. To assess accuracy in data extraction, a reviewer not involved in initial data extraction will independently extract data for 25% of studies, and any differences will be discussed and resolved. Extracted data from relevant studies will be provided as an appendix or in a publicly-available USEPA data repository.

Data synthesis and presentation
Meta-analysis and narrative and tabular summaries of stressor-response relationships will be used to synthesize data from the systematic review. For all studies, the direction or shape of the response will be noted (see "Data extraction") and summarized across studies and subgroups of interest (e.g. subsets based on ecoregion, stream size, chl-a or nutrient measurement method). For studies with sufficient information, effect sizes (see "Data extraction") and variance within and among studies will be examined across studies using a random effects model. Random effects models assume that the true effect size differs among studies and treat this heterogeneity as random, and are appropriate for making unconditional inferences about a set of studies of which the obtained studies are assumed to be a random sample [51][52][53][54]. Pearson's correlation coefficient or Spearman's rho (r) between TN or TP and chl-a will be used as the effect size in most instances. A Fisher's z-transformation of r will likely be necessary to improve normality and variance [55,56], although other effect size measures (e.g. standardized slope coefficients) will be explored. Equations in Nakagawa and Cuthill [50], Lajeunesse [51] and meta-analysis packages in the R environment [57] (e.g. 'MAc' [58] and citations therein) will be used to convert other effect sizes (e.g. multiple regression coefficients) to Pearson's r. For analysis and presentation, results for TN and TP will be analyzed separately.
Effects of modifying factors (e.g. canopy cover) or subgroupings (e.g. ecoregion) will be assessed using mixed-effects models or meta-regression. Effect size variation and mean effect size will be visualized using forest plots. Analyses will be conducted using several R packages, including 'metafor' [53] and 'MAc' [58]. Quality assessment scores will be used as factors in sensitivity analysis to explore the impact of study quality on overall effect sizes and response shapes [40]. Publication bias will be assessed using funnel plots comparing study effect sizes with standard error [59,60]. Bennett  Search terms to be used for database searches Habitat terms Nutrient terms Chlorophyll a terms benth* "total nitrogen" chlorophyll catchment "total N" "chlorophyll-a" watershed "chl-a" stream* "total phosphorus" "chl a" creek* "total P" river*   Table 4 Detailed inclusion and exclusion criteria used to determine study inclusion in the systematic review

Exclusion criteria
Population (unit of study) a -Lotic fresh waters anywhere in the world; -Lentic or non-fresh waters (wetlands, lakes, reservoirs, ponds, oceans, estuaries) -Mesocosms made to mimic lotic freshwater systems. Exposure (environmental variable to which population is exposed) -Exposure to total nitrogen (TN) or total phosphorus (TP) measured as concentration (e.g. mg/L) -Exposure only to other nutrients, or nitrogen and phosphorus not reported as TN or TP Comparators (control or alternative intervention) -Comparison to sites or treatments with lower or higher levels of TN or TP across a gradient; -Studies of single sites (without sampling across time) or those without comparison to lower or higher levels of TN or TP.
-Comparison to control group (no or background TN or TP) or to lower or higher levels of TN or TP in experimental studies. Outcomes (relevant outcomes resulting from exposure) -Concentration of benthic or sestonic chlorophyll a, measured as mass per area or volume (e.g. μg/cm2, mg/m2, -Articles with no original data (e.g. editorials, reviews); -Study must contain sufficient detail on methodology to assess study quality -Articles without sufficient information to evaluate pertinent relationships (chlorophyll a response to TN or TP) or study quality (e.g. methodology); -Retracted articles a We included some search terms that may capture studies in lentic habitats related to flowing systems (e.g. floodplain, riparian) in an attempt to obtain relevant studies that might otherwise be missed. We recognize that there is some uncertainty with the lotic/lentic distinction (e.g. flowing freshwater springs) and will liberally include such articles at the title/abstract screening if otherwise relevant