What is the available evidence for the application of genome editing as a new tool for plant trait modification and the potential occurrence of associated off-target effects: a systematic map protocol

Background: Plant breeding is a developing process and breeding methods have continuously evolved over time. In recent years, genome editing techniques such as clustered regularly interspaced short palindromic repeats/CRISPR associated proteins (CRISPR/Cas), transcription activator-like effector nucleases (TALENs), zinc-finger nucleases (ZFN), meganucleases (MN) and oligonucleotide-directed mutagenesis (ODM) enabled a precise modification of DNA sequences in plants. Genome editing has already been applied in a wide range of plant species due to its simplicity, time saving and cost-effective application compared to earlier breeding techniques including classical mutagenesis. Although genome editing techniques induce much less unintended modifications in the genome (off-target effects) compared to classical mutagenesis techniques, off-target effects are a prominent point of criticism as they might cause genomic instability, cytotoxicity and cell death. Methods: The aim of this systematic map is to address the following primary question: “What is the available evidence for the application of genome editing as a new tool for plant trait modification and the potential occurrence of associated off-target effects”? The primary question will be considered by two secondary questions: One is aimed at the traits being modified by genome editing in plants and the other explores the occurrence of off-target effects. The systematic map will focus on model plants as well as on plants produced for agricultural production that were subjected to genome editing techniques. Academic and grey literature will be searched in English and German language. Inclusion/exclusion criteria were developed for the two secondary questions and will be applied on title/abstract and full text stage. Included studies will be catalogued in a searchable and open access database and study results will be summarized using descriptive statistics. Furthermore, the extracted data will serve as a preparatory step for further in-depth analysis, e.g. by a systematic review.


Background
Technological progress in agriculture and plant breeding has contributed significantly to a stable food supply and has formed the basis for high yields and the agricultural production of high quality products. However, in an ever changing world new challenges are encountered within the next decades. Aside the demands of the growing global population and limited fossil resources, climate change is a driver of breeding efforts as it is associated with increased extreme weather events like droughts or floods as well as changing dynamics of pests and diseases. Agriculture needs to ensure and increase the world agricultural production to serve extended demands with limited environmental resources like soil and water [1]. In contrast, intensification of agriculture causes considerable impact on nature, as naturally diverse landscapes are replaced by arable land for the cultivation of few plant species. Biodiversity is threatened through habitat loss and pesticide use which in turn is considered to increase disease and pest pressure [2]. Additionally, environmental impacts on agriculture are becoming increasingly important in societal debates. To solve all these challenges a combination of good agricultural practice (GAP) and innovation in plant breeding is needed. GAP addresses environmental, economic and social sustainability leading to safe and healthy food [3]. Examples for GAP are the preservation of natural soil fertility through suitable crop rotations, fertilization and plant protection according to the principles of integrated cultivation or a balanced and species-appropriate animal husbandry [3]. Besides, plant breeding is of crucial importance to manage environmental impacts on cultivation systems by providing varieties resistant to plant diseases or pests, and tolerant to abiotic stress. This may reduce pesticide use and result in less intense management efforts (e.g. irrigation). Further yield improvement will reduce the area required for food production and may balance with areas e.g. for nature conservation [2]. Plant breeding essentially relies on the utilization of genetic variation within the breeding material that can be used for crossing and selection steps to develop improved varieties. New genetic variation can occur naturally by spontaneous mutations that enable populations to adapt to changing environmental conditions. However, as the mutation rate is fairly low and at random, plant breeders and scientists artificially induce mutations for already several decades. The first generation of mutation breeding used chemical and physical mutagens to generate a plurality of nonspecific mutations. The increased mutation rate results in plants with a few positive, a lot of neutral and several negative characteristics. Thus, laborious backcrossing and selection steps are necessary in order to select for a desired trait. Nevertheless, today more than 3200 mutant varieties from 214 different plant species have been generated through undirected mutagenesis and have been officially registered [4].

Glossary: [2, 5]
Backcrossing Backcrossing is a crossing of a hybrid with one of its parents in order to achieve offsprings that are genetically closer to the selected parent. This way desired heterologous traits from the hybrid can be transferred into the genetic background of a parental line. Since crossing recombines all genes many backcrosses are necessary to achieve considerable dilution of unwanted genes from the hybrid.
Mutagen A mutagen is an agent that increases the mutation rate within an organism or cell, e.g. X-rays, gamma-rays or chemicals like ethyl methane sulfonate (ESM).
Mutation breeding A plant breeding approach using mutagens to enhance genetic variation. The resulting random mutations can generate new gene variations with positive traits that can be selected for further breeding. However, several of these mutations are negative and diminish the viability of the plant.
Selection A process in breeding by which the breeder chooses only those individuals that show desired trait(s).
In 1983, the first recombinant DNA was delivered to plant cells using Agrobacterium tumefaciens [6,7]. From this time on, it is possible to work at a single gene level with genetic material from any organism generating plants that cannot be bred by conventional breeding techniques [8,9]. Nevertheless, the induced mutations using chemical and physical mutagens as well as the "classical" transgenic approach show limited efficiencies due to the random targeting of the modified site [10]. In recent years, genome editing techniques have been developed enabling a precise modification of DNA sequences in a specific and site-directed manner [11]. To date, genome editing comprises two molecular approaches that efficiently induce targeted alterations in genomes: (1) site-directed nucleases (SDN) and (2) oligonucleotide-directed mutagenesis (ODM). Site-directed nucleases induce double-strand breaks (DSBs) in the DNA which are subsequently repaired by the hosts own cellular mechanisms. The type of repair can be categorized in three main types [11][12][13][14]: 1. No additional template is added and the DSB is repaired by non-homologous end joining (NHEJ) resulting in small insertion-deletion (indel) mutations. This approach is defined as SDN1. 2. A repair template is added which, except for a few nucleotides, is identical to the sequences in which the DSB is introduced. Then, the DSB is repaired via homology-directed repair (HDR), causing nucleotide substitution or targeted indels. This approach is defined as SDN2. 3. The repair template harbors a recombinant DNA sequence additional to the homologous sequences in which the DSB is made and the break is repaired via HDR, resulting in more complex alterations i.e. the insertion of foreign genes. This approach is defined as SDN3.

Meganucleases
Meganucleases are naturally occurring endonucleases that can be modified to bind to a specific DNA sequence and cleave it [15]. The advantage of meganucleases is their small size, making them appropriate to a majority of delivery methods [16]. However, the DNA-binding domain cannot be separated from the catalytic domain challenging the construction of MN [4]. MN have been applied successfully for genome editing in plants such as Arabidopsis [17], maize [18] and cotton [19].

Zinc-finger nucleases
In 1996, zinc-finger nucleases were reported as the first programmable site-specific nucleases [20]. ZFN are generated by fusing two independent protein domains. A zinc-finger protein, which comprises up to six zinc-finger domains each able to identify a nucleotide triplet of a specific DNA sequence, is fused with a synthetic endonuclease domain (most frequently FokI) [4,9,29]. Since the nuclease is active as a dimer, two zinc-finger nucleases are necessary in close proximity to target and cut a sequence in the genome.

TALENs
Similarly to ZFN, TALENs are also composed of two functional parts. The first part consists of the TALE which is originally derived from the Xanthomonas species and is crucial for the binding to a specific DNA sequence. The TALE is composed of a 34-amino acid repeat, each binding specifically to a single nucleotide in the target DNA [21]. In order to mediate the introduction of a targeted DSB, the TALE is fused to a FokI endonuclease domain. Compared to ZFN where each repeat recognizes a cluster of three nucleotides and interferes with neighboring repeats, the design of specific TALE DNA-binding domains is easier and amenable to programming [22]. Similar to ZFN, TALENs are most frequently used as pairs to introduce a DSB at a specific target site of the DNA [9].

CRISPR/Cas9
The most widely used CRISPR/Cas system is derived from Streptococcus pyogenes consisting of two elements. An artificial single guide RNA (sgRNA) is directing the nuclease to a specific DNA sequence. Afterwards, the Cas endonuclease induces a DSB at this targeted DNA sequence [23]. To induce a DSB, a protospacer-adjacent motif (PAM), such as 5′-NGG-3′, has to be present in the site specific target [24]. The difference to TALENs and ZFN is that, instead of a protein, a short sgRNA is used for target recognition within the CRISPR/Cas9 system. This sgRNA can be easily adapted to match the target sequence. Compared to ZFNs and TALENs the CRISPR/ Cas9 system is easier, faster and more flexible since only the sgRNA has to be adapted to a new sequence instead of the whole binding proteins as it is the case for MN, ZFN or TALENs [25].

Oligonucleotide-directed mutagenesis
In contrast to MN, ZFN, TALENs and CRISPR/Cas9 the ODM technique does not require a nuclease and a DSB [26]. The mechanism of ODMs is based on the use of chemically synthesized oligonucleotides for the induction of site-specific mutations in the genome. The alteration generally affects one to four adjacent nucleotides resulting in point mutations [27]. The oligonucleotide is a modified DNA or DNA/RNA molecule of 20-100 nucleotides. It is homologous to a genomic sequence except for the nucleotide(s) that is/are supposed to be modified [8,27]. The introduced oligonucleotide binds to the targeted DNA sequences and this sequence is then modified by the host cell´s mechanism of mismatch repair [11].
Genome editing offers substantial advantages compared to previous mutation breeding techniques and conventional genetic engineering in terms of speed and precision. Genome editing provides the opportunity to selectively mutate or modify one or a few genes (SDN1, SDN2). In addition, it is now possible to precisely modify or selectively replace (SDN3) entire genes from both closely as well as distantly related organisms [11]. By the use of conventional genetic engineering traces of recombinant DNA, from the viruses or bacteria that were used as gene shuttle persist in the modified organism leading to clearly characterized genetically modified organisms. In contrast, by applying genome editing it is possible to modify crops without inserting foreign DNA sequences at all [28]. This may reduce the regulatory burden for plant breeders and increase the acceptance of genome editing within society. Based on the simplicity, time saving and cost effective application of genome editing, it has already been applied in a wide range of cultivars. Genome editing has been used for: i. Analyzing gene functions (e.g. effect of the RAV2 gene for salt stress in rice [29]). ii. Improvement of product quality (e.g. decreased linolic acid in soybean [30]). iii. Development of disease resistant varieties (e.g. virus resistant cucumber [31]). iv. Developing of herbicide tolerant varieties (e.g. resistance to the herbicide chlorsulfuron in oil seed rape [32]). v. Improved adaption to abiotic stress, (e.g. drought tolerance in maize [33]).
Even in plants like hexaploid wheat that were so far largely inaccessible for targeted genetic alterations the simultaneous mutation of all six alleles was successfully performed [34]. All these open new dimensions for the scientific, plant breeding and agricultural community.
Compared to randomly induced mutations by chemicals or irradiation, the number of unintended mutations (off-target effects) is greatly reduced by genome editing techniques [11]. Nevertheless, their application does not completely or per se exclude the occurrence of off-targets. Off-targets are changes in a certain DNA sequence being similar to the targeted one but located at another site in the genome. Mainly, they occur due to the lack of exclusiveness and/or length of the recognition site [11,35,36]. Several methods have been developed to predict and identify off-target sites linked to the use of genome editing. One can differentiate between the prediction of off-target effects using in silico methods and the detection of off-targets using either biased detection methods to analyze individual DNA sequences or unbiased detection methods where genome-wide off-target analyses are conducted [37]. Depending on the detection method being used the results of identifying off-target mutations vary widely. Although genome editing techniques induce much less off-target effects compared to classical mutagenesis techniques, off-targets are an important point of criticism as they may possibly cause genomic instability, cytotoxicity and cell death [38][39][40].
Risk assessors and decision makers are depending on the provision of a reliable body of evidence to support conclusions about potential risks being associated with the application of genome editing. Thus, the provided overview on the available evidence on the occurrence of off-target effects will be of crucial importance. Furthermore, this systematic map facilitates an objective debate by informing interested stakeholder communities in a transparent and retraceable manner about the status of research, the progress in genome editing in plants and the available evidence about the potential occurrence of associated off-target effects. The results of the systematic map will be discussed on a stakeholder conference as well as within an expert group established as part of the ELSA-GEA project. These meetings will be an important part to identify relevant key aspects that should further be analyzed within a systematic review.

Objectives of the map
Due to its strong implications of plant breeding, genome editing is of particular relevance to scientists, regulators and policy-makers in the EU and worldwide. Therefore, we want to survey the available evidence about applications of genome editing in plants. The main objectives are: • Overview of the traits modified by genome editing in model plants as well as in crops produced for agricultural production. • Overview of the available evidence about the occurrence of off-target effects due to the use of genome editing techniques in model plants as well as in crops produced for agricultural production. • Identification of the volume of the available literature, evidence clusters and key characteristics of the evidence base to inform interested stakeholder communities. • Identification of knowledge gaps concerning the occurrence of off-target effects in order to inform decision makers which future research might be needed for a risk assessment. • Assessment whether the available evidence base is suitable for in-depth analyses such as by a systematic review.
The primary question of the systematic map is: "What is the available evidence for the application of genome editing as a new tool for plant trait modification and the potential occurrence of associated off-target effects"?
To answer this primary question, it is reconsidered by two secondary questions related to (1) the traits modified by genome editing and (2) the occurrence of off-target effects due to the use of genome editing.

Secondary question one
"What are the traits modified by genome editing in model plants as well as in crops produced for agricultural production?" Population: Any model plant or crop produced for agricultural production. Intervention: One of the following genome editing techniques was used to induce an alteration in the plant genome: clustered regularly interspaced short palindromic repeats/CRISPR associated proteins (CRISPR/Cas), transcription activator-like effector nucleases (TALENs), meganucleases (MN), zinc-finger nucleases (ZFN), oligonucleotide-directed mutagenesis (ODM). Outcome: The alteration of the genome (i.e. insertion, deletion or replacement of nucleotides) induced by the use of one of the genome editing techniques.

Secondary question two
"What is the available evidence for the potential occurrence of associated off-target effects due to the use of genome editing in model plants as well as in crops produced for agricultural production?" Population: Any model plant or crop produced for agricultural production. Intervention: One of the following genome editing techniques was used to induce an alteration in the plant genome: clustered regularly interspaced short palindromic repeats/CRISPR associated proteins (CRISPR/Cas), transcription activator-like effector nucleases (TALENs), meganucleases (MN), zinc-finger nucleases (ZFN), oligonucleotide-directed mutagenesis (ODM). Outcome: The occurrence of potential off-target events was assessed.

Search strategy
To test the comprehensiveness of the search strategy a scoping search was carried out to validate the search string and to test it against a priori selected articles of relevance (Additional file 1). To revise the adequacy of the search string an iterative process was applied by testing search strings in Web of Science (WoS), recording numbers of hits and testing them against the test library. The development of the search string is shown in Additional file 2. The search string will be composed of two parts: The first part defines the population of interest and comprises less specific terms like crop, plant or seed and in addition model plants and crops produced for agricultural production including their English and Latin names to ensure broad coverage. The second part defines the intervention, i.e. the genome editing technique applied to induce an alteration in the plant genome (CRISPR, TALENs, ZFN, MN or ODM). The search terms describing each key element will be combined by the Boolean operator "OR" and the different key elements will be combined with the "AND" operator. Wildcards ('*') will be used to search for variant word endings. The final search terms shown in Additional file 3 will be adapted to the specific needs of each database to which it will be applied to. Database searches will be conducted in English and German language. Articles published after 1996, when the first study about a genome editing technique was published, will be considered.
The following online publication databases will be searched to identify academic literature. Access to these databases is ensured by institutional subscriptions: Furthermore, Google Scholar (https ://schol ar.googl e.com) will be searched using 30 different combinations of the most relevant (model) plants and genome editing terms. The first 20 search results, organized by relevance, of each combined search term will be assessed at the title/abstract stage. Additionally, the search engine Google will be used to identify companies working with genome editing and to search on websites of government agencies for clues on the application of genome editing for market approval.
Furthermore, the references of each review article will be scanned for further relevant papers. All hits from each database will be imported into an EndNote X8.0.1 library file. Duplicates will be removed using the appropriate function within the EndNote software. After removing duplicates the remaining records will be imported into the open-access and non-profit database CADIMA [41] to increase transparency and traceability during the review process.

Article screening and study inclusion criteria Study inclusion criteria
In order to be included in the systematic map each article has to meet all the following inclusion criteria: Relevant population Any model plant or crop produced for agricultural production as well as higher fungi was used. Ornamental and medicinal plants as well as yeast will be excluded.
Relevant outcome Due to the use of a genome editing technique an alteration in the plant genome was reported (insertion, deletion or replacement). Other techniques which do not induce a DSB and therewith do not employ the DNA repair mechanism of the cell will be excluded; among which are TALE (without a nuclease) and variations of a non-functional dCas9 (deadCas9) fused to a methylase, demethylase or transcription factor.
Primary data Only those references will be included which comprise primary data referring to the use of a genome editing technique to induce a sequence alteration in the plant genome. If there is any doubt about the availability of primary data of an article on title/abstract stage, it will be kept for full text assessment.

Article screening
When applying the selection criteria at title/abstract stage, a consistency check will be conducted by all participating reviewers aiming to determine the interreviewer agreement. A minimum of 50 references or 10% of the total number up to a maximum of 200 references retrieved by the research will be checked until a kappa value with a score of at least 0.6 indicates a good reviewer agreement. If the kappa value is below 0.6, the reviewer will analyze the reasons for the insufficient kappa value within the whole review team and reassess the inclusion criteria. In a first step, all identified records will be assessed at title/abstract stage. In case that insufficient information is provided, the records will be passed on to the full text stage. Afterwards, the eligibility of records being retained after title/abstract screening will be checked at full text stage. A list of articles excluded at full text stage with the reason for exclusion will be provided. At title/abstract and at full text stage, the inclusion criteria will be applied to all articles by two reviewers working independently from each other. Inconsistencies in rating decisions will be documented and the reasons will be discussed in the review team.

Study quality assessment
The aim of this systematic map is to provide a broad overview of the current progress in modifying the plant genome as well as the occurrence of off-target effects observed due to the use of genome editing techniques. Therefore, a full critical appraisal of included studies will not be performed. In order to facilitate the decision whether a systematic review would be worthwhile for being performed on a specific section of the map, data being indicative for the validity of an included study will be extracted (e.g. search for off-targets, off-target detection method).

Data coding strategy
For each study, data will be extracted by one reviewer and the extracted data will be cross-checked by a second reviewer to minimize human error.
The following superordinate categories will be considered: More detailed information about the data extraction strategy and the data extraction mode are shown in Additional file 4. In a pilot scheme 5% of the studies retained for data extraction (at least 20) will be checked a priori by all reviewers aiming to assess repeatability of the extraction process.

Study mapping and presentation
Included studies will be catalogued in a searchable database as well as in an Excel file. The database will be freely accessible on the project webpage http://www. dialo g-gea.de. In addition, eligible studies will be characterized using descriptive statistics on key trends, including: -Frequency distribution of countries which are working on genome editing. Knowledge gaps (un-or underrepresented subtopics that warrant further primary research) and knowledge clusters (well-represented subtopics that are amenable to full synthesis by a systematic review) will be identified e.g. by cross-tabulating key meta-data variables in heat maps. Furthermore, all results gained within this systematic map will be summarized in a narrative report. Moreover, additional files will include: -An EndNote database of all studies included in the systematic map. Authors' contributions DM, FH, TS, CK, DK, JS and RW conceived the review question. DM undertook pilot research. DM drafted the protocol text with support from FH, TS, CK, DK and RW. DM will coordinate the mapping process, analysis and presentation of the results. DM, FH, TS and DK will screen the articles. All authors read and approved the final manuscript.