Effectiveness of animal conditioning interventions in reducing human–wildlife conflict: a systematic map protocol

Human–wildlife conflict (HWC), is currently one of the most pressing conservation challenges. We restrict ourselves here to wildlife behaviour that is perceived to negatively impact social, economic or cultural aspects of human life or to negatively impact species of conservation concern. HWC often involves wild animals consuming anthropogenic resources, such as crops or livestock, either out of necessity (loss of habitat and natural prey) or as consequence of opportunistic behaviour. A variety of interventions are undertaken to reduce HWC, differing in practicability, costs and social acceptance. One such non-lethal intervention is animal conditioning, a technique to reduce conflict by modifying the behaviour of ‘problem’ animals long-term. Conditioning changes associations animals have with resources or behaviours. Both via ‘punishment’ of unwanted behaviour and ‘rewarding’ of alternative behaviour, researchers aim to make expression of unwanted behaviour relatively less desirable to animals. Despite the potential, however, studies testing conditioning interventions have reported seemingly contradictory outcomes. To facilitate reduction of HWC via conditioning, we thus need to better understand if and when conditioning interventions are indeed effective. With this systematic map we intend to make the global evidence base for conditioning of free-ranging vertebrates more accessible to practitioners, to identify potential evidence clusters and effect modifiers for a subsequent systematic review and to highlight evidence gaps for future research. We will compile evidence, including grey literature, from bibliographic databases, online search engines, specialist sites and expert contacts. Where possible, a Boolean-style full search string will be used, including Intervention and Outcome search terms. Searches will be conducted in English. Search comprehensiveness will be evaluated with an a priori list of benchmark articles. We will base inclusion of articles on presence of quantitative data, subject identity, comparator and outcome. Inclusion consistency checks will be performed with 10% of the titles, abstracts and full texts. We will assess validity of the literature base on basis of study design and sample size. Finally, we will develop a searchable literature database and an interactive evidence atlas along with a narrative synthesis of the evidence.


Background
"Everyone knew there were wolves in the mountains, . . . , but they seldom came near the village -the modern wolves were the offspring of ancestors that had survived because they had learned that human meat had sharp edges. " Terry Pratchett, Equal Rites Human-wildlife conflict (HWC) is increasing. The human population and the numbers of associated livestock are growing and expanding, while natural habitat is declining [1]. At the same time, some wildlife populations, following conservation actions, are also (re) growing. As a consequence of one or both of these developments, the intensity and frequency of HWC has increased to the point of being recognized as one of the most critical conservation challenges [2][3][4][5]. Conflict with wildlife can range from Canadian geese (Branta canadensis) eating and defecating on golf courses, to wolves (Canis lupus) killing sheep, to polar bears (Ursus maritimus) and tigers (Panthera tigris) attacking and killing people. Conflicts thus cover a variety of 'problem' behaviours, ranging from a nuisance to threatening lives and livelihoods. Not only do these conflicts result in shortterm costs for humans and, often as a consequence of retaliation, for animals, in the long-term it also decreases local support for wildlife conservation [3,5,6]. We restrict our definition of HWC here to wildlife behaviour that is perceived to negatively impact social, economic or cultural aspects of human life, or species of conservation concern, i.e. 'human-wildlife impacts' [7], but for simplicity we use the term 'human-wildlife conflict' (HWC).
Although there has been a recent surge in urgency, especially concerning conflicts with large carnivores [2,4,8,9], HWC has long been an issue, as illustrated by a quote from 254 to 184 BC: "Where there are sheep, the wolves are never very far away. " (Titus Plautus). As such, many lethal and non-lethal interventions, with the aim to reduce conflicts, have been proposed and tested, but not one type of intervention has proven to be the silver bullet [4,5,[10][11][12][13][14][15][16]. Besides effectiveness, an intervention needs to fulfil a number of additional criteria, such as those based on cost-effectiveness, feasibility, sustainability and social, legal and ethical acceptance. Lethal interventions might be socially or legally undesirable even if they appear effective in some cases [15,[17][18][19], translocation might be too costly and risky for the animals, next to being generally ineffective for large carnivores [11,[20][21][22] and use of simple deterrents may be effective during the actual intervention but not in the long-term [10,12,14,[23][24][25][26]. Large-scale traditional fencing might be undesirable from a social/ethical perspective and unfeasible when it strongly restricts movements of non-target species [11,[27][28][29][30] and while virtual fences could prevent problems for non-target species, their usefulness may be mostly restricted to highly social species [31]. Finally, although guardian animals appear to be a promising tool, specifically for reducing livestock predation, they may not be effective against all kinds of problem species and behaviours [10][11][12]32]. In summary, appropriateness and effectiveness of specific HWC intervention techniques is very much dependent on the local context. Therefore, a combination of several techniques is likely to always be necessary to effectively reduce HWC. Ideally these interventions are conducted in combination with preventive measures. For example, those that reduce the problem animal's need for anthropogenic resources, such as habitat restoration and natural prey management, those that directly disrupt the problem animal's learning process before a conflict can form, such as olfactory pre-exposure [33] and those that target the human side of the conflict (or impact), such as knowledge exchange and compensation schemes [16,34].
A promising HWC intervention that could be part of an effective 'HWC mitigation toolbox' and which does not involve extremely invasive procedures, such as killing or trans-locating animals, is 'animal conditioning' [35].
The key component of conditioning is associative learning. Associative learning involves memory, making it in essence effective after, not just during, the intervention. Learned associations also have the potential to be generalized over locations, possibly making the intervention effective over larger areas or from ex situ (captivity) to in situ (wild) [36]. Conditioning has therefore been flagged as a potentially useful tool for reducing HWC [37][38][39]. Conditioning interventions in HWC specifically aim to change the behaviour of an animal in the long-term.
Generally, two main arms of conditioning are recognised: classical and operant. Classical conditioning occurs when an animal learns that one external cue predicts another (i.e. a bell predictably occurs before food appears). This means that an animal learns to use one previously neutral cue (the bell) to predict the appearance of an important cue (food). The previously neutral cue is generally expected to precede the important cue in time. In contrast, operant conditioning involves an animal learning that its behaviour is associated with a given outcome. For example, by approaching location X, an animal finds food. The animal is the active agent in this scenario, not just observing relations between external cues. There are four main methods to achieve operant conditioning. Certain 'wanted' behaviours can be reinforced by (1) addition of an appetitive stimulus, or (2) removal of an existing aversive stimulus, when that behaviour occurs. 'Unwanted' behaviours can be decreased by (3) addition of an aversive stimulus, or (4) removal of an existing appetitive stimulus, when that behaviour occurs (Table 1, based on [40]). In HWC situations, the use of conditioning by introducing aversive stimuli is much more common practice than conditioning by adding appetitive stimuli. Even though the latter may be ethically preferred, finding an effective appetitive stimulus is usually more challenging. For example, pain is aversive at any time while food might only be appetitive when an animal is hungry. Additionally, an appetitive stimulus (e.g. supplemental food) might artificially bolster the population, which may in turn lead to more conflict.
It should be noted, however, that when a behaviour is performed by an animal to acquire a resource that is essential to its health and survival (a biological imperative), for example because no alternative natural resources are sufficiently available, trying to make the unwanted behaviour less desirable to the animal will require considerable effort and it may be unlikely to extinguish the behaviour completely. In turn, when accessing a resource is not (or no longer) a biological imperative, and the conflict thus involves somewhat opportunistic behaviour [5], conditioning has the potential to be a more effective and less laborious intervention.
There are, however, some practical challenges associated with applying conditioning as a HWC intervention. The first challenge is that conditioning is generally expected to be most effective when it is applied as a preventive measure rather than a remedial one [41]. Second, to be successfully paired, the stimulus should be behaviourally contingent (i.e. follow the behaviour quickly). With certain sporadic and elusive unwanted behaviours, such as livestock predation, it may be very difficult to catch the animal in the act and immediately apply punishment. In this scenario, the behaviour to be punished is 'attacking sheep' . Because of the logistical (and ethical) challenges involved with trying to punish attack behaviour directly, proxies, such as sheep carcasses, are regularly used [42,43]. This can lead to counterproductive outcomes whereby the 'eating of sheep carcasses or baits' is punished, but not the actual unwanted attacking and killing behaviour [44][45][46]. That is, the wrong lesson is learned. This limited effectiveness might also be explained by a third challenge in animal conditioning, namely that not all types of stimuli can be effectively paired with each type of resource or behaviour. For example, wild rats were observed to avoid eating a food that made them sick, but not to avoid coming to a place that made them sick [47,48]. In cases where illness-inducing substances are used, limited effectiveness might also be the result of the animals having associated the smell of the substance (and not the resource) with the illness [45,49,50]. Generally, stimuli that are perceptually salient and generate experiences that are more biologically relevant are learned faster [40]. Mammalian predators are especially quick to learn associations between (unintended) olfactory cues and following rewards or punishments, although pre-exposure to the smell might provide a solution in some cases [33]. Fourth, animals could learn to overcome the aversive stimulus (i.e. habituate/ desensitize) and even start to use it as a cue for resource availability, otherwise known as the "dinner bell" effect [51]. Five, the social system of animals may influence the effectiveness of conditioning interventions, as social interactions can facilitate or modify learned associations [52][53][54]. Lastly, and maybe most importantly, to determine if conditioning has actually taken place, animals should be monitored before/during and after the intervention on an individual-level, and some variation at this level should be expected.
Unsurprisingly, there is no clear agreement on the overall effectiveness of conditioning interventions in reducing HWC. Moreover, based on field trials with livestock predating carnivores, certain conditioning interventions are often deemed unsuccessful [10,11,14]. Differences in outcomes are potentially explained by differences in methodology, context, behaviour being targeted, species traits and individual traits. But studies have also been criticised for lacking internal validity, by using too small Offer an appetitive stimulus (e.g. food) at a location where we want an animal to go to Introduce an aversive deterrent at a location we want an animal to avoid

Removal (Negative conditioning)
Remove aversive human-produced noise from a location where we want an animal to go to Remove an appetitive stimulus (e.g. food) from a location we want an animal to avoid a sample size and not using an (appropriate) control [10,55], and for lacking external validity, by using captive instead of wild animals or by focussing too much on one (type of ) species [10,12,44]. To help facilitate a minimally invasive, yet long-term effective reduction in HWC via conditioning of free-ranging vertebrates, it is necessary to better understand if and when conditioning interventions in HWC contexts are indeed successful. We will first assess whether there is enough high-quality evidence available to evaluate overall effectiveness of conditioning in free-ranging vertebrates, by synthesising existing conditioning intervention studies in a systematic map [56]. If there is sufficient high-quality evidence, a systematic map can provide a global evidence base for the premise of animal conditioning as a wildlife intervention technique. However, if not enough high-quality evidence can be found, our map will highlight an important knowledge gap. For example, in a recent large-scale evaluation of human-carnivore conflict interventions, it was concluded that such interventions are rarely quantitatively compared against experimental controls and that therefore an appropriate and much needed evidence base for carnivores is still missing [57]. Yet, if our map highlights potential evidence clusters, these clusters of evidence may serve subsequent systematic reviews in assessing if animal conditioning is an intervention technique worth pursuing overall, if it should be restricted for use in certain species or behaviours, or if resources might be better invested elsewhere.

Stakeholder engagement
The topic of HWC reduction using animal conditioning techniques was first identified during discussions with an international group of fellow behavioural/conservation ecologists in a joined Collaboration for Environmental Evidence (CEE) training workshop (Oct 2017) [58]. Subsequently, an Advisory Team was established (i.e. the co-authors) and later expanded (Prof. Colleen Cassady St. Clair and Rob Appleby B.Sc), comprising experts in behavioural ecology, animal cognition, wildlife conservation, wildlife management and specifically HWC. St. Clair and Appleby have also been directly involved in the design and application of animal conditioning to reduce HWC [59,60]. The Advisory Team includes, but is not restricted to, staff of the Leibniz Institute for Zoo and Wildlife Research, the Institute for Conservation Research of San Diego Zoo, WWF-Netherlands and the company Wild Spy (Banyo, Australia). It also includes participants of the CEE workshop, who contributed to the search strategy and will be part of the consistency checking process. All Advisory Team members contributed to the lists of search terms, inclusion/exclusion criteria, literature, specialist websites and/or contact persons. Moreover, the Advisory Team aspired to make the primary question as relevant (for practitioners) and comprehensive (for a systematic map) as practically feasible.

Objectives of the review
With the proposed map we mean to provide an extensive evidence base of existing studies on the effectiveness of animal conditioning interventions in reducing HWC with free-ranging vertebrates. The map is the first step towards a systematic review on this topic and we will use it to identify evidence clusters (appropriate subtopics/subcategories for systematic review) and potential effect modifiers. Additionally, we aim to identify evidence gaps as a basis for recommendations for relevant future research directions. In this map we thus aim to provide and assess the evidence base necessary to address the primary and secondary questions, but not to answer them. This systematic map protocol has been structured following the ROSES reporting standards [61,62] (see Additional file 1).

Primary question
Are animal conditioning techniques effective in reducing human-wildlife conflict (i.e. impact) with free-ranging vertebrates?

Secondary questions
1. Over what period of time are animal conditioning techniques generally effective in reducing humanwildlife conflict? 2. Are animal conditioning techniques more or less effective in reducing specific categories of humanwildlife conflict, such as crop raiding versus egg predation versus livestock predation?

Components of the primary question
The primary question can be broken down to the following PICO components: Population (P) All free-ranging vertebrate species involved in human-wildlife conflict (i.e. human-wildlife impact) as indicated by the respective study. Subjects should be free-ranging during the quantification of the outcome, but not necessarily the intervention. Intervention (I) Non-lethal or lethal techniques that have conditioning of animals as a goal (e.g. aversive or appetitive conditioning) or have conditioning of (non-target) animals as a potential consequence (e.g. disruptive stimuli, such as deterrents and repellents or hunting of conspecifics). Overall, deterrents serve to 'hinder' , while repellents serve to 'avert' at the moment of intervention. However, disruptive stimuli lie on a continuum and all these stimuli may (unintentionally) lead to learned aversions. Therefore, we will include all applications of above-mentioned stimuli under the condition that the authors quantified a potential change of behaviour after the intervention. Comparator (C) No intervention (as described above) in time, space or both. Alternative interventions (e.g. killing, translocation and fencing) in time, space or both.

Outcome (O)
Human-wildlife incidents (e.g. undesired close encounters, attacks and kills), livestock or fisheries predation, depredation of eggs or species (plants or animals) with a high conservation value, damage to anthropogenic goods or food resources (e.g. crop raiding, beehive destruction, tree destruction and car break-ins) and visitations to specific (human-populated) areas.

Searching for articles Search string
A list of relevant search terms and initial HWC research and review articles was compiled by the Advisory Team. Subsequently, we used these and 'snowballed' articles to generate word frequency lists and complement the initial search term list with frequently used HWC terms. Next, we refined the search string via test searches in Web of Science, removing search terms that appeared to be too general. We formatted the search string for Web of Science following Boolean-style and structured it using derivatives of two of the four PICO elements: Intervention (e.g. Condition* = conditioned, conditioning etc.) and Outcome (e.g. Depredat* = depredation, depredated etc.). Because we are interested in a very broad group of species (i.e. all vertebrates), we did not include a population term. The search terms are combined using the Boolean operators "OR" and "AND" ( Table 2). The asterix (*) is used to represent any number of additional characters, including no character, and the dollar sign ($) to only include a maximum of one more character. Quotation marks ("") are used to allow for the search of exact phrases (including hyphenated variations). Terms combined using 'NEAR/5' , allows the search of terms that occur within five words apart from each other. We will develop simplified search strings for databases and search engines that do not accept the elaborate search string proposed in Table 2. All adjustments and variations of the search string, together with the corresponding database and/or search engine name will be recorded. For databases, search engines and website searches, we will only use English search strings. If articles include publications from other languages, but include a relevant abstract in English, they will be recorded separately. We will compile a database including the references of all the returned publications. We will evaluate search comprehensiveness with an a priori list of 23 benchmark articles of which 20 are available in Web of Science (Additional file 2). The list was compiled via stakeholder suggestions, pilot searches on Google

Table 2 Composition of the initial Boolean-style full search string for Web of Science (WoS)
This search string led to 14,016 initial hits (January 2019), including 20/20 of the "benchmark" articles available in WoS TI: title; TS: topic; SU: research area Search string (I) TI = ("Aversive conditioning" OR "Fear conditioning" OR "Appetitive conditioning") OR TS = (("Associative learning" OR "Avoidance learning" OR Banger$ OR (Bear NEAR/3 spray) OR "Capsicum spray" OR Clicker OR Collar* OR Conditioning OR Conditioned OR CTA OR Diversionary OR Flare$ OR Hazing OR "Illness inducing" OR "Negative punishment" OR "Negative reward" OR "Non-lethal management" OR "Non-lethal control" OR Pinger$ OR "Positive punishment" OR "Positive reward" OR Reinforcement OR "Response learning" OR "Rubber bullets" OR Slingshot$ OR "Taste aversion" OR Train* OR Vexing) AND Impact))) AND SU = ("Life Sciences Biomedicine" OR "Zoology") Scholar and snowballing HWC review paper reference lists. The final percentage of benchmark articles retrieved via our search strategy will be reported.
With our search strategy we aim to retrieve studies published as primary literature in scientific journals, as well as those published as grey literature (e.g. Ph.D. theses, NGO reports). We do this to be as inclusive as possible and to reduce the influence of a publication bias that is often associated with journal publications, i.e. an overrepresentation of articles reporting significant effects of conflict interventions [11]. The quality of the studies will be evaluated during the validity assessment phase and will not be based on the venue of publication (e.g. high-impact journals). If the time-span between the initial search and the target date for final submission of the systematic map were to exceed 2 years, we will conduct literature-update searches to check for new published studies. After the final publication, we intend to update the map approximately every 5 to 10 years.

Bibliographic databases
We will search the following online bibliographic databases, using the institutional access provided by the host-institutes of the Advisory Team. We will search "All Databases", however, where possible, searches will exclude articles from clearly irrelevant research fields, such as Physical Sciences and Arts, for example by adding SU = "Life Sciences Biomedicine" and "Zoology" in Web of Science (see Table 2). Such specifications will be documented.

Search engines
We will use Google Scholar to search the internet for relevant articles. Google Scholar Search is limited to one 'phrase' (enclosed in double quotation marks), one 'OR substring' and 256 characters. Our search string will therefore be adjusted accordingly, creating multiple search strings. All these strings and the number of hits will be recorded. We will examine the first 50 hits per search string, sorted by relevance. We will list additional relevant specialist websites identified by this method. We will make searches with cookies and browser history cleared and using private 'incognito' settings in Google Chrome.

Specialist websites and databases
The Advisory Team compiled a list of specialist websites and databases (Additional file 3). We will screen these websites intensively and specialists will be contacted if there is evidence for (unpublished) HWC studies that might involve conditioning techniques or outcomes. This list is not final as we might encounter additional relevant websites throughout the search process.

Other literature sources
We will consult stakeholders within the network of our Advisory Team for relevant published and unpublished material. An open request will be made on Research-Gate, LinkedIn and Twitter for additional highly relevant material, including publications in other languages. If relevant non-English papers are identified an additional (open) request will be made for a researcher speaking this language to enter the associated metadata in English. We will scan (i.e. 'snowball') reference lists of literature included at the final full text stage for relevant missed articles and, if possible, we will retrieve such articles.

Search record log
We will document any adjustments of the proposed search string in Table 2 and for each search we will record the total number of hits per unique platform/literature source, together with the date of the search. The percentage of benchmark articles returned will be recorded for Web of Science and for all platforms combined. We will report additional relevant (unpublished) material put forward by stakeholders and specialists and additional publications identified by scanning the reference lists of included articles.

Reference management and literature reference archive
We will export references of articles per search platform to separate Zotero databases (Roy Rosenzweig Center for History and New Media, Fairfax, USA). Subsequently, when searches for all platforms are complete, we will export the Zotero references as one RIS database per search string and platform to CADIMA version 1.7.6 (Julius Kühn-Institut, Quedlinburg, Germany), an open-access evidence synthesis tool and database [63]. We will use CADIMA to identify and remove duplicates.
The resulting database will be the reference database (i.e. reference archive) for this systematic map and any subsequent systematic reviews following this map. Next, we will use CADIMA to screen for relevant titles and abstracts. Any missing full texts of articles included after abstract screening will be actively searched for and, if possible, retrieved using institutional access of the Advisory Team and expert stakeholders or by contacting the first and final author (for publications < 10 years).

Article screening and study eligibility criteria Screening process
We will first screen the retrieved literature on basis of title then abstract and finally full text. Consistency of screening will be checked within CADIMA before the official screening. Two reviewers will evaluate a random subset of 10% of the articles at (1) title, (2) abstract and (3) full-text stage (max 100 articles at title and abstract stage and 50 at full text stage). We will analyse consistency of article inclusion using the Kappa score and will be deemed acceptable with a Kappa score of 0.6 or higher. We will discuss discrepancies, irrespective of the score, but we will repeat the check with adjusted criteria definitions if the score falls below 0.6. When the score is 0.6 or higher the primary reviewer will continue screening. We will perform this process for title, abstract and full-text stage. Inclusion will be conservative, meaning that when we are in doubt, we will include an article to be reviewed in the next stage. Articles with relevant titles but no abstract will automatically transfer to the full text screening stage. We will restrict inclusion decisions to reviewers who have not (co)authored any articles to be considered within the review.

Eligibility criteria
Eligible subjects: All vertebrate species (excluding humans) involved in HWC (see "Background" for working definition). Animals should be free-ranging at the time of the outcome measure (but not necessarily during the intervention). This includes trans-located or reintroduced animals that are known to have a high probability of becoming involved in HWC. Eligible intervention: All methods that can consequently result in conditioning of the animal. This does not have to be a method that was intentionally designed for the purpose of conditioning. For example, a repellent, such as bear spray, is designed for immediate aversion of conflict, but could have as a consequence that the bear reduces its overall tendency to approach humans.
Eligible comparator(s): The study should include a control, comprising before versus after treatment, treatment versus no intervention or treatment versus a different intervention. Effectiveness of the conditioning intervention should be evaluated using behavioural data collected after the intervention (in absence of the unconditioned stimuli). Otherwise, changes in behaviour cannot conclusively be assigned to the animal conditioning or learning (i.e. forming a new association between the existing resource or behaviour and a reward or punishment. Eligible outcomes: The animals should be free-ranging at the time of the outcome measurement. We will include precursor behaviours, i.e. those behaviours that are essential for the unwanted behaviour to arise (approach before attack and attack before kill). Both individualbased and population-based outcome measures will be eligible for inclusion, but limitations of the latter measure will be part of the descriptive validity assessment (see "Study validity assessment" for details).
Eligible types of study design: When an article includes quantitative data on effectiveness it will be eligible for inclusion, with the exception of meta-analyses. We will exclude meta-analyses, but also review, opinion, comment and discussion papers and save and list them separately. We will scan their reference lists and supplemental materials for potentially missed primary studies. A study should at the very least include a before-after (BA) design or control-impact (CI) design. We will include articles independent of study sample size and unit of analysis (i.e. individual or population), but we will document this information, together with the presence/absence of randomization, the length of study and the study design, in the metadata file and use it for the descriptive validity assessment. We will not apply inclusion restrictions based on geography.
Eligible language and dates: We will only evaluate studies in English, unless highly relevant publications in other languages are proposed by experts/stakeholders. When such publications can be reliably translated we will include them as well. No date restrictions will be applied.
All inclusion/exclusion decisions in the full-text stage will be documented and made publicly available together with the literature reference archive and search records. When the same study is published twice, for example via a thesis and via a publication, we will include the most recent publication.

Study validity assessment
We will collect metadata of individual studies (Additional file 4) for use in validity eligibility decisions of subsequent systematic reviews. These metadata will include: sample size, use of individual-or population-based outcome measurements, presence of randomization and study design. We will check consistency of the validity-related metadata extraction in CADIMA with two reviewers extracting such metadata from 10% of the studies (max 50 studies). For the purpose of this map, we will only assess the validity of the evidence base on a basic descriptive level, meaning that we will quantitatively describe the presence/absence of study components known to affect validity, but that we will not use this information in maprelated eligibility decisions. For example, we will create bar graphs to visualize the number of studies per unique research design (e.g. BA, CI, BACI) and we will create frequency histograms to visualize variation in sample size among studies. If the data permit, we will subdivide these data per species family, type of unwanted 'problem' behaviour, conditioning technique and/or stimulus type. Additionally, we will pay special attention to the correspondence between the reported unwanted behaviour and the outcome measurement. For example, when the primary problem is an animal killing livestock, the quantified outcome should ideally be closely related to attack or kill behaviour, and not merely be eating behaviour. When an animal can be conditioned to stop consuming a dead sheep, it does not necessarily mean that the animal will not attack and kill a live sheep. We will therefore discuss and graphically represent how many of the included studies show a potential mismatch between the previously described unwanted behaviour and the quantified outcome behaviour.

Data coding strategy
We will collect metadata on a variety of aspects of the study, including bibliographical information, study year and location characteristics, population characteristics, 'unwanted behaviour' characteristics, intervention and outcome characteristics, study design and comparator information and any additional remarks. For example, in the category 'intervention and outcome characteristics' we will collect available data on intensity, modality and frequency of exposure of the unconditioned stimulus, which are predicted to influence the effectiveness of conditioning interventions [41,64]. See Additional file 4 for a complete overview. To evaluate consistency of data extraction a second reviewer will additionally fill in the datasheet for ten publications. Any discrepancies will be discussed before further extraction and if necessary, definitions of variables will be refined and/or codes adjusted. When relevant metadata information in an article appears missing or unclear, not retrievable from other sources (e.g. IUCN Red List) and the reported study was conducted less than 10 years ago, we will try to contact the authors of the respective article to retrieve the information. For articles reporting on studies older than 10 years we will leave sections with missing metadata blank. Also, if certain types of metadata are missing or unclear for more than 50% of the included articles (irrespective of study year) authors will not be contacted.

Study mapping and presentation
We will make a narrative synthesis of the included studies. In this synthesis the availability of the evidence in respect to the main research question and the two subquestions, as well as specific metadata variables (e.g. species, social system, intervention type, study-design) will be discussed. Where useful, descriptive statistics will be provided and one or more study-frequency heatmaps will be created to visualise the potential presence of evidence clusters and gaps in the evidence base. In the narrative synthesis we aim to discuss whether the identified evidence clusters might be suitable for systematic review. Based on the included studies, we will also discuss potentially important effect modifiers to be included in a subsequent systematic review. We will pay special attention to factors that were mentioned by previous studies to potentially affect effectiveness of conditioning (see "Background"), such as the social system of the subject species, the specific combination of types of unconditioned stimulus and conditioned stimulus or behaviour, frequency and duration of stimulus pairing and order and time between occurrence of conditioned stimulus or behaviour and unconditioned stimulus presentation. Finally, we will discuss any identified evidence gaps and will suggest potentially relevant avenues for future research on this topic. Special attention will be paid to avoid vote-counting and discussions on the overall effectiveness of conditioning interventions. Together with the narrative synthesis, we will create an interactive geographic map of the results (i.e. evidence atlas), which will show the geographical spread of the evidence within the literature. We will also make a MS-Excel database available that includes all the extracted metadata (see Additional file 4). Finally, we will present a flow diagram of the mapping process and we will publish all the data related to search strategy, consistency checking and other intermediate steps in the mapping process (as made available by CADIMA) together with the narrative.