Review descriptive statistics
From searches conducted between June and September 2021, a total of 8655 articles were returned from the queried databases and search engines (Fig. 1). After deduplication, 5950 articles remained of which title, abstract and metadata were checked for inclusion. This step was done by two authors using the Rayyan web tool. Among the 416 articles screened by two authors (7.0%), there were only 24 conflicting in- or exclusion decisions. In Additional file 2, a table of all references with their reason for exclusion is given. The main reasons of exclusion were a non-relevant topic (n = 1718), a non-relevant population (n = 976) and a language other than English, German, French, Spanish or Dutch (n = 840). The R software package ‘ROSES_flowchart 0.0.1’ was used to create the ROSES flow chart depicted in Fig. 1 [50].
A subset of 591 articles were retained as eligible, see Additional file 5 (containing a table of all eligible articles mentioning the database where an article was found). However, only 281 full texts were retrieved, of which 178 were discarded after reading the full text. The main reason here were research outcomes outside of those eligible for the review. The unretrievable articles are listed in Additional file 6
In conclusion, 104 articles were included for the analysis. Among the 104 included articles, 92.3% reported on more than one experiment (796 in total). With regard to the reported outcome, 16.5% of the articles reported on more than one outcome measure per study, which were individually retained as 971 effect sizes in the final dataset (Additional file 4).
Despite efforts to include literature from Asia and Latin America, 84.6% of the results were confined to North America, Europe and Oceania (Fig. 2).
No study from the African continent was considered nor for the mountainous parts of south-east Asia. Relatively important producing countries like Argentina and Chile do not appear. Studies from these countries were initially detected but did not fulfil the inclusion criteria in Table 2. Therefore, the most important producing regions in terms of harvest weight are severely underrepresented and the dominance of studies from North America is not in accordance with the global distribution of annual total fruit production (Fig. 3).
The body of included literature spans a period of 1905 to 2021 with the oldest publications coming from North America and England (Fig. 2). During this period, there was a constant release of publications dealing with apple while in recent years, grapevine gained increased attention (Fig. 4b). Historical regionally destructive frost episodes, like the ones in 1991 and 2017 in Europe, did not show in the data. The studies of peach, cherry and apple were spread over countries and continents, while other fruits were confined to certain regions, e.g., pear in Western Europe and citrus to the United States. The latter is overrepresented in the dataset since nearly all included fruits are researched in this sub-continent.
Temperate fruits in this review also include citrus fruits [52], for which numerous studies on frost hardiness and leave or stem survival (during the winter months) have been published. Since in this review, outcome measures were restricted to effects on buds, flowers and yields, in relation to spring frosts, the majority of these citrus fruit studies were excluded. Citrus fruit (lemon, orange, grapefruit and mandarin), as well as avocado, are therefore underrepresented in Fig. 4b and in Fig. 5a, b.
Interventions based on water, wind or heating installations were continuously studied through time (Fig. 4a). With exception of the oldest included study, the interventions grouped as ‘foliar applications’ constitute a comparatively recent subject of research interest. Alternative approaches, including covering of the buds, rows or entire orchards, as well as cultivation practices (e.g., pruning or mowing), also received more attention in the last decades. Not any intervention type was studied for all included fruits (Fig. 5b). Foliar sprays were mostly studied in relation to apple, peach, cherry and pear. Wind machines were relatively often studied for vineyards.
Narrative synthesis including study validity assessment
In the 104 selected articles, 796 studies or experiments were identified which yielded 971 data points on effect sizes (Fig. 6). The most common outcome was bud and flower damage reduction. Data extracted from each study, including metadata and individual study findings, along with other key information such as study location and reporting of effect modifiers are accessible in Additional file 6.
Figure 7 shows the share of studies that report on selected details. Most studies reported on the cultivars (75%), but only 38.5% on the rootstock, despite the strong influence of the latter on frost resistance [25]. Only few studies reported on the landform or terrain of the studied fields (14.4%) and 6.7% reported on notable surrounding land use, such as the presence of waterbodies and the ground cover between the rows. The mention of pruning and training schemes is as low as 14% although they determine the amount of 1-year- and multi-year wood and the resulting flowering times and exposure to frost. Likewise, only 13.5% report on the tree height, despite the potentially important vertical temperature gradient.
For the 104 articles the details on the validity assessment are provided in Additional file 7. With the criteria for risks of biases as defined above, 73.1% of the 104 studies is considered to have a high risk of bias and only 12.5% of the studies were rated to have a low risk for bias, meaning that at most one criterion for a risk of bias was fulfilled. This is illustrated in Fig. 8 where a distinction is made between study setups (Fig. 8a) and types of publications (Fig. 8b), in the inner circles respectively. The major share (78.8%) of the studies were field experiments, while the remainder comprised experiments in controlled environments like cold chambers, tunnels and greenhouses. Surprisingly, the share of low-bias studies was comparatively smaller in the controlled environments (9.1%) than for the field studies (15.9%), where environmental influences cannot be well controlled (see Additional file 8).
Experiments in conference contributions and articles published in practicioners (professional) journals were (nearly) entirely rated as having low validity (Fig. 8b).
The percentages of studies evaluated as having risks of bias are shown in Fig. 9 per bias type. The “Selection bias” as well as the criteria on “Comparable baselines” refer to biases that result from an unbalanced selection of samples. Most studies did not mention the randomisation of their samples. Studies conducted on selected tree branches in controlled environments should be more suitable for randomisation than those examining entire trees in fields. However, not a single study in controlled experiments reported on the exact way the randomisation was operated, as is the standard in other research fields [44].
In the case of interventions affecting larger spatial extents (e.g., wind machines) a strong spatial separation of the test populations is necessary, which may introduce baseline biases. Adjacent fields have been considered as comparable and risk of bias in this section was only assumed when it was explicitly stated that the control field was not adjacent to the field where the intervention occurred, or in case of other influencing factors like different cultivars.
The “Performance bias” may arise in the absence of blinding and a potential (unconscious) tendency to record higher or lower scores in function of the desired research outcome. In the case of field studies, this is also practically very difficult and only two such studies reported on explicit blinding of the researchers [53, 54]. For example, night temperature data was analysed without knowing which datapoints were collected during wind machine operation.
Data synthesis
Effectiveness of intervention classes
Considering all data points, irrespective of the validity, the highest mean bud and flower damage reductions were observed for water-based interventions, followed by the group of cultivation practices (Fig. 10). The average improvement of flower/bud survival was 15.75%. The large group of ‘foliar applications’ appears little effective on average, but several experiments reported above 30% higher survival. Since unsuccessful and even destructive treatments (excessive concentrations, extreme timings) were also included in this comparison, the range of outcomes is wide. Heating systems and tested wind machines seems to have the lowest effectiveness or may have negative effects on flower and bud survival, compared to control populations.
Considering only low risk of bias studies, the comparison does not cover all possible interventions and one single study on field covers emerges as highly effective, while the other techniques do not seem to be effective (Fig. 11).
Delaying of budding or flowering onset is meant to reduce the probability of frosts occurring during the sensitive stage of flowering. Interventions based on water and cultivation practice (mostly pruning techniques) and combined approaches led to increased delays (3–4 days more) compared to most studied foliar applications and installations of tunnels and nets. The mean delay over all techniques was 3.75 days. Here, examining results depending on the risk of bias does not change the conclusions by more than two days.
Only low validity studies reported that wind machine performed best to increase the temperature in orchards and vineyards (Fig. 11). None of the high validity studies report on effective wind machines. The range of temperature increases was large, between 0 and 9 °C. Conventional vertical wind towers outperformed the new horizontal models. Sprinkler systems performed second best and better than combinations of heaters and sprinklers or wind machines as well as heating systems on their own. Alternative systems of other categories failed to exceed a 2.5 °C increase, which may suffice only in case of light frosts. The average increase was 2.1 °C only.
For sprinkler systems the mean value was 0.5, which implies that with the interventions, yields were 1.34 times higher than the yields in the control population. As high and low validity studies reported positive effects, this kind of intervention is worth further investigation.
A Kruskal–Wallis test confirmed significant differences between intervention classes for all the outcome categories (Additional file 1: Table S2) at a significance level p = 0.05. Differences between specific groups are highlighted in the paired Wilcoxon test results per outcome category (Additional file 1: Table S3 – Additional file 1: Table S6). Significant differences were mostly reported for bud and flower damage reductions and the flowering delays. Significant differences in terms of temperature increases and yield ratios were only found between heating and water interventions and between foliar applications and water, respectively.
As pome, stone and citrus fruits and grapevine differ from both biological as managerial perspectives, we distinguish between these classes. A Kruskal–Wallis test confirmed significant differences between at least some fruit classes for all the outcome categories (Additional file 1: Table S7). While the average effects on bud and flower damage were relatively independent of the type of fruit (Fig. 10a, Additional file 1: Table S8), the potential of the interventions meant to delay flowering differed more strongly between stone fruits and grapevine (Additional file 1: Table S9). The increase in ambient temperature should not be dependent on the fruit type for biological reasons, but higher increases were measured in stone fruit orchards (significantly different only compared to pome fruit, Additional file 1: Table S10. Increases in yields were weaker for grapevine than for the other fruits (Additional file 1: Table S11).
Conditionality of effectiveness
We hypothesized that the effectiveness of a given measure is dependent on a range of environmental conditions. According to the protocol, we tested four models based on location (elevation, absolute latitude, and their interaction), a soil texture approximation and minimum temperature. The following models are restricted to ‘field experiments’ only and rely on externally retrieved data. The models differed in the defined random factor, which could be either (1) none, (2) the intervention type, (3) the fruit type, or (4) the phenological stage. Given the reporting quality of the collected data, no conclusive statements can be drawn from the explorative regressions. The detailed model results are given in Additional file 1: Table S12 to Additional file 1: Table S15.
In apple orchards, daily temperature ranges were higher and minimum temperature lower on sandy-loamy soils than on clayey soils [39]. The heat capacity and water retention potential of sandy soils are different from soils with a finer texture. However, in function of the employed model and outcome category, relations were both weakly positive and negative. Opposite effects between the impact on bud and flower damage reduction and the other outcomes were reported for the change in ambient temperature, Fig. 12a, d, g).
The latitude could not explain differences in the reported outcomes (Fig. 12b, e, h). Latitude and elevation were considered separately and in interaction, as low latitudes/high altitude locations can be exposed to similar thermal conditions as high latitude/low altitude locations. Effects of the latitude may be correlated with other factors, which are not further investigated, including the economic situation of the country in which a study was conducted, which might influence the means of conducting the study as well as the costs (and quality) of the equipment that was tested.
The severity of the frost (minimum temperature recorded during the experiment) was also expected to pose limits to certain installations more than others. Trend lines were of opposite direction, depending on the outcome measure (Fig. 12c, f, i). In nearly all tested models, the effect was statistically significant (p < 0.001). Based on the available data, which did not allow to detect trends, the highest increases in temperature were documented for temperatures around − 4 °C.
The difference between the models appeared to be influenced by the number of observations. The information on the development stage was only provided in half of the studies. For the interventions aiming at delaying the flowering, only two studies reported on the temperature and the development stage. With R2 values of 0.176 (bud and flower damage reduction, 0.458 (temperature), 0.155 (yield ratio), 0.193 (budding and flowering delay), the models have little explanatory power. This suggests that other factors, which could not be tested for, were dominant in determining the effectiveness.
The changes of the reported effectiveness of the tested interventions over time is shown in Fig. 13. The reduction of damage to buds and flowers was reported to increase over time, whereas after 2005 no strong negative outcomes were published anymore. With regard to the other outcomes, the trends in reported effectiveness seemed to be slightly negative. A possible explanation is the growing concern for the resource efficiency of the interventions, like low-volume micro-sprinklers or horizontal wind machines, which do not necessarily deliver the same level of protection as more resource-consuming techniques [17].
Sensitivity
The sensitivity of the outcome was assessed with regard to the validity rating of each data point (Fig. 11). It must be noted that, in most cases, there were higher effect sizes in studies with lower validity (Fig. 11). The effect sizes reported by high validity studies are near zero with few exceptions. The other intervention/outcome combinations were not examined in both high and low risk of bias studies. Due to the lack of data and the large dominance of studies with low validity rating, no extensive analysis was possible.
Review limitations
Limitations of review methodology
Within the available resources and given the limited comparability, the review was restricted to studies on interventions that can be measured in terms of yields, temperature increase, budding and flowering delay as well as ambient temperature. It thereby excludes other important practices of (passive) frost protection, including genetic selection or modification, improved rootstocks, increases of frost resistance and antibacterial treatments. The innovativeness of research on spring frost risk management strategies cannot be judged based on the collected evidence base.
Searches were conducted mostly in English and additionally in four other European languages, but no publications in Asian languages (e.g., Chinese) were included. Given the importance of both local fruit production and applied agricultural research in China, a substantial part of the evidence base may have been omitted from the review.
Limitations of statistical methods
As only nine studies reported on standard deviations or errors as a measure of precision, no meta-analysis including a quantification of the overall precision, or the heterogeneity of the studies was possible. The definition of sample sizes varied enormously between studies, hindering the estimation of publication biases [55]. Often, the results were more anecdotal evidence with low numbers of repetitions. The statistical metrics are therefore mostly of descriptive nature and the interpretation of the (mixed) linear regression models to answer the question of conditions of effectiveness remains indicative.
Furthermore, due to the range of effect modifiers and the relatively lower number of studies reporting on each category, regression analyses were limited to variables from external data sources, e.g., soil texture, or from meta-data of the study, e.g., the latitude.
Limitations of the evidence base
A comparatively low share of full texts could be retrieved, compared to the number of included studies (based on title and abstract). The median year of publication of included studies is 1987. A substantial part of the articles listed in the specialized databases Agricola, FAO Agris and Groene Kennis were not available in digital form and two inquired research centres have disposed of printed copies dating from before 1980. As most of these articles were published in American, German or English practitioner journals, the general conclusions in terms of spatial research gaps are likely to remain valid and the included literature is estimated to be representative, based on a comparative analysis of the abstracts of the missing literature.
Numerous studies from Asian Universities were identified in the queried databases, but the majority was published in Chinese and a good share focussed on topics outside the scope of the review, e.g. crop breeding. This results in a geographic bias. In addition, a focus on high value crops became apparent (grapes, peaches) or high production volumes (apple) rather than commercially less interesting fruits, such as plum.
Certain studies investigated patented products, like the Frostbuster or Frostguard. A minority of articles, especially from earlier years, disclosed their funding source and objectivity. On the other hand, few studies (17.3%) reported on the costs of the interventions, or on other variables which would allow deriving costs. Specialised studies and reports (i.e. [56,57,58,59,60,61,62] cover costs, but lack details on effectiveness, resulting in their exclusion from this review. There are also important gaps in the information provided on side effects of the employed techniques, like phytotoxicity [63], waterlogging, noise or reduced fruit quality. Information on the latter attributes was not extracted systematically in this review due to restricted resources.