This systematic review aimed at analysing the reliability and validity of field-based tests for assessing physical fitness in gymnasts.
MethodThree electronic databases (PubMed, SPORTDiscus, and Scopus) were searched up to March 2022, in order to identify studies that assessed the psychometric properties of field-based physical fitness test among gymnastics.
ResultsA total of 16 studies on several gymnastics modalities (artistics n = 11; rhythmic n = 3, artistics and rhythmic n = 1; aerobic n = 1), were analyzed. All studies reported on reliability measured through test-retest design. Validity was reported in only four studies. Regarding specific tests, the split test (ICC = 0.998), and the handstand (ICC= 1) showed the highest test-retest reliability. The greater validity values were achieved by the split test (r2 = 0.52), hanging pikes test (r2 = 0.86), and handstand test (r2 = 0.65).
ConclusionA great variety of both specific and non-specific physical fitness tests have been analyzed in the field of gymnastics. The side split test, the handstand test, the vertical jump test, the 20-m run test, the agility test, and the aerobic gymnast anaerobic test could be useful tools to assess flexibility, strength, balance, muscular power, speed, agility, and cardiorespiratory fitness in gymnasts. Further investigations analyzing absolute reliability and criterion validity are needed.
It is estimated that worldwide about 50 million people of all ages regularly perform gymnastics in a club setting.1 The International Gymnastics Federation (FIG, http://www.fig.gymnastics.com) recognized a total of eight disciplines, being three of them (artistic, rhythmic and trampoline) Olympic.1
Physical fitness (PF), is strongly involved in gymnastics, since its practice requires a combination of speed, strength, endurance, agility, flexibility, balance and power.2 The importance of assessing PF in gymnastics relies on the fact that it not only helps coaches and trainers to monitor the development of their athletes, but also to promote healthy, injury-free participation, as well as talent identification.2–4 Consequently, coaches and trainers need meaningful, reliable, and sensitive outcome gymnastics-specific fitness tests.
Laboratory test represents the gold standard for assessing PF, however these tests are expensive and require highly trained experimenters, which compromise their feasibility and applicability in the gymnastics context. Considering these circumstances, the use of field-based PF tests is recommended, since they are easy to administer, involve minimal equipment, minimal cost, and a larger number of participants can be evaluated in a relatively short period of time.5 However, the quality and weighting of the information obtained from field-based tests is conditioned by the quality of psychometric properties, especially reliability (consistency or repeatability of measurements) and validity (the capacity of the test to reflect what is has been designed to measure), which should be informed in advance.6
Information regarding the accuracy of field-based tests for assessing athletes’ PF, is usually provided by systematic reviews that summarize and critically analyze their psychometric properties. This has been the case of different sports modalities such as soccer,7 basketball.8
Coaches and trainers can identify which are the more accurate field-based PF tests that should be administered to their athletes, through systematic reviews that have summarized and critically analyzed their psychometric properties. However, these reviews are usually focused on the most famous and practiced sports, such as soccer,7 or basketball,8 while to the very best of the authors’ knowledge, scant research of this kind has been carried out in lesser popular sport modalities, such as gymnastics. In the light of all this, the objective of this study is to carry out a comprehensive review of the scientific evidence about the reliability and validity of field-based tests for assessing PF in gymnasts.
MethodsA systematic review about the reliability and/or validity of field-based tests was used to assess the fitness level in gymnasts was carried out. This systematic review was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.9
Search strategyThree electronic databases (PubMed, SPORTDiscus, and Scopus) were searched from inception to March 202. The literature search was conducted by one researcher. The following keywords, Boolean operators, and combinations were used: [“Gymnastics” OR “Rhythm gymnastics” OR “Artistics gymnastics”] AND [“Physical fitness” OR “Physical performance” OR “Strength” OR “Muscular strength” OR “Endurance” OR “Aerobic endurance” OR “Flexibility” OR “Anaerobic” OR “Aerobic endurance”] AND [“Evaluation” OR “Measurement”]. To be included in the review, studies were required to meet the following criteria: (i) provided information about the reliability and/or validity of at least one field-based PF test in gymnastics, (ii) published in English, Spanish or Portuguese and (iii) in a peer-reviewed journal. Investigations that reported date on the psychometric properties of field-based PF test without describing the methodological approach used for identifying reliability or validity, were exclude.
Study selectionOne author screened the titles and abstracts identified during the search. When the information provided suggested that the study met the selection criteria, a full-text copy was examined. Doubts about inclusion were discussed with a third author until a consensus was reached.
Data extractionAll included studies were reviewed by one author. Information on participants’ characteristics (n, age and gymnastics specialty), gymnastics fitness test/s performed, and values related to their reliability and/or validity (the method to identify them and the type of statistical analysis and its coefficients) was extracted. Two expert authors identified specific and non-specific tests. Discrepancies were resolved by a second author. The bibliography in all selected studies was analyzed in search of new evidence.
ResultsA total of 677 studies were found after different search strategies. Authors obtained 56 studies after removal duplicate results and those not related to the main aim. After reading full texts, a total of 16 studies about the psychometric properties of field-based physical fitness tests in gymnasts were selected for the further analysis (Table 1).
Studies included in the final selection.
CMJ: Countermovement jump; CMJA: Countermovement jump with arm swing; DJ: Drop jump; ICC: Intraclass correlation coefficient; KTK: Körperkoordinationstest für Kinder; NR: Not reported; SJ: Squat jump.
Regarding specialty, artistic specialty was reported in eleven studies.2,4,10–18 On the other hand, three investigations reported rhythmic specialty.19–21 Only one study indicated aerobic specialty.22 Another record included rhythmic and artistic specialties.23
All studies reported on reliability measured through test-retest design. Validity was reported in only four studies.2,15,18,22
Relative reliabilityA total of 15 investigations indicated test-retest reliability,2,4,10–14,16–23 while only two studies added inter-rater reliability.15,19
A total of 12 studies reported on relative reliability for different physical fitness-related subtests.2,4,11–15,17–20,23 Four studies reported on reliability data for one physical fitness test: flexibility,10,16 cardiorespiratory fitness,22 and coordination.21 Time interval between test and retest varied from same day4,10,16,20 to ten days.19
FlexibilityFlexibility tests were the most frequent assessment in 10 studies.2,4,11,15,16,18–20,23,24 Regarding to the most reliable test, the side split test was the most reliable specific assessment (ICC = 0.998), while between non-specific tests, active shoulder flexibility was the most reliable (ICC = 0.996).4
StrengthA total of eight records analyzed strength.2,4,12,15,17,18,20,23 The handstand test (ICC = 1,00)15 and 4-m rope climb (ICC = 0.999)4 obtained the higher reliable values in the specific and the non-specific tests, respectively.
Muscular powerEight investigations used muscular power assessments.2,4,11,13–15,18,20 The non-specific vertical jump test was the most reliable evaluation in muscular power (ICC = 0.999).4
BalanceA total of five studies used balance assessments.2,15,18,20,23 The handstand test was also obtained the best value in the specific balance tests (ICC = 1.00)15 and regarding to non-specific balance tests, the flamingo test accomplished the higher result (ICC = 0.870).23
SpeedFour studies analyzed speed performance.2,4,15,23 The 20-m run obtained the best value in the non-specific speed assessments (ICC = 0.996).4
AgilityThree studies reported agility assessments.2,15,18 The non-specific agility test accomplished the most reliable value (ICC = 0.95).15,18
Cardiorespiratory fitnessThree studies analyzed cardiorespiratory fitness.20,22,23 Specific aerobic gymnast anaerobic test was the most reliable test (ICC = 0.97)22 and 20-m shuttle run test obtained the higher value in the non-specific cardiorespiratory fitness evaluations (ICC = 0.91).20
MobilityMobility assessments were reported in two studies.20,23 Shoulder extension test obtained the best result in non-specific mobility tests (ICC = 0.97).
CoordinationOnly one study reported coordination assessment.21 In this study, throwing the ball and reversing forward specific test obtained 0.799 in the Pearson's correlation coefficient.
Absolute reliabilityStandard error of measurement (SEM) and minimum detectable change (MDC) were calculated to assess absolute reliability in only four studies13,15,17,22. Indicated that SEM with 95% confidence intervals was 0.96 to 0.99 in the overall reliability analysis of the battery total score. Another study reported that SEM for all muscle groups varied from ±0.4 to ±1.0 kg at different joint angular positions in four muscle groups (shoulder flexors and extensors, hip flexors and extensors).17 MDC 95% resulting from the ICC was 0.12 s in the specific anaerobic field test for aerobic gymnastics.13,22 Reported that SEM (flight time) varied from 5.84 ms to 10.26 ms in the CMJ and DJ40, respectively. SEM (estimated mechanical power) varied between 1.84 w/kg (DJ60) and 2.25 w/kg (DJ40). MDC % (flight time) was also indicated and ranged between 3.3 (CMJA) and 5.9 (DJ40). MDC % (estimated mechanical power ranged from 12.5 (DJ60) to 13.9 (DJ40).
ValidityThree studies analyzed construct validity using a simple regression analysis between total test scores and the gymnasts’ competition level.2,15,18 Only one used criterion validity evaluating whether the specific anaerobic field test for aerobic gymnastics correlated with the Wingate test.22
FlexibilityThree studies reported flexibility assessments.2,15,18 Regarding to validity values, the specific splits test was the most valid test (r2 = 0.52)15 and the shoulder flexibility test obtained the higher value among non-specific tests (r2 = 0.05).2
StrengthStrength assessments were used in three records.2,15,18 The specific hanging pikes test accomplished the higher value in strength assessments (r2 = 0.86), while the non-specific push-up test obtained the higher value (r2 = 0.91).15
Muscular powerThree investigations reported muscular power evaluations.2,15,18 The non-specific jump test was the most valid test in muscular power assessments (r2 = 0.88).15
BalanceBalance evaluations were analyzed in three investigations.2,15,18 The handstand test obtained the best value in balance evaluations (r2 = 0.65).15
SpeedTwo records reported speed evaluations.2,15 The 20-yard sprint test was the most valid test (r2 = 0.92).15
AgilityThree investigations reported agility assessments.2,15,18 The agility test showed the higher value (r2 = 0.96).15
Cardiorespiratory fitnessCardiorespiratory fitness validity test was reported in one study.22 This study indicated that validity ranged from 0.69 to 0.73.
DiscussionThe main goal of this review was to show the scientific evidence about the reliable and validity data of field-based tests to assess PF in gymnastics. Thus, coaches could use these findings to know and control physical condition in gymnasts, that it could be helpful to improve the performance of athletes.
On the one hand, it was reported that reliability values were high in most tests (ICC > 0.9). Anyway, it is important to clarify that protocols were not adequate in several records. According to this, most studies included in this review used a time between trials too short (< 2 days). In most of them, the familiarization session was not included. Thus, it exists scientific evidence which indicates that subjects should be familiarized with the performance protocol by at least one trial before measurement commences.6 Furthermore, the time between assessments could affect the test-retest reliability and it could not be too short or too long.25 Although it was shown that the optimal time interval will vary depending on the construct being measured, on the stability of the construct over time and on the target population, two weeks seem the most frequently recommended interval.26 It seems relevant to highlight that various studies included in this review used one week interval to control fatigue or learning effects while trying to avoid enough passage of time to allow a true change in a gymnast's overall fitness.2
Regarding to physical abilities, flexibility, strength, power, balance, speed, agility, and cardiorespiratory fitness play an important role in the success of a competitive gymnast.2 Considering this, it has shown that the most reliable tests were the side split test (flexibility), the handstand test (strength and balance), the vertical jump test (muscular power), the 20-m run test (speed), the agility test (agility), and the aerobic gymnast anaerobic test (cardiorespiratory fitness). Thus, three of them (the vertical jump test, the 20-m run test and the agility test) were non-specific tests. So, it is important to consider that specific tests are useful tool to assess specific performance because the non-specific tests do not always correlate well with gymnasts’ performance, and this could suppose a weakness.27 Furthermore, some of these tests showed high reliability values in other populations. For example, the 20-m run test showed a high reliability among elite youth female soccer players (ICC = 0.96).28 Regarding jump assessments, the vertical jump test also showed a high reliability values in men and women physically active (ICC = 0.87 – 0.94).29
It is relevant to highlight that only two studies indicated inter-rater reliability data.15 only reported inter-rater reliability data, while19 showed intra- and inter-rater reliability values. Thus, authors affirmed that only one rater administered the measurements supposed a limitation in one study.17
Regarding to reliability types, relative reliability is the degree to which individuals maintain their position in a sample with repeated measurements,30 being reported in most studies included in this review. Nevertheless, only four records indicated absolute reliability data through SEM and MDC. Absolute reliability refers to the degree to which repeated measurements vary for individuals. In addition, it helps predict the magnitude of a real change in individual athletes and could be employed to estimate statistical power for a repeated-measures experiment.30 According to this, it seems necessary that future investigations also show absolute reliability data with the main goal of knowing how repeated assessments vary among subjects.
On the other hand, validity was reported in only four studies. Three of them used construct validity, contrasting total test scores with gymnasts’ competition level. Nonetheless, only one study used criterion validity evaluating the correlation between the specific anaerobic field test for aerobic gymnastics and a gold standard assessment (the Wingate test). This supposes a weakness because criterion validity allows for an objective measure of validity.6 The lack of studies that analyze the validity of different tests used in gymnasts hinders the implementation of these tests to assess and control athletes’ performance.
LimitationsThere are some limitations that need to be considered when interpreting the findings of this review. First, only four studies analyzed absolute reliability, and since this helps predict the magnitude of a true change. Moreover, criterion validity is the only objective measure of validity and only one record reported criterion validity data. Thus, these results should be interpreted with caution.
In addition, most studies did not report a correct explanation about tests. Furthermore, most tests were non-specific assessments, so this should be interpreted considering this aspect. It is important the details of assessments to understand and apply these tests in the best way. Finally, language restrictions in the search process and the non-inclusion of grey literature might have affected these results.
ConclusionsThere are several tests to assess gymnasts’ fitness. The side split test, the handstand test, the vertical jump test, the 20-m run test, the agility test, and the aerobic gymnast anaerobic test could be useful tools to assess flexibility, strength, balance, muscular power, speed, agility, and cardiorespiratory fitness in gymnasts. Further investigations aimed at analyze absolute reliability and criterion validity are needed.
FundingThis research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.