Validation non-English version of modified checklist for autism in toddlers-revised with follow-up

Background: In modern conditions, the diagnosis of autism spectrum disorders (ASD) in young children is quite complex. A two-step version of the updated checklist for autism in young children, revised with follow-up (M-CHAT-R/F), was shown to be successful in a large, geographically diverse sample of young children in primary care. In this study, the authors conducted a critical study of 16 validation studies of 35,722 participants participating in M-CHAT-R/F for early detection of ASD in non-English-speaking countries, and established a recommendation for validation of the non-English-language version of the M-CHAT-R/F screening method among toddlers. Aim: Examine available publications on assessing the accuracy of the screening tool for detecting ASD in non-English speaking countries and identify possible errors in methodology. Material and methods: The survey was conducted in databases. The criteria for inclusion of publications in this review: studies of the diagnostic accuracy of the screening tool for early detection of ASD M-CHAT-R/F, non-English speaking country of study. The search depth is 6 years. Results: The sensitivity and accuracy of M-CHAT-R/F screening displayed outstanding performance with an average of 0.86 (CI=0.95) and 0.80 (CI=0.95) respectively. Based on the findings of this analysis, it can be inferred that, at some stage of the assessment of the validity and reliability of the M-CHAT-R/F instrument, each study included in the analysis rendered various degrees of error. Conclusion: Non-compliance with the blindness guideline and the presence of children diagnosed with ASD and documented screening outcomes greatly raises the likelihood of error. A major challenge for researchers at the timing and flow stage was to ensure that the maximum number of participants participated from the very beginning to the end. It is more effective for researchers to work with a sample of small magnitude since this is an incentive to offer diagnostics to the full number of study participants.


Introduction
A mixture of social communication disorders and repeated sensorimotor behaviors is autism spectrum disorder (ASD). Genetic and environmental influences are often correlated with ASD [1]. Research indicates that there is no widely recognized definition of the existence of ASD, and the lack of appropriate therapeutic options makes it necessary to consider ASD as an incurable condition at present [2]. Moreover, the proper diagnosis of ASD requires daily evaluation and behavior review [3]. Also, research reports that individuals with ASD have difficulties in obtaining health services and lack of support throughout their lives [4,5].
In this respect, the value of detailed early detection is greatly enhanced. Initiated at the age of less than three years, intervention greatly enhances responsive behavior, raises IQ, and decreases the prevalence of ASD disease [6].
Even so, under current settings, it is very difficult to diagnose ASD under young children. Only 4 out of 21 cases of ASD have been reported by primary health care providers, suggesting the need for additional use of appropriate screening methods [7]. Minimum preparation and good outcomes with a higher positive prognostic value (PPV) are necessary for the use of screening methods by health professionals [8].
The early version of the Modified Checklist for Toddler Autism (M-CHAT) consists of 23 questions that were created in 1999, and has been proven in more than 22 countries to be accurate and effective. M-CHAT demonstrates outstanding diagnosis of small children with other developmental disabilities, but demonstrated poor PPV in a cohort of children under the age of 27 months [9].
The Modified Checklist for Autism in Toddlers-Revised Follow-Up (M-CHAT-R/F) was developed and tested in 2014 for the early diagnosis of ASD in children aged 16-30 months. The age of diagnosis of ASD can be lowered to 2 years by screening, thereby increasing the time available for early detection. Compared to the previous iteration of M-CHAT, M-CHAT-R/F detects ASD with a higher frequency and also displays slightly less false-positive findings owing to the second screening phase-additional interview [10]. The efficacy of the two-stage screening variant on a wide, geographically distributed primary care population of young children has been shown [11].
In terms of its usefulness, however, M-CHAT-R/F is a questionnaire focused on the child's parental assessment. Therefore, due to variations in parents' opinions on their children's normal actions, the efficacy of the screening method can vary [12]. Therefore, validity and reliability tests must be properly conducted in order to prevent overestimating or underestimating the usefulness of the M-CHAT-R/F models.
In this review, the authors carried out a critical examination of M-CHAT-R/F validation studies for early identification of ASD in toddlers. Analysis vulnerabilities and concerns that contributed to a high risk of testing errors were identified, as well as methods of addressing them.

Material and methods
The criteria for inclusion in this review were: studies aimed at evaluating the validity and reliability of the M-CHAT-R/F screening tool in young children.
A literature research was done in common databases such as Scopus, Web of Science, eLibrary and Google Scholar. In total 775 references were found by the 'M-CHAT-R/F' keyword. Among them only 16 articles described validation trials of the M-CHAT-R/F screening instrument among children 24-30 months of age who are not at risk in a non-English -speaking country.
Final papers were evaluated using QUADAS-2 consistency evaluation diagnostic test tool [13]. Evaluation was conducted with a series of signaling questions, which were grouped into four categories: patient selection, index test, reference standard and timing and flow. Additionally, the evaluation questions were increased by following questions: training for health providers, training for the caregiver and cross-cultural adaptation. All signaling questions are listed in Table 2.
Applicability considerations are based on the following criteria developed to achieve the objective of the research, namely: the validation and reliability of the ASD M-CHAT-R/F early detection screening technique in non-English speaking countries. Exclusion criteria: studies conducted in an Englishspeaking country, evaluating the accuracy of the Englishlanguage version of M-CHAT-R/F, a study focused on the study of risk groups for ASD, and not on the diagnostic accuracy of M-CHAT-R/F, using only earlier versions in the study questionnaires such as M-CHAT.
As indicated by the QUADAS-2 criteria, the degrees of uncertainty regarding the applicability of the analysis in the sample should be rated as "low," "high," or "unclear." It should be noted that the "unclear" word is used where there is insufficient data to assess the applicability of research in this review ( Table 5). First of all, the applicability of the studies in this review was evaluated. In Table 5, can be seen the level of applicability concerns. Despite the fact that some studies have a high level of applicability concerns, all studies were included in the review because they have low-level applicability concerns in the "index-test" section.
The answers to each question could be 'yes', 'no' and 'unclear'. "Unclear" is written as an answer to a signal question when there is no information about this item in the publication.

Results
A brief description of the studies included in the survey is provided in Table 1.

Patient selection
Answers on signaling questions are shown in Table 2. 10 of the studies (62.5%) avoided the "case-control" study design, 3 (18.75%) of the studies used it and there was no information to answer this question for the remaining 3 (18.75%) of the studies.
Consecutive or random approach of choosing children for testing was used in 10 of the studies (62.5%), 3 (18.75%) of the studies did not follow these recommendations and 3 (18.75%) of the studies did not provide any details about recruiting sample subjects.
Only 3 (18.75%) of the studies used acceptable inclusion and exclusion criteria. On the contrary, 8 (50%) of the studies included children with or at risk of diagnosis with ASD, behavioral illness and other neurodevelopmental conditions. Remaining 5 (31.25%) studies did not provide any details on the inclusion and exclusion requirements in the publications ( Table 2).

Index-test
The principle of blindness was found in 10 studies (62.5%), M-CHAT-R/F screening was conducted in children prior to the diagnostic test for ASD, and screening findings were also determined without impacting diagnostic results. However, as some research involved children with prior conditions, in 5 studies (31.25%) the theory of 'blindness' may not be possible ( Table 2).
The threshold value of the original variant of M-CHAT-R/F for the possibility of ASD was tentatively used in half of the experiments ( Table 2). These studies identified their thresholds, ranging from 2 to 7. No information about their thresholds was given for the second half of the studies (Table 3).
To prevent misinterpretation, it is very important to ensure that parents grasp the M-CHAT-R/F questions [30]. However, only a few studies offered short guidance for the doctors and caregivers involved in the study (8 out of 16 studies (50 %) and 6 out of 16 studies (37.5 %), respectively). However, no detail was contained in the other articles regarding any preparation or clarification ( Table 2).
Cultural modifications to their versions of the M-CHAT-R/F questionnaire were carried out by the vast majority of publications (13 (81,2 %). The two experiments did not carry out any cultural adaptation, but instead checked localized versions of the official website's M-CHAT-R/F questionnaire [31]. In its analysis, one study did not mention the reality of cultural adaptation. (Table 2) The sensitivity and accuracy of M-CHAT-R/F screening displayed outstanding performance with an average of 0.86 (CI=0.95) and 0.80 (CI=0.95) respectively. In Study 7 (Sen=0.5) and in Study 9 (Sp=0.05), the minimum values are marked. Positive predictive value (PPV) and negative predictive value (NPV) mean values are 0.50 and 0.97, respectively. These metrics have not been measured in several tests (Table 3).

Reference standard
In 14 (87.5 percent) trials, only children at high risk of ASD were diagnosed on the basis of the findings of the M-CHAT-R / F screening, which suggests that the diagnosis of the participants in the research was made with knowledge of the screening data, which is unacceptable for an independent examination of the questionnaire's validity and reliability (Table 1). Two studies (Study 5, Study 16) did not diagnose the study participants.
The diagnostic approaches used by the study authors are substantially different in Table 3. Most studies, however, have used the fifth edition of the Diagnostic and Statistical Manual of Psychiatric Disorders (DSM-5). The Autism Diagnostic Interview-Revised (ADI-R) and the Autism Diagnostic Observation Program (ADOS) have the greatest database of evidence and the highest reliability and accuracy and were suggested for combined use [32]. However, this powerful mixture was only seen in one analysis (Study 12).

Timing and flow
All the research analyzed included 35,782 individuals in total. Owing to non-compliance with the eligibility criterion (age not included in the 16-30-month group, diagnosis of ASD or other behavioral and developmental disabilities, lack of permission to enroll in the study), 7987 participants were omitted from the research. 2,479 participants were omitted because of inappropriate filling in the M-CHAT-R method of the first level of screening. There is a substantial decline in patient involvement (2663 participants) in the second stage of the screening due to comprehension issues, reluctance to further participate in the research, and communication issues. In studies, the cumulative number of individuals were diagnosed is just 1,592 (Figure 1).
Participants completed a screening test, Follow-Up, and diagnostics in 4 studies. In comparison, most experiments (12 out  16) did not involve all empirical participants. This is because most studies have involved people who were found to have a high to medium risk of ASD to underwent screening testing, while the study did not involve children with a moderate risk of ASD in diagnostics. Of the 16 reports, 12 did not have any detail on the period of time between screening and diagnosis, making it hard to determine its accuracy. And, as multiple diagnostic methods were used in the experiments, participants were exposed to multiple diagnostic methods in five studies (Table 2).

Patient Selection
The presence of selective or random sampling of young children in research was examined in this segment, namely, the age group of 24-30 months that did not have an early registered diagnosis of "autism spectrum disorder," "Atypical autism," "Childhood autism," "Asperger syndrome," "Pervasive developmental disorder," and other neurodevelopmental, mental, and behavioral conditions. It is also mandatory that the experiments take into consideration the adequacy of the caregivers, their emotional and physical state. This is relevant because the parents of the children under review carry out M-CHAT-R/F screening, and inaccurate outcomes can lead to an over-optimistic or negative estimation of the efficacy of M-CHAT-R/F. Studies should also include both children and parents who satisfy the requirements stated and lack any grounds for exclusion. (Signaling questions: "A consecutive or random sample of patients enrolled", " Avoided case-control study design") The "case-control" study design should be avoided if children are afflicted with ASD or other neurodevelopmental, mental, behavioral disorders to assess the efficacy of M-CHAT-R/F in a high-risk population. Since this design distorts fact and will lead to a re-evaluation of the M-CHAT-R/F screening tool's usefulness and accuracy (Signaling question: "Avoided casecontrol study design") Children who had already been diagnosed with ASD or other mental and developmental disorders at the time of study were included in the failed trials. Children with ASD as a "case" group and children without diagnosis as a "control" category were used in some trials, so the "case-control" model method was used.
There's a strong chance of prejudice for this block in half of the tests. This is attributed largely to inadequate requirements for inclusion and exclusion. There is a low chance in 6 (37.5%) studies and there is no evidence on the procedures of recruiting of sample subjects and the conditions for participation and exclusion in 3 (18.8%) studies (Table 5).

Index-test
In this analysis, M-CHAT-R/F is the index-test. The lack of diagnostic test findings concerning the outcomes of the M-CHAT-R/F screening should be the main issue of this segment. (Signalling question: 'M-CHAT-R/F findings perceived without information of test outcomes') In this question, owing to the arbitrary analysis of the M-CHAT-R/R questions, the concept of 'blindness' must be followed. First of all, for caregivers interested in the analysis, this idea should be followed. Until screening, health care providers should stop advising or posing questions about the progress of the child and mental health, as well as including parents with prematurely diagnosed children with ASD, other mental, neurodevelopmental, and behavioral disabilities [33].
In addition, despite the fact that the initial version of M-CHAT-R / F sets cut-off scores for low, medium, and high risk, all children participating in the research must undergo follow-up interviews and diagnostic testing. Each analysis should measure the sensitivity and specificity of each "bad" item after passing all stages of screening and diagnostics and determine its cutoff values. (Signalling question: "Was it pre-specified whether a threshold was used"). Only in each analysis should the cut-off values of the initial M-CHAT-R / F version be suggested, but not final, as this could lead to an overestimation of the efficacy of M-CHAT-R / F, which could be lower in another sample of children with the same threshold score [13,34].
This analysis also looked at whether there was a short description of M-CHAT-R / F screening for caregivers and health professionals included in the research in primary health settings. (Signaling questions: "Health professional training", "Caregiver training"). This is important because this test is a trial in most trials because it has not historically been observed by any caregivers or health professionals. It is also important to clarify the purpose of the screening, its understanding, to the health care professionals engaging in the research, so that they can correctly communicate it to caregivers. For the most reliable and relevant screening, it is therefore important to communicate with clinicians about the purposes and value of screening and receive informed consent from them. For the same purposes, the cultural elements of the M-CHAT-R/F screening questions need to be updated. For both health care professionals and caregivers, for proper understanding and clarification of screening questions.

(Signaling question: «Cross-cultural adaptation of M-CHAT-R/F»)
Most studies had a low risk of error for this block (10 (62,5%), 5 (31,3%) studies had a high risk, and 1 (6,3%) study had inadequate risk assessment information. In experiments with a high probability of error (Table 5), the key issue being that children with an older diagnosis were included in the studies, which violated the 'blindness' concept in the study of the efficacy of M-CHAT-R/F.

Reference standard
This section analyzes the approaches used in trials to identify ASD in toddlers. It is believed that the diagnostic approaches used will assess ASD by 100% (Signaling query "Reference form correctly defined ASD"), and inaccurate screening outcomes are the discrepancies between the diagnostic results and the results of M-CHAT-R/F [35].
Around the same time, the idea of 'blindness' must be followed, as in the previous section. Knowing the effects of screening will influence the outcomes of diagnostics and lead to a re-evaluation of the efficacy and detection accuracy of ASD using M-CHAT-R/F. (Signaling question: «Diagnostic findings perceived without knowledge of the effects of the M-CHAT-R/F») [33,36].
Most studies (11 (68.8 %) have a high probability of error for this block, since the studies interpreted the testing findings for those members of the sample that had a high and average probability of ASD, suggesting that the screening outcomes were known. For this block, only Study 1 (6.3 %) has a low chance of a mistake (Table 5).

Flow and timing
Ideally, it is important to quickly process the findings of screening and diagnostic testing. As ASD is marked by a persistent path, however, no regression or regeneration is expected. So a couple of days' delay does not seem like a concern. Around the same time, for all research subjects, the longer the distance between screening and diagnosis, the greater the chance of low diagnostic coverage. This would also, in fact, lead to an erroneous estimation of the efficacy of M-CHAT-R/F screening in studies. ("Appropriate period between screening and comparison assessment" signaling question) To prevent bias, experiments do not use multiple testing approaches under the influence of screening outcomes. As there are many screening approaches, this is particularly important for the diagnosis of ASD. It is therefore important to diagnose all research subjects, irrespective of the outcomes of M-CHAT-R/F screening, to prevent inappropriate sensitivity and specificity evaluation. Both children examined should be included in the study, regardless of the ASD screening and diagnostic findings [37]. (Signaling questions: «All patients receive a diagnostic», «All patients receive a diagnostic», «All patients included in the analysis»).
If the knowledge given in the analysis satisfies the above criteria, the answer to the signal question is "yes". If, on the other hand, it does not satisfy, then the answer to the query of the signal is written as 'no.' The response to the signal question is written as "unclear" if there is inadequate knowledge in the analysis on the questions posed.
Just two studies have a low risk of error when testing the efficacy of their variant of the M-CHAT-R/F for this block ( Table  4). The majority of experiments have a high probability of error, largely because not all subjects in the sample have completed a diagnostic test and have not been included in the final review.

Conclusion
Based on the findings of this analysis, it can be inferred that, at some stage of the assessment of the validity and reliability of the M-CHAT-R/F instrument, each study included in the analysis rendered various degrees of error. The researchers had the largest number of problems, evidently at the "Reference Standard," "Timing and Flow" level. This may be because there is little consensus in the question of the "gold standard" in diagnosis of ASD in realistic health care (Table 5). And this can be seen in Table 4, where we can see that various diagnostic approaches and techniques are used from sample to research. Around the same time, non-compliance with the blindness guideline and the presence of children diagnosed with ASD and with documented screening outcomes greatly raises the likelihood of error. Future studies to determine the validity and efficacy of the M-CHAT-R/F screening method in its community should also concentrate on these two topics. Next, instrumental diagnostic approaches for ASD, such as ADOS-2 and ADI-R, can also be used for diagnostic purposes in combination with the multidisciplinary diagnosis of infants. Second, eliminate any chances to include children with current ASD diagnoses and other developmental disabilities in trials to prevent prejudice. A major challenge for researchers at the timing and flow stage was to ensure that the maximum number of participants participated from the very beginning to the end. However, as can be seen in Figure 1, there is a substantial decline in the number of participants at the diagnostic level. This is attributed to a number of causes. From an excessively large initial number of participants participating in the study to a low level of health literacy in the study population or problems with health promotion activities. Thus, in future studies, at this point, researchers should pay more attention to the sample under analysis and set an acceptable timeline between each stage of the study. Attention should also be given to increasing the trust of the sample subjects in the study themselves and to carrying out explanatory practices.
Study 6 is the study with the lowest error chance (Table 5). After evaluating all the products, it can be inferred that this is partly due to the comparatively limited sample size (110). It is as straightforward as possible for such a sample to go through all phases of the analysis. Around the same time, it is more effective for researchers to work with a sample of this magnitude since this is an incentive to offer diagnostics to the full number of study participants.
In addition to the aforementioned guidelines on sample size, the following relevant points are recommended for prospective researchers: -participation in the report by people who have not previously had an ASD diagnosis and other psychiatric and developmental disorders.
-the age of the children tested should be between 16 and 30 months.
-the lack of motor and speech disabilities in the sample participants; -the implementation of the "gold standard" in the diagnosis of ASD.
-maximum participation of all participants in the research process at each stage; -cultural adaptation of the M-CHAT-R/F version.

Disclosures:
There is no conflict of interest for all authors.