A CONFIRMATORY FACTOR ANALYSIS OF TEACHER’S COMPETENCE IN ACTION

The instrument that assesses teachers’ competence on AR methodology is limited. Thus, it is one of the issues concerning evaluating the effectiveness of a professional development program on designing AR projects. It is difficult to determine how much and what teachers have learned in a course or training. Thus, this cross-sectional study aimed to evaluate further the validity and reliability of the Teacher’s Competence in Action Research Questionnaire, a seven-factor instrument previously proposed by Cortes, Pineda, and Geverola (2020). This self-report scale was not subjected to confirmatory factor analysis, had less sample size, and had homogenous participants. In the present study, 450 participants, both preand in-service teachers and from different teaching specializations, answered the survey. The data were analyzed using the confirmatory factor analysis method through the Maximum Likelihood approach. Four model fit indices recorded satisfactory results (CFI = 0.890; TLI = 0.884; RMSEA = 0.072; SRMR = 0.039), thus, supporting the seven-factor scale. The standardized factor loading, composite reliability, average variance extracted, and Cronbach’s alpha coefficient/s of the entire scale and within subscales also provide evidence of the convergent validity and reliability of the scale. There may be an issue in the discriminant validity of the scale, but the conceptual distinctions of each factor as supported by theoretical foundation and arguments provide a principal reason for retaining all the items and factors.


Introduction
A growing interest in continuing professional development (CPD) programs in recent years has been evident in the educational context such as the following but not limited to vocational teaching (Andersson and Köpsén, 2015), language teaching (e.g., Novozhenina and Lopez-Pinzon, 2018), science teaching (e.g., Kartal et al., 2018), and mathematics teaching (e.g., Jacob, Hill, and Corey, 2017). The primary goal of embedding these CPD programs is to support teachers' quest for lifelong learning and eventually improve educational outcomes. However, most professional learning opportunities are mainly conceptualized and implemented via top-down approaches, such as workshops, short-duration courses, and webinars, which may be demotivating, discouraging collaboration and dialogue, and incongruent to teachers' needs and interests (Wyatt and Ager, 2016;Manfra, 2019). In effect, developing teachers into lifelong learners remains a distant goal because there is no synergy between the goal and the fashion of implementing the CPD programs. To become a lifelong learner, one should constantly and actively engage in professional learning experiences throughout his career for personal and professional reasons. In this regard, bottom-up teacher-led professional development processes characterized by active, reflective, and transformative professionalism have been proposed by many researchers in lieu of top-down approaches (Dehghan, 2020). One of the bottom-up approaches referred to is action research (AR). It is a collaborative inquiry aimed at improving practices and attaining desirable educational outcomes.
The improvement of teachers' practices through AR is evident in the following facets of education but not limited to classroom management (Sadruddin, 2012), technology integration (Kuo, 2015), pedagogy and instruction (Pennington, 2015;James and Augustin, 2017;Cortes, 2020), and assessment (Pang, 2020). Meanwhile, long-term outcomes for teachers' engagement in AR include learning how to conduct AR, developing reflective practice, a student-centred teaching approach, collaboration between peers, changing attitudes, and lasting effect on teaching (Kember, 2002). Recognizing these outcomes or significance, several efforts have been made even before and until recently to promote the culture of AR in pre-service and inservice teaching. These are increasingly visible aspects of educational reforms, such as its inclusion in teacher preparation programs as a course (Stevens and Kitchen, 2004;Kizilaslan and Leutwyler, 2012;Cortes, Pineda, and Geverola, 2021). When pre-service teachers conduct AR during practicum, it is argued that it allows them to link theory with their developing classroom practices (Kennedy-Clark et al., 2018). In addition, AR methodology is included as a core subject in the graduate teacher education curriculum (Hine, 2013) and offered as a professional development program for the professional upgrading of in-service teachers (Cullen, Akerson, and Hanson, 2010).
However, one issue concerning evaluating the effectiveness of a professional development program on designing AR projects rests on the limited instrument that assesses teachers' competence in AR methodology. Although there are existing professional development evaluation models and frameworks for evaluating learning, these tend to be generic, conceptual, or processual in focus, such as Kirkpatrick's (1959) Four-Level Training Evaluation Model, Guskey's (2002) Theory of Teacher Change, Clarke-Hollingsworth (2002) Model, and Triangulated Model of Assessment for learning (Tan, 2013). In other words, these analytical frameworks may guide the evaluation of a program or a course but do not precisely provide the specific set of skills to evaluate, of which conducting AR requires a variety of skills ranging from selecting the topic to disseminating research results. Hence, developing and validating a scale intended to evaluate teachers' perceived competential needs and development in AR methodology will help a researcher or organization appropriately design, implement, and evaluate a professional development program on AR.
One self-report scale which operationally defines the specific skills in conducting AR is the Teacher's Competence in Action Research (TCAR) scale (Cortes, Pineda, and Geverola, 2020). The skills described in the scale include the teacher's competence to select an AR topic, plan an AR project, integrate ethics, integrate technology in writing literature, analyze and present AR data, integrate technology in analyzing data, and reflect on and communicate results. When assessing teachers' needs and learning, professionals can use this scale to develop and evaluate professional learning opportunities for in-service teachers on AR methodology. In addition, university lecturers may adopt the scale to measure the extent of self-perceived competence of pre-service teachers before and after taking a course on AR. Other disciplines may even adopt the scale because AR is not only confined to being used in the educational context. In fact, this research method was applied originally in solving social conflicts. Therefore, other disciplines are also actively engaged in conducting AR to resolve social and institutional problems and grow professionally.
However, there are two apparent limitations of the scale. First, the sample size was adequate, as evidenced from sampling size adequacy tests such as Kaiser-Meyer-Olkin (KMO) and Bartlett's Test of Sphericity, yet samples were not well represented. The samples were all in-service teachers within one province in Central Visayas, Philippines, and there was no report on teachers' profiles (e.g., specialization and tenure in service), although there were inclusion criteria before a teacher could participate in the survey. The study also excludes pre-service teachers who have formal training to design and implement AR as one of their course requirements on AR, as evidenced from teacher preparation curricula of several countries such as the Philippines (Cortes, 2019) and Australia (Vialle, Hall, and Booth, 1997). Hence, they were supposedly qualified participants during scale development and validation. Their participation could have helped verify the usability of the scale at their level. Second, discriminant and convergent validation were not established because confirmatory factor analysis was not performed, although it is considered mandatory (Taherdoost, 2016). Thus, the present study aims to address these limitations identified by the researcher to establish the validity and reliability of the scale further. In particular, several steps were done to address these limitations, namely: (a) increasing the sample size, (b) involving both pre-service and in-service teachers across the country, and (c) performing confirmatory factor analysis.

Participants
Before the recruitment of teachers, transmittal letters indicating the purpose of the study were sent to the superintendents of different school divisions and the deans of the colleges of different teacher education institutions as a mode of obtaining consent. Upon their approval, the questionnaires were distributed to preservice and in-service teachers based on the lists provided by the superintendents and deans. In addition to the questionnaire, the informed consent forms were also distributed. The purpose was to inform these teachers regarding the purpose and background, procedures, the extent of confidentiality, benefits, and their voluntary participation in the study. However, regardless if they were part of the lists provided by their superintendents or deans, inclusion criteria were still strictly followed before a teacher participated in the survey. For pre-service teachers, although they are all required to undergo formal training in AR as part of the teacher preparation program, they must have finished the course and completed an AR project. Meanwhile, the in-service teachers were selected based on two criteria, attendance to previous trainings on AR and completion of an AR project.
If a teacher met the inclusion criteria mentioned above and he or she approved the terms stipulated within the consent, he or she was eventually tasked to attend a retooling program through a four-session webinar on designing AR projects as part of data collection. The sessions include lectures on writing the preliminary part of an AR proposal, ethical issues in AR, quantitative and mixed-methods research design and data analysis, and qualitative research designs and data analysis. Upon completion of attending the webinar, all teachers were encouraged to answer the TCAR questionnaire, which was administered online via Google Form between April to May 2021. There were 450 valid responses returned which distribution of teacher-respondents when grouped according to specialization, sex, age, and state of residence, are shown in Figure 1. The previous study conducted by Cortes, Pineda, and Geverola (2020) administered this questionnaire to 166 teachers only, meaning the present sample size is almost three times higher. The distribution of teachers when grouped according to specialization, sex, age, and state of residence is shown in Figure 1.

TCAR Scale
TCAR, as shown in Table 1, is a 54-item scale developed to assess competential needs and competential development of teachers before and after a professional development program on designing AR projects. The 54 items are distributed unevenly in seven factors, namely: analyzing and presenting AR data (13 items), reflecting on and communicating results (13 items), planning an AR project (11 items), integrating ethics (8 items), selecting a topic for professional growth (4 items), integrating technology in writing literature (2 items), and integrating technology in analyzing data (3 items). The teachers' responses are collected using a 5-point Likert scale, of which five is the highest and represents expert competence while one is the lowest and represents limited competence.  In the present study, the questionnaire was created using Google Form, and the corresponding link to it was distributed on different platforms such as Facebook Messenger, Emails, and text messaging. Upon receiving and answering the survey form, it was reiterated to the teachers that their participation was voluntary, and they could withdraw during the process or even their responses after. The teachers were required to indicate the start and finish time when completing the questionnaire, of which the average time as calculated was 26 minutes. All the responses collected were transferred to IBP SPSS 23 in preparation for the subsequent data analysis.

Data Analysis
The data analysis began with determining the sampling size adequacy using the Kaiser-Meyer-Olkin (KMO) Test and Bartlett's Test of Sphericity (BTS) as a prerequisite to perform confirmatory factor analysis (CFA). Kline (2011) and Joseph et al. (2012) explained that the purpose of CFA is to test the existing theory or model or verify the factor structure of a set of observed variables which is the six-factor structure questionnaire previously proposed by Cortes et al., (2020) in this case. The adequacy of the sample size is met if the KMO value is greater than 0.6 or close to 1.0 and the significance value of BTS is less than 0.05 (Tabachnick and Fidell, 2007;Hair et al., 2010). Subsequently, the estimation results were determined using the t-value or critical ratio and standardized factor loading of each item. The acceptable t-value is greater than or equal to 1.96 or practically 2.00, while the SFL value is greater than or equal to 0.45 Kline, 2016).
The overall model-data fit, which provides confirmatory evidence for the factor structure generated during exploratory factor analysis (EFA), used several goodness-of-fit indices (GFIs). In literature, these fit indices are influenced by different sample sizes, data types, and acceptable scores ranges (Hu and Bentler, 1999;MacCallum, Browne and Sugawara, 1996). These are comparative fit index (CFI), Tucker-Lewis index (TLI) or non-normed fit index (NNFI), root mean square error of approximation (RMSEA), and Chisquare/df ratio. It is suggested that an RMSEA smaller than .08 (Kenny, Kaniskan, and McCoach, 2014), SRMR less than or equal to 0.08 (Hu et al., 1999), a TLI larger than 0.85 (Sharma et al., 2005), a CFI larger than 0.80 (Browne and Cudeck, 1992;Garson, 2006), and a Chi-square/df ratio less than 2.0 (Kline, 1998) indicate relatively good model-data fit. However, the latter may not be recommended for large sample size when evaluating model fit ( okluk, Şekercioğlu, and Büyüköztürk., 2014).
Finally, the validation of the TCAR scale was established by presenting evidence of convergent validity and discriminant validity. The evidences of convergent validity were determined through the following parameters, namely: standardized factor loading (SFL), composite reliability (CR), and average variance extracted (AVE). A good SFL, CR, and AVE ≥ 0.7 is considered good (Hair et al., 2010). With respect to the discriminant validity of the scale, the AVE estimate should be higher than the squared correlation between the two constructs (Hair Jr. et al., 2014). Reliability analysis was calculated using Cronbach's alpha, of which items having values above 0.70 are considered very reliable (Hair et al., 2010).

Sampling Size Adequacy
The adequacy of the sample size was tested using the Kaiser-Meyer-Olkin Test and Bartlett's Test of Sphericity before performing the CFA. The KMO value resulted in 0.984, which is much higher than the predetermined value and very close to 1.0. Meanwhile, BTS values resulted to 28479.001 with a recorded pvalue = 0.000. This p-value is much smaller than the standard. Therefore, both sampling size adequacy tests, as revealed by their resulting value/s, suggest that the required sample size for CFA of TCAR was met.
Estimation Results An analysis of the factor load estimation results was done on each observed variable using a critical ratio or t-value and SFL before the overall model-data fit was tested. Table 2 shows the results of factor load estimation of the model of which all t-values are greater than 1.96 2.00 while all SFL values are greater than 0.45. The minimum t-value is 16.562 (F2_38) and the maximum is 35.028 (F7_57). Meanwhile, the minimum SFL value is 0.687 (F2_47) and the maximum is 0.934 (F7_57). In summary, no offending estimate has been recorded in the factor load estimation results. Thus, the overall fit analysis of the TCAR model can be done.

Overall Model Fit
Five goodness-of-fit indices (GFIs) were used to examine the overall fit of the proposed model by Cortes et al. (2019). Table 3 shows this model fit indices of which satisfactory results were mostly obtained (CFI = 0.890; TLI = 0.884; RMSEA = 0.072; SRMR = 0.039) except for Chi-square/df ratio which resulted to 3.302. Nonetheless, this may be considered tolerable because this GFI is not usually recommended when evaluating model fit because it is sensitive to sample size. The value can be high even if the model is a good one (Karakaya-Ozyer and Aksu-Dunya, 2018). Based on the overall model fit indices, clear evidence that TCAR is statistically accurate among pre-and in-service teachers in measuring AR competence. Hence, the previously proposed seven-factor model proposed after EFA is further reinforced.

Convergent Validity
The overall model data fit had been established through the GFIs, thus leading to evaluating the convergent validity of constructs. Table 4 shows the results of the convergent validity of the scale. It can be noted that the factor loading of all items is above 0.7, indicating a good convergent validity (Gefen, Straub, and Boudreau, 2000). The composite reliability (CR) and average variance explained (AVE) were also examined to further establish the convergent validity of the scale. The CR is greater than 0.6, indicating an inherent consistency of all items in the scale as higher. Meanwhile, the AVE is greater than 0.5, indicating that the items in the scale can better reflect the characteristics of each research variable in the model (Srinivasan, Lilien, and Rangaswamy, 2002). The CR values range from 0.968 to 0.997, while the AVE values range from 0.673 to 0.863. Thus, it may be safe to conclude that TCAR has an acceptable and adequate amount of evidence of convergent validity.

Discriminant Validity
The discriminant validity of the scale can be determined when comparing the squared correlation against the AVE estimate. Table 4 shows the squared correlation of each construct versus the AVE of a particular construct as evidence of the scale's discriminant validity. It can be observed that not all estimated AVE values are higher than the squared correlation of the construct/s where they are compared. For instance, the squared correlations of Factor 2 (Reflecting on and Communicating Results), Factor 3 (Planning an AR Project), Factor 4 (Integrating Ethics in AR), and Factor 5 (Selecting Topic for Professional Growth) are higher against the AVE of Factor 1 (Analyzing and Presenting AR Data). This issue is also evident in Factor 2 and Factor 4 when their AVEs are compared against squared correlations. While this may indicate a weak relationship between the factor in TCAR or evidence of weak discriminant validity, it may not necessarily compromise the scale's psychometric properties or imply that the underlying concepts are all identical. Not all constructs have squared correlation higher than the AVE of which they are compared. It is also important to note that discriminant validity is not exclusively a practical means to validate a model (Bagozzi and Phillips 1982). The conceptual distinctions of each factor as supported by theoretical foundations and arguments should provide the principal reasons for constructs correlating or not (Bollen and Lennox 1991).

Reliability
Reliability is concerned with the ability of a scale to measure consistently and is closely associated with scale validity. A scale cannot be valid unless its reliability is established, but its reliability is not dependent on its validity (Tavakol and Dennick, 2011). In the present study, the scale's reliability was calculated using Cronbach's alpha, of which 0.70 or above was considered a reference value to determine whether the entire scale and subscales is/are reliable or not. Table 4 shows the Cronbach's alpha coefficient of subscales or factors. The largest value recorded is 0.970 with Factor 1 on analyzing and presenting AR data, while the lowest value recorded is 0.854, with Factor 6 on integrating technology in writing the related literature. Nonetheless, the range of these Cronbach's alpha coefficients reveals satisfactory reliability with reference to the proposed value above by Hair et al. (2010). In addition, the overall Cronbach's alpha coefficient of the entire scale is 0.989, indicating that it is very reliable.

Conclusions
The Teacher's Competence in Action Research (TCAR) Scale was subjected to validity and reliability tests in the present study. Four GFIs of the CFA results support the seven-factor scale. These GFIs were CFI = 0.890, TLI = 0.884, RMSEA = 0.072, and SRMR = 0.039. The other analysis, such as convergent validity and reliability, also provides further evidence of the scale's validity. In particular, SFL, CR, and AVE have resulted in values exceeding than proposed values. Also, the Cronbach's alpha of the entire scale and within subscales is/are higher than the predefined criteria. There may be an issue in the scale's discriminant validity, but this can be addressed in future studies. The principal ground for retaining the factors or items in the present study rests on the conceptual distinctions of each factor as supported by theoretical foundation and arguments. When comparing items in analyzing and presenting AR data in Factor 1 against items in Factor 2 to Factor 4, there is apparent evidence that these factors measure different sets of skills or competence.
In conclusion, the validity and reliability of the TCAR questionnaire are further validated through the evidences of convergent validation and reliability in the present study. Previously, the sudy of Cortes et al., (2020) only the construct validity through exploratory factor analysis and reliability through Cronbach's alpha were established. In this regard, the scale may still be used when developing and evaluating the effectiveness of a professional development program on conducting action research projects. The scale may be used both for pre-and in-service teachers and across different disciplines training for action research. These disciplines include but are not limited to science, mathematics, English, and other languages.