EFA Pre-Check Before Data Collection — Cronbach's Alpha, KMO, and Deciding the Number of Factors

Once cognitive interviewing is done and the items are refined, one last gate remains before data collection. This is the step where you run an exploratory factor analysis on small pilot data before the main survey to confirm in advance that the item structure works as expected. If a problem surfaces here, you can fix it before the main survey — a very different cost from discovering it after collecting hundreds of responses.

Exploratory factor analysis (EFA) is a method that examines the pattern of correlations among items to identify a latent factor structure. At this step you exploratively review whether the factor structure you designed in theory also appears in real data. If the EFA results differ greatly from the theoretical model, it is a signal that you need to revise items or reconsider the construct definition.

Two things to check before EFA — KMO and Bartlett's test

Before running EFA, you first have to check whether the data are suitable for factor analysis. Two indices are used.

The KMO (Kaiser-Meyer-Olkin) index expresses factor-analysis suitability as a value between 0 and 1, based on the pattern of partial correlation coefficients among variables. By Kaiser's criterion, KMO ≥ 0.6 means you can proceed; 0.8 or above is excellent. A KMO below 0.6 means the shared variance among items is insufficient, and you should reconsider the item composition.

Bartlett's test of sphericity checks whether the correlation matrix differs significantly from an identity matrix. A significance level below 0.05 indicates there is enough correlation among the variables to proceed with factor analysis. If it fails this test, the items are independent of one another, so extracting common factors is meaningless in the first place.

Index	Threshold	Interpretation
KMO	≥ 0.6	Factor analysis can proceed (Kaiser)
KMO	≥ 0.8	Excellent
Bartlett's test	p < 0.05	Suitable for factor analysis

How to decide the number of factors — parallel analysis is the standard

There are several ways to decide the number of factors, but the most trusted method today is parallel analysis. The Kaiser criterion, which extracts factors with eigenvalues of 1 or more, tends to overestimate the number of factors, so it is best not used alone. The scree plot relies on visual judgment, so conclusions can differ from researcher to researcher.

Parallel analysis compares the eigenvalues of the real data with eigenvalues generated from random data. Only factors whose real-data eigenvalue exceeds the random-data eigenvalue are treated as meaningful. Because this method compares eigenvalues while controlling for sample size and the number of variables, it is more accurate than the Kaiser criterion.

Deciding the number of factors is not a decision made on statistical criteria alone. You should judge by combining the parallel-analysis result with theoretical rationale, the scree plot, and interpretability. Even if the statistics suggest three factors, if two are theoretically correct, the right approach is to state that rationale in the methods section.

Choosing a rotation method — Promax and Varimax

After extracting factors, you apply rotation to make interpretation easier. Rotation methods are divided by whether they assume the factors can be correlated with one another.

Promax (oblique rotation) allows correlation among factors. Psychosocial constructs are generally related to one another in reality. Assuming that job satisfaction and organizational commitment are completely independent is unrealistic. In such cases Promax is more appropriate.

Varimax (orthogonal rotation) assumes that factors are independent of one another. Use Varimax when there should theoretically be no correlation among factors.

In psychometric scale development it is common to choose Promax by default. After rotation, factor loadings should meet the ≥ 0.5 criterion per Hair et al.

Cronbach's alpha — the threshold and interpretation of internal consistency

Once the factor structure is confirmed, check internal consistency for each factor. Cronbach's alpha is an index of how consistently items measuring the same construct are answered. By Nunnally's (1978) criterion, α ≥ 0.7 is the acceptable threshold.

An alpha below 0.7 signals that the items of that factor are not sufficiently consistent. In that case, check the CITC (Corrected Item-Total Correlation) to find the problem item. Items that fall short of CITC ≥ 0.3 are candidates for removal. However, when removing an item, you should judge by both the statistical criterion and the necessity dictated by the construct definition. Narrowing the meaning of the construct just to raise alpha should be avoided.

An excessively high alpha (α > 0.95) also calls for caution. It can be a sign of item redundancy, where items repeat very similar content.

Index	Threshold	Source
Cronbach's alpha	≥ 0.7	Nunnally (1978)
Factor loading	≥ 0.5	Hair et al.
CITC	≥ 0.3

The order of EFA and CFA

As a rule, EFA and CFA should be run on independent samples. If you explore the structure through EFA on one dataset and then run CFA on the same dataset, overfitting occurs and the validation becomes meaningless. The recommended procedure is to run EFA on pilot data and run CFA on an independent sample of the main-survey data.

If the sample size is sufficient, you can also randomly split the main-survey data into two groups and run EFA on one and CFA on the other.

modidoc's pre-simulation stage

modidoc's pre-simulation stage automatically computes the KMO, Bartlett's test, parallel analysis, EFA results, and per-factor Cronbach's alpha when you enter pilot data. It flags items that fall short of the CITC criterion and proposes revision directions, and it compares and summarizes how well the EFA result matches the theoretical model. This process is implemented internally as the C4 pre-simulation engine.

You can get started for free at modidoc.com.

Frequently asked questions

What is the order of EFA and CFA?

Run EFA first to explore the factor structure, then validate with CFA on an independent sample. Running EFA and CFA together on the same data causes overfitting, making the validation results hard to trust. Run EFA on pilot data and CFA on main-survey data, or randomly split the main-survey data into two groups.

What is the Cronbach's alpha threshold?

By Nunnally's (1978) criterion, α ≥ 0.7 is the acceptable threshold. Below 0.7 signals that the items of that factor are inconsistent; check items falling short of CITC ≥ 0.3 and consider removing them. Conversely, α > 0.95 should raise suspicion of item redundancy.

What should I do if the KMO value is low?

If the KMO is below 0.6, reconsider the item composition before proceeding with factor analysis. If a particular item shares no variance with the others, that item may be measuring an independent concept, or the item itself may be flawed. The usual procedure is to remove the problem item and then check the KMO again.

How do you decide the number of factors?

Parallel analysis is the most trusted method today. It treats only those factors whose real-data eigenvalue exceeds the eigenvalue generated from random data as meaningful. The Kaiser criterion (eigenvalue ≥ 1) tends to overestimate the number of factors, so avoid using it alone and decide by combining the parallel-analysis result with theoretical rationale and interpretability.

Next step

Once the factor structure is confirmed through EFA and the internal-consistency criterion is met, it is time to rigorously validate the structure with the main-survey data. The next article covers how to run a confirmatory factor analysis (CFA) and interpret the CFI, RMSEA, and SRMR fit indices, and how to validate convergent and discriminant validity with AVE and HTMT.

Previous: What Is Cognitive Interviewing? How to Solve Survey Items That Respondents Read Differently

Next: CFA Complete Guide — CFI, RMSEA, AVE, HTMT Thresholds and Interpretation (in preparation)