Using Student Characteristics to Predict Online Versus Face-to-Face Attrition

Audience Level: 
All
Institutional Level: 
Higher Ed
Strands (Select 1 top-level strand. Then select as many tags within your strand as apply.): 
Abstract: 

This study analyzes data from a large U.S. university system in the northeast (n=255,187).  Results suggest that there is not a strongly significant difference online versus face-to-face in the proportion of students who successfully complete a course with a C- or better. However, these results are sensitive to hidden bias. 

Extended Abstract: 

Research questions

  1. Are rates of successful course completion different for fully online versus face-to-face courses?
  2. To what extent does controlling for specific student or course characteristics, or matching on these characteristics, alter the results?
  3. How sensitive are these results to hidden bias or unmeasured variables?

 

Prior research on online outcomes

Numerous studies have found no significant difference in learning outcomes between face-to-face and online learning. see e.g.2  Nonetheless, online course dropout in the U.S. ranges from 20-40% and online course attrition rates are reported as 7-20 percentage points higher than for face-to-face courses. for a review, see 9 There is little research on the effects of online learning on college persistence and completion, and what results are available are mixed.4

 

 

The impact of student characteristics on online enrollment and success has been explored, but with conflicting results. for areview, see 9 The mixed findings are likely due to the main methodology issue with all of these studies, to wit, being based on a single course instead of on large-scale aggregate samples.

 

Methodology

Data source and sample: This research uses a sample of students who enrolled in online or comparable face-to-face courses at one of the colleges at the City University of New York (CUNY) from 2004 to 2017.  The results reported focus on the fall 2014 sample, with a total sample size of 255,187 students.  Matched samples, obtained using propensity score matching, included 25,198 students.

 

Measures: The analysis in this study uses successful course completion as the primary measure of student outcomes, and as the primary dependent variable.  Successful course completion is defined as whether the student successfully completed a course with a grade of C- or higher. This measure was selected because it may be more comparable a measure across different mediums (for example, some studies have suggested that online students may drop out at higher rates but face-to-face students may fail at higher rates e.g. 4); also, it is the typical standard to receive major or transfer credit, so it is serves as a measure of degree progression. 

 

The main independent variable, course medium, was dichotomized in these analyses to face-to-face or fully online, based on Sloan Consortium definitions. Prior research suggests that students who take hybrid courses are substantially similar to students who take face-to-face courses and that their outcomes are similar,10 therefore, for the purposes of this analysis, hybrid courses (11.9% of the sample) were grouped with face-to-face courses. 

 

Covariates included: gender; race/ethnicity; age; developmental course placement; native speaker status; college level (two-year, four-year, or graduate); number of credits earned  and G.P.A. at the start of the semester; the student’s major; and number of credits/classes taken that semester.  Different non-linear versions of variables were explored, and because the relationship between both age and the number of credits earned appeared to be non-linearly related to course outcomes, these variables were squared for use in the analysis.  GPA was converted to a categorical variable based on the letter grade that corresponded with the GPA value, with D/F grades grouped together, and with an extra category for students with no GPA. 

 

Analytical Approaches: Propensity scores were generated by fitting (using melogit in Stata) a multi-level model (with course taken as the second level) including all of the dependent and independent variables that were to be used in subsequent analysis as covariates, to predict the probability that each student in the sample enrolled in a fully online course.   A matched dataset was generated from these propensity scores (using the psmatch2 package) using  single nearest-neighbor matching with replacement because this approach yielded good balance on the covariates, based on the standardized bias for each variable.  The median standardized bias across variables was 1.5%.  Based on Rubin’s7 rule of thumb, standardized bias should be below 25% after matching, so the matched dataset achieved good balance on all covariates.  Distribution of propensity scores was evaluated before and after matching, and the dataset showed good balance and significant overlap in the region of common support.  Histograms of each variable in the matched dataset were also compared to ensure that the distributions of covariates across both online and face-to-face students were unbiased. 

 

Multilevel mixed-effects logistic regression models (using melogit in Stata) were run on unmatched and matched datasets with course as the second-level factor to control for unobserved heterogeneity between courses.  Sensitivity analysis was conducted using Rosenbaum's method6 and Mantel-Haenszel bounds.5   

 

 

Results

Probability of fully online enrollment, based on student characteristics

The dataset was modeled using a multi-level logistic regression model with the specific course taken as the second-level factor, whether the course was fully online or not as the dependent variable, and all other independent variables as the covariates. From this model, students with the following characteristics were significantly more likely to enroll in fully online sections of a course: women; white students; older students; students with A GPA at the beginning of the semester and first semester students without a GPA (in comparison with D/F students); students who have accumulated more credits by the beginning of the semester; students who were not placed into developmental mathematics courses; students whose first language is English; and four-year college students.

 

Online versus face-to-face outcomes compared for comparable courses

Comparing rates of successful course completion between students enrolled in fully online versus face-to-face courses by computing a simple logistic regression with no co-variates revealed that students in online courses were slightly but significantly more likely to successfully complete the course. However, after controlling for the specific course taken by running the same model as a multi-level logistic regression with the course as the second level, this relationship is reversed, with students enrolled in online courses significantly less likely to successfully complete the course. Re-running the multi-level analysis on the matched dataset, this relationship becomes much weaker, but is still significant.  Adding co-variates to the multi-level model on the matched dataset reduces this relationship a bit further, and the difference in successful course completion between fully online and face-to-face courses becomes slightly less significant (p=0.02).  Overall, these effect sizes are trivial, based on Cohen’s standards as applied to the odds ratios.3

 

The average treatment effect on the treated was borderline significant, with the rate of successful course completion 1.4 percentage points lower for fully online courses than for face-to-face courses.  Using Rosenbaum’s method6 to perform sensitivity analysis showed that these results would be sensitive (for α=0.05) to an upward hidden bias of Γ = 1.02 and a downward hidden bias of Γ = 1.11; this means that the actual average treatment effect on the treated (ATT) (after accounting for hidden bias) could be negative and significant if unobserved factors significantly decrease the likelihood of successful course completion while simultaneously increasing the likelihood of fully online course enrollment by 2%; or the the actual ATT could be positive and significant (indicating that fully online courses have significantly higher successful completion rates) if unobserved factors significantly increase the likelihood of successful course completion while simultaneously increasing the likelihood of fully online course enrollment by 11%.  These results suggest that this model is sensitive to effects of hidden bias, and therefore other factors should be explored that might impact both online course enrollment and course outcomes. 

 

 

 

 

Discussion

 

Limitations: This study shows what patterns can be observed using only those data points readily available to college institutional research offices, and models based on these factors have the potential to be of the most use to colleges since they are based on easily obtainable data.  However, before such models can actually be used in practice, further research that explores the relationship between these readily available factors with other more difficult-to-observe factors is essential.  For example, online students are significantly more likely to be student parents or to work full-time, and neither of these variables is included in this study because this variable requires the collection of additional survey and interview data.  Thus, this study is just the first step of a larger systematic exploration of student characteristics as predictors of online course outcomes. 

 

 

While the CUNY system is highly diverse and somewhat generalizable to a wider student population, it is not necessarily nationally representative.  CUNY does not have rural campuses.  Caution should be exercised before extending any results taken from the CUNY dataset to rural college students.  Furthermore, CUNY is significantly more diverse and has a higher proportion of low-income, foreign-born, and first-generation students than the average college in the U.S.  This means that these results may not be generalizable to other less-diverse college populations. 

 

 

Conclusion

This initial study suggests that when comparing similar students enrolled in the same course in a fully online versus hybrid/face-to-face format, rates of successful course completion are almost identical.  Online completion rates are slightly lower—this difference is mildly significant and the effect size is trivial.  However, these results are sensitive to potential sources of unmeasured bias, so future research must explore a wider range of student characteristics before positing any definitive conclusions about the relationship between the online medium and successful course completion. 

 

References

  1. Atkins, S. (2013). Ambient Insight Whitepaper: The 2012 boom in learning technology investment. USnews.Com

  2. Bernard, R. M., Brauer, A., Abrami, P. C., & Surkes, M. (2004). The development of a questionnaire for predicting online learning achievement. Distance Education, 25(1), 31-47. DOI 10.1080/0158791042000212440

  3. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (second ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.

  4. Jaggars, S. S. (2011). Online learning: Does it help low-income and underprepared students? (CCRC Working Paper 26). CCRC, Columbia University. Retrieved from http://ccrc.tc.columbia.edu/media/k2/attachments/online-learning-help-students.pdf

  5. Mantel, N., & Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. J Natl Cancer Inst., 22(4), 719-748.

  6. Rosenbaum, P. R. (2002). Observational studies. New York: Springer.

  7. Rubin, D. B. (2001). Using Propensity Scores to Help Design Observational Studies: Application to the Tobacco Litigation. Health Services & Outcomes Research Methodology, 2(1), 169-188.

  8. Wladis, C. W., Hachey, A. C., & Conway, K. M. (2012). An analysis of the effect of the online environment on STEM student success. Paper presented at the Proceedings of the 15th Annual Conference on Research in Undergraduate Mathematics Education, 2.

  9. Wladis, C. W., Hachey, A. C. & Conway, K. M.(2015). The online STEM classroom – Who succeeds? An exploration of the impact of ethnicity, gender and non-traditional student characteristics in the community college context.Community College Review, 43(2), 142-164.

  10. Xu, D., & Jaggars, S. S. (2011). Online and hybrid course enrollment and performance in Washington state community and technical colleges. CCRC, Columbia University. Retrieved from http://ccrc.tc.columbia.edu/media/k2/attachments/online-hybrid-performance-washington.pdf

 

Session Type: 
Education Session - Research Highlights