Beating two sigma: Combining human and computer instruction

Audience Level: 
All
Institutional Level: 
Higher Ed
Strands (Select 1 top-level strand. Then select as many tags within your strand as apply.): 
Abstract: 

Human tutors and intelligent tutoring systems can be more effective working together than either alone, helping solve Bloom’s “two sigma” problem. We studied over 16,000 students who used an ITS along with online human tutoring and can predict when students use human tutoring and when they most benefit.

Extended Abstract: 

A long-time goal for Intelligent Tutoring Systems (ITSs) is to solve Bloom’s “two sigma” problem (Bloom, 1984) and create instructional systems that are as effective as a personal human tutor.  The advent of widely-available educational resources through the internet prompt us to reconsider this goal. Students are not facing a choice between receiving all of their instruction through an ITS or through a human tutor; instead, they are choosing, on a minute-by-minute basis, which instructional resource to use. Their choices include seeking help from the ITS or contacting a human tutor. Instead of asking whether a resource like an ITS can achieve the same results as a personal human tutor, we should be asking how we can build an instructional environment, including ITSs and human tutors, that can exceed the results of any of these resources acting alone.

Accomplishing this  requires thinking about how students allocate their time between different instructional resources, how they choose between using one type of resource or another and what the particular advantages are of one resource over another.

To answer these questions, we have been using an extraordinary dataset, providing insight into student use of both an ITS and human tutors. Our data were collected from two online five-week developmental mathematics courses delivered to 16,905 students between June and December 2014. The courses required students to complete weekly assignments within Carnegie Learning’s Mika ITS. Students were also provided unlimited, free access to Tutor.com (TDC), an online, chat-based human tutoring service.

Within Mika, learners work through complex, multi-step, real-world problems (Figure 1). Students progress through their assignments by demonstrating mastery of each topic assigned to them. This approach has been demonstrated effective as part of a K-12 Algebra course in one of the largest “gold-standard,” experimental trials ever conducted on a mathematics curriculum (Pane, et al. 2014).

 

Figure 1: Image from Mika. The student’s task is to complete the worksheet and graph corresponding to the word problem.

 

Tutor.com is the largest provider of online, one-to-one and on demand tutoring for students. The company began offering online academic tutoring services 16 years ago and now works with a variety of clients from public schools, universities, colleges, libraries and corporations to the U.S. military. Its tutors, have completed more than 15 million online tutoring sessions to date.

We linked data from students using Mika to chat logs from TDC to understand which characteristics of students led them to seek human tutoring and which tutoring actions led to most improved outcomes. In all, 3,320 learners (19.6% of 16,905 learners) used TDC at least once, for a total of 19,248 TDC sessions lasting an average of about 25 minutes each. In order to study students who did not use TDC, we randomly selected Mika clickstream data for 1,874 such students, leading to a set of over 88 million student actions (individual entries or help requests).

Since both cognitive and non-cognitive factors may contribute to students’ decisions to use human tutoring, we used a set of data-based “detectors” for metacognitive actions such as gaming the system (Baker and de Carvalho 2008), off-task behavior (Baker 2007), carelessness (San Pedro, et al. 2011), as well as detectors for affective states including boredom, confusion, frustration and engaged concentration (Baker, et al. 2012). Such detectors rely on patterns of errors, hint use, pauses, and other features of learner interaction to infer student states. For example, “gaming the system” (attempting to provide correct answers without deeply engaging with the mathematics) may be indicated by rapidly asking for hints or entering distractor numbers that appear in the statement of a math problem, among other patterns.

Results

The human tutoring offered by TDC is underused by many students and overused by a small subset. Figure 2 shows the distribution of TDC sessions. Over 80% of students never used TDC, but the top 10% of TDC users (2.1% of the population) used almost 55% of total TDC time. The lack of TDC usage by many users may be particularly damaging, as use of TDC, even for a single session, is associated with better course completion. Students who used TDC at least once were more likely to complete the Mika course material than students who did not use TDC (50% vs. 36%). This result obtains despite the fact that TDC users appear less well-prepared for the course. In the first week, they spend more time working through the material, and they make more errors and ask for more hints in Mika throughout the course (see Figure 3).

Figure 2:Number of TDC sessions by student

Our models provide a coherent picture of who uses TDC and when they are likely to use it. Students who are poorly prepared for the course are more likely to use TDC. TDC users are also less likely to be classified as bored, more likely to be classified as off-task and less likely to make errors after mastering a skill (i.e., they are less likely to be careless). Together, this profile suggests that TDC users, though poorly prepared for the course, want to succeed and are conscientious.

Figure 3: Students who go on to use TDC spend more time working on each topic (left), make more errors (center) but complete more of the required work in the course (right).

 

Even within TDC users, the use of TDC is infrequent, and so it is particularly important to understand the situational factors that lead students to use TDC. Our models suggest that gaming the system is a predictor of when such users will go to TDC. Although “gaming the system” is typically described as trying to get the correct answer without deeply engaging with the mathematics, a closer look at the detector indicates that this result may indicate that the student has adopted a shallow learning strategy: entering answers such as distractor numbers in a word problem. One explanation of the reason that such students go to TDC is that, although they are conscientious, they turn to shallow learning strategies to try and complete problems and, when those fail, turn to TDC tutors in order to help them understand the deep structure of the mathematics. TDC tutors might particularly benefit from knowing the shallow answers that students had entered, and students might benefit from instruction in metacognitive strategies that are more productive than the shallow strategies that trigger this detector.

 

Figure 4: For TDC users, rates of gaming the system (left) and frustration (right) increase in a Mika session prior to using TDC, suggesting that these increases may be factors leading to TDC use.

 

Once a student goes to TDC, what predicts a particularly successful TDC session? Our modeling shows that the key factor in this prediction is frustration (Figure 4). We might imagine that frustration results from a combination of lack of success (through the shallow strategies that the student is employing) and a real desire to learn, perhaps driven by a recognition that the student realizes that the particular topic they are struggling with is important. Though speculative, this finding about frustration supports the general picture that such students are conscientious and driven to succeed in the course.

Conclusion

We have identified consistent and reliable models of why students choose to use human tutoring and what conditions lead them to benefit most from it. Our observed misallocation of the valuable instructional resource represented by human tutoring calls out for tighter integration between the ITS and the tutors. The models we have developed here provide the basis for developing a recommendation system that will help students understand when tutors would be most valuable.

In addition, a teacher recommendation system could convey information about the conditions that lead students to seek human help to the teacher (or tutor) and make recommendations about the kind of assistance that would best benefit the student.

Although our work has focused on a fully online course with human assistance available online, we believe that our results will generalize to blended learning courses, in which students are using technology in the (physical) presence of a teacher. In such an environment, the same misallocation of the teacher’s time likely takes place. Additional work can help us understand the extent to which our findings apply to in-person instruction and to different populations of students.

Session Type: 
Education Session