PREDICTING STUDENT SUCCESS VIA ONLINE HOMEWORK USAGE

With the amount of data available through an online homework system about students’ study habits, it stands to reason that such systems can be used to identify likely student outcomes. A study was conducted to see how student usage of an online chemistry homework system, Online Web-based Learning (OWL) correlated with student success in a general chemistry course. Online chemistry homework activity was examined for first-year students taking general chemistry at a mid-size, private university. The six different chemistry question sets examined were: bond properties; standard molar enthalpy; electronegativity; Lewis dot structures; calorimetry; and stoichiometry. Students’ OWL activity was then correlated with their exam grades and their final course grades. Results showed that higher average time spent per question correlated positively with student success as measured by final grades. However, multiple attempts per question correlated negatively with student success. A multiple linear regression model and other guidelines are presented for instructors’ use to identify chemistry topics where students may need additional instruction to improve their understanding.


Introduction
It is not practical or effective for students to learn chemistry exclusively in the classroom.Studying outside of class is essential to success.So how does one study "effectively?"It has been suggested that altering one's location or changing subjects to keep the mind fresh helps to improve information retention (Carey, 2010;Kornell, Castel, Eich, & Bjork, 2010).
Homework is, for most academic staff, the default method for ensuring that students study outside the classroom.It is not, however, necessarily the most effective method for encouraging students to learn.A study has shown that testing (i.e.exams) is just as important to the learning process as is homework in that tests help students with recall of information at a later time (Roediger & Karpicke, 2006).Furthermore, a review of 45 journal articles regarding research on online homework (some compared directly with paper homework) found that claims of homework effectiveness were frequently specious as the research designs were usually suspect and lacking in validated measurements of effectiveness (Bonham, Deardorff, & Beichner, 2003).Bonham et al. concluded that online homework is no less effective than paper-and-pencil homework and, in some cases, may be more effective.However, in cases where sound pedagogy is not used in designing the questions, online homework may encourage a "plug-and-chug" method where students keep trying answers instead of trying to understand their mistakes at the conceptual level, a method that does not lead to student learning (Kortemeyer, 2006).Spending considerable time on homework assignments has not been shown to correlate with success.Rather, effective study habits, such as concentrating on the content, were necessary for a positive relationship between study time and achievement (Nonis & Hudson, 2010).
Various studies, in multiple different disciplines, have found online homework to be effective (Bonham, Beichner, & Deardorff, 2001;Mestre, Hart, Rath, & Dufresne, 2002).Teachers who developed their own homework system through learning management systems (LMS) found that feedback on homework increased the retention of information (Cole & Todd, 2003;Zerr, 2007).Replacing traditional lectures with online homework and online discussions has also been effective (Riffell & Sibley, 2005).
Recent studies of online homework have focused more on correlating student effort to students' final course grades.In a longitudinal study with a number of mathematics classes, students who earned a grade of greater than 70% on the online homework assignments (equivalent to a C) did significantly better in the course than the students who failed to get at least 70%.The authors supposed this might be due to extra effort but did not actually monitor time on task (Kuhn, Watson, & Walters, 2013).A similar effect was noted in an introductory chemistry course; the more students completed online homework assignments, the better they did as measured both by final course grade and performance on the standardised ACS (American Chemical Society) final exam (Revell, 2014).As with the Kuhn et al. (2013) study, the Revell (2014) study did not report on time spent on task.
Online homework offers unique abilities to investigate student behaviour.In addition to monitoring when students get questions right or wrong, online systems automatically gather information about number of attempts and time spent on the homework, whether or not an instructor chooses to look at those data.Ngai, Poon, and Chan (2007) found time online to be a factor in student success, although interestingly, they chose to measure time online through student reporting rather than as recorded automatically by the online homework system.
Given the possibility to automatically identify students at risk of failing, we proposed the following research question: Can online homework activity predict student success?Specifically, we were interested in whether or not those automatically monitored variables (i.e., number of attempts, total time on a question, and average time per attempt) could be used to (a) determine topics that a majority of students are currently struggling to understand and (b) identify students who need more attention on a given topic.

Student Cohort
The students in this study were enrolled in a mid-sized, private university in Pennsylvania.Participants were enrolled in the first-term general chemistry course in Fall 2009 and 870 students (87%) gave their consent to participate in the study.Student participation in the study was voluntary and it was made clear to the students that their course grade would be unaffected by participation in the study.The students were not made aware of their results while the study was in progress.

Online Homework Setup
The OWL (Online Web-based Learning) homework assignments were set up with the intention of being a self-teaching tool.As such, there were no restrictions on the number of times students could attempt the OWL questions assigned.In addition, OWL provided feedback when students gave incorrect answers.The feedback was created by the authors of OWL, namely Cengage Learning, and was not controlled or specifically studied by the authors.There were also OWL topics available that were not graded although very few students took advantage of these optional opportunities.OWL question types reflect the range of homework questions seen in a typical general chemistry textbook, namely, some conceptual questions with multiple choice answers and a variety of calculations of varying difficulty and varying numbers of problem steps.The OWL grade, which was ten per cent of the final course grade, was assigned based on students' successful completion of the assigned questions regardless of the number of attempts it took to complete.Despite the fact that students had unlimited attempts to answer the questions, not all students achieved a perfect score (100) on OWL; in fact, only 51% of the students achieved a perfect OWL grade.Twenty-eight per cent gained a grade below 100 and greater than 85 while the remaining 21% received an OWL grade equal to or less than 85.

Topic Selection and Student Classification
The setup of the OWL online homework system provided a way to study student effort on individual chemistry questions.In order to study questions with varying levels of difficulty, six chemistry topics were chosen based on overall student performance for that topic on exam: 1. two topics in which students performed well on both the in-term and final exam (electronegativity and Lewis dot structures); 2. two topics in which students showed improvement from the in-term to the final exam (calorimetry and stoichiometry); and, 3. two topics in which students performed poorly on both the in-term and final exam (standard molar enthalpy and bond properties).
The selection of the individual chemistry topics was done without any prior knowledge of the student behaviour on the related homework questions.It is also important to note that the selection of the topics was based on overall class averages for each question and not on individual student performances.Topics in which students performed well on the in-term exams and poorly on the final exam were not given a category because this behaviour was only observed for individual students, not for the class.
For each chemistry topic, students were sorted into one of four categories: (i) wrong on both exams (WW); (ii) right on the in-term and wrong on the final exam (RW); (iii) wrong on the in-term and right on the final exam (WR); and (iv) right on both exams (RR).The four categories were then given pseudocodes of 1, 2, 3, and 4 respectively.Through analysis discussed later, it was determined that these pseudocodes also represented student success on the topic, with a higher number indicating better performance.

Statistical Analyses
All statistical analyses were conducted using SPSS 20 (IBM, 2011).Comparison of means was performed using Welch's F, a non-parametric form of ANOVA.
Correlations and partial correlations were calculated using Pearson's r.Bootstrapping, a resampling method for determining confidence intervals and reducing sample bias and reliance on parametric assumptions, was performed on each correlation to alleviate concerns about non-normal data (Preacher & Hayes, 2008;Singh & Xie, 2010).For each correlation, 95% bias-corrected and accelerated confidence intervals are reported (in brackets).Multiple linear regression was also performed with bootstrapping.All bootstrapping procedures used 1000 samples.

Total Time
In order to determine whether online homework usage data could be used to predict student performance, the three types of topics were first compared to see if students behaved differently.That is, were there significant differences in online homework usage between questions related to topics on which students performed well during the term, questions related to topics on which students improved over the term, and questions related to topics on which they performed poorly during the term?
A comparison of means showed that student usage of OWL varied greatly between the three groups of questions.The total time students spent on each of the categories of questions was significantly different (Welch's F(2, 2881.2) = 88.645,p < .001);post-hoc analysis confirmed that each of the three categories was significantly different from the others.In all, students spent an average of 8.4 minutes on the two topics in which they performed well, 11.7 minutes on the two topics in which they performed poorly, and 14.7 minutes on the two topics in which they improved their performance over the term.
These differences can be attributed, in part, to the different chemistry topics that were being tested.That is, calorimetry or stoichiometry questions (where students improved performance during the term) were likely take longer to answer than electronegativity or Lewis dot structure questions (where students performed well all term) due to the mathematical requirements.However, each category is a combination of two different types of chemistry questions − each with their own solution time.The observation also reinforces the idea that more time spent studying appears to correlate to improved student performance, but this correlation is mitigated by the fact that the subjects at which students performed best had the lowest total time.

Number of Attempts
Topics in which students performed poorly all term showed a significantly greater number of attempts per OWL question than either topics in which students improved or performed well all term.The difference between the three groups was found to be significant at greater than the 99.99% level (Welch's F(2,3119.1)= 32.720,p < .001)but post-hoc analysis only showed differences between the low performance group and the other two.There was no significant difference between high performance and great improvement.Questions on topics where students performed poorly over the entire term had an average of 7.1 attempts per student whereas the average number of attempts for questions where students did well was ~5.5 attempts per student.
The result would seem counterintuitive if total time studying is the only driving force behind student success.However, it has been observed by various instructors at the institution that some students quickly repeat questions to see if they can get the right answer by random guessing rather than by their own chemistry skill.This does appear to mimic, in a microcosm, the seemingly contradictory results reported in various studies, as reviewed by Nonis and Hudson (2010), where time spent on homework correlated positively, negatively, or showed no correlation with student success depending on other study habits.A study using student journals to track homework time found no correlation between student study time and student success (Schmidt, 1983).However, another study found that first-year college students saw increased grades from increased study time (Michaels & Miethe, 1989) and another study found that students with less free time for studying performed better (as measured by Grade Point Average (GPA)) than students with more free time to study (Ackerman & Gross, 2003).

Average Time per Attempt
A comparison of average time per question attempt shows a difference from that already observed.Students spent roughly 1.6 minutes per attempt at either the questions related to topics on which they performed well all term or the questions related to topics on which they performed poorly all term, but 2.8 minutes per attempt on questions related to topics where they showed significant improvement over the term -nearly double the amount of time.The difference between groups was more significant than either of the other above comparisons (Welch's F(2,3016.6)= 108.075,p < .001)and post-hoc analysis confirmed that the significant difference was between the topics that showed improvement and the other two categories only (no difference between consistent high and low performance).These results, combined with the above results, suggest that spending more time on a question, rather than just repeatedly trying it (plug-andchug), is a much more effective studying strategy.The increased focus may give students time to go beyond just trying to find a solution, instead tackling the conceptual aspects of the question.What cannot be known from the current analysis is how, exactly, their time is being used.Students may be accessing other online materials, textbooks, or discussing the problems with their peers.These actions would likely increase a student's likelihood of success.Additionally, students may find themselves becoming distracted while online, which would increase the apparent time spent on the problem, though it would not necessarily alter their final success.The balance appears to favour more time being time well spent.
If concentrating on a question helps to improve understanding, as suggested by these data, it also explains why students accessed the low performance questions more often than any other question.For those topics where students did not improve their initial poor performance, it may be that students may lack concentration -they do not spend enough time to engage with the concept behind the question and, instead, repeatedly try the question until their solution is accepted by the online system.This may also reflect a lack of metacognition; that is, the students may not be spending enough time thinking about why their responses were wrong or how to improve their approach to the problem.
Again, the differences between the topics may be due, in part, to the inherent length of calculation or degree of difficulty of the topics.However, such an explanation does not account for the differences seen in the number of attempts per question, which is time-independent.Furthermore, the idea that concentrating on homework is an effective studying strategy has support in the current literature.For example, Nonis and Hudson's ( 2010) study looked at 23-year-old students attending a business school in the Southern USA and issued the students with surveys to assess their study habits.The authors correlated student success, as measured by GPA, with student responses to the survey, while controlling for the ability to concentrate, as measured by the total time reported studying divided by their number of study sessions (their criteria).Greater concentration (i.e.longer average study sessions) led to higher GPAs.

Predicting Student Success as a Class
Efficient study time (i.e. higher average time per question attempt) appears to be more effective than quantity of study time (i.e.number of attempts).Using the above three measurements of online activity, it may be possible to identify which chemistry topics are proving difficult for students, as outlined in Table 1.For example, if students seem to be spending little time per attempt (i.e.short average time) and making few attempts, students are likely grasping the topic and will perform well all term.If, however, they were spending little time per attempt and there were many attempts on the same topic, students were apparently failing to grasp the concept and would need additional assistance to improve their performance by the end of the term.
Given that the matrix in Table 1 was derived from a composite of the entire class, it would be best applied to a class as a whole rather than to individual students.
When combined with exam results, this could be an effective means of identifying chemistry concepts that need more attention in class.1.These values were taken from the specific OWL problems used in this study and are only general guidelines; the relative relationships are more important.The exact values can vary from problem to problem, and teachers are encouraged to look for outliers, i.e. much longer than expected averages, higher than average expected attempts, etc.

Analysis of Question Type within Topics
Textual analysis of the exam questions for the different topics revealed some surprising correlations.The topics in which students performed well during the entire term shared a common trait: both topics (electronegativity and Lewis dot structures) were discussed only on a surface level, rather than in depth.For both topics, once students learned to rely on the periodic table to yield most of the answers (either by counting the number of electrons donated by an atom or by looking at an atom's position in a period trend), there was little challenge.The multiple-choice format of the exam questions reinforced the need for only surface-level understanding of these topics, because the students were given a set of possible answers rather than being required to generate answers without assistance.For example, testing a student's grasp of molecular structures in a multiple choice format does not allow them to incorrectly draw a molecule, which could expose a weakness in their understanding.A surface-level understanding of both of these topics is usually enough to recognise the distractors in multiple choice format assessment.
The topics in which students improved during the term, calorimetry and stoichiometry, incorporated a weakness for many students: word problems.These questions contain considerable information in the question statement and students must learn to mine the question for the relevant details and numbers.To continue, students must then synthesise the method for completing the question.With stoichiometry, there are no pre-solved formulas that will yield a quick solution.Learning to do this effectively takes time but once a student grasps the concept, they usually can do it reliably for the rest of the term.Further, because students see stoichiometry questions in a number of different situations during the term, they see the question from different perspectives which gives them more opportunity to improve.Although there are some pre-solved formulas that students can use to solve calorimetry questions, these questions are generally more involved than stoichiometry problems, and navigating the problems successfully is also time-consuming.
The topics in which students did not improve over the term, standard molar enthalpy and bond properties, share a feature with calorimetry and stoichiometry questions: word question confusion.Unlike the above topics, however, students did not improve over the course of the term.These topics have two significant differences from calorimetry and stoichiometry that most likely proved difficult for students.The first is that enthalpy and bond properties questions are more conceptual in nature than calorimetry and stoichiometry.The second is that both have counterintuitive concepts that many students struggle with: short bonds mean strong bonds, and negative enthalpy is heat released rather than consumed.The sign convention in enthalpy is especially confusing because it is arbitrary and may be confused with the sign convention used in physics.That is, in chemistry, heat transfer out of a system is considered as negative while in physics, heat flow is designated as positive.It may have been these features that prevented students from improving over the term and led to repeated attempts on OWL.

Correlations with Course Grade
Another aspect of predicting student performance is predicting overall student grades.The previous analyses focused on predicting student success within a given topic.However, when online homework usage was examined for all students, correlations were found between a student's study habits and their final course grade (Table 2).Consistent with the observation that concentration helped students improve their performance on certain chemistry topics, a significant positive correlation was observed between average time spent on OWL questions and final course grade (r = .118,p < .001).Similarly, a weak positive correlation was shown between total time spent on OWL questions and final course grade, but the result had a lower effect size and was not significant.On the other hand, there was a significant negative correlation between the number of times questions were accessed and final grade (r = -.147,p < .001).This correlation is consistent with the observation that students accessed the OWL questions on topics on which they performed poorly more often than other questions.If the number of attempts are controlled for (i.e.partial correlations run), the correlations for both total time and average time become essentially equal.Their r-value is about .090,with a 95% confidence interval of .019 to .160.So, with the number of accesses being equal, more time spent equals higher success.If, however, the total time is controlled for, the remaining correlations become stronger (Table 3).With time spent on a problem being equal, fewer attempts correlated with higher success.This matches with instructor observations of students repeatedly submitting answers to try to answer the question by random guessing or to wait until the system gave them the same problem again.As observed above, it appears that fewer attempts per question and longer time spent per attempt are associated with student success.This suggests efficient study time is more important than just more study time.According to the correlations in Tables 2 and 3, it is likely that if a student repeatedly attempts questions, that student's grades will suffer as a result of lack of focus.This is likely only true for their initial attempt at the assignment; if a student returns to the assignment later to attempt more problems on the topic, this extra studying would not show lack of focus nor would it likely lead to poorer grades.Likewise, if a student regularly focuses on homework questions, they are more likely to be studying effectively.As a result, the student would be learning more and achieve a higher grade at the end of the term.Measuring the time between attempts would give insight into this possible outcome; such data were not collected in this study, but will be collected in a follow-up study.Each of these correlations account for about 3% of the variance in final grades, suggesting that concentrating on solving a question, rather than repeatedly attempting a question, results in increased learning.

Modelling Student Success
When students were compared based on their performance on any individual exam topic, one clear pattern for final course grades was observed.The final course grade increased for each group in the following order: 1. WW (wrong on in-term exam, wrong on final exam) 2. RW (right on in-term exam, wrong on final exam) 3. WR (wrong in in-term exam, right on final exam), and 4. RR (right on both the in-term and final exams).
The exact difference between the four groups varied, but this pattern persisted in all six of the chosen chemistry topics.Analysis of variance showed significant differences between final grades for each topic.In most cases, post-hoc analysis showed the WW and RR groups to be significantly different from all other groups; RW and WR were usually not significantly different from each other, though for calorimetry and stoichiometry their scores were significantly different.Students who got the topic question right during the mid-term exam but wrong on the final exam consistently scored lower than students who got the topic question right on the final exam.Though usually not a significant difference, this was a consistent pattern that suggested an ordering of the four groups as follows: WW, RW, WR, and RR.This order was then assigned values of 1 through 4, respectively, for use as a dependent variable in multiple regression.The results from one of the models generated can be seen in Table 4.The previous analyses show that online homework activity is correlated to student success in the classroom.It is possible, then, that online activity may be used to predict student success.To test this, a series of multiple linear regressions were run; the dependent variables (i.e.what is being predicted) chosen were final course grade and exam performance on individual topics as described in the preceding paragraph.These models looked at individual performance rather than averages for the entire student cohort.Therefore, the models show the extent to which a student's online activity can predict that student's success.
Each model created shows that the most significant predictor of student success is the number of attempts on a question.As shown previously, there is a negative relationship between number of attempts and success; the higher the number of attempts, the less successful a student is likely to be.This correlation is related, in part, to the fact that students in this study were allowed unlimited attempts per question.This correlation may change if a penalty is added for incorrect attempts.However, average time per attempt has a much smaller influence and, depending on the model, may or may not be significant.It is, however, still a positive influence: higher average time leads to success.Changing the nature of the number of attempts may also influence this variable.Total time spent on a question is not significant in any of the models.
The contribution of average time per attempt may be smaller in these models because the number of attempts accounts for most of the variance that can be predicted with these variables.Average time also may not work well for short questions; if the time required to complete the question is too short, a statistical model may not be able to differentiate between students.The dependent variable is also limited in its ability to differentiate students.A more robust measure of success, with more than four categories, may aid in creating a better, more complete model.

Conclusions
It appears that online homework activity can be used to predict student performance as suggested by the observed homework behaviours and correlations to final course grades.It appears that concentrating on a question rather than repeatedly attempting a question, is more effective for learning.For a teacher looking to determine which topics their students are struggling to understand, data such as that presented in Table 1 can give an idea of which topics are proving difficult for the majority of students.One could potentially use this to identify subjects that are causing difficulty for the class or to find students who are struggling (e.g.find the outliers with high attempts and/or low average time per attempt on a given subject) and implement appropriate interventions to correct that problem.Instructors can use the information to improve content delivery and address common misconceptions of topics identified as difficult.
Average time per attempt on online homework correlated positively with a student's final course grade which suggests that focusing on individual questions is an effective studying technique.This was found to be true while controlling for either number of attempts or total time spent on homework.On the other hand, the total number of attempts on an online homework question was negatively correlated with final course grades.This may have been due to several reasons: a lack of ability to find the solutions, students who understand the concept would not need as many attempts on that subject, and/or unlimited attempts without penalty encouraged multiple attempts.Total time spent on online homework was not significantly correlated to student success.
In creating a model to predict student success using only online homework activity, the number of attempts on a topic was the largest contributor to predicting success with a significant negative correlation.Neither average time nor total time were large contributors to predicting success, though average time was close and may, in future studies, become a factor.
In order to enhance students' success with online homework, online homework systems could be altered to require an increasing amount of time between resubmitting an answer for the same problem (i.e. each subsequent submission on the same problem would require a little more time).This should encourage students to think more about the problem before trying again without penalising early mistakes such as getting the formatting wrong.Thinking more about why students are not succeeding and analysing why (metacognition) will help students to learn more effectively (Chen, Xie, Ge, & Kauffman, 2008).A cap on the number of attempts could also deliver a similar message to students: think about your answers before you submit.Determining the correct interval between attempts or the proper cap will vary depending on the question type and will require additional research.A cap on submissions or increasing time between submission would be less likely to be useful if the homework system is already set up to deduct points for incorrect submissions because such a deterrent is already likely to force students to think about their answers before answer submission.
Online homework systems, as discussed, provide tools for instructors to analyse students' performance with the online homework.These tools are not, to our knowledge, made available to the students.Although we are not advocating allowing students to be able to view individual data from their classmates, these systems can and should create a report for each student to show them how they are doing and how they compare to averages for the question.The averages could be class averages, college averages collected over multiple courses, or even national or international averages, assuming such data are collected.Such a report should let students know how they are doing on different types of questions (e.g.surface-level questions, word questions, and conceptual questions).Ideally, these reports would contain some information about student success in comparison to the various averages and recommend which topics students need to give more of their attention.This should help students analyse their own study habits, which would allow them to adapt and improve, perhaps with guidance provided by the system.

Future research
In the future, more work will be done to establish a better regression model.The measurement of student performance on individual topics will need to be improved beyond a 4-point scale.To generalise the model, subjects other than chemistry should be tested.The resulting model should be tested for its success at predicting student success.Interventions, too, should be tested for those students identified as needing assistance, based on the observations in this paper.Other variables of interest to the model may include determining how quickly students were repeating questions and determining what effect procrastination has on student success.Learning why students failed to complete the online homework assignment, regardless of the unlimited number of attempts and lack of penalty for wrong answers, would also provide valuable information.
A potentially powerful extension of this work would be to couple the identification of an academically at-risk student with interventions already proven to improve student outcomes (e.g.Lizzio & Wilson, 2013).Identification of atrisk students through online homework systems means students may be identified sooner than results from exams, which would provide extra time for such interventions to be effective.

Table 1
Summary of online homework usage profiles for determining likely student success on specific chemistry topics

Table 2
Correlations between OWL Usage and Students' Final Fall (CHEM 101) Grades (N = 842)Notes to Table2. 1. Bias corrected and accelerated bootstrap 95% confidence intervals are reported in brackets.2. Significance level (two tailed) showing the chance that this result was a random occurrence.

Table 3
Partial Correlations between OWL Usage and Students' Final Fall (CHEM 101) Grades (N = 841); total time accessing OWL factored outNotes to Table3. 1. Bias corrected and accelerated bootstrap 95% confidence intervals are reported in brackets.2. Significance level (two tailed) showing the chance that this result was a random occurrence.

Table 4
Coefficients and associated errors for linear model for predicting student success on stoichiometry