ASSESSING THE SCIENCE KNOWLEDGE OF UNIVERSITY STUDENTS: PERILS, PITFALLS AND POSSIBILITIES

Science content knowledge is internationally regarded as a fundamentally important learning outcome for graduates of bachelor level science degrees: the Science Threshold Learning Outcomes (TLOs) recently adopted in Australia as a nationally agreed framework include “Science Knowledge” as TLO 2. Science knowledge is commonly assessed using traditional examinations, tests and/or quizzes, but such forms of assessment can be problematic. A key contributing issue is the emphasis on “content” in many science curricula. Frequently, a focus on transmission of knowledge is mirrored in an over-reliance on traditional ways of teaching and an overdependence upon summative assessment: students respond by relying on rote learning at the expense of developing a deep understanding of science concepts. The challenge is, therefore, to design teaching approaches that foster active learning, and, critically, to match these with rigorously designed and meaningful assessment tasks that support higher level learning of science knowledge.


Introduction and Background
Science content knowledge is regarded as a core learning outcome for graduates of bachelor level science degrees, but meaningful assessment of science knowledge is still problematic, despite the wealth of available literature.There are two major areas of concern.The first is the degree to which "knowledge" is privileged as a major component of the typical undergraduate science curriculum and the second is the sometimes uncritical predominance of traditional forms of assessment as class sizes and, therefore, teaching academics' workloads continue to increase.This paper aims to present an overview of why a more effective and rigorous approach to assessment of science knowledge is mandatory and discusses some of the challenges and possibilities inherent in the design of appropriate assessment tasks.As Hanauer and Bauerle (2012) pointed out, assessment reform is the key to facilitating innovation in science education.

Articulating graduate learning outcomes
In recent years, there has been an international thrust towards overt documentation of expected standards for graduates in specified fields of higher education, coupled with expectations that such learning can be evidenced, for external peer review and/or for formal quality assurance purposes.In the United Kingdom, for example, the Quality Assurance Agency for Higher Education (QAA) publishes Subject Benchmark Statements which describe the nature and characteristics of a degree in a given subject area and set out expectations of standards for the abilities and capabilities that graduates will have acquired.However, the Subject Benchmark Statements provide non-prescriptive guidance as to how programs should be delivered to ensure that graduates meet those expectations.Similarly, Tuning1 , a movement that began in Europe but has since been taken up in the United States of America, Latin America, Africa and Russia, is a process that aims to articulate what a student knows and is able to do in a given discipline at the point of graduation.Tuning documentation provides frameworks that establish clear learning expectations for students in given subject areas.Such documents can be used both as the basis for curriculum design and for quality assurance purposes.
In Australia, there has recently been considerable impetus towards defining and describing threshold (i.e.pass level) learning outcomes for graduates in particular discipline areas.The Australian Learning and Teaching Council (ALTC)sponsored Learning and Teaching Academic Standards (LTAS) Project and later spin-off projects funded by the Office for Learning and Teaching (OLT) have provided a suite of nationally agreed threshold learning outcomes (TLOs) for graduates of a range of disciplines and at a range of educational levels.The TLOs for a particular discipline provide important insights into the qualities, attributes and skills that are considered germane to a tertiary level qualification in that discipline in the current environment.

The place of "knowledge" in an undergraduate science curriculum
In an era when it is has become fashionable to dismiss the acquisition of knowledge as being of lesser importance than the acquisition of skills, the nationally endorsed Threshold Learning Outcomes (TLOs) for bachelor-level degrees in Science (Jones, Yates & Kelder, 2011) demonstrate that content knowledge still holds a central place in the core science curriculum.Science Threshold Learning Outcome 2 is Scientific Knowledge (Jones et al., 2011, p. 11), and is framed as follows: Upon completion of a bachelor degree in science, graduates will: 2. Exhibit depth and breadth of scientific knowledge by: 2.1 demonstrating well-developed knowledge in at least one disciplinary area 2.2 demonstrating knowledge in at least one other disciplinary area.
The Good Practice Guide for Science Threshold Learning Outcome 2 (Jones, 2013) provided some discussion of how the phrase demonstrating well-developed knowledge in at least one disciplinary area should be interpreted.Jones (2013) suggested that students should acquire a body of knowledge that reflects "a coherent knowledge and understanding of the core principles and concepts of that disciplinary area" (p.4) but noted that the flexibility of science bachelor degree programmes, both within and between institutions, means that it is not usually appropriate to mandate particular areas of knowledge at graduate level.
It is difficult to find a direct international comparison for the Australian Science TLOs, which were framed so as to apply across all bachelor degrees in science.
Other national or internationally agreed frameworks such as the QAA Benchmark Statements2 or the European Tuning Reference Points for the design and delivery of degree programmes3 do not provide over-arching documents for science but, instead, focus at the level of specific disciplines or subject areas.However, some useful comparisons can still be made.For example, the Subject Benchmark Statement for Biosciences (2007), while acknowledging the vast diversity of degrees under this disciplinary banner, stated that the subject knowledge common to all biosciences degree programs will include "engagement with the essential facts, major concepts, principles and theories associated with the chosen discipline" (p.12), and that the "teaching and learning strategy should be designed to encourage a progressive acquisition of subject knowledge " (p.16).
Similarly, the QAA Subject Benchmark Statement for Chemistry (2007) included amongst the Chemistry-related cognitive abilities and skills that Chemistry graduates will demonstrate "the ability to demonstrate knowledge and understanding of essential facts, concepts, principles and theories" (p.11).The benchmark standards articulated in this document require that "a basic knowledge and understanding of the content covered in the course is evident" (p.13) in all pass-level graduates.As another example, the Tuning Reference Point for Earth Sciences (2009) requires that bachelor level (first cycle) graduates have acquired the key competency: "a broad knowledge and understanding of the essential features, processes, history and materials of System Earth" (p.19).An examination of Tuning Reference Points documents for other disciplines within science shows that they include similar statements.
At the more general level, in a recent call to arms regarding current cultural norms around science education, Anderson et al., (2011) contended that a universitylevel science education should ensure that students acquire "broad content knowledge" (p.152), as well as developing analytical skills and an understanding of scientific research process, inspiring curiosity and preparing students for lifelong learning.There is, therefore, an internationally held consensus view that a tertiary-level science education must include the acquisition of discipline-specific knowledge.

Assessment of science knowledge
If we acknowledge that content is a core learning outcome of a university science degree, then effective strategies for meaningful assessment of students' science knowledge are required.This is particularly relevant as Australia moves towards full implementation of a national standards framework as the regulatory instrument for the quality assurance of degree programmes (Krause, Barrie & Scott, 2012).As universities are increasingly called upon to monitor and assure academic standards, the standard of assessment of learner outcomes must continue to be improved (Coates, 2012).From the educational perspective, Boud and Associates (2010) stressed that assessment is central to curriculum design because it frames how and what students learn.In particular, they proposed that assessment is most effective when it is designed to focus students on learning, when it is recognised as a learning activity that requires students to engage on appropriate tasks, and when students benefit from useful and informative feedback.How is this premise currently applicable to the assessment of science knowledge and are there significant issues that need to be addressed?
The "content-heavy" science curriculum The prominence that content per se is traditionally given in an undergraduate science curriculum is increasingly recognised as being problematic.Science courses are frequently criticised for being content-heavy.Worryingly, science curricula in general have remained typically static with the emphasis being on content itself rather than students' ability to apply that content knowledge (Matthews & Hodgson, 2011;Stokstad, 2001).The sheer pace of discovery in modern science means that students are frequently overwhelmed and discouraged by the volume of content in their undergraduate courses while teachers struggle to incorporate new material within already crowded curricula (Hoskins & Stevens, 2009).Indeed, a survey of life sciences faculty members showed that they believed that the (perceived) need to cover content mitigated against the teaching of science process skills despite their rating the acquisition of such skills as being very important for their students (Coil, Wenderoth, Cunningham, & Dirks, 2010).
This continued focus on transmission of knowledge is mirrored in an overreliance on traditional ways of teaching and an over-dependence upon summative assessment.Furthermore, most textbooks, particularly those for early year levels, present scientific information from a "step-by-step accumulation of knowledge" viewpoint (ignoring the fact that the scientific progress is not necessarily linear) (Hoskins, 2008, p. A40), and therefore foster a rote-learning approach.This is despite compelling evidence that encouraging undergraduate students to be actively engaged in their own learning results in higher levels of understanding and knowledge retention than traditional lectures and laboratory classes (DeHaan, 2005;Stokstad, 2001).

Approaches to assessment of science knowledge
How is science knowledge usually assessed?James (2003), speaking about the Australian higher education system in general, noted that there was a strong emphasis on summative assessment using final examinations coupled with a tendency to over-assess in an attempt to cover prescribed subject content matter.In science undergraduate courses, summative assessments that are "heavy on the testing of content knowledge" continue to dominate (Hanauer & Bauerle, 2012, p. 36).
Two recent Australian studies have probed the ways in which science undergraduates' knowledge is assessed.A survey of students in the final year of a Bachelor of Science or Biomedical Science at two Australian research-intensive universities by Hodgson, Varsavsky and Matthews (2013) revealed that examinations were a dominant form of assessment and that students rated them most important (88.8%) of all forms of assessment as a method of assessing scientific knowledge.It is worth noting that these students were not asked to comment on their perception of the quality of the assessment type but upon its prominence across their program of study.After surveying Chemistry teaching at twelve Australian universities, Schultz, Mitchell Crow and O'Brien (2013) reported that examinations constitute, on average, about 50% (i.e.represent 50% of the percentage marks awarded) of total assessment in undergraduate Chemistry courses, with practicals being the next most important (mean 28%).The survey suggested that multiple choice testing represents an average of 31% (range 15-49%) of assessment at first year level but is little used in second or third year Chemistry courses.It also showed that online assessment is more likely to be used in first year classes.Similar patterns of assessment practice are likely in other science disciplines.
As further examples, documents on learning, teaching and assessment produced by Tuning Europe for a range of disciplinary areas4 show that summative written (open or closed book) and/or oral examinations are considered to be the cornerstones of assessment in undergraduate science courses.It appears, therefore, that traditional examinations continue to be a very significant component of assessment in undergraduate science courses.It can be asked, however, if they provide meaningful assessment of students' science knowledge?

Traditional examinations as assessment tasks
Traditional forms of knowledge assessment such as unseen closed book examinations have been heavily criticised for being driven by the (perceived) needs of teachers rather than students, and for producing "passive consumers" (Falchikov, 2005, p. 37).The types of questions set in an examination strongly influence students' study strategies.Examinations, together with tests and quizzes (generally in-class assessment tasks of shorter duration than formally invigilated examinations) often encourage surface learning and discourage retention of knowledge across courses or year levels.If students are only tested on factual recall, then they will learn at that level (Crowe, Dirks, & Wenderoth, 2008;Hanauer & Bauerle, 2012).Furthermore, students rarely receive any useful feedback on their performance in examinations (Hanauer & Bauerle, 2012) so they are unable to make further sense of what they have learned -or not learned.Additionally, traditional examinations favour students who happen to be skilled at dealing with time-constrained assessment tasks (Race, 1999), and only represent a snapshot of an individual's performance on that day -which may be influenced by a number of external factors such as their current state of health (Race, Brown, & Smith, 2005).
The validity of such assessment can also be brought into question: do examinations merely assess whether students can write about what they have read and have been able to remember (Race et al., 2005)?Even those students who perform well in traditional examinations may in fact have a poor grasp of key (or threshold) concepts (Boud, 1990).Hughes and Magin (1996) explained this apparent contradiction using a framework originally devised by Biggs and Collis (1982) who considered that students progress through five stages of ascending complexity as their understanding of unfamiliar material grows and as they move from incompetence to expertise.This Structure of the Observed Learning Outcomes (SOLO) Taxonomy defines five stages of levels of understanding: 1. Prestructural -lack of coherent grasp of the material but where isolated facts or skills may be acquired.2. Unistructural -a single relevant aspect may be mastered.3. Multistructural -several elements are mastered separately.4. Relational -several relevant aspects are integrated into a theoretical structure.5. Extended Abstract -stage of expertise in which the material is mastered within its own domain and in relation to other knowledge domains.
Thus, the development of knowledge and understanding is considered to take place along a continuum with students moving from simple unstructured knowledge and understanding to the complex structured and sophisticated knowledge that provides the basis for expert performance (Hughes & Magin, 1996).An emphasis upon testing recall of factual knowledge will not therefore differentiate between students at different stages along this spectrum.Hughes and Magin (1996) presented some useful case studies, four of which are from undergraduate science or engineering, that provide practical examples of strategies with which to assess higher level understanding.
Bloom's Taxonomy (Bloom, Krathwohl, & Masia, 1956) is another very useful framework that can be employed in assessment design for undergraduate science courses (Momsen et al., 2013).Bloom's Taxonomy is often reconstructed pictorially (see, for example, Lord & Baviskar, 2007) as a hierarchical triangle: knowledge (the base level), comprehension, application, analysis, synthesis, and evaluation (the highest level).Elements of the first level assist with understanding of the next level and so on.The revised version of Bloom's Taxonomy (see Krathwohl, 2002) is particularly relevant in a discussion of science knowledge because it presents the "knowledge dimension" with four (rather than the original three) sub-categories: factual knowledge, conceptual knowledge, procedural knowledge and metacognitive knowledge.
Students often have difficulty demonstrating their competence at the higher cognitive levels of Bloom's Taxonomy which require deep conceptual understanding of disciplinary knowledge (Crowe et al., 2008).Most test or examination questions focus on the lower levels of knowledge and comprehension.Knowledge involving understanding, application and attitudes is rarely assessed (Lord & Baviskar, 2007), but assessment that only requires recall and summarisation of factual knowledge per se may not discriminate between students who are at different stages of mastery of the material, as described above for the SOLO taxonomy.Similarly, examinations may disadvantage students if teaching approaches are not aligned to the cognitive challenge of the examination questions: if classroom activities focus mainly on facts and details, but the examination is aimed at a higher cognitive level, then students will tend to perform poorly because they have not been given the opportunity to practice working at that level and to develop a deep understanding of the material (Crowe et al., 2008).
The scope of this paper does not allow a critique of current approaches to undergraduate science teaching.However it is critical that instruction and assessment be aligned, and that both send clear non-conflicting messages about learning expectations and the nature of knowledge in the relevant science discipline (Momsen et al., 2013).Crowe et al., (2008) provide an excellent example of a science undergraduate teaching program designed to enhance student learning (of biology) through the implementation of Bloom's Taxonomy.Their "Blooming Biology Tool" (BBT) is used to determine the level of Bloom's Taxonomy assessed by questions on biology-related topics and versions of the BBT are available for both staff and students.Application of the BBT assists teaching academics to develop better questions and more appropriate learning tasks and helps students develop their own metacognitive skills.Case studies involving an undergraduate physiology course, a biology workshop and a cell biology laboratory class demonstrated that implementation of a BBT-based approach enhanced students' mastery of the subject material.

Multiple choice questions for testing science knowledge
Multiple choice questions (MCQs) are frequently employed as an "effective and efficient method of assessing students' content learning" (Fellenz, 2004, p. 703) and there is an extensive literature on the design of MCQs, their advantages and disadvantages (see, for example, Haladyna, Downing, & Rodriguez, 2002).Such tests are easy to mark (with automated electronic marking possible), which is an attraction for the assessment of very large classes, and they have the potential advantage that test performance is not reliant on (English) writing ability.
Overwhelmingly, MCQs are used to test recall of factual information (Fellenz, 2004), but Lord and Baviskar (2007) point out that multiple choice questions can be constructed so as to assess higher levels of learning, thus moving science students towards understanding, rather than merely recall, of information.In contrast, Crowe et al., (2008) strongly supported the use of short essay or similar questions in examinations in order to ensure that higher order cognitive skills are being assessed.However, when Palmer and Devitt (2007) analysed both MCQs and modified essay questions (MEQs) used for summative testing in a clinical medical undergraduate course and classified them according to a modified Bloom's Taxonomy.They found that, while over 50% of both MCQs and MEQs tested only factual recall, the MCQs used in these tests were actually better at testing higher order skills than the MEQs.
Such studies highlight the need for teaching academics to be more informed about effective assessment design and more critical of their own approaches to assessment of science knowledge, particularly at undergraduate level.As one example, Schultz (2011) described an innovative approach to assessment of science knowledge.Non-multiple choice randomised assignments aligned with the teaching activities and intended learning outcomes are delivered to a very large (> 300 students) first year class via a learning management system.Advantages of this tool include instant marking, automated mark entry and avoidance of cheating.Importantly, the students can take advantage of unlimited practice questions and receive formative feedback on their learning.Importantly, the questions are designed specifically to address the higher levels of Bloom's Taxonomy.

Other strategies for assessing science knowledge
As an assessment strategy, examinations focus primarily on assessing content knowledge whether at the level of factual retention or at higher cognitive levels.
As noted, they are often the most significant item(s) of assessment in core (i.e.compulsory) science units in terms of percentage marks allocated per task.Other commonly employed assessment tasks include practical reports, review essays, posters and oral presentations.Such tasks are usually designed specifically to develop and demonstrate students' acquisition of discipline-specific skills, communications skills or understanding of the processes of science as articulated in the Science TLOs (Jones et al., 2011) but relevant marking rubrics may include criteria relating to the demonstration or application of content knowledge.For example, a universal rubric for assessing students' scientific reasoning skills via scientific writing tasks (Timmerman, Strickland, Johnson, & Payne, 2011, p. 519) requires that the Introduction to the written piece be assessed against the criterion Accuracy, with the descriptor: Content knowledge is accurate, relevant and provides appropriate background including defining critical terms.
One may argue that such assessment tasks cannot assess the full spectrum of content knowledge covered in core science curricula, and it is important to remember that the Science TLOs, including TLO 2 Science Knowledge, apply integratively to the entire degree program (Jones et al., 2011) rather than to single units of study.However, a carefully planned assessment regime can help to move students' thinking and their understanding of content knowledge to higher levels of Bloom's Taxonomy.For example, first year biology students exposed to an innovative guided inquiry curriculum with a diverse suite of assessment tasks (including oral and poster presentations) self-reported using less memorisation and recall and more application and judgment/evaluation than their predecessors in a traditionally taught biology course (Goldey et al., 2012).Such curricula are often put in place in order to improve students' scientific literacy, quantitative and/or communication skills and may not be core or mandatory units.It is telling, therefore, that such curricula may have profound impacts upon students' disciplinary knowledge and conceptual understanding of science.
In summary, assessment that focuses primarily on the lower cognitive levels mitigates against the development of the critical thinking and problem-solving skills that science graduates must acquire (Momsen et al., 2013).Assessment programs for science undergraduate courses therefore need to provide constructively aligned assessment, include both summative and formative assessment, allow assessment of a range of knowledge types, use a range of different assessment tools and address both higher order thinking and the application of knowledge to real-world contexts (Hanauer & Bauerle, 2012).Importantly, such strategies need to be applied from the very first year of a science undergraduate program (Stokstad, 2001).

Conclusion and Recommendations
The Science discipline has already made important and significant moves towards a shared understanding of desired learning outcomes for Australian bachelor level graduates.This paper has focused on identifying some of the key issues around effective and defensible assessment of Science TLO 2. As Coates (2012) pointed out, "developing assessments of performance that simultaneously provide sound information to students, institutions and systems remains a major challenge for higher education" (p.14).
Academics in science-related disciplines must take collective responsibility for developing, sharing and peer-critiquing best practice in assessment of their students' science knowledge.To achieve this, the still common impediment of strongly content-driven and traditionally taught undergraduate curricula must be overcome.In the light of these challenges, the following recommendations are made: 1: that science academics take individual responsibility for ensuring that undergraduate science curricula are founded upon teaching approaches that foster active learning.
2: that science academics take individual responsibility for designing meaningful assessment tasks that support higher level learning (Fellenz, 2004) and provide incontrovertible evidence of student achievement.3: that science faculties have in place mechanisms for constructive discussions about the curriculum among academics teaching into all levels of an undergraduate degree program, and a process for internal, formative, peer review of assessment practices and outcomes.4: that the national science disciplinary community initiate and develop a system of external peer review of assessment of graduate learning outcomes for bachelorlevel degrees in science.