Measurement is the process of determining the level of performance. This module presents basic ideas for obtaining valid, reliable, and efficient measurements, and illustrates how these are central to proper assessment, evaluation, and research.
Table 1 Principles of Measurement |
|
|
Seminal Concepts
Measurement is important because people care about quality. Quality describes how good something is in the context of meeting human needs. Quality is a holistic combination of the inherent or distinctive attributes of a person, product, process, organization, etc. Some examples of quality in higher-education contexts include
Quality of knowledge
In a specific knowledge area (e.g. hydrology, statistics, western history), quality involves an individual’s depth, breath, and connections in the context of ideas and facts that comprise the knowledge area.
Quality of performance
In a specific performance area (e.g. teamwork, running a project, playing an oboe), quality describes how good the performance is.
Quality of a product
For a given product (e.g. technical report or journal paper, an original song), quality describes how good the product is.
Quality of an organization
For an academic unit (department, math tutoring center, etc.), quality describes how effectively this unit meets the needs of key stakeholders. For a university, quality describes how effectively it meets the needs of the students.
Quality occurs on a scale that spans from low to medium to high to exceptional. Measurement is the process of assigning a number or qualitative scale to indicate level of quality. Tools for making measurements have varied forms and names. Some common labels are scoring guides, rubrics, and measures. Here, we use the label measure to mean any tool that is used for the purposes of making a measurement.
Validity refers to how well the measurement process actually measures what it claims to measure. For example, a measurement of student writing should indicate the quality of the writing, and should not be influenced by things such as how much writing the student has done or whether or not the student has done things the way the teacher wanted them to be done.
Reliability refers to the repeatability of a measurement. That is, the more reliable a measurement, the more likely it is that the measurer will arrive at the same number or qualitative score if the measurement is repeated. In general, before the validity of a measurement process can be established, its reliability must be established. When multiple people use a measurement process, the level of consistency in their judgments is termed inter-rater reliability.
Quality in learning, assessment, evaluation, and research is enhanced by quality in measurement. To attain quality in measurement in multiple contexts, we suggest adherence to the principles summarized in Table 1.
Rationale for Measurement
Assessment, evaluation, and research are three important processes in higher education. Although each is different, all three of them involve measurement. Table 2 shows the significant differences between assessment, evaluation, and research. All three of these processes benefit from conscious attention to the principles of measurement articulated in Table 1.
Measurement targets should be meaningful to three different audiences: students, practitioners in the field, and researchers. Students respond best to explicit learning targets that involve authentic challenges connected with knowledge mastery, reasoning proficiency, product realization, and professional expectations (Stiggins, 1996). Practitioners expect to see course outcomes that support the diverse roles within the discipline or profession and in the workplace. Researchers depend on a clearly conceptualized cognitive model that reflects the latest understanding of how learners represent knowledge and develop expertise in the domain (Pellegrino, Chudowsky, & Glaser, 2001). Researchers also expect alignment among the cognitive model, the methods used to observe performance, and the protocol for interpreting results. Educators vary both in their motivation for collecting data and in their skill in interpreting and reporting it. It is important to address the challenge of serving all three audiences with learning and growth that can be validly measured. The following sections explore the varying uses of measurement.
Role in Assessment
Assessment is a process of measuring and analyzing a performance, a work product, or a learning skill to provide high-quality, timely feedback that gives assessees clear and meaningful directives and insights to help them improve their future performance (4.1.1 Overview of Assessment and 4.1.4 Assessment Methodology). Before a performance can be measured for assessment purposes, the criteria must be clearly defined and expectations or measures of each criterion must be set. The measurer will find it easier to provide specific feedback that will be effective for strengthening future performance if he or she narrows the focus to three to five performance expectations. If the goal is to “grow” a performance, a work product, or a learning skill, assessment must occur early (and often) to allow students ample time to refine and improve. For example, if a central course outcome is to improve student writing, it will be important for instructors to conduct multiple “formative” measurements of performance on steps in the process of preparing a research paper. In this case, an instructor might use an analytic writing rubric for a research paper as the assessment tool to measure and collect data that provides feedback to the student. At intermediate times throughout the semester, instructors can measure specific performance expectations, providing both student and instructor with assessment data that can strengthen writing quality.
Role in Evaluation
Evaluation is the process of measuring the quality of a performance (e.g., a work product or the use of a process) to make a judgment or to determine whether, or to what level, standards have been met (1.4.6 Overview of Evaluation and 1.4.7 Evaluation Methodology). Evaluation is used in many academic arenas, such as graded assignments and exams, grade point average (GPA), promotion and tenure, or grant acquisition. Measurements that are used to make judgments are often based on external standards (e.g., accrediting standards, agency policies, accountability for funding). Before any performance can be measured for evaluation purposes, the performance expectations (standards based on the measure) must be clear for each criterion of quality. Furthermore, the evaluation should be unbiased and be documented in a permanent record (e.g., transcript, personnel file, grant record). In the case of a research paper, the final grade may be assigned using information from a score sheet associated with a writing rubric. The more a measurement tool requires an evaluator to explain his or her judgments about whether standards have been met, the less effective that measurement tool is for evaluation.
Role in Research
The purpose of measurement in research is to validate new knowledge within or across disciplines (2.5.2 Research Methodology). Researchers begin with questions about a void in the existing body of knowledge; they then form hypotheses regarding relationships of measurable variables. Theory should be used to frame research questions and to guide methods for collecting reliable and valid data. In research, measurement falls into two categories: descriptive and experimental. If the researcher is attempting to answer a question descriptively, the appropriate tools include surveys, interviews or focus groups, conversational analysis, observation, ethnographies, or meta-analysis (Olds, Moskal & Miller, 2005). If the researcher’s study is experimental in nature, the proper methods include randomized controlled trials, matched groups, baseline data, post-testing, and longitudinal designs. Each of these research designs or techniques requires certain kinds of measures that will result in data that can be appropriately analyzed to provide a basis for interpretation (National Research Council, 2002). Inferences drawn from the measurement should directly relate the evidence obtained to the hypothesis being investigated. The quality of a measure is very important because limitations, biases, and alternative interpretations will affect validity. Researchers want to know whether their findings can be generalized to a broader population or to multiple settings. The consistency of the measurement and the validity of the data are evidenced by the ability of other researchers to replicate the results.
Peer review and publication of research are essential for disseminating new knowledge to other practitioners as well as to the public.
Performance Measurement
Many educators are reluctant to apply measurement instruments and techniques to complex and integrated performances. Tasks like these are commonly referred to as constructed-response outcomes; they include learning portfolios, reflective journals, self-growth papers, capstone reports, project reports, and experiential narratives. Learning portfolios can include multiple performance artifacts, such as a sequence of art works produced during a course and accompanied by reflective journals and interpretive analyses. It is much easier to design constructed-response outcomes like portfolios than it is to create reliable and valid measures for assessing or evaluating their quality. To assess and/or to evaluate these complex outcomes, instructors often use custom-designed rubrics.
Educators’ historic reluctance to adopt complex integrated performance outcomes stems in part from their assumptions about reliability and validity in measuring them. For many, selected-response instruments, such as multiple-choice and matching, are perceived to be more reliable and valid as well as easier to use. Instructors cannot measure performances that involve critical thinking, quality teaching, or service-learning projects by counting “correct” answers (Wiggins, 1998). These require qualitative judgments. As a result, some instructors opt to take advantage of the comfort that comes from using traditional select-response measurement instruments, and so spend most of their in-class time “covering the content” to align with “the test.” But select-response tests are often not authentic measures of intended outcomes. For example, when one applies for a driver’s license, the simple indicators of the driving test and written test do not represent and are not intended to represent all key driving performances. Table 3 is a guide for selecting measurement tools for the five types of learning outcomes described in the Learning Outcomes (2.4.5) module: competency, movement, accomplishment, experience, and integrated performance.
A competency is a collection of knowledge, skills, and attitudes needed to perform a specific task effectively and efficiently at a defined level. A common question about a competency outcome is: What can the learner do at what level in a specific situation? Movement is documented growth in a transferable process or learning skill. A common question about a movement outcomes is: What does increased performance look like? Accomplishments are significant work products or performances that are externally valued or affirmed by an outside expert. A common question about an accomplishment outcome is: How well does student work compare with work products of practitioners in the field?
Experiences are interactions, emotions, responsibilities, and shared memories that clarify one’s position in relation to oneself, a community, or discipline. A common question about an experience outcome is: How has this experience changed the learner? Integrated performance is the synthesis of prior knowledge, skills, processes, and attitudes with current learning needs to address a difficult challenge within a strict time frame and set of performance expectations. A common question about integrated performance is: How prepared are students to respond to a real-world challenge?
Over the last decade, rubrics have received considerable attention in education as tools for performance measurement (Arter & McTighe, 2001). Rubrics provide explicit statements that describe different levels of performance and are worded in a way that covers the essence of what to look for when conducting qualitative measurements. Rubrics should reflect the best thinking about what constitutes good performance, a work product, or a learning skill. As discussed in Fundamentals of Rubrics (1.4.2), rubrics can be analytic (with an extensive set of factors and multiple scales) or holistic (with just a single scale). However, rubrics are only as robust as the clarity of purpose for measurement.
Concluding Thoughts
Measurement is foundational to classroom assessment, grading, program evaluation, and educational research. In the physical sciences, quality measurement is a central event; in education, measurement involves a series of linked decisions that are more qualitative in nature. In both, the goal is to align outcomes, performance tasks, measurement methods, and data analysis. Educators in all disciplines must learn to apply their measurement skills to the multiple uses of measurement in education. Regardless of the discipline or profession, best practices include clear communication of purpose, well-selected targets for measurement, sound methods for data collection, and sampling to reduce bias and distortion. Faculty will become better teachers and researchers if they learn to seek consensus with their colleagues about what processes matter most in teaching and learning, and what tools measure learner growth most efficiently and effectively.
References
Arter, J., & McTighe, J. (2001). Scoring rubrics in the classroom: Using performance criteria for assessing and improving student performance. Thousand Oaks, CA: Corwin Press.
Olds, B. M., Moskal, B. M., & Miller, R. L. (2005). Assessment in engineering education: Evolution, approaches and future collaborations. Journal of Engineering Education, 94, 13-26.
Pellegrino, J., Chudowsky, N., & Glaser, R. (Eds.). (2001). Knowing what students know: The science and design of educational assessment. Washington DC: National Academy Press.
Shavelson, R. J., & Towne, L. (Eds.). (2002). Scientific research in education. National Research Council. Washington, DC: National Academy Press.
Stiggins, R. J. (1996). Student-centered classroom assessment (2nd ed.). Old Tappan, NJ: Prentice Hall.
Wiggins, G. (1998). Educative assessment: Designing assessments to inform and improve student performance. San Francisco: Jossey-Bass.
Table 2 Comparison of Measurement, Assessment, Evaluation, and Research
Categories
Measurement
Assessment
Evaluation
Research
Purpose
To assign a number or qualitative level to indicate level of quality
To improve quality
To determine whether standards for quality are met
To produce new knowledge that builds up existing knowledge
Nature
Objective/unbiased
Non-judgmental (collaborative)
Judgmental (not collaborative)
Inquiry-based (collaborative)
Performer
Measurer
Assessor
Evaluator
Researcher
Beneficiary
Stakeholders in use of measurement
Assessee
External decision-makers
Community of scholars and practioners
Results
A number or grade
Action plan
Documented level of final performance; part of a permanent record; brings closure
Contribution to
existing knowledge
Important Characteristics
Calibrated; reliable, scaled appropriately (with range and units)
Criteria based; assessee-centered
Unbiased; criteria based
Theory driven; designed to control bias; can be tested; involves a high level of expertise; uses accepted methods
Outcome Type |
Example |
Task/Instrument |
Competency |
Applying knowledge in a specific context at a specific level |
Checklist or selected response exam with answer key |
Movement |
Exercising transferable skills in a continuum with no upper bound (e.g., problem-solving, communication, teamwork) |
Reflective essay with analytic rubric |
Accomplishments |
Creating something with external value (project work, community service, artistic creation, thesis) |
Portfolio with scorecard or peer review form |
Experiences |
Responding to and internalizing a situation |
Personal journal with holistic rubric |
Integrative performances |
Deploying working expertise in response to an authentic challenge (e.g., internship interview, student teaching observation, final presentation, leadership situation) |
Performance appraisal with rating form |