1.4.1 Overview of Measurement

Table 1 Principles of Measurement

Measure what is most important; that is, measure what you care about.

Base measurement on observable data that are analyzed appropriately.

Focus the measurement on a well-defined area of quality; neither too large or too small.

Select the appropriate measurement tool for the task.

When possible, obtain data from multiple sources and triangulate.

Recognize that outliers in a data set often provide clues about the integrity of the measurements.

Establish the reliability of a measurement system by testing it before using it.

Increase the validity of a measurement system by comparing and contrasting outcomes under varying conditions.

Assess the overall usefulness of a measurement system by comparing the cost of the data and the levels of reliability and validity obtained.

Seminal Concepts

Measurement is important because people care about quality. Quality describes how good something is in the context of meeting human needs. Quality is a holistic combination of the inherent or distinctive attributes of a person, product, process, organization, etc. Some examples of quality in higher-education contexts include

Quality of knowledge

In a specific knowledge area (e.g. hydrology, statistics, western history), quality involves an individual’s depth, breath, and connections in the context of ideas and facts that comprise the knowledge area.

Quality of performance

In a specific performance area (e.g. teamwork, running a project, playing an oboe), quality describes how good the performance is.

Quality of a product

For a given product (e.g. technical report or journal paper, an original song), quality describes how good the product is.

Quality of an organization

For an academic unit (department, math tutoring center, etc.), quality describes how effectively this unit meets the needs of key stakeholders. For a university, quality describes how effectively it meets the needs of the students.

Quality occurs on a scale that spans from low to medium to high to exceptional. Measurement is the process of assigning a number or qualitative scale to indicate level of quality. Tools for making measurements have varied forms and names. Some common labels are scoring guides, rubrics, and measures. Here, we use the label measure to mean any tool that is used for the purposes of making a measurement.

Validity refers to how well the measurement process actually measures what it claims to measure. For example, a measurement of student writing should indicate the quality of the writing, and should not be influenced by things such as how much writing the student has done or whether or not the student has done things the way the teacher wanted them to be done.

Reliability refers to the repeatability of a measurement. That is, the more reliable a measurement, the more likely it is that the measurer will arrive at the same number or qualitative score if the measurement is repeated. In general, before the validity of a measurement process can be established, its reliability must be established. When multiple people use a measurement process, the level of consistency in their judgments is termed inter-rater reliability.

Quality in learning, assessment, evaluation, and research is enhanced by quality in measurement. To attain quality in measurement in multiple contexts, we suggest adherence to the principles summarized in Table 1.

Rationale for Measurement

Assessment, evaluation, and research are three important processes in higher education. Although each is different, all three of them involve measurement. Table 2 shows the signiﬁcant differences between assessment, evaluation, and research. All three of these processes benefit from conscious attention to the principles of measurement articulated in Table 1.

Measurement targets should be meaningful to three different audiences: students, practitioners in the field, and researchers. Students respond best to explicit learning targets that involve authentic challenges connected with knowledge mastery, reasoning proficiency, product realization, and professional expectations (Stiggins, 1996). Practitioners expect to see course outcomes that support the diverse roles within the discipline or profession and in the workplace. Researchers depend on a clearly conceptualized cognitive model that reflects the latest understanding of how learners represent knowledge and develop expertise in the domain (Pellegrino, Chudowsky, & Glaser, 2001). Researchers also expect alignment among the cognitive model, the methods used to observe performance, and the protocol for interpreting results. Educators vary both in their motivation for collecting data and in their skill in interpreting and reporting it. It is important to address the challenge of serving all three audiences with learning and growth that can be validly measured. The following sections explore the varying uses of measurement.

Role in Assessment

Assessment is a process of measuring and analyzing a performance, a work product, or a learning skill to provide high-quality, timely feedback that gives assessees clear and meaningful directives and insights to help them improve their future performance (4.1.1 Overview of Assessment and 4.1.4 Assessment Methodology). Before a performance can be measured for assessment purposes, the criteria must be clearly defined and expectations or measures of each criterion must be set. The measurer will find it easier to provide specific feedback that will be effective for strengthening future performance if he or she narrows the focus to three to five performance expectations. If the goal is to “grow” a performance, a work product, or a learning skill, assessment must occur early (and often) to allow students ample time to refine and improve. For example, if a central course outcome is to improve student writing, it will be important for instructors to conduct multiple “formative” measurements of performance on steps in the process of preparing a research paper. In this case, an instructor might use an analytic writing rubric for a research paper as the assessment tool to measure and collect data that provides feedback to the student. At intermediate times throughout the semester, instructors can measure specific performance expectations, providing both student and instructor with assessment data that can strengthen writing quality.

Role in Evaluation

Evaluation is the process of measuring the quality of a performance (e.g., a work product or the use of a process) to make a judgment or to determine whether, or to what level, standards have been met (1.4.6 Overview of Evaluation and 1.4.7 Evaluation Methodology). Evaluation is used in many academic arenas, such as graded assignments and exams, grade point average (GPA), promotion and tenure, or grant acquisition. Measurements that are used to make judgments are often based on external standards (e.g., accrediting standards, agency policies, accountability for funding). Before any performance can be measured for evaluation purposes, the performance expectations (standards based on the measure) must be clear for each criterion of quality. Furthermore, the evaluation should be unbiased and be documented in a permanent record (e.g., transcript, personnel file, grant record). In the case of a research paper, the final grade may be assigned using information from a score sheet associated with a writing rubric. The more a measurement tool requires an evaluator to explain his or her judgments about whether standards have been met, the less effective that measurement tool is for evaluation.

Role in Research

The purpose of measurement in research is to validate new knowledge within or across disciplines (2.5.2 Research Methodology). Researchers begin with questions about a void in the existing body of knowledge; they then form hypotheses regarding relationships of measurable variables. Theory should be used to frame research questions and to guide methods for collecting reliable and valid data. In research, measurement falls into two categories: descriptive and experimental. If the researcher is attempting to answer a question descriptively, the appropriate tools include surveys, interviews or focus groups, conversational analysis, observation, ethnographies, or meta-analysis (Olds, Moskal & Miller, 2005). If the researcher’s study is experimental in nature, the proper methods include randomized controlled trials, matched groups, baseline data, post-testing, and longitudinal designs. Each of these research designs or techniques requires certain kinds of measures that will result in data that can be appropriately analyzed to provide a basis for interpretation (National Research Council, 2002). Inferences drawn from the measurement should directly relate the evidence obtained to the hypothesis being investigated. The quality of a measure is very important because limitations, biases, and alternative interpretations will affect validity. Researchers want to know whether their ﬁndings can be generalized to a broader population or to multiple settings. The consistency of the measurement and the validity of the data are evidenced by the ability of other researchers to replicate the results.

Peer review and publication of research are essential for disseminating new knowledge to other practitioners as well as to the public.

Performance Measurement

Many educators are reluctant to apply measurement instruments and techniques to complex and integrated performances. Tasks like these are commonly referred to as constructed-response outcomes; they include learning portfolios, reflective journals, self-growth papers, capstone reports, project reports, and experiential narratives. Learning portfolios can include multiple performance artifacts, such as a sequence of art works produced during a course and accompanied by reﬂective journals and interpretive analyses. It is much easier to design constructed-response outcomes like portfolios than it is to create reliable and valid measures for assessing or evaluating their quality. To assess and/or to evaluate these complex outcomes, instructors often use custom-designed rubrics.

Educators’ historic reluctance to adopt complex integrated performance outcomes stems in part from their assumptions about reliability and validity in measuring them. For many, selected-response instruments, such as multiple-choice and matching, are perceived to be more reliable and valid as well as easier to use. Instructors cannot measure performances that involve critical thinking, quality teaching, or service-learning projects by counting “correct” answers (Wiggins, 1998). These require qualitative judgments. As a result, some instructors opt to take advantage of the comfort that comes from using traditional select-response measurement instruments, and so spend most of their in-class time “covering the content” to align with “the test.” But select-response tests are often not authentic measures of intended outcomes. For example, when one applies for a driver’s license, the simple indicators of the driving test and written test do not represent and are not intended to represent all key driving performances. Table 3 is a guide for selecting measurement tools for the five types of learning outcomes described in the Learning Outcomes (2.4.5) module: competency, movement, accomplishment, experience, and integrated performance.

A competency is a collection of knowledge, skills, and attitudes needed to perform a specific task effectively and efficiently at a defined level. A common question about a competency outcome is: What can the learner do at what level in a specific situation? Movement is documented growth in a transferable process or learning skill. A common question about a movement outcomes is: What does increased performance look like? Accomplishments are significant work products or performances that are externally valued or affirmed by an outside expert. A common question about an accomplishment outcome is: How well does student work compare with work products of practitioners in the ﬁeld?

Experiences are interactions, emotions, responsibilities, and shared memories that clarify one’s position in relation to oneself, a community, or discipline. A common question about an experience outcome is: How has this experience changed the learner? Integrated performance is the synthesis of prior knowledge, skills, processes, and attitudes with current learning needs to address a difﬁcult challenge within a strict time frame and set of performance expectations. A common question about integrated performance is: How prepared are students to respond to a real-world challenge?

Over the last decade, rubrics have received considerable attention in education as tools for performance measurement (Arter & McTighe, 2001). Rubrics provide explicit statements that describe different levels of performance and are worded in a way that covers the essence of what to look for when conducting qualitative measurements. Rubrics should reflect the best thinking about what constitutes good performance, a work product, or a learning skill. As discussed in Fundamentals of Rubrics (1.4.2), rubrics can be analytic (with an extensive set of factors and multiple scales) or holistic (with just a single scale). However, rubrics are only as robust as the clarity of purpose for measurement.

Concluding Thoughts

Measurement is foundational to classroom assessment, grading, program evaluation, and educational research. In the physical sciences, quality measurement is a central event; in education, measurement involves a series of linked decisions that are more qualitative in nature. In both, the goal is to align outcomes, performance tasks, measurement methods, and data analysis. Educators in all disciplines must learn to apply their measurement skills to the multiple uses of measurement in education. Regardless of the discipline or profession, best practices include clear communication of purpose, well-selected targets for measurement, sound methods for data collection, and sampling to reduce bias and distortion. Faculty will become better teachers and researchers if they learn to seek consensus with their colleagues about what processes matter most in teaching and learning, and what tools measure learner growth most efficiently and effectively.

References

Arter, J., & McTighe, J. (2001). Scoring rubrics in the classroom: Using performance criteria for assessing and improving student performance. Thousand Oaks, CA: Corwin Press.

Olds, B. M., Moskal, B. M., & Miller, R. L. (2005). Assessment in engineering education: Evolution, approaches and future collaborations. Journal of Engineering Education, 94, 13-26.

Pellegrino, J., Chudowsky, N., & Glaser, R. (Eds.). (2001). Knowing what students know: The science and design of educational assessment. Washington DC: National Academy Press.

Shavelson, R. J., & Towne, L. (Eds.). (2002). Scientific research in education. National Research Council. Washington, DC: National Academy Press.

Stiggins, R. J. (1996). Student-centered classroom assessment (2^nded.). Old Tappan, NJ: Prentice Hall.

Wiggins, G. (1998). Educative assessment: Designing assessments to inform and improve student performance. San Francisco: Jossey-Bass.

Table 2 Comparison of Measurement, Assessment, Evaluation, and Research

Categories

Measurement

Assessment

Evaluation

Research

Purpose

To assign a number or qualitative level to indicate level of quality

To improve quality

To determine whether standards for quality are met

To produce new knowledge that builds up existing knowledge

Nature

Objective/unbiased

Non-judgmental (collaborative)

Judgmental (not collaborative)

Inquiry-based (collaborative)

Performer

Measurer

Assessor

Evaluator

Researcher

Beneficiary

Stakeholders in use of measurement

Assessee

External decision-makers

Community of scholars and practioners

Results

A number or grade

Action plan

Documented level of ﬁnal performance; part of a permanent record; brings closure

Contribution to

existing knowledge

Important Characteristics

Calibrated; reliable, scaled appropriately (with range and units)

Criteria based; assessee-centered

Unbiased; criteria based

Theory driven; designed to control bias; can be tested; involves a high level of expertise; uses accepted methods

Table 3 Alignment of Learning Outcomes and Measurement Instruments

Outcome Type

Example

Task/Instrument

Competency

Applying knowledge in a specific context at a specific level

Checklist or selected response exam with answer key

Movement

Exercising transferable skills in a continuum with no upper bound (e.g., problem-solving, communication, teamwork)

Reflective essay with analytic rubric

Accomplishments

Creating something with external value (project work, community service, artistic creation, thesis)

Portfolio with scorecard or peer review form

Experiences

Responding to and internalizing a situation

Personal journal with holistic rubric

Integrative performances

Deploying working expertise in response to an authentic challenge (e.g., internship interview, student teaching observation, ﬁnal presentation, leadership situation)

Performance appraisal with rating form

Table 2 Comparison of Measurement, Assessment, Evaluation, and Research

Table 3 Alignment of Learning Outcomes and Measurement Instruments

1.4.1 Overview of Measurement

by Kathleen Burke (Economics, SUNY Cortland) and
Sandy Bargainnier (Kinesiology, The Pennsylvania State University)

Measurement is the process of determining the level of performance. This module presents basic ideas for obtaining valid, reliable, and efficient measurements, and illustrates how these are central to proper assessment, evaluation, and research.

Table 1 Principles of Measurement

Categories	Measurement	Assessment	Evaluation	Research
*Purpose*	To assign a number or qualitative level to indicate level of quality	To improve quality	To determine whether standards for quality are met	To produce new knowledge that builds up existing knowledge
*Nature*	Objective/unbiased	Non-judgmental (collaborative)	Judgmental (not collaborative)	Inquiry-based (collaborative)
*Performer*	Measurer	Assessor	Evaluator	Researcher
*Beneficiary*	Stakeholders in use of measurement	Assessee	External decision-makers	Community of scholars and practioners
*Results*	A number or grade	Action plan	Documented level of ﬁnal performance; part of a permanent record; brings closure	Contribution to existing knowledge
*Important Characteristics*	Calibrated; reliable, scaled appropriately (with range and units)	Criteria based; assessee-centered	Unbiased; criteria based	Theory driven; designed to control bias; can be tested; involves a high level of expertise; uses accepted methods

Outcome Type	Example	Task/Instrument
Competency	Applying knowledge in a specific context at a specific level	Checklist or selected response exam with answer key
*Movement*	Exercising transferable skills in a continuum with no upper bound (e.g., problem-solving, communication, teamwork)	Reflective essay with analytic rubric
*Accomplishments*	Creating something with external value (project work, community service, artistic creation, thesis)	Portfolio with scorecard or peer review form
*Experiences*	Responding to and internalizing a situation	Personal journal with holistic rubric
*Integrative performances*	Deploying working expertise in response to an authentic challenge (e.g., internship interview, student teaching observation, ﬁnal presentation, leadership situation)	Performance appraisal with rating form