A2i Assessments Technical Manual

TECHNICAL MANUAL

REPORTING ON RELIABILITY & VALIDITY

Test-Retest Reliability Test-retest reliability is a measure of the consistency of a test or assessment. If an assessment has high test-retest reliability, then a student who takes the test multiple times within a short time period will receive relatively consistent scores. Test-retest reliability can be a helpful metric for evaluating whether differences in scores are reflecting real differences between individuals, rather than errors. Test-retest reliability assumes a relatively unchanging ability or multiple tests taken in close temporal proximity (i.e., back to back in a short amount of time), but neither of these assumptions or testing protocols apply to Letters2Meaning. L2M is an assessment designed to capture reading ability, a trait that changes quickly in K–3 students, and it is used as a tool in the classroom by teachers to understand longer-term growth patterns among students (so it is very rare that a student will take L2M more frequently than ≈6 weeks). In spite of these limitations, we can simulate student response behavior using an advanced class of statistical tools called item response theory. In essence, these tools let us approximate how a student of a given reading ability might perform on our assessment. Because L2M is a computer adaptive test, the specific questions and order of those questions differ from student to student, and from test to test. By incorporating the same logic and rules used in the real-world assessments, we can simulate entire tests for students, which gives us the ability to evaluate how a student of a given grade and reading level would perform on L2M. Using 1,000 iterations of 58,000 simulated students across Grades K–3, we found that the test-retest reliability of L2M is approximately 0.965 (on a scale of 0 to 1), indicating consistent scores for students and that most variation in scores arises from true differences in student reading ability.

Validity Content Validity

Content validity measures the degree to which assessment items represent the topic, constructor, or behavior the test is designed to measure. Researchers use multiple methods to systematically determine how accuraetly a test is measuring the intended content and determine whether each item is accurately contributing to the overall score. The A2i assessment items were authored using specific protocols developed by Dr. Carol Connor and reviewed by expert researchers. Experienced teachers, early literacy instructors, graduate students, and research scientists authored the items using these guidelines. Once drafted, feedback from teachers and rigorous statistical testing was used to fine-tune the items and ensure content validity. The items designed by this collaborative team fit general guidelines about age and developmental appropriateness, and come from individuals representing a diverse swath of experiences and backgrounds, minimizing potential bias. Importantly, we leverage a powerful branch of statistics that lets us measure key attributes of the items themselves so that expert opinion is only the starting point for maximizing content validity. Regular analyses enable us to identify the relevant age and developmental status of each item, the degree to which it correlates with other items designed to measure reading ability, and whether students across demographic groups all answer the question in the same way. Thus, we use concrete, empirical guidelines to continually evaluate and improve our item bank.

Scholastic.com/A2i

Page 17

Made with FlippingBook - Online catalogs