A2i Assessments Technical Manual

PROFESSIONAL SUPPORT SYSTEM

The A2i Assessments Technical Manual

The A2i Assessments Technical Manual

Table of Contents A2i Overview ............................................................................................ 3 Assessment Background ....................................................................... 4 A2i Online Assessments ................................................................................ 4 Use ............................................................................................................... 4 Rationale ....................................................................................................... 5 The Importance of a Computer Adaptive Approach to Assessment ................. 7 Description of the Assessments .......................................................... 8 Letters2Meaning (L2M) .................................................................................. 8 Duration ................................................................................................... 8 Items ........................................................................................................ 8 Item Types ................................................................................................ 9 Word Match Game (WMG) ............................................................................ 11 Duration .................................................................................................. 11 Item Types ............................................................................................... 11

A2i Assessment Score Types.......................................................................................12

Reporting on Reliability & Validity .................................................... 16 Reliability ..................................................................................................... 16 Internal Consistency Reliability . ............................................................... 16 Test-Retest Reliability ............................................................................... 17 Validity ......................................................................................................... 17 Content Validity ....................................................................................... 17 Concurrent Validity .................................................................................. 18 Construct Validity .................................................................................... 19 Predictive Validity . ................................................................................... 19 A2i Assessment Data Displays ........................................................... 20 Assessment Implementation Details ................................................ 23 Assessment Best Practices for Teachers . ..................................................... 23 Technology Requirements . .......................................................................... 24

References .............................................................................................. 25

Scholastic.com/A2i

Page 2

TECHNICAL MANUAL

Overview

A2i is a system for literacy instruction that significantly improves student reading outcomes. Meeting the Every Student Succeeds Act (ESSA) Tier 1 “strong” evidence requirements, A2i leverages technology-based and professional development components (online assessments, data-driven recommendations, and aligned instructional planning) to raise literacy outcomes in Grades K–3. The system uses data from brief online assessments that measure decoding, comprehension, and vocabulary to make individualized recommendations for each student. Specifically, A2i assessments provide educators with the number of minutes that each student needs across meaning-focused or code-focused instruction as well as teacher-managed or child-managed instruction. A2i data is also used to create student groups, provide individualized instructional lesson plans, and monitor student growth. With robust, ongoing professional learning over a three-year implementation, A2i supports both student and teacher growth. Theory of Change ESSA encourages districts and schools to adopt evidence-based programs with a well-specified logic model explaining how the product or solution will likely improve outcomes. Figure 1 shows the inputs needed (e.g., professional development support, technology, and online assessments) to successfully launch A2i and documents the targeted activities (minute recommendations to better personalize and differentiate small- group instruction) that lead to short-term (improved performance on standardized reading assessments) and long-term changes in students (reading at or above grade level by the end of Grade 3).

This document provides technical insight into one of the components of A2i: the A2i online assessments.

Professional Development Professional learning communities (PLCs) or A2i literacy huddles (LHs) (30-min. grade- level meetings with LOS) In-class coaching or individualized classroom coaching (ICCs) (45-min. one- on-one coaching sessions)

Knowledge and Belief

Short Term Improved reading performance after one year of exposure

Knowledge of A2i-recommended instructional strategies Belief that they can implement the program components

On-demand support and flex days

Long Term Reading at or above grade level by the end of Grade 3

A2i Technology

Behavior Use of A2i system-generated instruction recommendations and other resources.

Online student assessment

Software platform

Instruction and grouping recommendations for individual students

Use of differentiated small-group instruction

Lesson-planning tools

Figure 1. The A2i Theory of Change logic model.

Back to Top

Scholastic.com/A2i

Page 3

TECHNICAL MANUAL

Assessment Background

A2i Online Assessments A2i’s computer adaptive online assessments are designed to measure students’ decoding skills, reading comprehension, and vocabulary. Decoding and reading comprehension are measured through the Letters2Meaning (L2M) assessment. Vocabulary and word knowledge are measured using the Word Match Game (WMG) assessment: •  Letters2Meaning is taken on a computer and assesses a student’s letter-name identification, letter- sound identification, word reading, spelling, and composition skills. Students begin the assessment on different tasks, depending on their grade level. The assessment items become more difficult as students answer correctly. If students answer questions incorrectly, the assessment will select easier questions. This process helps determine a student’s true ability level as quickly as possible. Tasks on L2M include selecting the correct letter by name or sound, selecting narrated words from a word bank, selecting letters to build words, and selecting words to generate sentences. •  The Word Match Game is a semantic matching task (e.g., find the two words that go together) that students take individually on a computer. It assesses a student’s vocabulary and word knowledge. As students listen to the audio naming three displayed words (e.g., kitten, cat, tree), they will see the words appear in separate boxes that flash on the screen as the word is spoken. The student then picks the two words that go together (e.g., kitten, cat). Students are provided three training items and then are presented items based on whether the previous item was answered correctly or incorrectly. Like with the Letters2Meaning assessment, if a student answers a question incorrectly, they receive an easier question. If the student answers a question correctly, they get a more difficult one. Use The A2i assessments were developed primarily as formative assessments of reading to assess a student’s ability in vocabulary, decoding, and comprehension. As formative assessments, the primary use case was to collect data at multiple timepoints throughout the school year. The National Council on Measurement in Education (NCME) explains that “formative assessment practices are those that provide teachers and students with information about learning as it develops—not just at the end of a project, unit, or year. The information is formative because it enables adjustments that deepen learning. Teachers use formative assessments to make adjustments to instruction, and students use the feedback from formative assessments to make revisions to their work and their approaches to it.” The actionable information provided by A2i is delivered, using the assessment scores and patented algorithms, as customized “minutes of instruction” across the two key areas of literacy development: code- focused instruction and meaning-focused instruction. In addition to the instructional recommendations, the A2i assessments also serve as a valid and reliable way to track student growth across the year and measure progress. Additionally, these assessments allow educators to track growth longitudinally, across Grades K–3, as foundational skills are being acquired and students are learning to read. Both assessments are administered via a computer and take approximately five to 20 minutes.

Back to Top

Scholastic.com/A2i

Page 4

TECHNICAL MANUAL

ASSESSMENT BACKGROUND

Rationale Teachers know that students come into the classroom with a wide variety of literacy skills. To account for these differences and effectively differentiate instruction, teachers need to have a clear picture of each student’s literacy profile. This logic was confirmed by data when reading researcher Dr. Carol Connor and colleagues identified four types of instruction that, when combined with information about a student’s literacy profiles (i.e., assessment data), could be used to inform effective individualization for any kindergarten through third-grade student. Consistent with the Simple View of Reading (Gough & Tunmer, 1986), they divided the “content of instruction” into two types, meaning-focused and code-focused instruction. They also observed that either instruction type could be delivered by the teacher (teacher-managed) or completed by one or more students (child-managed). By identifying both the content of instruction and who was managing the students’ learning, all reading instruction could be sorted into one of four quadrants (Figure 2). Although this is a relatively simple framework, it can predict a student’s reading growth trajectory with surprising accuracy simply by knowing how much of each type of instruction children had received in these four areas across a school year. Although all children need code-focused, meaning-focused, child-managed, and teacher-managed instruction, the specific amount of time required for each student or group of students depends upon the students’ decoding, vocabulary, and comprehension skills. A2i plays a critical role in determining how many minutes a student should receive on code-focused and meaning-focused instruction. To do this, the A2i assessments create a literacy profile for each student by measuring their decoding and comprehension ability (Letters2Meaning) and their vocabulary level (Word Match Game).

Code-Focused

Meaning-Focused

Teacher- Managed

Child- Managed

Figure 2. The four types of reading instruction as identified by Dr. Carol Connor and colleagues.

Back to Top

Scholastic.com/A2i

Page 5

TECHNICAL MANUAL

ASSESSMENT BACKGROUND

The scores produced by these assessments drive the four grade-level specific algorithms in A2i; a unique set of algorithms are used for each grade level (K–3) to account for developmental differences. The algorithms use a student’s current grade, the time of the school year, and A2i assessment data (i.e., the student’s literacy profile) to make precise recommendations about the number of instructional minutes that will be most effective for each child in each quadrant: code-focused, meaning-focused, child- managed, and teacher-managed instruction. In some cases within A2i, both the L2M and WMG scores are used to determine the instructional recommendation minutes, but for some grade levels and instruction types, data from just one assessment is enough. For example, in first grade, the teacher-managed, meaning-focused recommendation is calculated using both the L2M and WMG scores. However, by second grade, only the L2M score is needed to calculate the instructional minutes for teacher-managed, meaning-focused time. This is why it’s critical that both L2M and WMG produce a quick and accurate score and that students take both assessments in each assessment window to ensure that both data points are up to date. The graph in Figure 3 demonstrates how the needs of students entering first grade with different decoding/comprehension scores on the L2M assessment (see the x-axis) affect the instructional minute recommendations (y-axis).

Recommended Amounts of Instruction in First Grade

Teacher-Managed, Code-Focused

Teacher-Managed, Meaning-Focused

Child-Managed, Code-Focused

Child-Managed, Meaning-Focused

Students Reading Grade Equivalent in the Autumn

Figure 3. Recommended amounts of the four types of instruction for first-graders with reading skills varying from kindergarten to third-grade levels at the beginning of the first grade.

Back to Top

Scholastic.com/A2i

Page 6

TECHNICAL MANUAL

ASSESSMENT BACKGROUND

The Importance of a Computer Adaptive Approach to Assessment These online A2i assessments were created under a larger umbrella of assessments called computer adaptive tests (often abbreviated as CATs). Computer adaptive tests describe a type of assessment that uses technology to deliver a customized set of questions to the test-taker. This approach is advantageous because it allows an assessment to be delivered to groups with very diverse characteristics and proficiency levels as well as report reliable scores in a relatively short amount of time. In practice, using a CAT means that each student might see different questions on the same test. This is because the individual items (questions) are selected based on how students answer the previous questions, as well as additional information like their current grade levels or previous performances on the same assessment. The specific test items on the A2i assessments are described in the next section.

Back to Top

Scholastic.com/A2i

Page 7

TECHNICAL MANUAL

Description of the Assessments

There are two computer adaptive assessments that are directly embedded in the A2i platform: Letters2Meaning (decoding and reading comprehension) and the Word Match Game (vocabulary and word knowledge). Students will see a unique combination of items every time they take the test and, because students only take items needed to obtain an accurate score, the duration of each assessment is short.For effective A2i implementation, it is recommended that the assessments are taken every six to eight weeks, but they can be done a minimum of three times per year. The data generated helps educators plan lessons, monitor the effectiveness of their instruction, and identify students who are not making expected growth.

Letters2Meaning (L2M) L2M Duration

For most students, L2M takes under 20 minutes to complete. Students with advanced reading skills are the most likely to have slightly longer assessment times. In general, the expected length of the assessment can be estimated based on a student’s grade: •  Kindergarten students take 8.13 minutes on average (and over 90 percent of students finished in under 13.36 minutes). •  Grade 1 students take 8.96 minutes on average (and over 90 percent of students finished in under 14.38 minutes). •  Grade 2 students take 9.74 minutes on average (and over 90 percent of students finished in under 16.58 minutes). •  Grade 3 students take 12.2 minutes on average (and over 90 percent of students finished in under 22.35 minutes). L2M Items L2M assesses a student’s letter/sound recognition, spelling, word reading, and basic comprehension ability. This assessment produces a single score (Grade Equivalent, or GE) representing the student’s overall decoding/comprehension performance. The score can be interpreted as proficiency measured by grade and month; for example, a GE of 2.3 represents a score typical of a second-grade student three months into the school year. The multiple item types in L2M follow a developmental sequence that increases in complexity as they get further into the items on the test. •  Item Type 1: Identify Letter Names—students hear a letter name and must select the corresponding letter from a set of five options. Letters in each set are all capitalized or all lowercase. •  Item Type 2: Identify Letter Sounds—students hear a letter sound and must select the corresponding letter that makes that sound from a set of five options. Letters in each set are all capitalized or all lowercase.

Back to Top

Scholastic.com/A2i

Page 8

TECHNICAL MANUAL

DESCRIPTION OF THE ASSESSMENTS

•  Item Type 3: Decode Words—students hear a word read aloud and must select the corresponding word on their screen from a bank of seven options. •  Item Type 4: Spell (Encode) Words—students hear a word read aloud and must spell that word using a drag-and-drop interface and a pool of letters. •  Item Type 5: Construct Sentences—students receive a set of words and associated punctuation and must organize them into a grammatically correct and meaningful sentence. Item difficulty increases as the number of words and sentence complexity increase. L2M Item Types The Letters2Meaning assessment is a decoding and comprehension measure that has 653 items that make up five subtests (item types) for students. A majority of the assessment includes narration with the exception of the final section involving moving words into sentences. Students can proceed at their own pace and repeat the audio as needed. The first few questions under each item type will be practice items and will provide additional feedback. Students will move through these based on their performance. The following item types are outlined in their most common order. Students might not receive each type when they complete the assessment.

Item Type 2 asks students to match letter sounds with letters. The narration for this question is: Click on the letter that makes the sound /d/, dog, /d/. Here the student has selected “Listen Again” to hear the narration again and has already selected their answer by clicking on one of the letter options. Item Type 1 asks students to match letters with letter names. The narration for this question is: Click on the letter “C.” Here the student has already selected their answer by clicking on one of the letter options. Some items will appear in the following layout, with capital and lowercase letters appearing in random order. Progress is tracked on the bar in the bottom left-hand corner.

Back to Top

Scholastic.com/A2i

Page 9

TECHNICAL MANUAL

DESCRIPTION OF THE ASSESSMENTS

Item Type 3 asks students to identify words. The narration for this question is: Click on the word “Mom.” Here the student has already selected their answer by clicking on one of the word options.

Item Type 4 asks students to spell a narrated word with the letters available. The narration for this question is: Rearrange the letters so that they spell the word “dog.” Item Type 4 allows students to click or drag letters on to the response line to submit their answers. Item Type 4 also allows students to rearrange letters they have selected before submitting the response. Students do not need to use all letters presented to correctly spell many of the words. Item Type 5 asks students to combine words to make sentences. The initial narration for this question is: Make a sentence using all of the words. After the initial items, the student completes the remaining sentences without auditory prompting. Item Type 5 requires students to use all of the words. Once the student has determined the correct order, using all words, the “Next” arrow will appear.

Back to Top

Scholastic.com/A2i

Page 10

TECHNICAL MANUAL

DESCRIPTION OF THE ASSESSMENTS

Word Match Game (WMG) WMG Duration

The WMG assessment is consistently shorter than L2M, and most students finish in under 10 minutes. Test duration was relatively consistent across the grades, with slightly longer times for younger students: •  Kindergarten students take 5.22 minutes on average (and over 90 percent of students finished in under 8.02 minutes). •  Grade 1 students take 4.37 minutes on average (and over 90 percent of students finished in under 6.43 minutes). •  Grade 2 students take 4.30 minutes on average (and over 90 percent of students finished in under 6.39 minutes). •  Grade 3 students take 4.62 minutes on average (and over 90 percent of students finished in under 7.23 minutes). WMG Item Types The WMG is designed to assess students’ vocabulary and word knowledge gains. It is adaptive, administered online, and can be taken on a tablet, Chromebook, or laptop. To begin, students take three practice items where they receive feedback. During this portion of the test, three printed words appear on the screen and a box flashes around each word as it is read. They then select the two words that go together. Once they have completed the training items, they click an arrow to proceed to the next item. Each item includes the narration of each word to reduce the impact of reading ability on the assessment score. Students can repeat the narration as many times as needed before selecting a response. The WMG provides teachers with Grade Equivalent (GE) and Age Equivalent (AE) scores after a student completes an assessment. As mentioned above, the A2i assessments serve two primary purposes. They drive the research-based algorithms in A2i that translate children’s present language and literacy skills into recommended minutes of literacy instruction in four areas. The assessments also provide a means for teachers to monitor children’s growth in literacy skills over time. The WMG is a vocabulary assessment that has 241 items and one item type. The entire assessment includes narration, so students can proceed at their own pace and repeat the audio as needed. The first three items are practice items and will provide additional feedback. This assessment also includes experimental items, so students might complete a few items that are not included in their final reported score. Items will appear in the following layout, with three words appearing in random order for each question. Progress is tracked on the bar below, and the assessment ends on a dark screen that includes graphics of fireworks.

Back to Top

Scholastic.com/A2i

Page 11

TECHNICAL MANUAL

DESCRIPTION OF THE ASSESSMENTS

First, students listen to the words displayed and get directions about the assessment tasks. Words are read aloud and highlighted during the narration portion.

Following the directions, students can select the two “words that go together” or “Listen again” at the top.

A2i Assessment Score Types The primary score produced by the L2M assessments is the GE score. A GE score indicates the grade level at which a student’s performance matches the average performance of students at that grade level. It is designed to provide a clear way to communicate a student’s reading or literacy abilities in relation to typical grade-level expectations. The specific grade-level expectations are calculated using the results from a standardized literacy test that have been collected from a representative sample of students. The GE test results are compared to a large sample of students from the relevant grade levels. For the A2i assessments, this was done using data from the Woodcock-Johnson III Tests of Achievement and the Northwest Evaluation Association Measure of Academic Progress (NWEA MAP) reading assessment. It is important to note that a GE score does not necessarily mean that a student has completed all the skills and knowledge up to their GE level, but rather that their performance on the test aligns with what is typically seen at that grade level.

Back to Top

Scholastic.com/A2i

Page 12

TECHNICAL MANUAL

DESCRIPTION OF THE ASSESSMENTS

The WMG assessment produces a GE and AE score. Similar to the GE, an AE score indicates the age level at which a student’s performance matches the average performance of students at that age. It is most often used to communicate a student’s vocabulary or literacy abilities in relation to typical age-based expectations. We recommend focusing on the AE score when reporting on student achievement and growth on the WMG. This is because these score types also reflect how skills develop. Vocabulary development begins at birth and is strongly correlated with a student’s age. Because of this developmental connection between language, vocabulary levels, and age, the WMG provides an AE score. Understanding the GE and AE score types There are a number of other reasons why A2i uses the GE and AE scores for reporting assessment results as well. First is a logistical consideration; mathematically, the GE and AE values are needed to accurately run the A2i recommendation algorithms .​Think of it like this: The A2i recommendation algorithms are essentially complex algebraic equations, and they can only operate if the correct data for each variable is entered. This means that if the equation was created to operate with numerical inputs from approximately 0 to 7 (like the A2i GE scores), and it receives a score of 3.2 to input it will operate correctly. In contrast, if you used standardized scores that usually include values ranging from 80 to 120, these scores could technically still be entered into the algorithm equations, but the output would be uninterpretable and A2i would not produce accurate instructional minute recommendations. The A2i algorithm equations were designed to work specifically with GE and AE scores because student performance was originally captured using the GE and AE values calculated using the Woodcock-Johnson III Test of Achievements. Once the assessment portion of A2i needed to be moved online and made automatic, the A2i assessments were designed to generate the same score types. A secondary benefit of GE and AE scores is interpretability . They are easy for teachers to understand and connect back to performance level. For example, a third-grade teacher who has a student b​ eginning the school year at a second-grade reading level (GE = 2.0) can immediately recognize that the student is performing about a year behind what would be considered on level for the third grade. Finally, the GE and AE scores generated by the A2i assessments were created using national norms, meaning that the scores produced by the assessments align to a large and representative body of students. This means the GE and AE scores on the A2i assessments don’t just represent how an individual student is doing relative to peers in their class or school—their GE and AE values provide information on how a student’s performance compares to students across the United States.

Back to Top

Scholastic.com/A2i

Page 13

TECHNICAL MANUAL

DESCRIPTION OF THE ASSESSMENTS

In addition to overall performance, the GE and AE scores provided by A2i also provide a useful way to measure student growth. In part, this is because the CAT framework allows all students across Grades K–3 to see items from the same test bank. This means that the scores from L2M and WMG are on the same scales and can be compared across students and grades. Essentially, the GEs and AEs form a continuous sliding scale where any K–3 student’s progress and growth can be accurately tracked. In addition, we have also confirmed that the “sliding scales” of GE and AE scores measured by the A2i assessments have equal intervals. This means that each “step” on the GE and AE scale represents the same amount of growth relative to one another. It’s important to note that not all GE and AE scores share these characteristics, but once the correct conditions are established, scores can be added, subtracted, or averaged and used in data visualizations to illustrate growth. For a more detailed explanation of the GE growth scale, please view the Predictive Validity section.

Figure 4. An Equal Interval Scale (left) vs an Unequal Interval Scale (right).

Here are a few specific examples of how to understand and apply the GE score:

On Level •  If a student is just entering first grade and has a GE score of 1.0, the student is reading at their expected grade level, based on national norms. •  If a first-grade student has received five months of instruction and has a GE score of 1.5, the student is reading at their expected grade level, based on national norms. If the student remains on track, they should end the first-grade year performing at a GE of 1.9, indicating that they have the reading ability of a first-grader who has received nine months of effective instruction. Below Level •  If a student is just entering first grade and has a GE score of 0.0, the student is reading one year behind their expected grade level, based on national norms. •  If a student has a negative GE score, they are reading at a PreK grade level, based on national norms. For example, a GE of -0.4 means that a student is reading at the level of a student who will enter kindergarten in four months’ time.

Back to Top

Scholastic.com/A2i

Page 14

TECHNICAL MANUAL

DESCRIPTION OF THE ASSESSMENTS

Above Level •  If a student is just entering first grade and has a GE score of 1.6, the student is reading above their expected grade level, based on national norms. As mentioned above, in addition to the GE score, the WMG provides an AE score. The AE score can be interpreted in a similar way to the GE score. A student with an AE vocabulary score of 6.5 has the vocabulary of a child six and one-half years old, based on national norms. A2i was created to support instruction for all students receiving English and language arts instruction in the general education classroom. This is why each student receives a unique, customized value, called target outcome , to represent their assessment goal for the end of each school year. The target outcome in A2i is determined based on the student’s first assessment score each school year. Based on this starting point, students are given a target goal for the end of the school year (over the estimated nine months of instruction). The target outcome will reflect a goal of reaching their grade level if a student begins the year at or below grade level in the fall. If a student enters the year above level, the target will be set nine months beyond their starting point, with the A2i recommendations also reflecting the instructional needs of this advanced student.

Back to Top

Scholastic.com/A2i

Page 15

TECHNICAL MANUAL

Reporting on Reliability & Validity

In the context of assessments, reliability refers to the consistency and dependability of the results obtained from a measurement tool or evaluation. It ensures that the outcomes of an assessment are stable and repeatable, which gives us confidence in the accuracy and precision of the results. Reliability is also important because it allows us to make meaningful interpretations based on the assessment results. Validity refers to whether an assessment measures what it intends to measure. Without reliability, it is challenging to establish validity because inconsistent or unreliable measurements might not accurately capture the construct being assessed. Below, we discuss the reliability and validity of A2i assessments using several different approaches. Reliability Internal Consistency Reliability To be useful, assessment results should be reliable—stable, accurate, and dependable. A test’s accuracy is estimated by a number called the standard error of measurement (SEM). Assessment reliability can be evaluated for computer adaptive tests in a number of ways. One straightforward approach is to evaluate the relationship between a student’s “true ability” (the student’s exact ability level at the time of testing) and their actual score on a given assessment (Kim, 2012). However, it is impossible to measure a real person’s exact proficiency level without any error at all, so this approach is only possible when student test-takers can be simulated (since the simulation creates the “true ability”). Using this type of assessment simulator, the Letters2Meaning test was found to have a high squared- correlation reliability metric of 0.904 (values range from 0 to 1). For reference, the maximum reliability of assessments is indicated by a score of 1.0, representing perfect reliability. For most practical purposes, values higher than 0.7 are considered reliable. Conversely, scores closer to 0.0 indicate lower reliability, and one should be cautious about interpreting results from tests with very low reliability. Another approach to calculating reliability for this type of assessment is to use the observed relationship between student scores and assessment-specific error measurements. This approach yields reliability estimates of 0.93 to 0.94 , depending on the specific methodology chosen. The Word Match Game (WMG) reliability is lower than for L2M. Specifically, the reliability for the WMG is currently 0.32 , based on the same approach using a ratio of scores to error measurements. The current reliability estimates are primarily driven by the accuracy of the item-level difficulty estimation. The L2M items were recalibrated in 2020 using a large dataset (n > 5,000) and this update, combined with improved scoring model parameters, improved the L2M reliability to the values reported above. This work has not yet been completed for the WMG, and as such, a lower reliability is currently reflected for this assessment. The current WMG items were originally calibrated for a study conducted over the 2015–2016 school year on a smaller dataset (n < 1,000). Psychometric analysis of the WMG did reveal that the total test information was greater than 2.0 throughout the range of Rasch theta scores, suggesting that computer adaptive administration of the WMG can produce reliable individual scores throughout the full range of student abilities. (Values consistently greater than 2.0 correspond to a reliability greater than 0.7 for the assessment.)

Back to Top

Scholastic.com/A2i

Page 16

TECHNICAL MANUAL

REPORTING ON RELIABILITY & VALIDITY

Test-Retest Reliability Test-retest reliability is a measure of the consistency of a test or assessment. If an assessment has high test-retest reliability, then a student who takes the test multiple times within a short time period will receive relatively consistent scores. Test-retest reliability can be a helpful metric for evaluating whether differences in scores are reflecting real differences between individuals, rather than errors. Test-retest reliability assumes a relatively unchanging ability or multiple tests taken in close temporal proximity (i.e., back to back in a short amount of time), but neither of these assumptions or testing protocols apply to Letters2Meaning. L2M is an assessment designed to capture reading ability, a trait that changes quickly in K–3 students, and it is used as a tool in the classroom by teachers to understand longer-term growth patterns among students (so it is very rare that a student will take L2M more frequently than ≈6 weeks). In spite of these limitations, we can simulate student response behavior using an advanced class of statistical tools called item response theory. In essence, these tools let us approximate how a student of a given reading ability might perform on our assessment. Because L2M is a computer adaptive test, the specific questions and order of those questions differ from student to student, and from test to test. By incorporating the same logic and rules used in the real-world assessments, we can simulate entire tests for students, which gives us the ability to evaluate how a student of a given grade and reading level would perform on L2M. Using 1,000 iterations of 58,000 simulated students across Grades K–3, we found that the test-retest reliability of L2M is approximately 0.965 (on a scale of 0 to 1), indicating consistent scores for students and that most variation in scores arises from true differences in student reading ability.

Validity Content Validity

Content validity measures the degree to which assessment items represent the topic, constructor, or behavior the test is designed to measure. Researchers use multiple methods to systematically determine how accuraetly a test is measuring the intended content and determine whether each item is accurately contributing to the overall score. The A2i assessment items were authored using specific protocols developed by Dr. Carol Connor and reviewed by expert researchers. Experienced teachers, early literacy instructors, graduate students, and research scientists authored the items using these guidelines. Once drafted, feedback from teachers and rigorous statistical testing was used to fine-tune the items and ensure content validity. The items designed by this collaborative team fit general guidelines about age and developmental appropriateness, and come from individuals representing a diverse swath of experiences and backgrounds, minimizing potential bias. Importantly, we leverage a powerful branch of statistics that lets us measure key attributes of the items themselves so that expert opinion is only the starting point for maximizing content validity. Regular analyses enable us to identify the relevant age and developmental status of each item, the degree to which it correlates with other items designed to measure reading ability, and whether students across demographic groups all answer the question in the same way. Thus, we use concrete, empirical guidelines to continually evaluate and improve our item bank.

Back to Top

Scholastic.com/A2i

Page 17

TECHNICAL MANUAL

REPORTING ON RELIABILITY & VALIDITY

Concurrent Validity Concurrent validity measures the degree to which assessment scores from a new test compare to a well-established test. This requires both variables to be measured at approximately the same time (i.e., concurrently), to prevent external factors from affecting the variables of interest. Because the L2M assessment was specifically designed to capture a student’s letter-naming, decoding, encoding, and comprehension ability, the components of literacy common to many reading-focused assessments can be used to determine concurrent validity. Specifically, the degree to which a particular metric (such as the GE score from an L2M assessment) actually captures the underlying variable of interest (in this case, literacy) is referred to as validity, and positive correlations between different assessments demonstrate that those assessments are measuring the same underlying characteristic and provide evidence of an assessment’s concurrent validity. Correlation coefficients at and above 0.70 are commonly taken to indicate strong relationships (Moore et al., 2013). Data collected from 3,954 third-grade students at the end of the 2020–21 school year was used to evaluate correlations between the A2i L2M assessment and eight third-party literacy assessments. This analysis uncovered robust correlations* between L2M and each of these external assessments. These correlations indicate that L2M assessments capture important information about student literacy and demonstrate a high concurrent validity for L2M across varied conditions, locations, and populations.

Figure 5. Concurrent validity of L2M assessment demonstrated by correlation with third-party assessments. L2M Grade Equivalent scores are significantly correlated with the scores produced by third party literacy assessments and do so more efficiently (i.e., in less time than any other assessment save for STAR Reading, which also takes approximately 15 minutes).

Correlations Between A2i and Third-Party Assessments

Assessment

*These correlations were conducted using student-specific growth models to predict student performance on L2M at the time of the third-party assessments. These growth models pool information from our large database of student assessments and effectively smooth out much of the unavoidable noise that occurs from assessment to assessment. However, these growth models have not yet been incorporated into the A2i system, so these correlations represent best case concurrent validity.

Back to Top

Scholastic.com/A2i

Page 18

TECHNICAL MANUAL

REPORTING ON RELIABILITY & VALIDITY

Construct Validity Construct validity refers to the degree to which an assessment or measurement tool is capable of measuring the attribute (i.e., construct) it is intended to measure. In the case of A2i’s L2M assessment, which is designed to measure student reading ability, correlations with other reading assessments can provide evidence of construct validity. In particular, the convergence between L2M scores and a number of independently administered and validated third-party assessments (see above) provides strong evidence that L2M is measuring the same underlying construct—that is, reading ability. Additionally, because L2M is designed to capture information about student reading ability—a complex trait that includes comprehension, fluency, vocabulary knowledge, etc.—and is correlated with various additional reading assessments, it can be suggested that it is capturing this construct complexity effectively. Predictive Validity Predictive validity describes the extent to which an assessment can predict a student’s future performance. Though individual student growth trajectories frequently differ from one another, the change in each student’s performance over time allows us to make inferences about the rate at which they are growing—a fact that is true only for tests that have equal interval measurements. A2i’s L2M assessment gives scores that are meant to measure a student’s learning progress and are easy to understand. These GE scores are designed to be on a scale where a difference of one point represents one year of learning. As can be seen in the plot below, the growth trajectory of scores from kindergarten to third grade is on an equal interval scale. This attribute of the assessment means that student growth is directly comparable. For example, a student who grew from a beginning-of-year GE of 2.0 to an end-of- year GE of 2.8 (i.e., eight months of growth), would be comparable to a student who grew from a 2.7 to a 3.5 (also eight months of growth). However, for students reading at PreK levels, the progress is faster and follows a curved trajectory.

Figure 6. Equal interval scoring for L2M. The L2M assessment provides scores that are approximately equal interval for the range of reading abilities between the start of kindergarten (Grade GE = 0.0) and the end of the third grade (GE = 3.9). This attribute enables predictive validity such that measuring student growth rates will enable users to effectively predict future student success.

L2M Theta Growth Trajectory

Grade Equivalent Score

Back to Top

Scholastic.com/A2i

Page 19

TECHNICAL MANUAL

A2i Assessment Data Displays

There are three main data visualizations for our teacher dashboard display. One reflects usage for a user on the platform, and the other three display aggregated visualizations of the assessment data for a classroom. The teacher also has access to individual student data on the Progress Monitor page. A description of each of these data displays is included below, along with a screenshot of the actual visualization.

Teacher Usage Report This display reports a user’s weekly usage time within A2i. This graph can be used to help guide login patterns to align with ideal teacher usage patterns seen in our research studies. Previous data on software usage averaged out to about 10–20 minutes per week spent on A2i. In fact, analysis of A2i teacher usage results has revealed that users who were consistently spending time on the A2i platform had better student outcomes (Connor et al., 2013). Average Growth Rate The second dashboard display provides a visualization of the classroom-level growth rate. This panel displays the average performance of students on the L2M assessment across the school year as well as a Research Classrooms comparison line as a reference point. A2i was designed to support the needs of all students, regardless of where they begin the school year in reading ability and, with the right amount and type of instruction, will ensure that all students make gains. This graph allows a teacher to track that progress at a classroom level before focusing on specific A2i groups or individual students.

Back to Top

Scholastic.com/A2i

Page 20

TECHNICAL MANUAL

A2I ASSESSMENT DATA DISPLAYS

Assessment Tracker This Assessment Tracker displays how many students within a class have been assessed and when, allowing for easy confirmation of when an entire class has new data in A2i. The count can be toggled between a daily, weekly, or monthly display using the buttons on the top-left side of the panel. A list of the students who have not tested is also available in a pop-up window.

Students on Track This feature provides a quick and easy-to-read visual display that identifies what percentage of students in a classroom are on track to meet their A2i target outcome by the end of the year. This donut chart is automatically updated after each assessment round. Each label represents the following:

•  On track: These students are on pace to end the year reading on grade level or make nine months of growth (if they began the year reading above level). The end-of-year goal for each student is calculated by A2i and displayed as the student’s target outcome. •  Approaching target: These students are two months behind the growth goal required to reach their target outcome. •  Not on track: These students are over two months behind on their growth goal and are not yet on track to reach their target outcome. •  Not enough scores: These students have only taken the L2M assessment once. At least two L2M assessment scores are required to calculate a student’s rate of growth.

Back to Top

Scholastic.com/A2i

Page 21

TECHNICAL MANUAL

A2I ASSESSMENT DATA DISPLAYS

Progress Monitor Page The Progress Monitor page allows teachers to track student growth and progress toward their target outcomes and end-of-year goals. It is organized by the student groups that a teacher is using and displays each student’s L2M scores from their assessments. These scores are color-coded to reflect student progress.

Scores will initially appear with no color code, because that student has not yet completed two assessment rounds, so progress (as measured by change over time) can’t be tracked yet.

A teacher can hover over a student’s name to see their most-recent assessment score and end-of-year target outcome, which will reflect nine months’ worth of growth for a student (or more if they come in below grade-level). Hovering over each individual assessment score will allow the teacher to compare it with the target outcome for that month to see whether a student is on track. On the right side of this page are two GE growth columns where you can see individual student growth at a glance. These columns reflect student progress so far this year and show how much students are projected to grow throughout the academic year at their current trajectory. Data from these columns is used to help a teacher compare and analyze growth, celebrate successes, and focus planning and instructional decisions.

Back to Top

Scholastic.com/A2i

Page 22

TECHNICAL MANUAL

Assessment Best Practices for Teachers It’s important to prepare students to take the assessment, especially when they are new to the procedure or have not taken assessments on a computer. Certain steps can be taken before, during, and after the assessment windows to encourage students to try their best and limit the number of distractions that can affect scores. Before the assessment window , it is best to align expectations for all students. Modeling the testing procedures can be very helpful for this, including demonstrating how to access the assessment online and how to set up and use any technology (including headphones, mouse, or touchscreens). It can also be helpful to use the video directions associated with each assessment to walk students through the audio they should expect to hear as well as review the item types that they might encounter while taking the assessment. Though the actual L2M and WMG assessment items are only in English, there are student- facing instructional videos in both English and Spanish. These videos describe how to take the test and review the question prompts as well as the item format students will see when they log in. During the assessment window , setting up assessments in the classroom as a small-group center is typically the most effective approach for completing testing. Students can complete one assessment at a time, typically allowing an entire class to complete both the L2M and WMG over two or three days. Teachers can review scores as students complete the assessments in real time and ensure that data is populating as expected. After the assessment window , once students have all completed the L2M and WMG assessments, the data visualizations in A2i populate at the classroom, school, and district level. This is a good time to congratulate students on completing the assessments and share data with parents/caregivers, administrators, or the students themselves. In addition to reviewing the data reports and growth information, the updated data will also influence the A2i instructional recommendations and student grouping. Teachers should set aside planning time after each assessment round to update their literacy block to account for these changing needs as students grow. Assessment Implementation Details

Back to Top

Scholastic.com/A2i

Page 23

Page i Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Page 7 Page 8 Page 9 Page 10 Page 11 Page 12 Page 13 Page 14 Page 15 Page 16 Page 17 Page 18 Page 19 Page 20 Page 21 Page 22 Page 23 Page 24

Made with FlippingBook - Online catalogs