Teaching Item Bank
← Slides & decksMultiple-choice items with answer keys and rationales. Filter by topic or keyword.
20 items.
Random assignment to conditions primarily protects which type of validity?
- External validity
- Internal validity
- Construct validity
- Statistical conclusion validity
Random assignment equates groups in expectation, ruling out confounds and supporting causal (internal-validity) claims.
A researcher studies whether income predicts well-being using survey data with no manipulation. This is best described as:
- A true experiment
- An observational (correlational) design
- A randomized controlled trial
- A factorial experiment
With no manipulation or random assignment, the design is observational and supports association but not strong causal claims.
Selecting participants because they scored extremely low and retesting them later risks which threat?
- Maturation
- Regression to the mean
- Instrumentation
- Demand characteristics
Extreme scores tend to move toward the mean on retest regardless of any intervention.
The degree to which findings generalize to other people and settings is:
- Internal validity
- External validity
- Face validity
- Reliability
External validity concerns generalization across populations, settings, and times.
Differential attrition across conditions is a threat mainly because it can:
- Increase statistical power
- Make groups non-equivalent by the end of the study
- Improve construct validity
- Eliminate confounds
If dropout differs by condition, the groups are no longer equivalent, reintroducing confounding.
A p-value of .03 means:
- There is a 3% chance the null hypothesis is true
- Assuming the null is true, data this extreme (or more) occur 3% of the time
- The effect is large
- The result will replicate 97% of the time
A p-value is computed assuming the null is true; it is the probability of data at least as extreme as observed.
Failing to reject a false null hypothesis is a:
- Type I error
- Type II error
- Sampling error
- Measurement error
A Type II error (beta) is a false negative — missing a true effect.
Statistical power is defined as:
- 1 − alpha
- 1 − beta
- The p-value
- The effect size
Power = 1 − beta, the probability of detecting a true effect of a given size.
Which Cohen's d is conventionally considered a 'medium' effect?
- 0.20
- 0.50
- 0.80
- 1.00
Cohen's conventions: d ≈ .2 small, .5 medium, .8 large.
A 95% confidence interval is best interpreted as:
- A 95% probability the true value lies in this specific interval
- A range of plausible parameter values; 95% of such intervals capture the parameter over repeated sampling
- The range containing 95% of the data
- The standard error times 95
The 95% refers to the long-run capture rate of the procedure across repeated samples.
Cronbach's alpha is an index of:
- Criterion validity
- Internal-consistency reliability
- Test difficulty
- External validity
Alpha summarizes how consistently a set of items measure the same thing.
Reliability and validity are related such that:
- A valid test must be reliable
- A reliable test must be valid
- They are unrelated
- Validity caps reliability
Reliability sets a ceiling on validity; a measure cannot be valid for a purpose if it is not reliable.
Convergent and discriminant evidence are forms of:
- Content validity
- Construct validity
- Inter-rater reliability
- Face validity
Convergent (relates to similar constructs) and discriminant (unrelated to different constructs) evidence support construct validity.
An item-total correlation near zero suggests the item:
- Is too difficult
- Does not cohere with the rest of the scale
- Has high discrimination
- Is perfectly reliable
Low item-total correlation indicates the item is not measuring the same construct as the scale.
A T-score has a mean and standard deviation of:
- 0 and 1
- 50 and 10
- 100 and 15
- 5 and 2
T-scores are standardized to M = 50, SD = 10.
A score above a depression screener's cut-off indicates:
- A confirmed diagnosis
- That a fuller clinical evaluation may be warranted
- Treatment is unnecessary
- The person is malingering
Screeners flag the possible need for further assessment; they do not diagnose.
Specificity of a screening test refers to:
- The proportion of true cases correctly identified
- The proportion of non-cases correctly identified
- The total accuracy
- The base rate of the disorder
Specificity is the true-negative rate — non-cases correctly screened out.
Combining self-report with a performance task is an example of:
- Mono-method assessment
- Multi-method assessment
- Criterion contamination
- Norm referencing
Using different methods reduces shared-method bias and strengthens conclusions on convergence.
Reporting a confidence range around a score rather than a single number reflects attention to:
- Measurement error
- Random assignment
- Demand characteristics
- Publication bias
Every observed score contains measurement error; a confidence range communicates that uncertainty.
Norm-referenced interpretation is valid only when:
- The test is brand new
- The examinee resembles the standardization sample
- The score is above average
- No cut-off is used
Norms generalize only to populations resembling the sample on which they were established.