|
|
Adaptive testing
|
Journal Articles
|
Barrada, J. R. (2012).
Tests adaptativos informatizados: Una perspectiva general [Computerized adaptive testing: A general perspective]. Anales de Psicología, 28, 289-302.
[Abstract]
[PDF-Spanish]
Computerized adaptive testing (CAT) adapts the items to be administered to each examinee according to the responses to the previous items. In this way, more accurate trail level estimations can be obtained or test length is reduced. In the last years, several CATs have been developed in Spain and it is probably that, given the advantages of this technique, more will become available soon. The goal of this work is to offer and updated view of this topic. For doing so, the basic structure of a CAT is presented and the different steps composing it are commented. Special emphasis is given ot item selection, the fundamental part for the adaptability of the test, from the perspective of the four objectives that must be satisfied by a CAT: (a) accuracy, (b) item bank security; (c) content balance; and (d) test maintenance.
Olea, J., Barrada, J. R., Abad, F. J., Ponsoda, V., & Cuevas, L. (2012).
Computerized adaptive testing: The capitalization on chance problem. Spanish Journal of Psychology, 15, 424-441.
[Abstract]
[PDF]
This paper describes several simulation studies that examine the effects of capitalization on chance in the selection of items and the ability estimation in CAT, employing the 3-parameter logistic model. In order to generate different estimation errors for the item parameters, the calibration sample size was manipulated (! = 500, 1000 and 2000 subjects) as was the ratio of item bank size to test length (banks of 197 and 788 items, test lengths of 20 and 40 items), both in a CAT and in a random test. Results show that capitalization on chance is particularly serious in CAT, as revealed by the large positive bias found in the small sample calibration conditions. For broad ranges of è, the overestimation of the precision (asymptotic Se) reaches levels of 40%, something that does not occur with the RMSE (è). The problem is greater as the item bank size to test length ratio increases. Potential solutions were tested in a second study, where two exposure control methods were incorporated into the item selection algorithm. Some alternative solutions are discussed.
Barrada, J. R., Abad., F. J., & Olea, J. (2011).
Varying the valuating function and the presentable bank in computerized adaptive testing. Spanish Journal of Psychology, 14, 500-508.
[Abstract]
[PDF]
In computerized adaptive testing, the most commonly used valuating function is the Fisher information function. When the goal is to keep item bank security at a maximum, the valuating function that seems most convenient is the matching criterion, valuating the distance between the estimated trait level and the point where the maximum of the information function is located. Recently, it has been proposed not to keep the same valuating function constant for all the items in the test. In this study we expand the idea of combining the matching criterion with the Fisher information function. We also manipulate the number of strata into which the bank is divided. We find that the manipulation of the number of items administered with each function makes it possible to move from the pole of high accuracy and low security to the opposite pole. It is possible to greatly improve item bank security with much fewer losses in accuracy by selecting several items with the matching criterion. In general, it seems more appropriate not to stratify the bank.
Olea, J., Abad, F. J., Ponsoda, V., Barrada, J. R., & Aguado, D. (2011).
eCAT-Listening: Design and psychometric properties of a computerized adaptive test on English Listening. Psicothema, 23, 802-807.
[Abstract]
[PDF]
In this study, eCAT-Listening, a new computerized adaptive test for the evaluation of English Listening, is described. Item bank development, anchor design for data collection, and the study of the psychometric properties of the item bank and the adaptive test are described. The calibration sample comprised 1,576 participants. Good psychometric guarantees: the bank is unidimensional, the items are satisfactorily fi tted to the 3-parameter logistic model, and an accurate estimation of the trait level is obtained. As validity evidence, a high correlation was obtained between the estimated trait level and a latent factor made up of the diverse criteria selected. The analysis of the trait level estimation by means of a simulation led us to fi x the test length at 20 items, with a maximum exposure rate of .40.
Barrada, J. R., Olea, J., Ponsoda, V., & Abad, F. J. (2010).
A method for the comparison of item selection rules in computerized adaptive testing. Applied Psychological Measurement, 34, 438-452.
[Abstract]
[PDF]
In a typical study of the relative efficiency of two competing item selection rules in computerized adaptive testing, the common result is that they simultaneously differ in accuracy and security, making it difficult to reach a conclusion on which is the more appropriate rule. This study proposes a strategy to conduct a global comparison of two or more selection rules. A plot showing the performance of each selection rule for several maximum exposure rates is obtained and the whole plot is compared with other rule plots. The strategy has been applied in a simulation study with fixed length CATs for the comparison of 6 item selection rules: point Fisher information, Fisher information weighted by likelihood, Kullback-Leibler weighted by likelihood, maximum information stratification method with blocking, progressive method and proportional method. Our results show that there is no optimal rule for any overlap value or RMSE. The fact that a rule, for a given level of overlap, has lower RMSE than another does not imply that this pattern holds for another overlap rate. A fair comparison of the rules requires extensive manipulation of the maximum exposure rates. The best methods were Kullback-Leibler weighted by likelihood, proportional method, and maximum information stratification method with blocking.
Abad, F. J., Olea, J., Aguado, D., Ponsoda, V., & Barrada, J. R.. (2010).
Deterioro de parámetros de los ítems en tests adaptativos informatizados: estudio con eCAT [Item parameter drift in computerized adaptive testing: Study with eCAT]. Psicothema, 22, 340-347.
[Abstract]
[PDF-Spanish]
This study describes the parameter drift analysis conducted on eCAT (a Computerized Adaptive Test to assess the written English level of Spanish speakers). The original calibration of the item bank (N = 3224) was compared to a new calibration obtained from the data provided by most eCAT operative administrations (N = 7254). A Differential Item Functioning (DIF) study was conducted between the original and the new calibrations. The impact that the new parameters have on the trait level estimates was obtained by simulation. Results show that parameter drift is found especially for a and c parameters, an important number of bank items show DIF, and the parameter change has a moderate impact on high-level-English theta estimates. It is then recommended to replace the original estimates by the new set.
Olea, J., Abad, F. J., & Barrada, J. R.. (2010).
Tests informatizados y otros nuevos tipos de tests [Computerized tests and other new types of testing]. Papeles del Psicólogo, 31, 94-107.
[Abstract]
[PDF-Spanish]
[PDF-English]
The paper provides a short description of some test types that are earning considerable interest in both research and applied areas. The main feature of a computerized adaptive test is that in despite of the examinees receiving different sets of items, their test scores are in the same metric and can be directly compared. Four other test types are considered: a) model-based tests (a model or theory is available to explain the item response process and this makes the prediction of item difficulties possible), b) ipsative tests (the examinee has to select one among two or more options with similar social desirability; so, these tests can help to control faking or other examinee’s response biases), c) behavioral tests (personality traits are measured from non-verbal responses rather than from self-reports), and d) situational tests (the examinee faces a conflictive situation and has to select the option that best describes what he or she will do). The paper evaluates these types of tests, comments on their pros and cons and provides some specific examples.
Barrada, J. R., Abad, F. J., & Veldkamp, B. P. (2009).
Comparison of methods for controlling maximum exposure rates in computerized adaptive testing. Psicothema, 21, 313-320.
[Abstract]
[PDF]
This paper has two objectives: (a) to provide a clear description of three methods for controlling the maximum exposure rate in computerized adaptive testing, the Symson-Hetter method, the restricted method and the item-eligibility method, showing how all can be interpreted as methods for constructing the variable sub-bank of items from which each examinee receives the items in his test; (b) to indicate the theoretical and empirical limitations of each method and to compare their performance. With the three methods, we obtain basically indistinguishable results in overlap rate and RMSE (differences in the third decimal place). The restricted method is the best method for controlling exposure rate, followed by the item-eligibility method. The worst is the Sympson-Hetter method. The restricted method presents problems of sequential overlap rate. Our advice is to use the item-eligibility method, as it saves time and satisfies the objectives of restricting maximum exposure.
Barrada, J. R., Olea, J., Ponsoda, V., & Abad, F. J. (2009).
Item selection rules in Computerized Adaptive Testing: Accuracy and security. Methodology, 5, 7-17.
[Abstract]
[PDF]
The item selection rule (ISR) most commonly used in CATs is to select the item with maximum Fisher information for the current trait estimation (PFI). Alternative ISRs have been proposed. Fisher information considered in an interval (FI*I), Fisher information weighted with the likelihood function (FI*L), Kullback-Leibler information in an interval (KL*I) and Kullback-Leibler weighted with the likelihood function (KL*L) have shown a greater precision of trait estimation at the early stages of CAT. A new ISR is proposed, Fisher information by interval with geometric mean (FI*IG), which tries to rectify some detected problems in FI*I. We evaluate for these six ISRs accuracy and item bank security. FI*IG is the only ISR which outperforms simultaneously PFI in both variables. For the other ISRs there seems to be a trade-off between accuracy and precision, being PFI the one with worse accuracy ang greater security, and the ISRs using the likelihood function the reverse.
Barrada, J. R., Veldkamp, B. P., & Olea, J. (2009).
Multiple maximum exposure rates in computerized adaptive testing. Applied Psychological Measurement, 33, 58-73.
[Abstract]
[PDF]
Computerized adaptive testing is subject to security problems, as the itembank content remains operative over long periods and administration time is flexible for examinees. Spreading the content of a part of the item bank could lead to an overestimation of the examinees' trait level. The most common way of reducing this risk is to impose amaximum exposure rate (rmax) that no itemshould exceed. Several methods have been proposed with this aim. All of thesemethods establish a single value of rmax throughout the test. This study presents a new method, themultiple-rmax method, that defines asmany values of rmax as the number of items presented in the test. In this way, it is possible to impose a high degree of randomness in item selection at the beginning of the test, leaving the administration of items with the best psychometric properties to the moment when the trait level estimation is most accurate. The implementation of themultiple-rmax method is described and is tested in simulated itembanks and in an operative bank. Comparedwith a single maximumexposuremethod, the new method has a more balanced usage of the itembank and delays the possible distortion of trait estimation due to security problems, with either no or only slight decrements of measurement accuracy.
Barrada, J. R., Olea, J., & Abad., F. J. (2008).
Rotating item banks versus restriction of maximum exposure rates in computerized adaptive testing. Spanish Journal of Psychology, 11, 618-625.
[Abstract]
[PDF]
If examinees were to know, beforehand, part of the content of a computerized adaptive test their estimated trait levels would then have a marked positive bias. One of the strategies to avoid this consists of dividing a large item bank into several sub-banks and rotating the sub-bank employed (Ariel, Veldkamp & van der Linden, 2004). This strategy permits substantial improvements in exposure control at little cost to measurement accuracy. However, we do not know whether this option provides better results than using the master bank with greater restriction in the maximum exposure rates (Sympson & Hetter, 1985). In order to investigate this issue, we worked with several simulated banks of 2100 items, comparing them, for RMSE and overlap rate, with the same banks divided in two, three... up to seven sub-banks. By means of extensive manipulation of the maximum exposure rate in each bank, we found that the option of rotating banks slightly outperformed the option of restricting maximum exposure rate of the master bank by means of the Sympson-Hetter method.
Barrada, J. R., Olea, J., Ponsoda, V. & Abad., F. J. (2008).
Incorporating randomness in Fisher information for improving item exposure control in CATs. British Journal of Mathematical and Statistical Psychology, 61, 493-513.
[Abstract]
[PDF]
The most commonly employed item-selection rule in a Computerized Adaptive Test is that of selecting the item with the maximum Fisher information for the estimated trait level. This means a highly unbalanced distribution of item exposure rates, a high overlap rate among examinees and, for item bank management, strong pressure to replace items with a high discrimination parameter in the bank. An alternative for mitigating these problems involves, at the beginning of the test, basing item selection mainly on randomness. As the test progresses, the weight of information in the selection increases. In the present work we study, for two selection rules, the progressive method (Revuelta & Ponsoda, 1998) and the proportional method (Segall, 2004a), different functions that define the weight of the random component according to the position in the test of the item to be administered. The functions were tested in simulated item banks and in an operative bank. We found that both the progressive method and the proportional method tolerate a high weight of the random component with minimal or zero losses of accuracy, while bank security and maintenance is improved.
Barrada, J. R., Olea, J., & Ponsoda, V. (2007).
Methods for restricting maximum exposure rate in computerized adaptative testing. Methodology, 3, 14-23.
[Abstract]
[PDF]
The Sympson-Hetter (1985) method provides a means of controlling maximum exposure rate of items in Computerized Adaptive Testing. Through a series of simulations, control parameters are set that mark the probability of administration of an item on being selected. This method presents two main problems: it requires a long computation time for calculating the parameters and the maximum exposure rate is slightly above the fixed limit. Van der Linden (2003) presented two alternatives which appear to solve both of the problems. The impact of these methods in the measurement accuracy has not been tested yet. We show how these methods over-restrict the exposure of some highly discriminating items and, thus, the accuracy is decreased. It also shown that, when the desired maximum exposure rate is near the minimum possible value, these methods offer an empirical maximum exposure rate clearly above the goal. A new method, based on the initial estimation of the probability of administration and the probability of selection of the items with the restricted method (Revuelta & Ponsoda, 1998), is presented in this paper. It can be used with the Sympson-Hetter method and with the two van der Linden's methods. This option, when used with Sympson-Hetter, speeds the convergence of the control parameters without decreasing the accuracy.
Barrada, J. R., Mazuela, P., & Olea, J. (2006).
Maximum information stratification method for controlling item exposure in computerized adaptive testing. Psicothema, 18, 156-159.
[Abstract]
[PDF]
The proposal for increasing the security in Computerized Adaptive Tests that has received most attention in recent years is the a-stratified method (AS - Chang and Ying, 1999): at the beginning of the test only items with low discrimination parameters (a) can be administered, with the values of the a parameters increasing as the test goes on. With this method, distribution of the exposure rates of the items is less skewed, while efficiency is maintained in trait-level estimation. The pseudo-guessing parameter (c), present in the three-parameter logistic model, is considered irrelevant, and is not used in the AS method. The Maximum Information Stratified (MIS) model incorporates the c parameter in the stratification of the bank and in the item-selection rule, improving accuracy by comparison with the AS, for item banks with a and b parameters correlated and uncorrelated. For both kinds of banks, the blocking b methods (Chang, Qian and Ying, 2001) improve the security of the item bank.
Barrada, J. R., Olea, J., Ponsoda, V. & Abad., F. J. (2006).
Estrategias de selección de items en un Test Adaptativo Informatizado para la evaluación de inglés escrito [Item selection rules in a Computerized Adaptive Test for the assessment of written English]. Psicothema, 18, 828-834.
[Abstract]
[PDF-Spanish]
e-CAT is a Computerized Adaptive Test for the evaluation of written English knowledge, using the item selection rule most commonly employed: the maximum Fisher information criterion. Some of the problems of this criterion have a negative impact in the estimation accuracy and in the item bank security. In this study, the performance of this item selection rule is compared, by means of simulation, with two other rules: selecting the item with maximum Fisher information in an interval (Veerkamp y Berger, 1997) and a new criterion, called "maximum Fisher information in an interval with geometric mean". In general, this new rule shows smaller measurement error and smaller item overlap rates. It seems, thus, recommendable, as it allows the simultaneous improvement of estimation accuracy and the maintenance of the item bank security of e-CAT.
|
|
Submitted papers
|
Barrada, J. R., Abad., F. J., & Olea, J. (2011).
Optimal number of strata for the stratified methods in computerized adaptive testing. Manuscript submitted for publication.
[Abstract]
[PDF]
Test security can be a major problem in computerized adaptive testing, as examinees can share information about the items they receive. Of the different item selection rules proposed to alleviate this risk, stratified methods are those that have received most attention. In these methods, only low discriminative items can be presented at the beginning of the test and the mean information of the items increases as the test goes on. To do so, the item bank must be divided into several strata according to the information on the items. To date, there is no clear guidance about the optimal number of strata into which the item bank should be split. In this study, we will simulate conditions with different numbers of strata, from 1 (no stratification) to a number of strata equal to test length (maximum level of stratification) while manipulating the maximum exposure rate that no item should surpass (rmax) in its whole domain. In this way, we can plot the relation between test security and accuracy, making it possible to determine the number of strata that leads to better security while holding constant measurement accuracy. Our data indicates that the best option is to stratify into as many strata as possible.
Barrada, J. R., Olea, J., Ponsoda, V., & Abad, F. J. (2011).
Item bank disclosure in computerized adaptive testing: What makes an item selection rule safer? Manuscript submitted for publication.
[Abstract]
[PDF]
A computerized adaptive test is considered more secure the lower the overestimation of the examinee's trait level due to item pre-knowledge. The common measures of test security have been the overlap rate between examinees and the distribution of item exposure rates. We explain that lower overlap rates or less homogeneous distributions of usage of the items may not lead to safer CATs. Instead of these variables, we show that the probability of item pre-knowledge of the first items administered and the overlap rate for high trait levels are better variables for assessing test security. If low values are present for these two variables, there are many different routes to obtain an estimated high trait level and, thus harder for an examinee with item pre-knowledge to incorporate to one of these routes. This is illustrated in three different studies where item bank disclosure is simulated. In these studies we compare the point Fisher information, the progressive method and the alpha-stratified selection rules. The alpha-stratified method, the option leading to lower overlap rate and more homogeneous item exposure distribution when there is no bank disclosure, is not the selection rule offering higher test security.
|
Scales
|
Journal Articles
|
López-Guimerà, G., Fauquet, J., Sánchez-Carracedo, D., Barrada, J. R., Saldaña, C., & Masnou-Roig, A. (In press).
Psychometric properties of the Perception of Teasing Scale in a Spanish adolescent sample: POTS-S. Eating and Weight Disorders.
[Abstract]
[PDF]
The present study examines the psychometric properties of the Spanish version of the Perception of Teasing Scale (POTS-S). Participants were 1,559 adolescents. They completed a translated version of the POTS and versions validated in Spanish population of the Rosenberg Self-Esteem Scale, the subscales Body Dissatisfaction and Drive for Thinness of the Eating Disorders Inventory-2, and the Children’s Eating Attitudes Test. The results showed that the POTS-S retains the original structure of two factors, weight and competency, with satisfactory fit indices. The POTS-S constitutes a shorter questionnaire than the original version; specifically, it consists of 9 items instead of 11. The POTS-S showed good internal consistency and satisfactory test-retest stability. The relationship between the weight subscale and the variables related to eating and weight were statistically significant. As regards the competency subscale, the correlations were all lower than those for the weight subscale, except in the case of the self-esteem variable. The POTS-S showed good psychometric properties, indicating its suitability as an instrument for assessing the perception of teasing in Spanish adolescents.
Sánchez-Carracedo, D., Barrada, J. R., López-Guimerà, G., Fauquet, J., Almenara, C. A., & Trepat, E. (2012).
Analysis of the factor structure of the Sociocultural Attitudes Towards Appearance Questionnaire (SATAQ-3) in Spanish secondary-school students through exploratory structural equation modeling. Body Image, 9, 163-171.
[Abstract]
[PDF]
The aims of the present study were: (1) to assess the factor structure of the SATAQ-3 in Spanish secondary-school students by means of exploratory factor analysis (EFA), confirmatory factor analysis (CFA) and exploratory structural equation modeling (ESEM) models; and (2) to study its invariance by sex and school grade. ESEM is a technique proposed for the analysis of internal structure and which addresses some of the limitations of EFA and CFA. Participants were 1,559 boys and girls in grades seven to ten. The results support the four-factor solution of the original version, and reveal that the best fit was obtained with ESEM, excluding item 20 and with correlated uniqueness between reverse-keyed items. Our version shows invariance by sex and grade. The differences between scores of different groups are in the expected direction, and support the validity of the questionnaire. We recommend a version excluding item 20 and without reverse-keyed items.
Prieto, G., Torres, M. T., Francés, L., Falguera, G., Vila, L., Manresa, J. M., Casamitjana, R., Barrada, J. R., Acera, A., Guix, D., Torrent, A., Grau, J., Toráán, P. & the IODEGEST study group. (2011).
Nutritional status of iodine in pregnant women in Catalonia (Spain): Study on hygiene-dietetic habits and iodine in urine. BMC Pregnancy and Childbirth, 11, 17.
[Abstract]
[PDF]
Background
It is a priority to achieve an adequate nutritional status of iodine during pregnancy since iodine deficiency in this population may have repercussions on the mother during both gestation and post partum as well as on the foetus, the neonate and the child at different ages. According to the WHO, iodine deficiency is the most frequent cause of mental retardation and irrreversible cerebral lesions around the world. However, few studies have been published on the nutritional status of iodine in the pregnant population within the Primary Care setting, a health care level which plays an essential role in the education and control of pregnant women. Therefore, the aim of the present study is: 1.- To know the hygiene-dietetic habits related to the intake of foods rich in iodine and smoking during pregnancy. 2.- To determine the prevalence of iodine deficiency and the factors associated with its appearance during pregnancy.
Methods/design
We will perform a cluster randomised, controlled, multicentre trial. Randomisation unit: Primary Care Team. Study population: 898 pregnant women over the age of 17 years attending consultation to a midwife during the first trimester of pregnancy in the participating primary care centres. Outcome measures: consumption of iodine-rich foods and iodine deficiency. Points of assessment: each trimester of the gestation. Intervention: group education during the first trimester of gestation on healthy hygiene-dietetic habits and the importance of an adequate iodine nutritional status. Statistical analysis: descriptive analysis of all variables will be performed as well as multilevel logistic regression. All analyses will be done carried out on an intention to treat basis and will be fitted for potential confounding factors and variables of clinical importance.
Discussion
Evidence of generalised iodine deficiency during pregnancy could lead to the promotion of interventions of prevention such as how to improve and intensify health care educational programmes for pregnant women.
|
Psychology of attention
|
Journal Articles
|
Arend, I., Botella, J., & Barrada, J. R. (2003).
Emotional load and the formation of illusory conjunctions in the time domain. Psicothema, 15, 446-451.
[Abstract]
[PDF]
The effect of emotional load of the stimuli on its processing has been interpreted in terms of overautomaticity or as permanently lowered threshold for recognition. This special characteristic of emotional stimuli has been used to study how our cognitive system process information. In the present research the emotional/neutral manipulation has been used to test the model of Botella, Barriopedro & Suero (2001) for the formation of illusory conjunctions in the time domain. Results of three experiments are analyzed within this context. It is concluded that, in general, they support the general architecture of the model.
|
|
Book chapters
|
Barrada, J. R., & Botella, J. (2001).
Efectos aditivos de dos distractores en un paradigma de compatibilidad con presentaciones PRSV.
In C. Méndez, D. Ponte, L. Jimenez y M. J. Sampedro (eds.) La Atención: Un enfoque pluridisciplinar Vol. 2 (pp. 251-260).
Valencia: Promolibro.
|
| | | |