Reshape survey response data for conjoint analysis (single task set)
Source:R/reshape_projoint.R
reshape_projoint.Rd
Takes a wide survey data frame (e.g., from read_Qualtrics
) and reshapes
it so that each row corresponds to a single respondent–task–profile. Supports arbitrary
ordering of base tasks and a single repeated task per respondent. The repeated base task
is inferred from the first base outcome in .outcomes
, and the repeated outcome
must be the last element of .outcomes
.
Usage
reshape_projoint(
.dataframe,
.outcomes,
.choice_labels = c("A", "B"),
.alphabet = "K",
.idvar = "ResponseId",
.repeated = TRUE,
.flipped = TRUE,
.covariates = NULL,
.fill = FALSE
)
Arguments
- .dataframe
A data frame, preferably from
read_Qualtrics
.- .outcomes
Character vector of outcome column names in the asked order. If a repeated task is used, its outcome must be the last element.
- .choice_labels
Character vector (default
c("A","B")
) giving the two labels that appear at the end of the outcome strings.- .alphabet
Single character (default
"K"
) indicating the Qualtrics prefix.- .idvar
Character (default
"ResponseId"
) indicating the respondent id column.- .repeated
Logical (default
TRUE
) indicating whether a repeated task is present.- .flipped
Logical (default
TRUE
) indicating whether the repeated task flips profiles before agreement is computed.- .covariates
Optional character vector of respondent-level covariate column names to carry through.
- .fill
Logical (default
FALSE
). IfTRUE
, fillsagree
within respondent across tasks as described under “Filling agreement”.
Details
Scope and assumptions
One set of conjoint tasks with exactly two profiles per task (profiles 1 and 2).
For multi-set designs, call
reshape_projoint()
once per set and bind the results.
Expected input (Qualtrics K-codes)
Wide columns named
K-<task>-<attribute>
(attribute names) andK-<task>-<profile>-<attribute>
(level names), with<task>
in1..n
and<profile>
in1,2
.Rows with missing
K-1-1
are dropped as empty tables (server hiccup safeguard).
Outcome columns (.outcomes
)
List all choice variables in the order asked. If you include a repeated task, its outcome must be the last element.
For base tasks (all but the last element), the function extracts the base task id by reading the digits in each outcome name (e.g.,
"choice4"
,"Q4"
,"task04"
-> task 4).The set of base task ids extracted from
.outcomes
must exactly match the set of task ids present in the K-codes; otherwise an error is thrown.The repeated base task is inferred as the digits in the first base outcome (i.e., the first element of
.outcomes
, excluding the final repeated outcome).
Choice parsing
The selected profile is parsed from the last character of each outcome string and matched to
.choice_labels
. Ensure outcomes end with these labels (e.g.,"Candidate A"
/"Candidate B"
). If outcomes are numeric or differently formatted, pre-process or adjust.choice_labels
accordingly.
Output
A
projoint_data
object with:$labels
: map from human-readableattribute
/level
to stable ids (attribute_id = "att1","att2",...
,level_id = "attX:levelY"
).$data
: tibble with one row perid
–task
–profile
, attribute columns (namedatt*
) storinglevel_id
,selected
(1 if that profile was chosen; 0 otherwise),agree
(1/0/NA for repeated-task agreement after flip logic), and any.covariates
.id
is coerced to character; attribute columns are factors.
Filling agreement
If
.fill = TRUE
,agree
is filled within respondent across tasks in task order, propagating the observed repeated-task agreement to all tasks for that respondent. This assumes IRR is respondent-specific and independent of table content.
Diagnostics
dplyr::count(reshaped$data, task, profile)
should show exactly two rows per task.If
pj_estimate()
later reports “No rows match the specified attribute/level”, construct QoIs fromreshaped$labels
(use the exactattX:levelY
ids).
Examples
# \donttest{
# Base tasks asked in numeric order; repeated task corresponds to task 1
data(exampleData1)
outcomes <- c(paste0("choice", 1:8), "choice1_repeated_flipped")
reshaped <- reshape_projoint(exampleData1, outcomes)
dplyr::count(reshaped$data, task, profile) # should be 2 per task
#> # A tibble: 16 × 3
#> task profile n
#> <dbl> <dbl> <int>
#> 1 1 1 400
#> 2 1 2 400
#> 3 2 1 400
#> 4 2 2 400
#> 5 3 1 400
#> 6 3 2 400
#> 7 4 1 400
#> 8 4 2 400
#> 9 5 1 400
#> 10 5 2 400
#> 11 6 1 400
#> 12 6 2 400
#> 13 7 1 400
#> 14 7 2 400
#> 15 8 1 400
#> 16 8 2 400
# }