Reshapes survey response data for conjoint analysis (single task set)
reshape_projoint.Rd
This function takes a wide survey data frame (e.g., from read_Qualtrics
) and reshapes it so that each row
corresponds to a single respondent–task–profile. It supports arbitrary ordering of base tasks as asked to respondents and
a single repeated task per respondent. The repeated base task is inferred from the first base outcome in .outcomes
,
and the repeated outcome must be the last element of .outcomes
.
Usage
reshape_projoint(
.dataframe,
.outcomes,
.choice_labels = c("A", "B"),
.alphabet = "K",
.idvar = "ResponseId",
.repeated = TRUE,
.flipped = TRUE,
.covariates = NULL,
.fill = FALSE
)
Arguments
- .dataframe
A data frame, preferably from
read_Qualtrics
.- .outcomes
Character vector of outcome column names in the **asked order**. If a repeated task is used, its outcome must be the **last element**.
- .choice_labels
Character vector (default
c("A","B")
) giving the two labels that appear at the end of the outcome strings (e.g.,"Candidate A"
,"Candidate B"
).- .alphabet
Single character (default
"K"
) indicating the Qualtrics prefix used for conjoint tables.- .idvar
Character (default
"ResponseId"
) indicating the respondent id column.- .repeated
Logical (default
TRUE
) indicating whether a repeated task is present.- .flipped
Logical (default
TRUE
) indicating whether the repeated task flips profiles (1 <-> 2) before agreement is computed.- .covariates
Optional character vector of respondent-level covariate column names to carry through.
- .fill
Logical (default
FALSE
). IfTRUE
, fillsagree
within respondent across tasks as described under "Filling agreement".
Details
**Scope and assumptions**
* One set of conjoint tasks with exactly two profiles per task (profiles 1 and 2).
* For multi-set designs, call reshape_projoint()
once per set and bind the results.
**Expected input (Qualtrics K-codes)**
* Wide columns named K-<task>-<attribute>
(attribute names) and
K-<task>-<profile>-<attribute>
(level names), where <task>
is in 1..n
and <profile>
is 1
or 2
.
* Rows with missing K-1-1
are dropped as empty tables (server hiccup safeguard).
**Outcome columns (.outcomes
)**
* List all choice variables in the **order they were asked**. If you include a repeated task, its outcome column must be the **last element**.
* For base tasks (all but the last element), the function extracts the base task id by reading the **digits** in each outcome name
(e.g., "choice4"
, "Q4"
, "task04"
-> task 4).
* The set of base task ids extracted from .outcomes
must **exactly match** the set of task ids present in the K-codes; otherwise an error is thrown.
* The **repeated base task** is inferred as the **digits in the first base outcome** (i.e., the first element of .outcomes
, excluding the final repeated outcome).
* The repeated outcome’s own name does **not** need to contain digits; only its **position** (last) matters.
**Choice parsing**
* The selected profile is parsed from the **last character** of each outcome string and matched to .choice_labels
.
Ensure outcomes end with these labels (e.g., "Candidate A"/"Candidate B"). If outcomes are numeric or differently formatted, pre-process
or adjust .choice_labels
accordingly.
**Output**
* Returns a projoint_data
object with:
$labels
: a data frame mapping human-readableattribute
/level
to stable idsattribute_id = "att1","att2",...
andlevel_id = "attX:levelY"
.$data
: a tibble with one row perid
–task
–profile
, attribute columns (named byatt*
) storinglevel_id
,selected
(1 if that profile was chosen within the task, 0 otherwise),agree
(1/0/NA for repeated-task agreement after flip logic), and any columns specified in.covariates
.id
is coerced to character; attribute columns are factors.
**Filling agreement**
* If .fill = TRUE
, agree
is forward/backward filled **within respondent** in task order, propagating the observed repeated-task agreement
to all tasks for that respondent. This relies on the assumption that IRR is respondent-specific and independent of table content.
**Recommendation**
* Leave .fill = FALSE
by default. Consider .fill = TRUE
only when sample size or subgroup sparsity makes the single repeated-task observation
per respondent inadequate, and you are willing to assume that intra-respondent reliability (IRR) is respondent-specific and independent of the conjoint
table contents. When using .fill = TRUE
, always compute standard errors clustered at the respondent level, and report a sensitivity check comparing
results with .fill = FALSE
.
**Common diagnostics**
* After reshaping, dplyr::count(reshaped$data, task, profile)
should show exactly two rows per task (profiles 1 and 2).
* If pj_estimate()
later reports "No rows match the specified attribute/level", construct QOIs from reshaped$labels
(use the exact attX:levelY
ids).
Examples
library(projoint)
data("exampleData1")
# Example 1: Base tasks asked in numeric order, repeated = task 1
outcomes <- c(paste0("choice", 1:8), "choice1_repeated_flipped")
reshaped <- reshape_projoint(exampleData1, outcomes)
dplyr::count(reshaped$data, task, profile) # should be 2 per task
#> # A tibble: 16 × 3
#> task profile n
#> <dbl> <dbl> <int>
#> 1 1 1 400
#> 2 1 2 400
#> 3 2 1 400
#> 4 2 2 400
#> 5 3 1 400
#> 6 3 2 400
#> 7 4 1 400
#> 8 4 2 400
#> 9 5 1 400
#> 10 5 2 400
#> 11 6 1 400
#> 12 6 2 400
#> 13 7 1 400
#> 14 7 2 400
#> 15 8 1 400
#> 16 8 2 400
# Example 2: Arbitrary task order (e.g., respondents saw tasks {2,1,3,4,5}); repeated is last
# The repeated base task is inferred from the FIRST base outcome ("Q2" -> task 2).
# outcomes2 <- c("Q2","Q1","Q3","Q4","Q5","Q2_repeat")
# reshaped2 <- reshape_projoint(exampleData1, outcomes2, .flipped = TRUE)