Skip to contents

This function takes a wide survey data frame (e.g., from read_Qualtrics) and reshapes it so that each row corresponds to a single respondent–task–profile. It supports arbitrary ordering of base tasks as asked to respondents and a single repeated task per respondent. The repeated base task is inferred from the first base outcome in .outcomes, and the repeated outcome must be the last element of .outcomes.

Usage

reshape_projoint(
  .dataframe,
  .outcomes,
  .choice_labels = c("A", "B"),
  .alphabet = "K",
  .idvar = "ResponseId",
  .repeated = TRUE,
  .flipped = TRUE,
  .covariates = NULL,
  .fill = FALSE
)

Arguments

.dataframe

A data frame, preferably from read_Qualtrics.

.outcomes

Character vector of outcome column names in the **asked order**. If a repeated task is used, its outcome must be the **last element**.

.choice_labels

Character vector (default c("A","B")) giving the two labels that appear at the end of the outcome strings (e.g., "Candidate A", "Candidate B").

.alphabet

Single character (default "K") indicating the Qualtrics prefix used for conjoint tables.

.idvar

Character (default "ResponseId") indicating the respondent id column.

.repeated

Logical (default TRUE) indicating whether a repeated task is present.

.flipped

Logical (default TRUE) indicating whether the repeated task flips profiles (1 <-> 2) before agreement is computed.

.covariates

Optional character vector of respondent-level covariate column names to carry through.

.fill

Logical (default FALSE). If TRUE, fills agree within respondent across tasks as described under "Filling agreement".

Value

A projoint_data object with elements $labels and $data; see Details.

Details

**Scope and assumptions** * One set of conjoint tasks with exactly two profiles per task (profiles 1 and 2). * For multi-set designs, call reshape_projoint() once per set and bind the results.

**Expected input (Qualtrics K-codes)** * Wide columns named K-<task>-<attribute> (attribute names) and K-<task>-<profile>-<attribute> (level names), where <task> is in 1..n and <profile> is 1 or 2. * Rows with missing K-1-1 are dropped as empty tables (server hiccup safeguard).

**Outcome columns (.outcomes)** * List all choice variables in the **order they were asked**. If you include a repeated task, its outcome column must be the **last element**. * For base tasks (all but the last element), the function extracts the base task id by reading the **digits** in each outcome name (e.g., "choice4", "Q4", "task04" -> task 4). * The set of base task ids extracted from .outcomes must **exactly match** the set of task ids present in the K-codes; otherwise an error is thrown. * The **repeated base task** is inferred as the **digits in the first base outcome** (i.e., the first element of .outcomes, excluding the final repeated outcome). * The repeated outcome’s own name does **not** need to contain digits; only its **position** (last) matters.

**Choice parsing** * The selected profile is parsed from the **last character** of each outcome string and matched to .choice_labels. Ensure outcomes end with these labels (e.g., "Candidate A"/"Candidate B"). If outcomes are numeric or differently formatted, pre-process or adjust .choice_labels accordingly.

**Output** * Returns a projoint_data object with:

  • $labels: a data frame mapping human-readable attribute/level to stable ids attribute_id = "att1","att2",... and level_id = "attX:levelY".

  • $data: a tibble with one row per idtaskprofile, attribute columns (named by att*) storing level_id, selected (1 if that profile was chosen within the task, 0 otherwise), agree (1/0/NA for repeated-task agreement after flip logic), and any columns specified in .covariates. id is coerced to character; attribute columns are factors.

**Filling agreement** * If .fill = TRUE, agree is forward/backward filled **within respondent** in task order, propagating the observed repeated-task agreement to all tasks for that respondent. This relies on the assumption that IRR is respondent-specific and independent of table content.

**Recommendation** * Leave .fill = FALSE by default. Consider .fill = TRUE only when sample size or subgroup sparsity makes the single repeated-task observation per respondent inadequate, and you are willing to assume that intra-respondent reliability (IRR) is respondent-specific and independent of the conjoint table contents. When using .fill = TRUE, always compute standard errors clustered at the respondent level, and report a sensitivity check comparing results with .fill = FALSE.

**Common diagnostics** * After reshaping, dplyr::count(reshaped$data, task, profile) should show exactly two rows per task (profiles 1 and 2). * If pj_estimate() later reports "No rows match the specified attribute/level", construct QOIs from reshaped$labels (use the exact attX:levelY ids).

Examples

library(projoint)
data("exampleData1")

# Example 1: Base tasks asked in numeric order, repeated = task 1
outcomes <- c(paste0("choice", 1:8), "choice1_repeated_flipped")
reshaped <- reshape_projoint(exampleData1, outcomes)
dplyr::count(reshaped$data, task, profile)  # should be 2 per task
#> # A tibble: 16 × 3
#>     task profile     n
#>    <dbl>   <dbl> <int>
#>  1     1       1   400
#>  2     1       2   400
#>  3     2       1   400
#>  4     2       2   400
#>  5     3       1   400
#>  6     3       2   400
#>  7     4       1   400
#>  8     4       2   400
#>  9     5       1   400
#> 10     5       2   400
#> 11     6       1   400
#> 12     6       2   400
#> 13     7       1   400
#> 14     7       2   400
#> 15     8       1   400
#> 16     8       2   400

# Example 2: Arbitrary task order (e.g., respondents saw tasks {2,1,3,4,5}); repeated is last
# The repeated base task is inferred from the FIRST base outcome ("Q2" -> task 2).
# outcomes2 <- c("Q2","Q1","Q3","Q4","Q5","Q2_repeat")
# reshaped2 <- reshape_projoint(exampleData1, outcomes2, .flipped = TRUE)