Preferences Object — preferences • prefio

Create a preferences object for representing Ordinal Preference datasets.

Usage

preferences(
  data,
  format = c("long", "ordering", "ranking"),
  id = NULL,
  rank = NULL,
  item = NULL,
  item_names = NULL,
  frequencies = NULL,
  aggregate = FALSE,
  verbose = TRUE,
  ...
)

# S3 method for preferences
[(x, i, j, ..., by.rank = FALSE, as.ordering = FALSE)

as.preferences(x, ...)

# S3 method for grouped_preferences
as.preferences(x, aggregate = FALSE, verbose = TRUE, ...)

# S3 method for default
as.preferences(
  x,
  format = c("long", "ranking", "ordering"),
  id = NULL,
  item = NULL,
  rank = NULL,
  item_names = NULL,
  aggregate = FALSE,
  verbose = TRUE,
  ...
)

# S3 method for matrix
as.preferences(
  x,
  format = c("long", "ranking"),
  id = NULL,
  item = NULL,
  rank = NULL,
  item_names = NULL,
  aggregate = FALSE,
  verbose = TRUE,
  ...
)

# S3 method for aggregated_preferences
as.preferences(x, ...)

# S3 method for preferences
format(x, width = 40L, ...)

Arguments

data

A data frame or matrix in one of three formats:

"ordering": Orderings must be a data frame with list-valued columns. Each row represents an ordering of the items from first to last, representing ties by a list of vectors corresponding to the items.
"ranking": Each row assigns a rank to each item, with columns representing items. Note that rankings will be converted to 'dense' rankings in the output (see Details).
"long": Three columns: an id column grouping the rows which correspond to a single set of preferences, an item column specifying (either by index or by name) the item each row refers to, and a rank column specifying the rank for the associated item.

format

The format of the data: one of "ordering", "ranking", or "long" (see above). By default, data is assumed to be in "long" format.

id

For data in long-format: the column representing the preference set grouping.

rank

For data in long-format: the column representing the rank for the associated item.

item

For data in long-format: the column representing the items by name or by index, in which case the item_names parameter should also be passed, or the items will be named as integers.

item_names

The names of the full set of items. When loading data using integer-valued indices in place of item names, the item_names character vector should be in the correct order.

frequencies

An optional integer vector containing the number of occurences of each preference. If provided, the method will return a aggregated_preferences object with the corresponding frequencies.

aggregate

If TRUE, aggregate the preferences via aggregate.preferences before returning. This returns an aggregated_preferences object.

verbose

If TRUE, diagnostic messages will be sent to stdout.

...

Unused.

x

The preferences object to subset.

i

The index of the preference-set to access.

j

The item names or indices to project onto, e.g. if j = 1 the preferences will be projected only onto the first item; if by.rank = TRUE j corresponds to the rank of the items to subset to, e.g. if j = 1 then preferences will be truncated to only contain their highest-preference.

by.rank

When FALSE, the index j corresponds to items, when true the index corresponds to rank.

as.ordering

When FALSE, returns a preferences object: internally rows \(i\) contain the ranking assigned to each item in preference \(p_i\). When TRUE, returns a data frame where columns group the items by rank.

width

The width in number of characters to format each preference, truncating by "..." when they are too long.

Value

By default, a preferences object, which is a data frame with list-valued columns corresponding to preferences on the items. This may be an ordering on subsets of the items in the case of ties, or a potentially-partial strict ordering. In the case of partial or tied preferences, some entries may be empty lists.

Details

Ordinal preferences can order every item, or they can order a subset. Some ordinal preference datasets will contain ties between items at a given rank. Hence, there are four distinct types of preferential data:

soc: Strict Orders - Complete List
soi: Strict Orders - Incomplete List
toc: Orders with Ties - Complete List
toi: Orders with Ties - Incomplete List

The data type is stored alongside the preferences as an attribute attr(preferences, "preftype"). The data type is determined automatically. If every preference ranks every item, then the data type will be "soc" or "soi". Similarly, if no preference contains a tie the data type will be "toc" or "toi".

A set of preferences can be represented either by ranking or by ordering. These correspond to the two ways you can list a set of preferences in a vector:

ordering: The items are listed in order of most preferred to least preferred, allowing for multiple items being in the same place in the case of ties.
ranking: A rank is assigned to each item. Conventionally, ranks are integers in increasing order (with larger values indicating lower preference), but they can be any ordinal values. Any given rankings will be converted to 'dense' rankings: positive integers from 1 to some maximum rank, with no gaps between ranks.

When reading preferences from an ordering matrix, the index on the items is the order passed to the item_names parameter. When reading from a rankings matrix, if no item_names are provided, the order is inferred from the named columns.

A preferences object can also be read from a long-format matrix, where there are three columns: id, item and rank. The id variable groups the rows of the matrix which correspond to a single set of preferences, which the item:rank, pairs indicate how each item is ranked. When reading a matrix from this format and no item_names parameter is passed, the order is determined automatically.

Examples

# create rankings from data in long form

# Example long-form data
x <- data.frame(
  id = c(rep(1:4, each = 4), 5, 5, 5),
  item = c(
    LETTERS[c(1:3, 3, 1:4, 2:5, 1:2, 1)], NA,
    LETTERS[3:5]
  ),
  rank = c(4:1, rep(NA, 4), 3:4, NA, NA, 1, 3, 4, 2, 2, 2, 3)
)

# * Set #1 has two different ranks for the same item (item C
# has rank 1 and 2). This item will be excluded from the preferences.
# * All ranks are missing in set #2, a technically valid partial ordering
# * Some ranks are missing in set #3, a perfectly valid partial ordering
# * Set #4 has inconsistent ranks for two items, and a rank with a
# missing item.
# * Set #5 is not a dense ranking. It will be converted to be dense and then
# inferred to be a regular partial ordering with ties.
split(x, x$rank)
#> $`1`
#>    id item rank
#> 4   1    C    1
#> 13  4    A    1
#> 
#> $`2`
#>    id item rank
#> 3   1    C    2
#> 16  4 <NA>    2
#> 17  5    C    2
#> 18  5    D    2
#> 
#> $`3`
#>    id item rank
#> 2   1    B    3
#> 9   3    B    3
#> 14  4    B    3
#> 19  5    E    3
#> 
#> $`4`
#>    id item rank
#> 1   1    A    4
#> 10  3    C    4
#> 15  4    A    4
#> 

# Creating a preferences object with this data will attempt to resolve these
# issues automatically, sending warnings when assumptions need to be made.
preferences(x, id = "id", item = "item", rank = "rank")
#> Dropping rows containing `NA`.
#> Duplicated rankings per item detected: only the highest ranks will be used.
#> [1] [C > B > A] [B > C]     [A > B]     [C = D > E]

# Convert an existing matrix of rankings to a preferences object.
rnk <- matrix(c(
  1, 2, 0, 0,
  4, 1, 2, 3,
  2, 1, 1, 1,
  1, 2, 3, 0,
  2, 1, 1, 0,
  1, 0, 3, 2
), nrow = 6, byrow = TRUE)
colnames(rnk) <- c("apple", "banana", "orange", "pear")

rnk <- as.preferences(rnk, format = "ranking")

# Convert an existing data frame of orderings to a preferences object.
e <- character() # short-hand for empty ranks
ord <- preferences(
  as.data.frame(
    rbind(
      list(1, 2, e, e), # apple, banana
      list("banana", "orange", "pear", "apple"),
      list(c("banana", "orange", "pear"), "apple", e, e),
      list("apple", "banana", "orange", e),
      list(c("banana", "orange"), "apple", e, e),
      list("apple", "pear", "orange", e)
    )
  ),
  format = "ordering",
  item_names = c("apple", "banana", "orange", "pear")
)

# Access the first three sets of preferences
ord[1:3, ]
#> [1] [apple > banana]                 [banana > orange > pear > apple]
#> [3] [banana = orange = pear > apple]

# Truncate preferences to the top 2 ranks
ord[, 1:2, by_rank = TRUE]
#> [1] [apple > banana] [banana > apple] [banana > apple] [apple > banana]
#> [5] [banana > apple] [apple]         

# Exclude pear from the rankings
ord[, -4]
#> [1] [apple > banana]          [banana > orange > apple]
#> [3] [banana = orange > apple] [apple > banana > orange]
#> [5] [banana = orange > apple] [apple > orange]         

# Get the highest-ranked items and return as a data.frame of orderings
ord[, 1, by_rank = TRUE, as.ordering = TRUE]
#>   Rank1
#> 1 apple
#> 2 apple
#> 3 apple
#> 4 apple
#> 5 apple
#> 6 apple

# Convert the preferences to a ranking matrix
as.matrix(ord)
#>      apple banana orange pear
#> [1,]     1      2     NA   NA
#> [2,]     4      1      2    3
#> [3,]     2      1      1    1
#> [4,]     1      2      3   NA
#> [5,]     2      1      1   NA
#> [6,]     1     NA      3    2
#> attr(,"preftype")
#> [1] "toi"

# Get the rank of apple in the third preference-set
as.matrix(ord)[3, 1]
#> apple 
#>     2 

# Get all the ranks assigned to apple as a vector
as.matrix(ord)[, "apple"]
#> [1] 1 4 2 1 2 1