Read orderings from .soc
, .soi
, .toc
or .toi
files storing
ordinal preference data format as defined by
{PrefLib}: A Library for Preferences
into a preferences
object.
Usage
read_preflib(
file,
from_preflib = FALSE,
preflib_url = "https://raw.githubusercontent.com/PrefLib/PrefLib-Data/main/datasets/"
)
Arguments
- file
A preferential data file, conventionally with extension
.soc
,.soi
,.toc
or.toi
according to data type.- from_preflib
A logical which, when
TRUE
will attempt to source the file from PrefLib by adding the databaseHTTP
prefix.- preflib_url
The URL which will be preprended to
file
, iffrom_preflib
isTRUE
.
Value
A tibble with two columns: preferences
and frequency
. The
preferences
column contains all the preferential orderings in the file, and
the frequency
column the relative frequency of this selection.
Details
Note that PrefLib refers to the items being ordered by "alternatives".
The file types supported are
- .soc
Strict Orders - Complete List
- .soi
Strict Orders - Incomplete List
- .toc
Orders with Ties - Complete List
- .toi
Orders with Ties - Incomplete List
The numerically coded orderings and their frequencies are read into a tibble, storing all original metadata in a "preflib" attribute.
A PrefLib file may be corrupt, in the sense that the ordered alternatives do not match their names. In this case, the file will still be read, but with a warning.
Note
The Netflix and cities datasets used in the examples are from Caragiannis et al (2017) and Bennet and Lanning (2007) respectively. These data sets require a citation for re-use.
References
Mattei, N. and Walsh, T. (2013) PrefLib: A Library of Preference Data. Proceedings of Third International Conference on Algorithmic Decision Theory (ADT 2013). Lecture Notes in Artificial Intelligence, Springer.
Bennett, J. and Lanning, S. (2007) The Netflix Prize. Proceedings of The KDD Cup and Workshops.
Examples
# Can take a little while depending on speed of internet connection
# \donttest{
# strict complete orderings of four films on Netflix
netflix <- read_preflib("00004 - netflix/00004-00000138.soc", from_preflib = TRUE)
head(netflix)
#> # A tibble: 6 × 2
#> preferences
#> <prefrncs>
#> 1 [Beverly Hills Cop > Mean Girls > Mission: Impossible II > The Mummy Returns]
#> 2 [Mean Girls > Beverly Hills Cop > Mission: Impossible II > The Mummy Returns]
#> 3 [Beverly Hills Cop > Mean Girls > The Mummy Returns > Mission: Impossible II]
#> 4 [Mean Girls > Beverly Hills Cop > The Mummy Returns > Mission: Impossible II]
#> 5 [Beverly Hills Cop > Mission: Impossible II > Mean Girls > The Mummy Returns]
#> 6 [The Mummy Returns > Beverly Hills Cop > Mean Girls > Mission: Impossible II]
#> # ℹ 1 more variable: frequency <int>
levels(netflix$preferences)
#> [1] "Mean Girls" "Beverly Hills Cop" "The Mummy Returns"
#> [4] "Mission: Impossible II"
# strict incomplete orderings of 6 random cities from 36 in total
cities <- read_preflib("00034 - cities/00034-00000001.soi", from_preflib = TRUE)
# }