prefio



Working with preferential data in R


Floyd Everest, Heather Turner and Damjan Vukcevic

What is preferential data?

A sample ballot for ranking Australian house of representatives candidates.

A sample HoR ballot, courtesy of the AEC
  • One entry = a ranking among a common set of items.

  • e.g., ballots for the house of reps!

  • “computational social choice, recommender systems, data mining, machine learning, combinatorial optimization, to name just a few” — PrefLib

  • Other kinds of preferences can be incomplete or even include ties

What does tabular preferential data look like?

Wangaratta: (Charles, Beatriz, Allie), Geelong: (Allie, Beatriz)

"Lucky" long format.
ID VoterLocation Candidate Rank
2 Wangaratta Allie 3
2 Wangaratta Beatriz 2
2 Wangaratta Charles 1
3 Geelong Allie 2
3 Geelong Beatriz 1
"Unlucky" wide format.
ID VoterLocation Allie Beatriz Charles
2 Wangaratta 3 2 1
3 Geelong 2 1 NA

What does tabular preferential data look like?

  • Wide format data from the most recent Federal election (2025 Senators for ACT)

  • Long format data from the most recent NSW legislative assembly election (2023 member for Albury)

Let’s collect some votes!

Rank these café orders from most to least preferred

  • Latte
  • Iced latte
  • Flat white
  • Black coffee
  • Cold brew
  • Matcha
  • Tea (coffee has too much caffeine)

vote.floydeverest.com

The history of prefio

  • I wanted to make NSW elections accessible
  • Integrating with the PrefLib database
  • “stole” Heather’s code from PlackettLuce for loading PrefLib data and handling preferences
  • Adapted it to mesh nicely with the tidyverse ecosystem

vote.floydeverest.com

A “quick” demo

Let’s take a quick look at the 2023 New South Wales legislative assembly election.

list.files("nswla_data/")
 [1] "Albury.zip"              "Auburn.zip"             
 [3] "Badgerys Creek.zip"      "Ballina.zip"            
 [5] "Balmain.zip"             "Bankstown.zip"          
 [7] "Barwon.zip"              "Bathurst.zip"           
 [9] "Bega.zip"                "Blacktown.zip"          
[11] "Blue Mountains.zip"      "Cabramatta.zip"         
[13] "Camden.zip"              "Campbelltown.zip"       
[15] "candidates.xlsx"         "Canterbury.zip"         
[17] "Castle Hill.zip"         "Cessnock.zip"           
[19] "Charlestown.zip"         "Clarence.zip"           
[21] "Coffs Harbour.zip"       "Coogee.zip"             
[23] "Cootamundra.zip"         "Cronulla.zip"           
[25] "Davidson.zip"            "Drummoyne.zip"          
[27] "Dubbo.zip"               "East Hills.zip"         
[29] "Epping.zip"              "Fairfield.zip"          
[31] "Gosford.zip"             "Goulburn.zip"           
[33] "Granville.zip"           "Hawkesbury.zip"         
[35] "Heathcote.zip"           "Heffron.zip"            
[37] "Holsworthy.zip"          "Hornsby.zip"            
[39] "Keira.zip"               "Kellyville.zip"         
[41] "Kiama.zip"               "Kogarah.zip"            
[43] "Lake Macquarie.zip"      "Lane Cove.zip"          
[45] "Leppington.zip"          "Lismore.zip"            
[47] "Liverpool.zip"           "Londonderry.zip"        
[49] "Macquarie Fields.zip"    "Maitland.zip"           
[51] "Manly.zip"               "Maroubra.zip"           
[53] "Miranda.zip"             "Monaro.zip"             
[55] "Mount Druitt.zip"        "Murray.zip"             
[57] "Myall Lakes.zip"         "Newcastle.zip"          
[59] "Newtown.zip"             "North Shore.zip"        
[61] "Northern Tablelands.zip" "Oatley.zip"             
[63] "Orange.zip"              "Oxley.zip"              
[65] "Parramatta.zip"          "Penrith.zip"            
[67] "Pittwater.zip"           "Port Macquarie.zip"     
[69] "Port Stephens.zip"       "Prospect.zip"           
[71] "Riverstone.zip"          "Rockdale.zip"           
[73] "Ryde.zip"                "Shellharbour.zip"       
[75] "South Coast.zip"         "Strathfield.zip"        
[77] "Summer Hill.zip"         "Swansea.zip"            
[79] "Sydney.zip"              "Tamworth.zip"           
[81] "Terrigal.zip"            "The Entrance.zip"       
[83] "Tweed.zip"               "Upper Hunter.zip"       
[85] "Vaucluse.zip"            "Wagga Wagga.zip"        
[87] "Wahroonga.zip"           "Wakehurst.zip"          
[89] "Wallsend.zip"            "Willoughby.zip"         
[91] "Winston Hills.zip"       "Wollondilly.zip"        
[93] "Wollongong.zip"          "Wyong.zip"              

A “quick” demo

# Load the entire election with readr + purrr
election <- list.files("nswla_data/", pattern = "\\.zip$", full.names = TRUE) |>
  map(function(x) {
    read_delim(x,
               delim = "\t",
               show_col_types = FALSE,
               col_select = c(District, BPNumber, CandidateName, PrefCounted))
  }) |>
  list_rbind() |>
  print(n = 4)
# A tibble: 12,203,709 × 4
  District BPNumber CandidateName PrefCounted
  <chr>    <chr>    <chr>               <dbl>
1 Albury   9996     CLANCY Justin           1
2 Albury   9997     CLANCY Justin           1
3 Albury   9998     CLANCY Justin           1
4 Albury   9999     CLANCY Justin           1
# ℹ 12,203,705 more rows

A “quick” demo

# Format the preferences with prefio
election <- election |>
  long_preferences(vote,
                   id_cols = c(District, BPNumber),
                   item_col = CandidateName,
                   rank_col = PrefCounted) |>
  print(n = 5)
# A tibble: 4,701,930 × 3
  District BPNumber            vote
  <chr>    <chr>         <prefrncs>
1 Albury   1        [CLANCY Justin]
2 Albury   10       [CLANCY Justin]
3 Albury   100      [CLANCY Justin]
4 Albury   1000     [CLANCY Justin]
5 Albury   10000    [CLANCY Justin]
# ℹ 4,701,925 more rows

A “quick” demo

election |>
  group_by(vote) |>
  summarise(frequency = n()) |>
  sample_n(5) |>
  knitr::kable(
    format = "html",
    table.attr = 'style="font-size: 0.6em; width: 80%;"'
  )
vote frequency
[DIXON Jamie > DAVIS Kenneth > WILLMOTT Mia > HOMER Chris > WATSON Anna > BARNES Mikayla > GRANATA Rita] 5
[GREGORY Desiree > BETTS Peta > BULJUBASIC Mirsad (Max) > CARLE Adrian > FARRELL Kevin > KING Amelia > ADAMSON Greg > LANDINI David > DALTON Helen] 1
[DUROUX Brett > NOVAK Debrah > ANKERSMIT Leon > WILLIAMSON Richie > LEVI Nicki > KELLER George] 1
[WHAN Steven > THALER Andrew > HOLGATE James > GOLDIE Jenny > TANSON Josie > PRYOR Chris] 1
[BRUCE Sophie-Anne > CONDIE Jenna > PALMER Michelle > MARSCHALL Richard > KEIGHTLEY Greg > DOYLE Trish] 12

A “quick” demo

Compare our computed results with the official results (much faster than I would have previously thought possible).

# Format the preferences with prefio
election |>
  summarise(
    Winner = pref_irv(vote)$winner,
    .by = District
  )
# A tibble: 93 × 2
   District       Winner         
   <chr>          <chr>          
 1 Albury         CLANCY Justin  
 2 Auburn         VOLTZ Lynda    
 3 Badgerys Creek DAVIES Tanya   
 4 Ballina        SMITH Tamara   
 5 Balmain        SHETTY Kobi    
 6 Bankstown      DIB Jihad      
 7 Barwon         BUTLER Roy     
 8 Bathurst       TOOLE Paul     
 9 Bega           HOLLAND Michael
10 Blacktown      BALI Stephen   
# ℹ 83 more rows

Where to next?

  • I’d like to integrate this back into PlackettLuce, where it was born.

  • It would be great to directly produce visualisations for preferences in the future.

  • There are likely many features needed to make this useful to people other than me.

  • If you work with preferential data, give it a go and let me know of any features which may be useful to add!

The results

Democratically confirmed best coffee order

votes <- read_csv("voter_app/data/responses.csv") |>
  long_preferences(vote,
                   id_cols = c(device_hash, timestamp),
                   rank_col = rank,
                   item_col = item) |>
  pull(vote)

pref_irv(votes)$winner
[1] "Latte"

Distribution of preferences

Candidate Round 1 Round 2 Round 3 Round 4 Round 5
Black coffee 10 10 10
Flat white 16 16 19 24 29
Iced latte 7 8
Latte 12 12 15 18 39
Cold brew 2
Tea (coffee has too much caffeine) 11 11 12 13
Matcha 10 11 12 13

🎉 Thanks for listening!