Create a grouped hash table instead of using split()

Takes a dataframe column you want to group by and returns a hash table. The keys are the unique values of the group by column and the values are the row numbers where each key is found. This is parallelized across all available cores on your CPU and is a direct and much faster replacement of split(df, df$group_by).

hashcol(X, n.cores = detectCores() - 1)

Arguments

X	A dataframe column you want to group by. IE: `df$id`
n.cores	An integer value that indicates the number of cores you want to run the process on. The default is 1 less than the total number of available cores on the CPU for UNIX flavored OSs, while the only option (currently) on Windows OS is 1.

Details

Check the OS and chooses the correct package to use for mclapply. The pkg parallelsugar can be used for Windows (...but it's currently not) while parallel is used for everything else.

WARNING FOR WINDOWS USERS: not paralellized; only runs lapply instead of mclapply.

Examples

asd <- data.frame(
    id               = rep(letters, times = 5)
  , service          = sample(
      c('ps1', 'ps2', 'ps3', 'ps4', 'ps5', 'ps6', 'ps7')
    , size    = 26 * 5
    , replace = TRUE
    )
  , stringsAsFactors = FALSE
  )
h <- hashcol(asd$id, n.cores = 1)
h
#> <hash> containing 26 key-value pair(s).
#>   a :   1  27  53  79 105
#>   b :   2  28  54  80 106
#>   c :   3  29  55  81 107
#>   d :   4  30  56  82 108
#>   e :   5  31  57  83 109
#>   f :   6  32  58  84 110
#>   g :   7  33  59  85 111
#>   h :   8  34  60  86 112
#>   i :   9  35  61  87 113
#>   j :  10  36  62  88 114
#>   k :  11  37  63  89 115
#>   l :  12  38  64  90 116
#>   m :  13  39  65  91 117
#>   n :  14  40  66  92 118
#>   o :  15  41  67  93 119
#>   p :  16  42  68  94 120
#>   q :  17  43  69  95 121
#>   r :  18  44  70  96 122
#>   s :  19  45  71  97 123
#>   t :  20  46  72  98 124
#>   u :  21  47  73  99 125
#>   v :  22  48  74 100 126
#>   w :  23  49  75 101 127
#>   x :  24  50  76 102 128
#>   y :  25  51  77 103 129
#>   z :  26  52  78 104 130

hash::keys(h)
#>  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
#> [20] "t" "u" "v" "w" "x" "y" "z"
hash::values(h)
#>        a   b   c   d   e   f   g   h   i   j   k   l   m   n   o   p   q   r
#> [1,]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
#> [2,]  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44
#> [3,]  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70
#> [4,]  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96
#> [5,] 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122
#>        s   t   u   v   w   x   y   z
#> [1,]  19  20  21  22  23  24  25  26
#> [2,]  45  46  47  48  49  50  51  52
#> [3,]  71  72  73  74  75  76  77  78
#> [4,]  97  98  99 100 101 102 103 104
#> [5,] 123 124 125 126 127 128 129 130

h[hash::keys(h)[26]] # key value pair
#> <hash> containing 1 key-value pair(s).
#>   z :  26  52  78 104 130
h[[hash::keys(h)[26]]] # value accessor method; same as next line
#> [1]  26  52  78 104 130
hash::values(h)[ , 26] # value accessor method; same as previous line
#> [1]  26  52  78 104 130

Create a grouped hash table instead of using split()

Arguments

Details

See also

Examples

Contents