Takes a dataframe column you want to group by and returns a hash table. The keys are the unique values of the group by column and the values are the row numbers where each key is found. This is parallelized across all available cores on your CPU and is a direct and much faster replacement of split(df, df$group_by).
hashcol(X, n.cores = detectCores() - 1)
X | A dataframe column you want to group by. IE: |
---|---|
n.cores | An integer value that indicates the number of cores you want to run the process on. The default is 1 less than the total number of available cores on the CPU for UNIX flavored OSs, while the only option (currently) on Windows OS is 1. |
Check the OS and chooses the correct package to use for mclapply. The pkg parallelsugar
can be used for Windows (...but it's currently not) while parallel
is used for everything else.
WARNING FOR WINDOWS USERS: not paralellized; only runs lapply
instead of mclapply
.
parallel
, mclapply
, hash
asd <- data.frame( id = rep(letters, times = 5) , service = sample( c('ps1', 'ps2', 'ps3', 'ps4', 'ps5', 'ps6', 'ps7') , size = 26 * 5 , replace = TRUE ) , stringsAsFactors = FALSE ) h <- hashcol(asd$id, n.cores = 1) h#> <hash> containing 26 key-value pair(s). #> a : 1 27 53 79 105 #> b : 2 28 54 80 106 #> c : 3 29 55 81 107 #> d : 4 30 56 82 108 #> e : 5 31 57 83 109 #> f : 6 32 58 84 110 #> g : 7 33 59 85 111 #> h : 8 34 60 86 112 #> i : 9 35 61 87 113 #> j : 10 36 62 88 114 #> k : 11 37 63 89 115 #> l : 12 38 64 90 116 #> m : 13 39 65 91 117 #> n : 14 40 66 92 118 #> o : 15 41 67 93 119 #> p : 16 42 68 94 120 #> q : 17 43 69 95 121 #> r : 18 44 70 96 122 #> s : 19 45 71 97 123 #> t : 20 46 72 98 124 #> u : 21 47 73 99 125 #> v : 22 48 74 100 126 #> w : 23 49 75 101 127 #> x : 24 50 76 102 128 #> y : 25 51 77 103 129 #> z : 26 52 78 104 130#> [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" #> [20] "t" "u" "v" "w" "x" "y" "z"#> a b c d e f g h i j k l m n o p q r #> [1,] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 #> [2,] 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 #> [3,] 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 #> [4,] 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 #> [5,] 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 #> s t u v w x y z #> [1,] 19 20 21 22 23 24 25 26 #> [2,] 45 46 47 48 49 50 51 52 #> [3,] 71 72 73 74 75 76 77 78 #> [4,] 97 98 99 100 101 102 103 104 #> [5,] 123 124 125 126 127 128 129 130#> <hash> containing 1 key-value pair(s). #> z : 26 52 78 104 130#> [1] 26 52 78 104 130#> [1] 26 52 78 104 130