Transform and analyze workforce
data in meaningful ways for human resources (HR) analytics. The use of
two functions, hierarchy
and hierarchyStats
,
is demonstrated below. Convert standard employee and supervisor
relationship data into useful formats for robust analytics, summary
statistics, and span of control metrics. Install the package from CRAN
by running the install.packages("hR")
command.
The examples in this vignette use the sample
workforceHistory
data set. This data set reflects an
artificial organization’s historical workforce/employment data. The
sample is reduced to a data.table containing one row per active employee
and contractor in order to properly iterate over the current hierarchy
structure in the following sections.
data("workforceHistory")
# Reduce to DATE <= today to exclude future-dated records
dt = workforceHistory[DATE<=Sys.Date()]
# Reduce to max DATE and SEQ per person
dt = dt[dt[,.I[which.max(DATE)],by=.(EMPLID)]$V1]
dt = dt[dt[,.I[which.max(SEQ)],by=.(EMPLID,DATE)]$V1]
# Only consider workers who are currently active
# This provides a reliable 'headcount' data set that reflects today's active workforce
dt = dt[STATUS=="Active"]
# Exclude the CEO because she does not have a supervisor
CEO = dt[TITLE=="CEO",EMPLID]
dt = dt[EMPLID!=CEO]
# Show the prepared table
# This represents an example, active workforce
print(dt[,.(EMPLID,NAME,TITLE,SUPVID)])
#> EMPLID NAME TITLE SUPVID
#> <int> <char> <char> <int>
#> 1: 131356 George Analyst 199827
#> 2: 199827 Pablo Director 111355
#> 3: 534441 Rebekah Analyst 199827
#> 4: 199901 Enrique Associate 199827
#> 5: 268831 Hillary Intern 131356
The hierarchy
convenience function transforms a standard
set of unique employee and supervisor identifiers (employee IDs, email
addresses, etc.) into an elongated or wide format that can be used to
aggregate employee data by a particular line of leadership (i.e. include
everyone who rolls up to Susan).
When format = "long"
, the function returns a long
data.table consisting of one row per employee for every supervisor above
them, up to the top of the tree.
hLong = hierarchy(dt$EMPLID,dt$SUPVID,format="long")
print(hLong)
#> Employee Level Supervisor
#> <char> <int> <char>
#> 1: 131356 1 111355
#> 2: 131356 2 199827
#> 3: 199827 1 111355
#> 4: 199901 1 111355
#> 5: 199901 2 199827
#> 6: 268831 1 111355
#> 7: 268831 2 199827
#> 8: 268831 3 131356
#> 9: 534441 1 111355
#> 10: 534441 2 199827
# Who reports up through Susan? (direct and indirect reports)
print(hLong[Supervisor==CEO])
#> Employee Level Supervisor
#> <char> <int> <char>
#> 1: 131356 1 111355
#> 2: 199827 1 111355
#> 3: 199901 1 111355
#> 4: 268831 1 111355
#> 5: 534441 1 111355
When format = "wide"
, the function returns a wide
data.table with a column for every level in the hierarchy, starting from
the top of the tree (i.e. “Supv1” is the top person in the
hierarchy).
hWide = hierarchy(dt$EMPLID,dt$SUPVID,format="wide")
print(hWide)
#> Employee Supv1 Supv2 Supv3
#> <char> <char> <char> <char>
#> 1: 199827 111355 <NA> <NA>
#> 2: 131356 111355 199827 <NA>
#> 3: 534441 111355 199827 <NA>
#> 4: 199901 111355 199827 <NA>
#> 5: 268831 111355 199827 131356
# Who reports up through Pablo? (direct and indirect reports)
print(hWide[Supv2==199827])
#> Employee Supv1 Supv2 Supv3
#> <char> <char> <char> <char>
#> 1: 131356 111355 199827 <NA>
#> 2: 534441 111355 199827 <NA>
#> 3: 199901 111355 199827 <NA>
#> 4: 268831 111355 199827 131356
The hierarchyStats
function computes summary statistics
and span of control metrics from a standard set of unique employee and
supervisor identifiers (employee IDs, email addresses, etc.). The
resulting metrics and table are accessible from a list object.
hStats = hierarchyStats(dt$EMPLID,dt$SUPVID)
# Total Levels:
print(hStats$levelsCount$value)
#> [1] 4
# Total Individual Contributors:
print(hStats$individualContributorsCount$value)
#> [1] 3
# Total People Managers:
print(hStats$peopleManagersCount$value)
#> [1] 3
# Median Direct Reports:
print(hStats$medianDirectReports$value)
#> [1] 1
# Median Span of Control (Direct and Indirect Reports):
print(hStats$medianSpanOfControl$value)
#> [1] 4
# Span of Control Table
print(hStats$spanOfControlTable)
#> Key: <Employee>
#> Employee directReports spanOfControl
#> <char> <int> <int>
#> 1: 111355 1 5
#> 2: 131356 1 1
#> 3: 199827 3 4