reshape                package:stats                R Documentation

_R_e_s_h_a_p_e _G_r_o_u_p_e_d _D_a_t_a

_D_e_s_c_r_i_p_t_i_o_n:

     This function reshapes a data frame between 'wide' format with
     repeated measurements in separate columns of the same record and
     'long' format with the repeated measurements in separate records.

_U_s_a_g_e:

     reshape(data, varying = NULL, v.names = NULL, timevar = "time",
             idvar = "id", ids = 1:NROW(data),
             times = seq_along(varying[[1]]),
             drop = NULL, direction, new.row.names = NULL,
             sep = ".",
             split = if (sep==""){
                 list(regexp="[A-Za-z][0-9]",include=TRUE)
             } else {
                 list(regexp=sep, include= FALSE, fixed=TRUE)}
             )

_A_r_g_u_m_e_n_t_s:

    data: a data frame

 varying: names of sets of variables in the wide format that correspond
          to single variables in long format ('time-varying').  This is
          canonically a list of vectors of variable names, but it can
          optionally be a matrix of names, or a single vector of names.
          In each case, the names can be replaced by indexes which are
          interpreted as referring to 'names(data)'.  See below for
          more details and options.

 v.names: names of variables in the long format that correspond to
          multiple variables in the wide format. See below for details.

 timevar: the variable in long format that differentiates multiple
          records from the same group or individual.

   idvar: Names of one or more variables in long format that identify
          multiple records from the same group/individual.  These
          variables may also be present in wide format

     ids: the values to use for a newly created 'idvar' variable in
          long format.

   times: the values to use for a newly created 'timevar' variable in
          long format. See below for details.

    drop: a vector of names of variables to drop before reshaping

direction: character string, either '"wide"' to reshape to wide format,
          or '"long"' to reshape to long format.

new.row.names: logical; if 'TRUE' and 'direction="wide"', create new
          row names in long format from the values of the id and time
          variables.

     sep: A character vector of length 1, indicating a separating
          character in the variable names in the wide format. This is
          used for guessing 'v.names' and 'times' arguments based on
          the names in 'varying'. If 'sep==""', the split is just
          before the first numeral that follows an alphabetic
          character.

   split: A list with three components, 'regexp', 'include', and
          (optionally) 'fixed'. This allows an extended interface to
          variable name splitting. See below for details.

_D_e_t_a_i_l_s:

     The arguments to this function are described in terms of
     longitudinal data, as that is the application motivating the
     functions.  A 'wide' longitudinal dataset will have one record for
     each individual with some time-constant variables that occupy
     single columns and some time-varying variables that occupy a
     column for each time point.  In 'long' format there will be
     multiple records for each individual, with some variables being
     constant across these records and others varying across the
     records.  A 'long' format dataset also needs a 'time' variable
     identifying which time point each record comes from and an 'id'
     variable showing which records refer to the same person.

     If the data frame resulted from a previous 'reshape' then the
     operation can be reversed simply by 'reshape(a)'. The 'direction'
     argument is optional and the other arguments are stored as
     attributes on the data frame.

     If 'direction="wide"' and no 'varying' or 'v.names' arguments are
     supplied it is assumed that all variables except 'idvar' and
     'timevar' are time-varying. They are all expanded into multiple
     variables in wide format.

     If 'direction="long"' the 'varying' argument can be a vector of
     column names (or a corresponding index). The function will attempt
     to guess the 'v.names' and 'times' from these names.  The default
     is variable names like 'x.1', 'x.2', where 'sep="."'  specifies to
     split at the dot and drop it from the name. To have alphabetic
     followed by numeric times use 'sep=""'.

     Variable name splitting as described above is only attempted in
     the case where 'varying' is an atomic vector, if it is a list or a
     matrix, 'v.names' and 'times' will generally need to be specified,
     although they will default to, respectively, the first variable
     name in each set, and sequential times.

     Also, guessing is not attempted if 'v.names' is given explicitly.
     Notice that the order of variables in 'varying' is like
     'x.1','y.1','x.2','y.2'.

     The 'split' argument should not usually be necessary. The
     'split$regexp' component is passed to either 'strsplit()' or
     'regexp()', where the latter is used if 'split$include' is 'TRUE',
     in which case the splitting occurs after the first character of
     the matched string. In the 'strsplit()' case, the separator is not
     included in the result, and it is possible to specify fixed-string
     matching using 'split$fixed'.

_V_a_l_u_e:

     The reshaped data frame with added attributes to simplify
     reshaping back to the original form.

_S_e_e _A_l_s_o:

     'stack', 'aperm'; 'relist' for reshaping the result of 'unlist'.

_E_x_a_m_p_l_e_s:

     summary(Indometh)
     wide <- reshape(Indometh, v.names="conc", idvar="Subject",
                     timevar="time", direction="wide")
     wide

     reshape(wide, direction="long")
     reshape(wide, idvar="Subject", varying=list(2:12),
             v.names="conc", direction="long")

     ## times need not be numeric
     df <- data.frame(id=rep(1:4,rep(2,4)),
                      visit=I(rep(c("Before","After"),4)),
                      x=rnorm(4), y=runif(4))
     df
     reshape(df, timevar="visit", idvar="id", direction="wide")
     ## warns that y is really varying
     reshape(df, timevar="visit", idvar="id", direction="wide", v.names="x")

     ##  unbalanced 'long' data leads to NA fill in 'wide' form
     df2 <- df[1:7,]
     df2
     reshape(df2, timevar="visit", idvar="id", direction="wide")

     ## Alternative regular expressions for guessing names
     df3 <- data.frame(id=1:4, age=c(40,50,60,50), dose1=c(1,2,1,2),
                       dose2=c(2,1,2,1), dose4=c(3,3,3,3))
     reshape(df3, direction="long", varying=3:5, sep="")

     ## an example that isn't longitudinal data
     state.x77 <- as.data.frame(state.x77)
     long <- reshape(state.x77, idvar="state", ids=row.names(state.x77),
                     times=names(state.x77), timevar="Characteristic",
                     varying=list(names(state.x77)), direction="long")

     reshape(long, direction="wide")

     reshape(long, direction="wide", new.row.names=unique(long$state))

     ## multiple id variables
     df3 <- data.frame(school=rep(1:3,each=4), class=rep(9:10,6),
                       time=rep(c(1,1,2,2),3),
     score=rnorm(12))
     wide <- reshape(df3, idvar=c("school","class"), direction="wide")
     wide
     ## transform back
     reshape(wide)

