Find best transformations of the parameters for Linear Regression.

bestfit(X, ...)

# S3 method for default
bestfit(X, y, t = list(), p = list(), response, ...)

# S3 method for formula
bestfit(formula, data, subset, transf = c("rsqrt", "log",
  "sqrt"), ...)

Arguments

X

a design matrix for a regression model

not used

y

the vector of the response variable

t

The transformed data

p

The combinations of transformed variables to be tested

response

The name of the response variable

formula

A standard linear regression formula, with no transformation in the parameters.

data

A data frame containing the variables in the model.

subset

a specification of the rows to be used: defaults to all rows. This can be any valid indexing vector (see [.data.frame) for the rows of data or if that is not supplied, a data frame made up of the variables used in formula.

transf

A family of functions to be used to transform the variables in the data frame, in order to find the best combination of transformation to be applied to the data - usually functions of the box-cox family.

Value

a vector with adjusted R2 to each fit

Details

bestfit is a generic function for finding best transformations of the parameters for Linear Regression.

Examples

best_fit <- bestfit(valor ~ ., data = centro_2015@data) print(best_fit, n = 20)
#> Call: #> bestfit.formula(formula = valor ~ ., data = centro_2015@data) #> #> Best 20 fits: #> id valor area_total quartos suites garagens dist_b_mar adj_R2 #> 443 1 rsqrt sqrt rsqrt identity sqrt rsqrt 0.9480455 #> 395 2 rsqrt identity rsqrt identity sqrt rsqrt 0.9477222 #> 955 3 rsqrt sqrt rsqrt sqrt sqrt rsqrt 0.9474578 #> 907 4 rsqrt identity rsqrt sqrt sqrt rsqrt 0.9472744 #> 439 5 rsqrt sqrt log identity sqrt rsqrt 0.9471142 #> 951 6 rsqrt sqrt log sqrt sqrt rsqrt 0.9468425 #> 391 7 rsqrt identity log identity sqrt rsqrt 0.9466028 #> 903 8 rsqrt identity log sqrt sqrt rsqrt 0.9465023 #> 411 9 rsqrt log rsqrt identity sqrt rsqrt 0.9460101 #> 407 10 rsqrt log log identity sqrt rsqrt 0.9455580 #> 442 11 rsqrt sqrt rsqrt identity sqrt log 0.9454184 #> 954 12 rsqrt sqrt rsqrt sqrt sqrt log 0.9450922 #> 959 13 rsqrt sqrt sqrt sqrt sqrt rsqrt 0.9449867 #> 923 14 rsqrt log rsqrt sqrt sqrt rsqrt 0.9449182 #> 447 15 rsqrt sqrt sqrt identity sqrt rsqrt 0.9447919 #> 919 16 rsqrt log log sqrt sqrt rsqrt 0.9447690 #> 438 17 rsqrt sqrt log identity sqrt log 0.9446014 #> 950 18 rsqrt sqrt log sqrt sqrt log 0.9445764 #> 394 19 rsqrt identity rsqrt identity sqrt log 0.9445607 #> 906 20 rsqrt identity rsqrt sqrt sqrt log 0.9443802 #> ...
s <- summary(best_fit) #There still may be outliers: out <- car::outlierTest(s$fit) outliers <- match(names(out$p), rownames(centro_2015@data)) # There are two ways to handle with them: # Recalling bestfit via update with a subset argument ... best_fit <- update(best_fit, subset = -outliers)
#> Error in eval(substitute(subset), data, env): objeto 'outliers' não encontrado
# Or assigning a subset argument directly into summary.bestfit s <- summary(best_fit, fit = 1, subset = -outliers) # The latter takes less computational effort, since it only updates the # lm call of the chosen fit. The former is more precise, since it runs # bestfit again without the outliers.