Best fit models

Find best transformations of the parameters for Linear Regression.

bestfit(X, ...)

# S3 method for default
bestfit(X, y, t = list(), p = list(), response, ...)

# S3 method for formula
bestfit(formula, data, subset, transf = c("rsqrt", "log",
  "sqrt"), ...)

Arguments

X	a design matrix for a regression model
…	not used
y	the vector of the response variable
t	The transformed data
p	The combinations of transformed variables to be tested
response	The name of the response variable
formula	A standard linear regression formula, with no transformation in the parameters.
data	A data frame containing the variables in the model.
subset	a specification of the rows to be used: defaults to all rows. This can be any valid indexing vector (see [.data.frame) for the rows of data or if that is not supplied, a data frame made up of the variables used in `formula`.
transf	A family of functions to be used to transform the variables in the data frame, in order to find the best combination of transformation to be applied to the data - usually functions of the box-cox family.

Value

a vector with adjusted R2 to each fit

Details

bestfit is a generic function for finding best transformations of the parameters for Linear Regression.

Examples

best_fit <- bestfit(valor ~ ., data = centro_2015@data)
print(best_fit, n = 20)
#> Call:
#> bestfit.formula(formula = valor ~ ., data = centro_2015@data)
#> 
#> Best 20 fits:
#>     id valor area_total quartos   suites garagens dist_b_mar    adj_R2
#> 443  1 rsqrt       sqrt   rsqrt identity     sqrt      rsqrt 0.9480455
#> 395  2 rsqrt   identity   rsqrt identity     sqrt      rsqrt 0.9477222
#> 955  3 rsqrt       sqrt   rsqrt     sqrt     sqrt      rsqrt 0.9474578
#> 907  4 rsqrt   identity   rsqrt     sqrt     sqrt      rsqrt 0.9472744
#> 439  5 rsqrt       sqrt     log identity     sqrt      rsqrt 0.9471142
#> 951  6 rsqrt       sqrt     log     sqrt     sqrt      rsqrt 0.9468425
#> 391  7 rsqrt   identity     log identity     sqrt      rsqrt 0.9466028
#> 903  8 rsqrt   identity     log     sqrt     sqrt      rsqrt 0.9465023
#> 411  9 rsqrt        log   rsqrt identity     sqrt      rsqrt 0.9460101
#> 407 10 rsqrt        log     log identity     sqrt      rsqrt 0.9455580
#> 442 11 rsqrt       sqrt   rsqrt identity     sqrt        log 0.9454184
#> 954 12 rsqrt       sqrt   rsqrt     sqrt     sqrt        log 0.9450922
#> 959 13 rsqrt       sqrt    sqrt     sqrt     sqrt      rsqrt 0.9449867
#> 923 14 rsqrt        log   rsqrt     sqrt     sqrt      rsqrt 0.9449182
#> 447 15 rsqrt       sqrt    sqrt identity     sqrt      rsqrt 0.9447919
#> 919 16 rsqrt        log     log     sqrt     sqrt      rsqrt 0.9447690
#> 438 17 rsqrt       sqrt     log identity     sqrt        log 0.9446014
#> 950 18 rsqrt       sqrt     log     sqrt     sqrt        log 0.9445764
#> 394 19 rsqrt   identity   rsqrt identity     sqrt        log 0.9445607
#> 906 20 rsqrt   identity   rsqrt     sqrt     sqrt        log 0.9443802
#> ...
s <- summary(best_fit)

#There still may be outliers:
out <- car::outlierTest(s$fit)
outliers <- match(names(out$p), rownames(centro_2015@data))

# There are two ways to handle with them:

# Recalling bestfit via update with a subset argument ...
best_fit <- update(best_fit, subset = -outliers)
#> Error in eval(substitute(subset), data, env): objeto 'outliers' não encontrado

# Or assigning a subset argument directly into summary.bestfit
 s <- summary(best_fit, fit = 1, subset = -outliers)

# The latter takes less computational effort, since it only updates the
# lm call of the chosen fit. The former is more precise, since it runs
# bestfit again without the outliers.

Arguments

Value

Details

Examples

Contents