Find best transformations of the parameters for Linear Regression.
bestfit(X, ...) # S3 method for default bestfit(X, y, t = list(), p = list(), response, ...) # S3 method for formula bestfit(formula, data, subset, transf = c("rsqrt", "log", "sqrt"), ...)
X | a design matrix for a regression model |
---|---|
… | not used |
y | the vector of the response variable |
t | The transformed data |
p | The combinations of transformed variables to be tested |
response | The name of the response variable |
formula | A standard linear regression formula, with no transformation in the parameters. |
data | A data frame containing the variables in the model. |
subset | a specification of the rows to be used: defaults to all rows.
This can be any valid indexing vector (see [.data.frame) for the
rows of data or if that is not supplied, a data frame made up of the
variables used in |
transf | A family of functions to be used to transform the variables in the data frame, in order to find the best combination of transformation to be applied to the data - usually functions of the box-cox family. |
a vector with adjusted R2 to each fit
bestfit
is a generic function for finding best transformations of the
parameters for Linear Regression.
best_fit <- bestfit(valor ~ ., data = centro_2015@data) print(best_fit, n = 20)#> Call: #> bestfit.formula(formula = valor ~ ., data = centro_2015@data) #> #> Best 20 fits: #> id valor area_total quartos suites garagens dist_b_mar adj_R2 #> 443 1 rsqrt sqrt rsqrt identity sqrt rsqrt 0.9480455 #> 395 2 rsqrt identity rsqrt identity sqrt rsqrt 0.9477222 #> 955 3 rsqrt sqrt rsqrt sqrt sqrt rsqrt 0.9474578 #> 907 4 rsqrt identity rsqrt sqrt sqrt rsqrt 0.9472744 #> 439 5 rsqrt sqrt log identity sqrt rsqrt 0.9471142 #> 951 6 rsqrt sqrt log sqrt sqrt rsqrt 0.9468425 #> 391 7 rsqrt identity log identity sqrt rsqrt 0.9466028 #> 903 8 rsqrt identity log sqrt sqrt rsqrt 0.9465023 #> 411 9 rsqrt log rsqrt identity sqrt rsqrt 0.9460101 #> 407 10 rsqrt log log identity sqrt rsqrt 0.9455580 #> 442 11 rsqrt sqrt rsqrt identity sqrt log 0.9454184 #> 954 12 rsqrt sqrt rsqrt sqrt sqrt log 0.9450922 #> 959 13 rsqrt sqrt sqrt sqrt sqrt rsqrt 0.9449867 #> 923 14 rsqrt log rsqrt sqrt sqrt rsqrt 0.9449182 #> 447 15 rsqrt sqrt sqrt identity sqrt rsqrt 0.9447919 #> 919 16 rsqrt log log sqrt sqrt rsqrt 0.9447690 #> 438 17 rsqrt sqrt log identity sqrt log 0.9446014 #> 950 18 rsqrt sqrt log sqrt sqrt log 0.9445764 #> 394 19 rsqrt identity rsqrt identity sqrt log 0.9445607 #> 906 20 rsqrt identity rsqrt sqrt sqrt log 0.9443802 #> ...s <- summary(best_fit) #There still may be outliers: out <- car::outlierTest(s$fit) outliers <- match(names(out$p), rownames(centro_2015@data)) # There are two ways to handle with them: # Recalling bestfit via update with a subset argument ... best_fit <- update(best_fit, subset = -outliers)#> Error in eval(substitute(subset), data, env): objeto 'outliers' não encontrado# Or assigning a subset argument directly into summary.bestfit s <- summary(best_fit, fit = 1, subset = -outliers) # The latter takes less computational effort, since it only updates the # lm call of the chosen fit. The former is more precise, since it runs # bestfit again without the outliers.