Input to a regression problem

Автор работы: Пользователь скрыл имя, 23 Апреля 2013 в 12:35, задача

Краткое описание

Simple regression: (x1, Y1), (x1, Y2), … , (xn, Yn) Multiple regression: ( (x1)1, (x2)1, (x3)1, … (xK)1, Y1),
( (x1)2, (x2)2, (x3)2, … (xK)2, Y2),
( (x1)3, (x2)3, (x3)3, … (xK)3, Y3),

Скачать полностью (23.39 Кб) Сколько стоит заказать работу?

Прикрепленные файлы: 1 файл

MultipleRegressionBasicsCollection.doc

— 86.50 Кб (Скачать документ)

INPUT TO A REGRESSION PROBLEM

Simple regression: (x₁, Y₁), (x₁, Y₂), … , (x_n, Y_n) Multiple regression: ( (x1)₁, (x2)₁, (x3)₁, … (xK)₁, Y₁),

⁽ ⁽^x¹⁾₂^, ⁽^x²⁾₂^, ⁽^x³⁾₂^, ^{… (}^x^K⁾₂^, ^Y₂^),

⁽ ⁽^x¹⁾₃^, ⁽^x²⁾₃^, ⁽^x³⁾₃^, ^{… (}^x^K⁾₃^, ^Y₃^),

… ,

( (x1)_n, (x2)_n, (x3)_n, … (xK)_n, Y_n),

The variable Y is designated as the “dependent variable.” The only distinction between the two situations above is whether there is just one x predictor or many. The predictors are called “independent variables.”

There is a certain awkwardness about giving generic names for the independent variables in the multiple regression case. In this notation, x1 is the name of the first independent variable, and its values are (x1)₁, (x1)₂, (x1)₃, … , (x1)_n . In any application, this awkwardness disappears, as the independent variables will have application-based names such as SALES, STAFF, RESERVE, BACKLOG, and so on. Then SALES would be the first independent variable, and its values would be SALES₁, SALES₂, SALES₃, … , SALES_n .

The listing for the multiple regression case suggests that the data are found in a spreadsheet. In application programs like Minitab, the variables can appear in any of the spreadsheet columns. The dependent variable and the independent

variables may appear in any columns in any order. Microsoft’s EXCEL requires that you identify the independent variables by blocking off a section of the spreadsheet; this means that the independent variables must appear in consecutive columns.

MINDLESS COMPUTATIONAL POINT OF VIEW

_The _{output from} _{a regression exercise is a “}_f_{itted
regression} _model.”

Simple regression:

Multiple regression:

Y = b₀ + b₁ x

Y^ˆ = b + b ( x1) + b ( x2) + b ( x3) + ... + b ( xK )

Many statistical summaries are also produced. These are R², standard error of estimate, t statistics for the b’s, an F statistic for the whole regression, leverage values, path coefficients, and on and on and on and ...... This work is generally done by a computer program, and we’ll give a separate document listing and explaining the output.

WHY DO PEOPLE DO REGRESSIONS?

A cheap answer is that they want to explore the relationships among the variables.

A slightly better answer is that we would like to use the framework of the methodology to get a yes-or-no answer to this question: Is there a significant relationship between

variable Y and one or more of the predictors? Be aware that the word significant has a very special jargon meaning.

An simple but honest answer pleads curiousity.

The most valuable (and correct) use of regression is in making predictions; see the next point. Only a small minority of regression exercises end up by making a prediction, however.

HOW DO WE USE REGRESSIONS TO MAKE PREDICTIONS?

The prediction situation is one in which we have new predictor variables but do not yet have the corresponding Y.

Simple regression: We have a new x value, call it x_new , and the predicted (or fitted) value for the corresponding Y value is

_ˆ

^Y_new

⁼ ^b₀ ⁺ ^b₁ ^x_new ^.

Multiple regression: We have new predictors, call them (x1)_new, (x2)_n_e_w, (x3)_new,

^…, ⁽^xK⁾_new ^. ^{The predicted (or fitted) value for the}

corresponding Y value is

_Y_ˆ₌ _b

₊ _b ₍ _x₁₎

₊ _b ₍ _x₂₎

₊ _b ₍ _x₃₎

₊ _... ₊ _b

₍ _x_K ₎

new 0 1

new 2

new 3

new K new

CAN I PERFORM REGRESSIONS WITHOUT ANY UNDERSTANDING OF THE UNDERLYING MODEL AND WHAT THE OUTPUT MEANS?

Yes, many people do. In fact, we’ll be able to come up with rote directions that will work in the great majority of cases. Of course, these rote directions will sometimes mislead you. And wisdom still works better than ignorance.

WHAT’S THE REGRESSION MODEL?

The model says that Y is a linear function of the predictors, plus statistical noise.

Simple regression: Y_i = â₀ + â₁ x_i + å_i

Multiple regression: Y_i = â₀ + â₁ (x1)_i + â₂ (x2)_i + â₃ (x3)_i + … + â_K (xK)_i + å_i

The coefficients (the â’s) are nonrandom but unknown quantities. The noise terms å₁, å₂, å₃, …, å_n are random and unobserved. Moreover, we assume that these å’s are statistically independent, each with mean 0 and (unknown) standard deviation ó.

The model is simple, except for the details about the å’s. We’re just saying that each data point is obscured by noise of unknown magnitude. We assume that the noise terms are not out to deceive us by lining up in perverse ways, and this is accomplished by making the noise terms independent.

Sometimes we also assume that the noise terms are taken from normal populations, but this assumption is rarely crucial.

WHO GIVES ANYONE THE RIGHT TO MAKE A REGRESSION MODEL? DOES THIS MEAN THAT WE CAN JUST SAY SOMETHING AND IT AUTOMATICALLY IS CONSIDERED AS TRUE?

Good questions. Merely claiming that a model is correct does not make it correct. A model is a mathematical abstraction of reality. Models are selected on the basis of simplicity and credibility. The regression model used here has proved very effective. A careful user of regression will make a number of checks to determine if the regression model is believable. If the model is not believable, remedial action must be taken.

HOW CAN WE TELL IF A REGRESSION MODEL IS BELIEVABLE? AND WHAT’S THIS REMEDIAL ACTION STUFF?

Patience, please. It helps to examine some successful regression exercises before moving on to these questions.

THERE SEEMS TO BE SOME PARALLEL STRUCTURE INVOLVING THE MODEL AND THE FITTED MODEL.

It helps to see these things side-by-side.

Simple regression:

The model is Y_i = â₀ + â₁ x_i + å_i

The fitted model is

Y = b₀ + b₁ x

Multiple regression:

The model is Y_i = â₀ + â₁ (x1)_i + â₂ (x2)_i + â₃ (x3)_i + …

+ â_K (xK)_i + å_i

The fitted model is

Y^ˆ = b + b ( x1) + b ( x2) + b ( x3) + ... + b ( xK )

The Roman letters (the b’s) are estimates of the corresponding Greek letters (the â’s).

WHAT ARE THE FITTED VALUES?

In any regression, we can “predict” or retro-fit the Y values that we’ve already observed, in the spirit of the PREDICTIONS section above.

Simple regression:

The model is Y_i = á + â x_i + å_i

_The_f_{itted model is}

_Y ₌ _a ₊ _bx

_The_f_{itted v}_a_lue _f_{or point} _i _is

_Y ₌ _a ₊ _bx

i i

Multiple regression:

The model is Y_i= â₀ + â₁ (x1)_i + â₂ (x2)_i + â₃ (x3)_i + …

+ â_K (xK)_i + å_i

The fitted model is

Y^ˆ = b + b ( x1) + b ( x2) + b ( x3) + ... + b ( xK )

The fitted value for point i is

_Y_ˆ ₌ _b

₊ _b ₍ _x₁₎

₊ _b ₍ _x₂₎

₊ _b ₍ _x₃₎

₊ _... ₊ _b

₍ _x_K ₎

i 0 1 i 2

i 3 i K i

_Indeed, _{one way to assess the success of the} _{regression is the closene}_s_{s of
these fitted} _Y

_{values, na}_m_ely _Y _, _Y _, _Y_, _..._, _Y

_to _{the actual ob}_s_erved _Y _values_Y₁_, _Y₂_, _Y₃_, _…, _Y_n_.

1 2 3 n

THIS IS LOOKING COMPUTATIONALLY HOPELESS.

Indeed it is. These calculations should only be done by computer. Even a careful, well- intentioned person is going to make arithmetic errors if attempting this by a non-

computer method. You should also be aware that computer programs seem to compete in using the latest innovations. Many of these innovations are passing fads, so don’t feel too bad about not being up-to-the-minute on the latest changes.

_The _{notation used here in the} _m_{odels
is not} _{universal. Here are so}_m_{e other
possibilities.}

Notation here	Other notation
^Y_i	^y_i
^x_i	^X_i
â₀+â₁x_i	á+â x_i
å_i	^e_i ^or ^r_i
⁽^x¹⁾_i^,⁽^x²⁾_i^,⁽^x³⁾_i^, ^{…, (}^x^K⁾_i	^x_i₁^, ^x_i₂^, ^x_i₃^, ^…,^x_iK
^b_j	_â^ˆ ^j

Информация о работе Input to a regression problem