LEAST MEDIAN OF SQUARES AND REGRESSION THROUGH THE ORIGIN

11 ASSUMPTION OF THE ORDINARY LEAST SQUARES MODEL TO
12 INSTRUCTIONAL DIFFERENTIATION STRATEGIES—THREE LEVELS LEVEL ONE—LEAST CHALLENGING TO
21LEAST SQUARES REGRESSION IF A SCATTERPLOT SHOWS A

9 LEASTSQUARES JOINT DIAGONALIZATION OF A MATRIX SET BY
A PETITION MUST BE IN AT LEAST 8POINT TYPE
ADDRESSING THE NEEDS OF LEAST DEVELOPED COUNTRIES BACKGROUND PAPER

LMS and Regression Through the origin

Least Median of Squares and Regression through the Origin

Supporting files online at

http://www.wabash.edu/econexcel/LMSOrigin







By


Humberto Barreto

Department of Economics

Wabash College

Crawfordsville, IN 47933

[email protected]

and

David Maharry

Department of Mathematics and Computer Science

Wabash College

Crawfordsville, IN 47933

[email protected]


The authors thank Michael Axtell, Frank Howland, and anonymous referees

for suggestions and criticisms.


Comments welcome

Do not quote without the author’s permission

January 2005


Abstract

An exact algorithm is provided for finding the Least Median of Squares (LMS) line for a bivariate regression with no intercept term. It is shown that the popular PROGRESS routine will not, in general, find the LMS slope when the intercept is suppressed.



A Microsoft Excel workbook that provides the code in Visual Basic is made available at www.wabash.edu/econexcel/LMSOrigin


Keywords: LMS, Robust Regression, PROGRESS



1. Introduction


Rousseeuw [1984] introduced Least Median of Squares (LMS) as a robust regression procedure. Instead of minimizing the sum of squared residuals, coefficients are chosen so as to minimize the median of the squared residuals. Unlike conventional least squares (LS), there is no closed-form solution with which to easily calculate the LMS line since the median is an order or rank statistic. A general non-linear optimization algorithm performs poorly because the median of squared residuals surface is so bumpy that merely local minima are often incorrectly reported as the solution.


Although a closed-form solution does not exist and brute force optimization is not reliable, several algorithms are available for fitting the LMS line (or hyperplane). Perhaps the most popular approach is called PROGRESS (from Program for RObust reGRESSion). The program itself is explained in Rousseeuw and Leroy [1987] and the most recent version is available at http://www.agoras.ua.ac.be/. Several software packages, such as SAS/IML (version 6.12 or greater), have an LMS routine based on PROGRESS.


This paper focuses on the special problem of finding the LMS fitted line through the origin in the bivariate case. The next section presents the model and defines the LMS line. Section 3 shows that the PROGRESS algorithm gives an incorrect solution, in general, when the intercept is restricted to zero. Section 4 presents an analytical, exact method for finding the minimum median squared residual for the bivariate, zero intercept case. Finally, a simple example is provided to illustrate the algorithm and show why PROGRESS fails in the zero-intercept case.


<, the algorithm chooses the point with the smallest deviation. The 5th one, at a slope of 2.4 produces the smallest deviation with a value of 0.64. This is the solution to the problem. The ‘best’ straight line using these 5 data points is the one with a slope of 2.4. All other slopes produce larger median squared deviations.


It is possible for more than two parabolas to intersect at a point, but the parabola that becomes the new median can be determined by ordering the intersecting parabolas based on their slope and curvature at the point of intersection. In the case of an even number of data points it is necessary to follow two parabolas, representing the (n/2)th deviation and the (n/2+1)th deviation, since the median is the average of these two values.


When there are n data points the efficiency of this algorithm is O(n2 log n) in the worst case. It requires determining the intersections of each of the parabolas with the median parabola to choose the next intersection to use. It is possible that each parabola might be the median parabola at some point of the algorithm. Thus one may have to determine as many as LEAST MEDIAN OF SQUARES AND REGRESSION THROUGH THE ORIGIN intersections and these intersections have to be ordered.


Figure 2 can also be used to show another view of how the PROGRESS algorithm works in the zero intercept case. For each value of the slope that causes the straight line to pass through a data point, giving a squared residual of zero, the PROGRESS algorithm computes the median deviation of the 5 data points. It then chooses the slope that causes the minimum value of this set of deviations. The discussion presented in the previous paragraphs makes clear why this approach fails—the global minimum squared residual will not, in general, be associated with a slope where a squared residual value for an individual observation is zero. This will provide the correct result only in the case where a majority of the data points lie on a straight line through the origin.


6. Conclusion


When applying Least Median of Squares, coefficients are chosen so as to minimize the median of the squared residuals. Because the median is not sensitive to extreme values, it can outperform conventional least squares when data are contaminated. This paper makes two contributions to the LMS literature:

  1. PROGRESS, the standard algorithm for fitting the LMS estimator, does not find the true LMS fit when the intercept is suppressed. Any computations based on the estimated slope (such as regression diagnostics and estimated standard errors) are also wrong.

  2. For a bivariate regression with a zero-intercept,LEAST MEDIAN OF SQUARES AND REGRESSION THROUGH THE ORIGIN , an algorithmic method based on keeping track of the median squared residual is demonstrated.


References


Barreto, Humberto (2001) “An Introduction to Least Median of Squares,” unpublished manuscript, http://www.wabash.edu/econexcel/LMSOrigin (LMSIntro.doc).


Rousseeuw, Peter J. (1984) “Least Median of Squares Regression,” Journal of the American Statistical Association, 79 (388), 871-880.


Rousseeuw, Peter J. and Annick M. Leroy (1987) Robust Regression and Outlier Detection, John Wiley & Sons: New York.


1 “In simple regression (p=2), it follows from (Steele and Steiger 1986) that if all 2-subsets are used and their intercept is adjusted each time, we obtain the exact LQS.” Rousseeuw and Hubert [1997], p. 9.

10

111584.doc


APPENDIX C WEIGHTED LEAST SQUARES FIT BY ITERATION1
AT LEAST 3 OF PUBLIC SERVICE JOBS ARE FOR
BATAVIA BUSINESSES OPERATING FOR AT LEAST 50 YEARS COMPILED


Tags: least median, to least, median, squares, origin, regression, through, least