Assignment 2
The
data is taken from the Places Rated Almanac, by Richard Boyer and David
Savageau, copyrighted and published by Rand McNally. The nine rating criteria
used by Places Rated Almanac are:
- Climate & Terrain
- Housing
- Health Care & Environment
- Crime
- Transportation
- Education
- The Arts
- Recreation
- Economics
For all but two of the above criteria, the
higher the score, the better. For Housing and Crime, the lower the score the
better. The scores are computed using the following component statistics for
each criterion (see the Places Rated Almanac for details):
- Climate & Terrain: very hot and very cold
months, seasonal temperature variation, heating- and cooling-degree days,
freezing days, zero-degree days, ninety-degree days.
- Housing: utility bills, property taxes,
mortgage payments.
- Health Care & Environment: per capita
physicians, teaching hospitals, medical schools, cardiac rehabilitation
centers, comprehensive cancer treatment centers, hospices,
insurance/hospitalization costs index, flouridation of drinking water, air
pollution.
- Crime: violent crime rate, property crime
rate.
- Transportation: daily commute, public
transportation, Interstate highways, air service, passenger rail service.
- Education: pupil/teacher ratio in the
public K-12 system, effort index in K-12, accademic options in higher
education.
- The Arts: museums, fine arts and public
radio stations, public television stations, universities offering a degree
or degrees in the arts, symphony orchestras, theatres, opera companies,
dance companies, public libraries.
- Recreation: good restaurants, public golf
courses, certified lanes for tenpin bowling, movie theatres, zoos,
aquariums, family theme parks, sanctioned automobile race tracks,
pari-mutuel betting attractions, major- and minor- league professional
sports teams, NCAA Division I football and basketball teams, miles of ocean
or Great Lakes coastline, inland water, national forests, national parks, or
national wildlife refuges, Consolidated Metropolitan Statistical Area
access.
- Economics: average household income
adjusted for taxes and living costs, income growth, job growth.
In addition latitude and longitude, population
and state and case number are also given. Use principal components analysis to
identify the major components of variation in the ratings amongst cities.
In particular
- How many principal components are needed?
- Interpret your principal components.
- If you could only use a few variables from
the original dataset, which would they be?
- Identify unusual cities.
To identify the unusual cities, the
identify() function
is useful. here is an example of its use:> plot(places[,1],places[,2])
> identify(places[,1],places[,2],row.names(places))
Now click on the plot with the right mouse button
to identify points and use the middle button to finish.