A principal component analysis can be run on the data to determine which variables explain the majority of the variability in the data.
summary:
Values proportion of variance St. Deviation
4.2282 0.9246 2.0563
0.2427 0.0531 0.4926
0.0782 0.0171 0.2797
0.0238 0.0052 0.1544
| |
The principal component analysis command returns a record, which we can query in order to return the principal components, the rotation matrix, and details on the proportion of variance explained by each component. Note that this can also be seen by using the summarize option as above.
For example, the rotation matrix, or loadings for the components can be returned using the rotation option:
A ScreePlot is useful in visualizing the variance explained by each component:
From the scree plot, it can be seen that the first component accounts for 92.46% of the variance. The second component accounts for a much smaller fraction of the total variance, suggesting that only one component may be enough to summarize the data.
A Biplot can also be used to show the first two components and the observations on the same diagram. The first principal component is plotted on the x-axis and the second on the y-axis.
From the biplot, it can be observed that petal width and length are highly correlated and their variability can be primarily attributed to the first component. Likewise, the first component also explains a large part of the Sepal length. The variability in Sepal width is more attributed to the second component.