Psychology homework help. Homework 7

Answer the following questions: (10 point each)

1. Find all well-separated clusters in the set of points shown below:

2. Many partitional clustering algorithms that automatically determine the number of clusters claim that this is an advantage. List two situations in which this is not the case.

3. Identify the clusters in Figure 7.36 using the center-, contiguity-, and density-based definitions. Also indicate the number of clusters for each case and give a brief indication of your reasoning. Note that darkness or the number of dots indicates density. If it helps, assume center-based means K-means, contiguity-based means single link, and density-based means DBSCAN.

4. Suppose that for a data set

– there are m points and K clusters,

– half the points and clusters are in “more dense” regions,

– half the points and clusters are in “less dense” regions, and

– the two regions are well-separated from each other.

For the given data set, which of the following should occur in order to minimize the squared error when finding K clusters:

a. Centroids should be equally distributed between more dense and less dense regions.

b. More centroids should be allocated to the denser region.

c. More centroids should be allocated to the less dense region.

Note: Do not get distracted by special cases or bring in factors other than density. However, if you feel the true answer is different from any given above, justify your response.

5. Hierarchical clustering is sometimes used to generate K clusters, K > 1 K>1 by taking the clusters at the Kth level of the dendrogram. (Root is at level 1.) By looking at the clusters produced in this way, we can evaluate the behavior of hierarchical clustering on different types of data and clusters, and also compare hierarchical approaches to K-means.

The following is a set of one-dimensional points: {6, 12, 18, 24, 30, 42, 48}.

a. For each of the following sets of initial centroids, create two clusters by assigning each point to the nearest centroid, and then calculate the total squared error for each set of two clusters. Show both the clusters and the total squared error for each set of centroids.

a. {18, 45}

b. {15, 40}

b. Do both sets of centroids represent stable solutions; i.e., if the K-means algorithm was run on this set of points using the given centroids as the starting centroids, would there be any change in the clusters generated?

c. What are the two clusters produced by single link?

d. Which technique, K-means or single link, seems to produce the “most natural” clustering in this situation? (For K-means, take the clustering with the lowest squared error.)

e. What definition(s) of clustering does this natural clustering correspond to? (Well-separated, center-based, contiguous, or density.)

f. What well-known characteristic of the K-means algorithm explains the previous behavior?