In many clustering problems, the "proper" number of clusters to look for is assumed to be a property of the structure of the data set (the points to be clustered). One approach is to cluster the data set many times, assuming 2, then 3, then 4, etc. clusters are to exist, and then to compare the answers from each of these clustering subproblems using some sort of "error" measure. A common sense property of "natural" clusters is that the distances between points within a cluster are less than distances between points from different clusters. This suggests an error measure such as the total within-cluster distance between data points, or between data points and their respective cluster centroids (Solomon and Bezdek, 1980).
This criterion cannot be used exclusively, though, because these error measures decrease steadily as the number of clusters increases. To see this, consider the extreme case, where there there are n data points and one looks for n clusters. Each cluster will contain exactly one point, and all within-cluster distances and all distances between points and cluster centroids will be zero. A compromise is to use the least number of clusters possible for which the error does not become excessive. In general, these concepts must be determined subjectively, since without prior knowledge of the structure of the data set, it is impossible to predetermine an acceptable error level.
Within the present context, the physical (hydraulic) situation allows a slightly different approach. It is not desirable use more clusters than necessary, since that means additional valves and expense. Furthermore, the minimum number of valves (clusters) possible can be computed in advance.
Suppose, for example, that the water source can supply water at a flow rate not to exceed 10 GPM, and the sum of the individual flow rates of the sprinklers in an area is 25 GPM. It is easy to see that since 25 ÷ 10 = 2.5, at least 3 valves must be used. This concept can be formalized as follows.
If the maximum water supply flow rate is Q, and the sum of individual sprinkler flow rates is å qi, then there must be at least ( å qi/Q) valves. Let N equal the integer portion of ( å qi/Q). (For example if å qi/Q = 5.83, the integer portion would be 5, and N = 5.) If ( å qi/Q) should be an integer, it may be possible to get by with N valves. If ( å qi/Q) is not an integer, then N + 1 valves will be required as a minimum. |