Theory¶
This section presents a general overview of the clugen algorithm. A complete description of the algorithm's theoretical framework is available in the article "Generating multidimensional clusters with support lines" (an open version is available on arXiv).
Clugen is an algorithm for generating multidimensional clusters. Each cluster is supported by a line segment, the position, orientation and length of which guide where the respective points are placed. For brevity, line segments will be referred to as lines.
Given an \(n\)-dimensional direction vector \(\mathbf{d}\) (and a number of additional parameters, which will be discussed shortly), the clugen algorithm works as follows (\(^*\) means the algorithm step is stochastic):
- Normalize \(\mathbf{d}\).
- \(^*\)Determine cluster sizes.
- \(^*\)Determine cluster centers.
- \(^*\)Determine lengths of cluster-supporting lines.
- \(^*\)Determine angles between \(\mathbf{d}\) and cluster-supporting lines.
- For each cluster:
- \(^*\)Determine direction of the cluster-supporting line.
- \(^*\)Determine distance of point projections from the center of the cluster-supporting line.
- Determine coordinates of point projections on the cluster-supporting line.
- \(^*\)Determine points from their projections on the cluster-supporting line.
Figure 1 provides a stylized overview of the algorithm's steps.
The example in Figure 1 was generated with the following parameters, the exact
meaning of each is described in the documentation for the
clugen()
function, and further discussed in the
article mentioned above:
Parameter values | Description |
---|---|
\(n=2\) | Number of dimensions. |
\(c=4\) | Number of clusters. |
\(p=200\) | Total number of points. |
\(\mathbf{d}=\begin{bmatrix}1 & 1\end{bmatrix}^T\) | Average direction. |
\(\theta_\sigma=\pi/16\approx{}11.25^{\circ}\) | Angle dispersion. |
\(\mathbf{s}=\begin{bmatrix}10 & 10\end{bmatrix}^T\) | Average cluster separation. |
\(l=10\) | Average line length. |
\(l_\sigma=1.5\) | Line length dispersion. |
\(f_\sigma=1\) | Cluster lateral dispersion. |
Additionally, all optional parameters (not listed above) were left to their
default values. The complete list of parameters is presented in the
clugen()
function documentation.