Algorithms¶

This section explains the underlying algorithms and mathematical foundations of the Clustermatch Correlation Coefficient (CCC) and its GPU implementation.

Clustermatch Correlation Coefficient (CCC)¶

The Clustermatch Correlation Coefficient is based on the simple yet powerful idea of clustering data points and then computing the Adjusted Rand Index (ARI) between the two clusterings. This approach allows CCC to capture both linear and non-linear relationships effectively.

Mathematical Foundation¶

Given two variables X and Y with n observations each, CCC works as follows:

Quantile Clustering: For each variable, create multiple partitions using quantile-based clustering with different numbers of clusters k
Partition Generation: Generate partitions for k = 2, 3, …, √n clusters (customizable)
ARI Computation: Calculate the Adjusted Rand Index between all partition pairs
Maximization: Return the maximum ARI value across all partition combinations

The ARI is defined as:

\[ARI = \frac{\text{RI} - E[\text{RI}]}{\max(\text{RI}) - E[\text{RI}]}\]

where RI is the Rand Index and E[RI] is its expected value under random partitioning.

Quantile Clustering Algorithm¶

The quantile clustering procedure partitions data into k clusters based on percentiles:

Rank Computation: Convert data values to ranks using rank(data)
Percentile Calculation: Compute percentiles that divide data into k equal-sized groups
Partition Assignment: Assign each data point to a cluster based on its percentile position

def get_perc_from_k(k):
    """Get percentiles for k clusters"""
    return [(1.0 / k) * i for i in range(1, k)]

def run_quantile_clustering(data, k):
    """Perform quantile clustering on 1D data"""
    data_rank = rank(data)
    data_perc = data_rank / len(data)
    percentiles = [0.0] + get_perc_from_k(k) + [1.0]
    # Assign clusters based on percentile boundaries
    return partition

Adjusted Rand Index (ARI)¶

The ARI measures the similarity between two clusterings, corrected for chance:

\[ARI(X, Y) = \frac{2(ad - bc)}{(a + b)(b + d) + (a + c)(c + d)}\]

where a, b, c, d are counts from the contingency table of the two partitions.

ARI Properties:

Range: [-1, 1] where 1 indicates perfect agreement
Chance Correction: Expected value is 0 for random partitions
Symmetric: ARI(X, Y) = ARI(Y, X)

GPU Implementation Details¶

CCC-GPU leverages CUDA to accelerate the most computationally intensive parts of the algorithm:

CUDA Kernel Architecture¶

The GPU implementation uses several specialized CUDA kernels:

ARI Computation Kernel: Parallelizes ARI calculations across feature pairs
Max-Finding Kernel: Uses CUB primitives for efficient maximum value detection
Partition Processing: Handles multiple partition comparisons in parallel

template <typename T>
__global__ void findMaxAriKernel(const T *aris,
                                 uint8_t *max_parts,
                                 T *cm_values,
                                 const int n_partitions,
                                 const int reduction_range)

Memory Management¶

The GPU implementation employs sophisticated memory management:

Device Memory: Stores partition data and intermediate results on GPU
Thrust Containers: Uses thrust::device_vector for RAII memory management
Memory Coalescing: Optimizes memory access patterns for maximum bandwidth
Batch Processing: Processes data in batches to handle memory constraints

Performance Optimizations¶

Several optimizations contribute to the significant speedups:

Parallel ARI Computation: Each thread block handles one feature comparison
CUB Primitives: Leverages highly-optimized reduction operations
Memory Hierarchy: Strategic use of shared memory and registers
Warp-Level Primitives: Exploits SIMD execution within warps

Algorithmic Complexity¶

Time Complexity¶

For n samples and m features:

CPU Implementation: O(m² × k² × n log n) where k is max clusters
GPU Implementation: O(m² × k² × n log n / p) where p is parallelization factor
Memory Complexity: O(m × k × n) for storing all partitions

The GPU implementation achieves 16x-74x speedups depending on dataset size and hardware.

Space Complexity¶

The algorithm requires storage for:

Original Data: O(m × n)
Partitions: O(m × k × n) for all clustering variations
ARI Matrix: O(m² × k²) for pairwise comparisons
Results: O(m²) for final correlation matrix

Categorical Data Support¶

CCC naturally handles categorical data through specialized encoding:

Categorical Encoding: Each unique category becomes a cluster label
Single Partition: Categorical features generate only one meaningful partition
Mixed Data Types: Seamlessly combines numerical and categorical features

def get_feature_type_and_encode(feature_data):
    """Detect and encode feature data type"""
    is_numerical = feature_data.dtype.kind in ("f", "i", "u")
    if is_numerical:
        return feature_data, True
    # Encode categorical as integers
    return np.unique(feature_data, return_inverse=True)[1], False

Implementation Considerations¶

Numerical Stability¶

The implementation includes several measures for numerical stability:

Singleton Detection: Handles features with zero variance (all same values)
NaN Handling: Proper treatment of invalid computations
Precision Control: Uses appropriate floating-point precision for calculations

Error Handling¶

Robust error handling for edge cases:

Insufficient Data: Requires minimum number of samples for clustering
GPU Memory Limits: Graceful handling of memory constraints
CUDA Errors: Proper error checking and cleanup

The algorithm is designed to be both mathematically sound and computationally efficient, making it suitable for large-scale genomics and machine learning applications.