Nonlinear DR

Machine Learning

Computers can learn like human beings, from observations, examples, images, sensors, data, and experience. Machine learning is a field of artificial intelligence that involves the design and implementation of algorithms for computers to evolve their behavior from observations, examples, images, sensors, data, and experience, among many other sources. My main research in machine learning is on pattern recognition, dimensionality reduction, representation learning, and clustering.

The main focus of my research in this area is to devise efficient algorithms for classification, clustering, feature selection, dimensionality reduction, visualization and performance evaluation, with applications to interactomics, transcriptomics and multi-omics data integration. Over the past 15 years, I have contributed quite a bit in many application areas (mainly in transcriptomics and interactomics) as can be seen below. In fundamental pattern recognition I have worked in statistical pattern recognition. A summary of my contributions is given below. Some items are just summarized in a sentence or two; more information can be found in the corresponding publications.

Optimal 1D Clustering + KPCA

kPCA Synthetic We have recently proposed a nonlinear dimensionality reduction and clustering approach based on kernel PCA and one-dimensional, optimal multi-level thresholding. The algorithm consists of utilizing kernel PCA to reduce the data onto the one-dimensional space. An optimal multi-level thresholding algorithm is then applied to the 1D data along with indices of validity to discover the optimal clustering. As the algorithm runs in quadratic time, it allows to explore a wide range of parameters of the kernels, making it more efficient than using k-means combined with other non-linear dimensionality methods or even KPCA. Results on synthetic data (figure on the right) and images of the 26 letter alphabet from MNIST show Silhouette scores higher than 0.98, which makes it a very competitive approach for clustering high-dimensional, complex data.

Relevant publications:

Summary of Previous Contributions