A series of lectures within the course of Geometry and Topology for Data Analysis
BioI received my MSc in Mathematics from the University of Trento in 2017 with a thesis on Algebraic Topology. I have then worked as a research assistant at the Fondazione Bruno Kessler where I have been training on Deep Learning methods for massive medical data. I also collaborated with the Immuno-Oncology Laboratory of the Ospedale Pediatrico Bambino Gesù (Rome) for the development of predictive models in precision oncology. I am currently a PhD candidate in Biomolecular Sciences and Transdisciplinary Program in Computational Biology at the University of Trento with a joint scholarship with Fondazione Bruno Kessler. The focus of my research project is to develop AI frameworks on high-dimensional dataset for accurate patient stratification in the context of precision medicine, while leveraging TDA-based techniques for the unsupervised analysis and interpretability of predictive models.
During the last decade many scientific fields have undergone a tremendous revolution with the accelerated development of technologies, leading to an explosion of the amounts of complex, high-dimensional data. The translation of these data into relevant information requires several challenging steps, including data curation, data visualisation and data analysis.
Data visualisation is a crucial part of the analysis process but it can get very tricky, especially for high-dimensional dataset; although there is no silver bullet approach, several methods can be implemented to reduce the dimensionality of the data. Dimensionality reduction is not only necessary to visualise data on a “comfortable” space, but it is also bene cial to possibly reduce the noise before a data mining algorithm can be successfully applied.
Among the different dimensionality reduction approaches, the recently introduced UMAP algorithm (McInnes L, 2018) is a state-of-the-art manifold-learning dimensionality reduction technique, based on a strong mathematical framework that leverages algebraic geometry and topology.
Computational topology has been successfully applied to a wide range of disciplines, including data analysis, pattern recognition, and machine learning. Topological descriptors, such as persistent diagrams, can characterise the structure of a dataset and be used to build topology-based machine learning models.
In particular, an increasing number of Topological Data Analysis (TDA) tools have been adopted to improve machine learning and deep learning techniques. For example, TDA frameworks have been integrated to convolutional neural networks using persistent homology and persistence landscapes to investigate the role of repeated clinical measures and biological variables in predicting response to drug treatments in clinical trials.
In these seminars, several dimensionality reduction techniques will be described, including linear methods (e.g. PCA, MDS) and non-linear manifold learning methods (e.g. tSNE, UMAP), along with the corresponding mathematical background. Also, TDA-based approaches in data analysis and machine learning will be illustrated, in particular upon the literature available in the context of precision medicine. Each seminar will be combined with a hands-on session providing practical examples to test the described algorithms on toy datasets, based on the Python language and TDA-speci c libraries.
Lectures will include examples and materials from the following textbooks/papers:
- Wang, Jianzhong. Geometric structure of high-dimensional data and dimensionality reduction. Vol. 5. Berlin Heidelberg: Springer (2012).
- McInnes, Leland, John Healy, and James Melville. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018).
- Joshi, Milan, and Dhananjay Joshi. A survey of Topological Data Analysis Methods for Big Data in Healthcare Intelligence.” Int. J. Appl. Eng. Res 14 (2019)
- Wednesday 5 May 2021 @ 15.30-17.30
- Thursday 6 May 2021 @ 08.30-10.30
- Wednesday 12 May 2021 @ 15.30-17.30
- Thursday 13 May 2021 @ 08.30-10.30
- Wednesday 19 May 2021 @ 15.30-17.30