Research Projects

Here is an overview of our recent research projects

Safeguarding privacy in AI health systems

Advances in artificial intelligence (AI) technologies have led to emerging models (e.g., foundation and generative techniques) capable of leveraging health data in new transformative ways. At the same time, the sensitive nature of health data combined with the increasing complexity of these AI models introduce novel privacy concerns (e.g., memorization of training samples), which require technical frameworks to quantify potential risks and minimize privacy harm. To address these challenges, we have designed rigorous approaches that enable the development of useful AI models while safeguarding the privacy of data contributors.

Relevant Publications

Privacy-protecting methods for data integration

Integrating health datasets from multiple sites is vital in enabling powerful analytics and advancing clinical practice. In the data integration process, privacy is a major challenge, as sites must ensure data confidentiality. To achieve privacy current approaches mainly rely on security-based primitives (e.g., encryption, SMC). We have advanced these approaches by leveraging dimensionality reduction and similarity-based techniques to perform scalable data linkage with rigorous privacy protection. Additionally, we have proposed methods to facilitate the integration and re-use of de-identified datasets.

Relevant Publications

Back to top

Privacy-protecting data sharing and analysis

The analysis and sharing of health data are central in accelerating medical research. However, recent studies have shown that even the release of aggregate level data (e.g., statistics) may lead to the disclosure of sensitive information (e.g., membership disclosure, phenotype inference). To this end, we have developed novel privacy-protecting methods that build on formal statistical disclosure control techniques (e.g., information theory, differential privacy) to provide privacy while retaining the usability of the data in emerging biomedical application settings.

Relevant Publications

Back to top

Knowledge discovery with biomedical data

Current information systems enable the collection of large biomedical datasets which can be used by data-driven models to significantly advance patient care. To extract useful medical knowledge, we have proposed new pattern mining techniques that capture the temporal correlation between the medical events (e.g., symptoms for a disease, disease evolution) and are robust in the presence of noise/missing values. These patterns can be used to construct similarity measures for effective patient retrieval in large biomedical datasets and to improve prediction models for mortality studies.

Relevant Publications

Back to top