Research Projects
Here is an overview of our recent research projects
Privacy-protecting methods for data integration
Integrating health datasets from multiple sites is vital in enabling powerful analytics and advancing clinical practice. In the data integration process, privacy is a major challenge, as sites must ensure data confidentiality. To achieve privacy current approaches mainly rely on security-based primitives (e.g., encryption, SMC). We have advanced these approaches by leveraging dimensionality reduction and similarity-based techniques to perform scalable data linkage with rigorous privacy protection. Additionally, we have proposed methods to facilitate the integration and re-use of de-identified datasets.
Relevant Publications
- Bonomi, L. and Jiang, X., 2018. Linking temporal medical records using non-protected health information data. Statistical methods in medical research, 27(11), pp.3304-3324
- Wang, S., Jiang, X., Singh, S., Marmor, R., Bonomi, L., Fox, D., Dow, M. and Ohno-Machado, L., 2017. Genome privacy: challenges, technical approaches to mitigate risk, and ethical considerations in the United States. Annals of the New York Academy of Sciences, 1387(1), p.73
- Bonomi, L., Xiong, L., Chen, R. and Fung, B.C., 2012, October. Frequent grams-based embedding for privacy preserving record linkage. In Proceedings of the 21st ACM international conference on Information and knowledge management (pp. 1597-1601)
Privacy-protecting data sharing and analysis
The analysis and sharing of health data are central in accelerating medical research. However, recent studies have shown that even the release of aggregate level data (e.g., statistics) may lead to the disclosure of sensitive information (e.g., membership disclosure, phenotype inference). To this end, we have developed novel privacy-protecting methods that build on formal statistical disclosure control techniques (e.g., information theory, differential privacy) to provide privacy while retaining the usability of the data in emerging biomedical application settings.
Relevant Publications
- Bonomi, L., Huang, Y. and Ohno-Machado, L. Privacy challenges and research opportunities for genomic data sharing. Nat Genet (2020).
- Bonomi, L., Jiang, X. and Ohno-Machado, L., 2020. Protecting patient privacy in survival analyses. Journal of the American Medical Informatics Association, 27(3), pp.366-375
- Bonomi, L., Fan, L. and Jin, H., 2016, February. An information-theoretic approach to individual sequential data sanitization. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining (pp. 337-346)
Knowledge discovery with biomedical data
Current information systems enable the collection of large biomedical datasets which can be used by data-driven models to significantly advance patient care. To extract useful medical knowledge, we have proposed new pattern mining techniques that capture the temporal correlation between the medical events (e.g., symptoms for a disease, disease evolution) and are robust in the presence of noise/missing values. These patterns can be used to construct similarity measures for effective patient retrieval in large biomedical datasets and to improve prediction models for mortality studies.
Relevant Publications
- Bonomi, L., Fan L, Jiang X., 2020. Noise-tolerant similarity search in temporal medical data. Journal of biomedical informatics, 13, p.103667
- Bonomi, L. and Jiang, X., 2018. Patient ranking with temporally annotated data. Journal of biomedical informatics, 78, pp.43-53
- Bonomi, L. and Jiang, X., 2018, June. Pattern Similarity in Time Interval Sequences. In 2018 IEEE International Conference on Healthcare Informatics (ICHI) (pp. 434-435). IEEE