Publications

An NSF REU Site Based on Trust and Reproducibility of Intelligent Computation: Experience Report

Published in EduHPC23 Workshop on Education for High Performance Computing, 2023

This paper presents an overview of an NSF Research Experience for Undergraduate (REU) Site on Trust and Reproducibility of Intelligent Computation, delivered by faculty and graduate students in the Kahlert School of Computing at University of Utah. The chosen themes bring together several concerns for the future in producing computational results that can be trusted: secure, reproducible, based on sound algorithmic foundations, and developed in the context of ethical considerations. The research areas represented by student projects include machine learning, high-performance computing, algorithms and applications, computer security, data science, and human-centered computing. In the first four weeks of the program, the entire student cohort spent their mornings in lessons from experts in these crosscutting topics, and used one-of-a-kind research platforms operated by the University of Utah, namely NSF-funded CloudLab and POWDER facilities; reading assignments, quizzes, and hands-on exercises reinforced the lessons. In the subsequent five weeks, lectures were less frequent, as students branched into small groups to develop their research projects. The final week focused on a poster presentation and final report. Through describing our experiences, this program can serve as a model for preparing a future workforce to integrate machine learning into trustworthy and reproducible applications.

Recommended citation: Hall, Mary, Ganesh Gopalakrishnan, Eric Eide, Johanna Cohoon, Jeff Phillips, Mu Zhang, Shireen Elhabian et al. "An NSF REU Site Based on Trust and Reproducibility of Intelligent Computation: Experience Report." In Proceedings of the SC23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, pp. 343-349. 2023. https://dl.acm.org/doi/abs/10.1145/3624062.3624100

Structural Cycle GAN for Virtual Immunohistochemistry Staining of Gland Markers in the Colon

Published in Machine Learning in Medical Imaging, 2023

With the advent of digital scanners and deep learning, diagnostic operations may move from a microscope to a desktop. Hematoxylin and Eosin (H &E) staining is one of the most frequently used stains for disease analysis, diagnosis, and grading, but pathologists do need different immunohistochemical (IHC) stains to analyze specific structures or cells. Obtaining all of these stains (H &E and different IHCs) on a single specimen is a tedious and time-consuming task. Consequently, virtual staining has emerged as an essential research direction. Here, we propose a novel generative model, Structural Cycle-GAN (SC-GAN), for synthesizing IHC stains from H &E images, and vice versa. Our method expressly incorporates structural information in the form of edges (in addition to color data) and employs attention modules exclusively in the decoder of the proposed generator model. This integration enhances feature localization and preserves contextual information during the generation process. In addition, a structural loss is incorporated to ensure accurate structure alignment between the generated and input markers. To demonstrate the efficacy of the proposed model, experiments are conducted with two IHC markers emphasizing distinct structures of glands in the colon: the nucleus of epithelial cells (CDX2) and the cytoplasm (CK818). Quantitative metrics such as FID and SSIM are frequently used for the analysis of generative models, but they do not correlate explicitly with higher-quality virtual staining results. Therefore, we propose two new quantitative metrics that correlate directly with the virtual staining specificity of IHC markers.

Recommended citation: Dubey, S., Kataria, T., Knudsen, B., Elhabian, S.Y. (2024). Structural Cycle GAN for Virtual Immunohistochemistry Staining of Gland Markers in the Colon. In: Cao, X., Xu, X., Rekik, I., Cui, Z., Ouyang, X. (eds) Machine Learning in Medical Imaging. MLMI 2023. Lecture Notes in Computer Science, vol 14349. Springer, Cham. https://doi.org/10.1007/978-3-031-45676-3_45 https://link.springer.com/chapter/10.1007/978-3-031-45676-3_45

To Pretrain or Not to Pretrain? A Case Study of Domain-Specific Pretraining for Semantic Segmentation in Histopathology

Published in Medical Image Learning with Limited and Noisy Data, 2023

Annotating medical imaging datasets is costly, so fine-tuning (or transfer learning) is the most effective method for digital pathology vision applications such as disease classification and semantic segmentation. However, due to texture bias in models trained on real-world images, transfer learning for histopathology applications might result in underperforming models, which necessitates the need for using unlabeled histopathology data and self-supervised methods to discover domain-specific characteristics. Here, we tested the premise that histopathology-specific pretrained models provide better initializations for pathology vision tasks, i.e., gland and cell segmentation. In this study, we compare the performance of gland and cell segmentation tasks with histopathology domain-specific and non-domain-specific (real-world images) pretrained weights. Moreover, we investigate the dataset size at which domain-specific pretraining produces significant gains in performance. In addition, we investigated whether domain-specific initialization improves the effectiveness of out-of-distribution testing on distinct datasets but the same task. The results indicate that performance gain using domain-specific pretrained weights depends on both the task and the size of the training dataset. In instances with limited dataset sizes, a significant improvement in gland segmentation performance was also observed, whereas models trained on cell segmentation datasets exhibit no improvement

Recommended citation: Kataria, T., Knudsen, B., Elhabian, S. (2023). To Pretrain or Not to Pretrain? A Case Study of Domain-Specific Pretraining for Semantic Segmentation in Histopathology. In: Xue, Z., et al. Medical Image Learning with Limited and Noisy Data. MILLanD 2023. Lecture Notes in Computer Science, vol 14307. Springer, Cham. https://doi.org/10.1007/978-3-031-44917-8_24 https://link.springer.com/chapter/10.1007/978-3-031-44917-8_24

ADASSM: Adversarial Data Augmentation in Statistical Shape Models from Images

Published in Shape in Medical Imaging, 2023

Statistical shape models (SSM) have been well-established as an excellent tool for identifying variations in the morphology of anatomy across the underlying population. Shape models use consistent shape representation across all the samples in a given cohort, which helps to compare shapes and identify the variations that can detect pathologies and help in formulating treatment plans. In medical imaging, computing these shape representations from CT/MRI scans requires time-intensive preprocessing operations, including but not limited to anatomy segmentation annotations, registration, and texture denoising. Deep learning models have demonstrated exceptional capabilities in learning shape representations directly from volumetric images, giving rise to highly effective and efficient Image-to-SSM networks. Nevertheless, these models are data-hungry and due to the limited availability of medical data, deep learning models tend to overfit. Offline data augmentation techniques, that use kernel density estimation based (KDE) methods for generating shape-augmented samples, have successfully aided Image-to-SSM networks in achieving comparable accuracy to traditional SSM methods. However, these augmentation methods focus on shape augmentation, whereas deep learning models exhibit image-based texture bias resulting in sub-optimal models. This paper introduces a novel strategy for on-the-fly data augmentation for the Image-to-SSM framework by leveraging data-dependent noise generation or texture augmentation. The proposed framework is trained as an adversary to the Image-to-SSM network, augmenting diverse and challenging noisy samples. Our approach achieves improved accuracy by encouraging the model to focus on the underlying geometry rather than relying solely on pixel values.

Recommended citation: Karanam, M.S.T., Kataria, T., Iyer, K., Elhabian, S.Y. (2023). ADASSM: Adversarial Data Augmentation in Statistical Shape Models from Images. In: Wachinger, C., Paniagua, B., Elhabian, S., Li, J., Egger, J. (eds) Shape in Medical Imaging. ShapeMI 2023. Lecture Notes in Computer Science, vol 14350. Springer, Cham. https://doi.org/10.1007/978-3-031-46914-5_8 https://link.springer.com/chapter/10.1007/978-3-031-46914-5_8

Automating Ground Truth Annotations for Gland Segmentation Through Immunohistochemistry

Published in Modern Pathology, 2023

The microscopic evaluation of glands in the colon is of utmost importance in the diagnosis of inflammatory bowel disease (IBD) and cancer. When properly trained, deep learning pipelines can provide a systematic, reproducible, and quantitative assessment of disease-related changes in glandular tissue architecture. The training and testing of deep learning models require large amounts of manual annotations, which are difficult, time-consuming, and expensive to obtain. Here, we propose a method for the automated generation of ground truth in digital hematoxylin and eosin (H&E) stained slides using immunohistochemistry (IHC) labels. The image processing pipeline generates annotations of glands in H&E histopathology images from colon biopsies by transfer of gland masks from CK8/18, CDX2, or EpCAM IHC. The IHC gland outlines are transferred to co-registered H&E images for the training of deep learning models. We compare the performance of the deep learning models to manual annotations using an internal held out set of biopsies as well as two public datasets. Our results show that EpCAM IHC provides gland outlines that closely match manual gland annotations (Dice = 0.89) and are robust to damage by inflammation. In addition, we propose a simple data sampling technique that allows models trained on data from several sources to be adapted to a new data source using just a few newly annotated samples. The best performing models achieved average Dice scores of 0.902 and 0.89, respectively, on GLAS and CRAG colon cancer public datasets when trained with only 10% of annotated cases from either public cohort. Altogether, the performances of our models indicate that automated annotations using cell type specific IHC markers can safely replace manual annotations. Automated IHC labels from single institution cohorts can be combined with small numbers of hand-annotated cases from multi-institutional cohorts to train models that generalize well to diverse data sources.

Recommended citation: Kataria, Tushar, Saradha Rajamani, Abdul Bari Ayubi, Mary Bronner, Jolanta Jedrzkiewicz, Beatrice Knudsen, and Shireen Y. Elhabian. "Automating Ground Truth Annotations for Gland Segmentation Through Immunohistochemistry." Modern Pathology (2023): 100331. https://www.sciencedirect.com/science/article/abs/pii/S0893395223002363

InfoSync: Information Synchronization across Multilingual Semi-structured Tables

Published in Findings of the Association for Computational Linguistics: ACL 2023, 2023

Information Synchronization of semi-structured data across languages is challenging. For example, Wikipedia tables in one language need to be synchronized with others. To address this problem, we introduce a new dataset InfoSync and a two-step method for tabular synchronization. InfoSync contains 100K entity-centric tables (Wikipedia Infoboxes) across 14 languages, of which a subset (~3.5K pairs) are manually annotated. The proposed method includes 1) Information Alignment to map rows and 2) Information Update for updating missing/outdated information for aligned tables across multilingual tables. When evaluated on InfoSync, information alignment achieves an F1 score of 87.91 (en <-> non-en). To evaluate information updation, we perform human-assisted Wikipedia edits on Infoboxes for 532 table pairs. Our approach obtains an acceptance rate of 77.28% on Wikipedia, showing the effectiveness of the proposed method.

Recommended citation: Khincha, Siddharth, Chelsi Jain, Vivek Gupta, Tushar Kataria, and Shuo Zhang. "InfoSync: Information Synchronization across Multilingual Semi-structured Tables." In Findings of the Association for Computational Linguistics: ACL 2023, pp. 2536-2559. 2023. https://aclanthology.org/2023.findings-acl.159/

Analysis of Ringing Artifact in Image Fusion Using Directional Wavelet Transforms

Published in IJERT, 2020

In the field of multi-data analysis and fusion, image fusion plays a vital role for many applications. With inventions of new sensors, the demand of high quality image fusion algorithms has seen tremendous growth. Wavelet based fusion is a popular choice for many image fusion algorithms, because of its ability to decouple different features of information. However, it suffers from ringing artifacts generated in the output. This paper presents an analysis of ringing artifacts in application of image fusion using directional wavelets (curvelets, contourlets, non-subsampled contourlets etc.). We compare the performance of various fusion rules for directional wavelets available in literature. The experimental results suggest that the ringing artifacts are present in all types of wavelets with the extent of artifact varying with type of the wavelet, fusion rule used and levels of decomposition.

Recommended citation: Ashish V. Vanmali, Tushar Kataria, Samrudha G. Kelkar, Vikram M. Gadre, “Analysis of Ringing Artifact In Image Fusion Using Directional Wavelet Transforms, ” in Proceedings of Vidyavardhini’s National Conference 2020 (Technical Advancements for Social Upliftments), International Journal of Engineering Research & Technology (IJERT), Vol. 9, Issue 3, Feb 2021, pp. 495-502, ISSN: 2278-0181. https://www.ijert.org/analysis-of-ringing-artifact-in-image-fusion-using-directional-wavelet-transforms

Ringing artifacts in wavelet based image fusion: Analysis, measurement and remedies

Published in Information Fusion, 2020

A thorough analysis of the ringing phenomenon, by experimenting with different types of images and different wavelet families, with varying lengths of filters and varying levels of decomposition is performed to obtain deeper insights of the ringing artifacts. It is experimentally shown that wavelet based fusion results in the modification of the intra- and inter-scale dependencies, with the inter-scale dependency being the dominating factor causing the ringing artifacts. Also, these ringing artifacts are localized in the Fourier domain. Subsequently, a quantitative measure using structural dissimilarity is proposed to measure the ringing artifacts due to wavelet based fusion. Two possible solutions to compensate for the ringing artifacts are then proposed. In the first strategy, a filtering based method is proposed to reduce these ringing artifacts. It takes advantage of the localized nature of the ringing artifacts. Furthermore, the intra- and inter-scale dependencies are modeled using order-zero entropy. A second strategy using the inter-scale dependency is then proposed to reduce the ringing artifacts. Experimental results show that both these methods are able to reduce the ringing artifacts significantly and have further scope for improvement. Another critical finding of this work is selection of the wavelet filter and its levels of decomposition for the process of fusion.

Recommended citation: Vanmali, Ashish V., Tushar Kataria, Samrudha G. Kelkar, and Vikram M. Gadre. "Ringing artifacts in wavelet based image fusion: Analysis, measurement and remedies." Information Fusion 56 (2020): 39-69. https://www.sciencedirect.com/science/article/pii/S1566253517304748

Image hallucination at different times of day using locally affine model and kNN template matching from time-lapse images.

Published in Proceedings of the Tenth Indian Conference on Computer Vision,Graphics and Image Processing. ACM, 2016., 2016

Image Hallucination has many applications in areas such as image processing, computational photography and image fusion. In this paper, we present an image Hallucination technique based on the template (patch) matching from the database of time lapse images and learned locally affine model. Template based techniques suffer from blocky artifacts. So, we propose two approaches for imposing consistency criteria across neighbouring patches in the form of regularization. We validate our Color transfer technique by hallucinating a variety of natural images at different times the day. We compare the proposed approach with other state of the art techniques of example image based color transfer and show that the images obtained using our approach look more plausible and natural.

Recommended citation: Patel, Nikunj, and Tushar Kataria. "Image hallucination at different times of day using locally affine model and kNN template matching from time-lapse images." In Proceedings of the Tenth Indian Conference on Computer Vision, Graphics and Image Processing, pp. 1-8. 2016. https://dl.acm.org/doi/10.1145/3009977.3010038