Recent Publications


pub-19

ROI-Packing: Efficient Region-Based Compression for Machine Vision

Abstract:

This paper introduces ROI-Packing, an efficient image compression method tailored specifically for machine vision. By prioritizing regions of interest (ROI) critical to end-task accuracy and packing them efficiently while discarding less relevant data, ROI-Packing achieves significant compression efficiency without requiring retraining or fine-tuning of end-task models. Comprehensive evaluations across five datasets and two popular end-tasks—object detection and instance segmentation—demonstrate up to a 44.10% reduction in bitrate without compromising end-task accuracy, along with an 8.88 % improvement in accuracy at the same bitrate compared to the state-of-the-art Versatile Video Coding (VVC) codec standardized by the Moving Picture Experts Group (MPEG).

Authors:

Md Eimran Hossain Eimon, Alena Krause, Ashan Perera, Juan Merlos, Hari Kalva, Velibor Adzic, Borko Furht

Conference / Journal

2025 IEEE 8th International Conference on Multimedia Information Processing and Retrieval (MIPR)

pub-18

Content Adaptive Multi-Scale Feature Layer Filtering

Abstract:

This paper presents a content-adaptive feature layer filtering method for intermediate feature compression in split inference systems using multi-scale neural networks. The proposed encoder-side optimization removes redundant feature layers based on object size information derived from the input image. Early layers, which contain high spatial resolution within feature maps are suited for detecting small objects. These early layers are then pruned when large objects dominate the scene and their contribution becomes negligible. This reduces redundancy and improves compression efficiency. The method requires no retraining of the task network and remains compatible with conventional codecs by spatially packing the retained features. Aligned with the MPEG Feature Coding for Machines (FCM) framework, this approach enables more efficient collaborative intelligence by reducing bandwidth during intermediate feature transmission. Experimental results on object detection and segmentation tasks show up to a 43% bitrate reduction without compromising task accuracy.

Authors:

Juan Merlos, Md Eimran Hossain Eimon, Ashan Perera, Hari Kalva, Velibor Adzic, Borko Furht

Conference / Journal

2025 IEEE 8th International Conference on Multimedia Information Processing and Retrieval (MIPR)

pub-17

PAVEN: A Perceptual Algorithm for Versatile video Encoding using Neural networks

Abstract:

This work introduces the Perceptual Algorithm for Versatile video Encoding using Neural Networks (PAVEN), a subjective video coding algorithm designed to reduce the bit rate in videos encoded with the Versatile Video Coding (VVC) standard without compromising subjective video quality. The algorithm uses a deep learning model specifically trained by the authors to account for the specific characteristics of video signals. The trained model outperforms others in the literature by more accurately identifying areas of the frames where viewers are most likely to focus their attention. The output of the deep learning model is further processed to merge all disjoint areas and adapt the result to the Coding Tree Unit (CTU) size in VVC, allowing for greater compression in less important areas. The results show an average reduction in bit rate of 7% while maintaining the same subjective video quality, validated through viewer interviews using the Mean Opinion Score (MOS) metric.

Authors:

Fernández-Lagos, P., Ríos, B., Kalva, H., Cebrián-Márquez, G., Vigueras, G., & Diaz-Honrubia, A. J.

Conference / Journal

Engineering Applications of Artificial Intelligence, Volume 159, Part B, 8 November 2025

pub-18

FCM-RT: Real-Time Feature Coding for Machines

Abstract:

As numerous edge devices start implementing intelligent components, the challenges of energy consumption, bandwidth efficiency, and privacy gain significance. One proposed solution relies on the paradigm of split inference, which optimizes the delegation of the computational load between edge and remote devices. We developed and implemented the standard-compliant split inference system with an encoder and decoder capable of real-time streaming and processing. Our system outperforms state-of-the-art video compression implementations by an average of 83% bitrate reduction, while preserving privacy. We demonstrate the system's real-time performance on consumer devices, with interactive visualizations of object detection and segmentation, incorporating real-time metrics. Demo video: https://youtu.be/bmCbUo_ZWWU

Authors:

Perera Ashan, Eimon Md Eimran Hossain, Merlos Juan. Adzic Velibor, Kalva Hari and Furht Borko

Conference / Journal

InProceedings of the 33rd ACM International Conference on Multimedia 2025 Oct 27 (pp. 13525-13527)

pub-16

OD-VIRAT: A Large-Scale Benchmark for Object Detection in Realistic Surveillance Environments

Abstract:

Realistic human surveillance datasets are crucial for training and evaluating computer vision models under real-world conditions, facilitating the development of robust algorithms for human and human-interacting object detection in complex environments. These datasets need to offer diverse and challenging data to enable a comprehensive assessment of model performance and the creation of more reliable surveillance systems for public safety. To this end, we present two visual object detection benchmarks named OD-VIRAT Large and OD-VIRAT Tiny, aiming at advancing visual understanding tasks in surveillance imagery. The video sequences in both benchmarks cover 10 different scenes of human surveillance recorded from significant height and distance. The proposed benchmarks offer rich annotations of bounding boxes and categories, where OD-VIRAT Large has 8.7 million annotated instances in 599,996 images and OD-VIRAT Tiny has 288,901 annotated instances in 19,860 images. This work also focuses on benchmarking state-of-the-art object detection architectures, including RETMDET, YOLOX, RetinaNet, DETR, and Deformable-DETR on this object detection-specific variant of VIRAT dataset. To the best of our knowledge, it is the first work to examine the performance of these recently published state-of-the-art object detection architectures on realistic surveillance imagery under challenging conditions such as complex backgrounds, occluded objects, and small-scale objects. The proposed benchmarking and experimental settings will help in providing insights concerning the performance of selected object detection models and set the base for developing more efficient and robust object detection architectures.

Authors:

Ullah, H., Khan, A., Munir, A., & Kalva, H.

Conference / Journal

arXiv:2507.12396, July 2025

pub-1

Unpacking Generative AI in Education: Computational Modeling of Teacher and Student Perspectives in Social Media Discourse

Abstract:

Generative AI (GAI) technologies are quickly reshaping the educational landscape. As adoption accelerates, understanding how students and educators perceive these tools is essential. This study presents one of the most comprehensive analyses to date of stakeholder discourse dynamics on GAI in education using social media data. Our dataset includes 1,199 Reddit posts and 13,959 corresponding top-level comments. We apply sentiment analysis, topic modeling, and author classification. To support this, we propose and validate a modular framework that leverages prompt-based large language models (LLMs) for analysis of online social discourse, and we evaluate this framework against classical natural language processing (NLP) models. Our GPT-4o pipeline consistently outperforms prior approaches across all tasks. For example, it achieved 90.6% accuracy in sentiment analysis against gold-standard human annotations. Topic extraction uncovered 12 latent topics in the public discourse with varying sentiment and author distributions. Teachers and students convey optimism about GAI's potential for personalized learning and productivity in higher education. However, key differences emerged: students often voice distress over false accusations of cheating by AI detectors, while teachers generally express concern about job security, academic integrity, and institutional pressures to adopt GAI tools. These contrasting perspectives highlight the tension between innovation and oversight in GAI-enabled learning environments. Our findings suggest a need for clearer institutional policies, more transparent GAI integration practices, and support mechanisms for both educators and students. More broadly, this study demonstrates the potential of LLM-based frameworks for modeling stakeholder discourse within online communities.

Authors:

DeVito, P., Vallala, A., Mcmahon, S., Hinda, Y., Thaw, B., Zhuang, H., & Kalva, H.

Conference / Journal

arXiv preprint arXiv:2506.16412

pub-2

Efficient Feature Compression for Machines with Global Statistics Preservation

Abstract:

The split-inference paradigm divides an artificial intelligence (AI) model into two parts. This necessitates the transfer of intermediate feature data between the two halves. Here, effective compression of the feature data becomes vital. In this paper, we employ Z-score normalization to efficiently recover the compressed feature data at the decoder side. To examine the efficacy of our method, the proposed method is integrated into the latest Feature Coding for Machines (FCM) codec standard under development by the Moving Picture Experts Group (MPEG). Our method supersedes the existing scaling method used by the current standard under development. It both reduces the overhead bits and improves the end-task accuracy. To further reduce the overhead in certain circumstances, we also propose a simplified method. Experiments show that using our proposed method shows 17.09% reduction in bitrate on average across different tasks and up to 65.69% for object tracking without sacrificing the task accuracy.

Authors:

Eimon, M. E. H., Choi, H., Racapé, F., Ulhaq, M., Adzic, V., Kalva, H., & Furht, B.

Conference / Journal

2025 IEEE International Symposium on Circuits and Systems (ISCAS) (pp. 1-5). IEEE

pub-3

Feature Compression for Machines with Range-Based Channel Truncation and Frame Packing

Abstract:

This paper proposes a method that enhances the compression performance of the current model under development for the upcoming MPEG standard on Feature Compression for Machines (FCM) [1]. By truncating low-activation feature channels and signaling these truncations in the bitstream, the method reduces bitrate while preserving task accuracy. Experimental results show an average BD-rate reduction of 10.59% across datasets and tasks, demonstrating gains in bitrate efficiency.

Authors:

Merlos, J., Racapé, F., Choi, H., Ulhaq, M., & Kalva, H.

Conference / Journal

Data Compression Conference 2025 (DCC) (pp. 392-392) IEEE

pub-4

Comparative Analysis of VCM and AhG8 for Machine Vision Applications

Abstract:

This paper presents a performance review of standardization efforts for optimized video compression techniques targeting machine vision. As these efforts are expected to share similar use cases, there is interest in a formal comparison. Considered activities include MPEG's ongoing standard Video Coding for Machines (VCM) and JVET's contributions in AhG8. Multiple simulations across bitrates, encoding configurations, datasets, and machine tasks are performed for fair evaluation. Empirical results show VCM to outperform noticeably in most cases, and significantly in select cases.

Authors:

Perera, A., Adzic, V., Kalva, H., & Furht, B.

Conference / Journal

SSRN 2025

pub-5

Enabling Next-Generation Consumer Experience with Feature Coding for Machines

Abstract:

As consumer devices become increasingly intelligent and interconnected, efficient data transfer solutions for machine tasks have become essential. This paper presents an overview of the latest Feature Coding for Machines (FCM) standard, part of MPEG-AI and developed by the Moving Picture Experts Group (MPEG). FCM supports AI-driven applications by enabling the efficient extraction, compression, and transmission of intermediate neural network features. By offloading computationally intensive operations to base servers with high computing resources, FCM allows low-powered devices to leverage large deep learning models. Experimental results indicate that the FCM standard maintains the same level of accuracy while reducing bitrate requirements by 75.90% compared to remote inference.

Authors:

Eimon, M. E. H., Merlos, J., Perera, A., Kalva, H., Adzic, V., & Furht, B

Conference / Journal

2025 IEEE International Conference on Consumer Electronics (ICCE), pp. 1-4. IEEE, 2025

pub-6

Feature Weighted Linguistics Classifier for Predicting Learning Difficulty Using Eye Tracking

Abstract:

This article presents a new approach to predict learning difficulty in applications such as e-learning using eye movement and pupil response. We have developed 12 eye response features based on psycholinguistics, contextual information processing, anticipatory behavior analysis, recurrence fixation analysis, and pupillary response. A key aspect of the proposed approach is the temporal analysis of the feature response to the same concept. Results show that variations in eye response to the same concept over time are indicative of learning difficulty. A Feature Weighted Linguistics Classifier (FWLC) was developed to predict learning difficulty in real time. The proposed approach predicts learning difficulty with an accuracy of 90%.

Authors:

Saurin S. Parikh, Hari Kalva

Conference / Journal

ACM Trans. Appl. Percept. 17, 2, Article 5 (April 2020)

pub-7

Multi-View Clustering for Fast Intra Mode Decision in HEVC

Abstract:

This article presents a new approach to predict learning difficulty in applications such as e-learning using eye movement and pupil response. We have developed 12 eye response features based on psycholinguistics, contextual information processing, anticipatory behavior analysis, recurrence fixation analysis, and pupillary response. A key aspect of the proposed approach is the temporal analysis of the feature response to the same concept. Results show that variations in eye response to the same concept over time are indicative of learning difficulty. A Feature Weighted Linguistics Classifier (FWLC) was developed to predict learning difficulty in real time. The proposed approach predicts learning difficulty with an accuracy of 90%.

Authors:

R. Jillani, S. F. Hussain, H. Kalva

Conference / Journal

2020 IEEE International Conference on Consumer Electronics (ICCE)

 

Video Analytics: Challenges, Algorithms, and Applications

Abstract:

The papers in this special section focus on the topic of video analytics. Also known as video content analysis, video analytics refer to the capability of automatically analyzing video to extract knowledge/information and detect and determine temporal and spatial events. The algorithms designed for these analytics can be implemented as software on general-purpose machines, or as hardware in specialized video processing units. Video analytics is still an emerging technology with techniques that are continuously being developed to help make widespread implementation feasible in the years ahead. Such analytics has been typically used in semantic categorization and retrieval of video databases. A goal of this special issue is to focus on video analytics beyond categorization and retrieval. With increasing hardware capability and advances in algorithms used, real-time video analytics is now being used in a wide range of domains including entertainment, health-care, retail, automotive, transport, home automation, emotion analysis, aesthetics, inappropriate content detection, safety and security. For instance, video analytics are increasingly being deployed for real-time alerts in situation monitoring systems such as traffic surveillance (vehicle counting), counting people in lines (some hospitals are using this to get more nurses from a less busy department to serve thewaiting patients) and in manufacturing (for monitoring and counting). From the sensing aspect, 3-D cameras such as RGB-D and LiDAR (Light Detection and Ranging) cameras are becoming more and more affordable, enabling additional areas of research and applications, such as self-driving cars employing video analytics on LiDAR captured data for path planning as well as obstacle detection.

Authors:

B. Prabhakaran, H. Kalva

Conference / Journal

IEEE Transactions on Multimedia, vol. 20, no. 5, pp. 1037-1037, May 2018

pub-9

High Bit-Depth Medical Image Compression With HEVC

Abstract:

Efficient storing and retrieval of medical images has direct impact on reducing costs and improving access in cloud-based health care services. JPEG 2000 is currently the commonly used compression format for medical images shared using the DICOM standard. However, new formats such as high efficiency video coding (HEVC) can provide better compression efficiency compared to JPEG 2000. Furthermore, JPEG 2000 is not suitable for efficiently storing image series and 3-D imagery. Using HEVC, a single format can support all forms of medical images. This paper presents the use of HEVC for diagnostically acceptable medical image compression, focusing on compression efficiency compared to JPEG 2000. Diagnostically acceptable lossy compression and complexity of high bit-depth medical image compression are studied. Based on an established medically acceptable compression range for JPEG 2000, this paper establishes acceptable HEVC compression range for medical imaging applications. Experimental results show that using HEVC can increase the compression performance, compared to JPEG 2000, by over 54%. Along with this, a new method for reducing computational complexity of HEVC encoding for medical images is proposed. Results show that HEVC intra encoding complexity can be reduced by over 55% with negligible increase in file size.

Authors:

S. S. Parikh, D. Ruiz, H. Kalva, G. Fernández-Escribano and V. Adzic

Conference / Journal

IEEE Journal of Biomedical and Health Informatics, vol. 22, no. 2, pp. 552-560, March 2018

pub-10

Pupil response to quality and content transitions in videos

Abstract:

Real-time human eye recognition and tracking systems with human-computer interaction mechanism are being adopted to advance user experience in smart devices and consumer electronic systems. Eye tracking systems measure eye gaze and pupil response non-intrusively. This paper presents an analysis of pupil response to video structure and content. A consumer device that can assess user response to a video can provide better experiences with approaches such as real-time content adaptation. The first set of experiments involved presenting different video content to subjects and measuring eye response with an eye tracker. Results show that pupil constrictions and magnitude of constrictions vary with content. Significant changes in video and scene cuts led to sharp constrictions. User response to videos can provide insights that can improve subjective quality assessment metrics. This paper also presents an analysis of the pupillary response to quality changes in videos with the second set of experiments. The results of the tests show pupil constrictions for noticeable changes in perceived quality. Using real-time eye tracking systems for video analysis and quality evaluation can open a new class of applications for consumer electronic systems.

Authors:

D. Pappusetty, H. Kalva and H. S. Hock

Conference / Journal

IEEE Transactions on Consumer Electronics, vol. 63, no. 4, pp. 410-418, November 2017

pub-11

Using pupillary response to assess video quality

Abstract:

Pupil response can be measured non-intrusively using an eye tracker and offers a potentially new approach to understanding video structure and content. An analysis of pupil response to quality variations in a video is reported in this paper. Experiments were conducted under free viewing conditions and pupillary response of subjects was analyzed. Video clip encoded with AVC/H.264 at various qualities and durations were used to assess user response. Results show pupillary constrictions at points of quality transitions.

Authors:

D. Pappusetty, V. V. R. Chinta and H. Kalva

Conference / Journal

IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 2017, pp. 64-65

pub-12

Fast CU partitioning algorithm for HEVC intra coding using data mining

Abstract:

The international standard of High Efficiency Video Coding (HEVC) improves the compression ratio by over 50 % compared to H.264/AVC, for the same perceptual quality. HEVC adopts flexible coding unit (CU) partitioning by applying recursive CU splitting into four sub-CUs, up to four depth levels, which causes a significant complexity increase. Intra-prediction coding in HEVC achieves high coding performance through the exhaustive evaluation of all available CU sizes, with up to 35 prediction modes for each CU, selecting the one with the lower rate distortion cost. This work presents a novel CU size classifier comprising an offline-trained decision tree with three hierarchical nodes. The decision rules computed in each node are based on the content texture properties of CUs as well as the inter-sub-CUs statistics of the same depth level. Our approach can reduce the number of CU sizes to be checked by the Rough Mode Decision and Rate Distortion Optimization stages of intra-prediction coding. The experimental results show that the proposed algorithm can achieve over 50 % coding time reduction, with no quality penalty in terms of the Peak Signal to Noise Ratio, and just a low bit rate increase (2 %) compared to the HEVC reference model. A performance comparison with state-of-the-art proposals shows that this algorithm surpasses the best proposal in terms of time reduction, for the same coding performance penalty.

Authors:

Ruiz, D., Fernández-Escribano, G., Adzic, V. et al.

Conference / Journal

Multimed Tools Appl 76, 861–894 (2017)

pub-13

A Fast Splitting Algorithm for an H.264/AVC to HEVC Intra Video Transcoder

Abstract:

The High Efficiency Video Coding (HEVC) standard roughly doubles the rate-distortion compression performance of its predecessor, H.264/AVC, at a cost of a high computational complexity. Moreover, intra sequences are commonly used at video editing or post-production, making it necessary a migration from H.264/AVC to HEVC. This paper aims to propose a fast intra transcoding algorithm from H.264/AVC to HEVC.

Authors:

A. J. Díaz-Honrubia, J. L. Martínez, P. Cuenca and H. Kalva

Conference / Journal

2016 Data Compression Conference (DCC), Snowbird, UT, USA, 2016, pp. 588-588

pub-14

Content dependent intra mode selection for medical image compression using HEVC

Abstract:

This paper presents a method for complexity reduction in medical image encoding that exploits the structure of medical images. The amount of texture detail and structure in medical images depends on the modality used to capture the image and the body part captured by that image. The proposed approach was evaluated using Computed Radiography (CR) modality, commonly known as x-ray imaging, and three body parts. The proposed method essentially reduces the number of CU partitions evaluated as well as the number of intra prediction modes for each evaluated partition. Evaluation using the HEVC reference software (HM) 16.4 and lossless intra coding shows an average reduction of 52.47% in encoding time with a negligible penalty of up to 0.22%, increase in compressed file size.

Authors:

S. Parikh, D. Ruiz, H. Kalva and G. Fernández-Escribano

Conference / Journal

2016 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 2016, pp. 561-564

pub-15

Image retargeting for wearable devices

Abstract:

Small displays on wearable devices pose new challenges to presenting images. Image retargeting methods developed for mobile devices assume larger displays and are not adequate. This paper reports new approaches to image retargeting for wearable devices. Key to the proposed approach is identifying and presenting regions of interest that allows users to comprehend content and context. The system was implemented and evaluated on an Android watch. Subjective evaluation of the proposed approach shows that the proposed approach is effective and improves user experience.

Authors:

J. Bhatt, D. Pappusetty, H. Kalva and M. Naik

Conference / Journal

2016 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 2016, pp. 55-57

 

For the complete list of publications please visit: