Current Issue : January-March Volume : 2025 Issue Number : 1 Articles : 5 Articles
The constant increase in multimedia Internet traffic in the form of video streaming requires new solutions for efficient video coding to save bandwidth and network resources. HTTP adaptive streaming (HAS), the most widely used solution for video streaming, allows the client to adaptively select the bitrate according to the transmission conditions. For this purpose, multiple presentations of the same video content are generated on the video server, which contains video sequences encoded at different bitrates with resolution adjustment to achieve the best Quality of Experience (QoE). This set of bitrate–resolution pairs is called a bitrate ladder. In addition to the traditional one-sizefits- all scheme for the bitrate ladder, context-aware solutions have recently been proposed that enable optimum bitrate–resolution pairs for video sequences of different complexity. However, these solutions use only spatial resolution for optimization, while the selection of the optimal combination of spatial and temporal resolution for a given bitrate has not been sufficiently investigated. This paper proposes bit-ladder optimization considering spatiotemporal features of video sequences and usage of optimal spatial and temporal resolution related to video content complexity. Optimization along two dimensions of resolution significantly increases the complexity of the problem and the approach of intensive encoding for all spatial and temporal resolutions in a wide range of bitrates, for each video sequence, is not feasible in real time. In order to reduce the level of complexity, we propose a data augmentation using a neural network (NN)-based model. To train the NN model, we used seven video sequences of different content complexity, encoded with the HEVC encoder at five different spatial resolutions (SR) up to 4K. Also, all video sequences were encoded using four frame rates up to 120 fps, presenting different temporal resolutions (TR). The Structural Similarity Index Measure (SSIM) is used as an objective video quality metric. After data augmentation, we propose NN models that estimate optimal TR and bitrate values as switching points to a higher SR. These results can be further used as input parameters for the bitrate ladder construction for video sequences of a certain complexity....
The density of fog is directly related to visibility and is one of the decision-making criteria for airport flight management and highway traffic management. Estimating fog density based on images and videos has been a popular research topic in recent years. However, the fog density estimated results based on images should be further evaluated and analyzed by combining weather information from other sensors. The data obtained by different sensors often need to be aligned in terms of time because of the difference in acquisition methods. In this paper, we propose a video and a visibility data alignment method based on temporal consistency for data alignment. After data alignment, the fog density estimation results based on images and videos can be analyzed, and the incorrect estimation results can be efficiently detected and corrected. The experimental results show that the new method effectively combines videos and visibility for fog density estimation....
Facial biometrics are widely used to reliably and conveniently recognize people in photos, in videos, or from real-time webcam streams. It is therefore of fundamental importance to detect synthetic faces in images in order to reduce the vulnerability of biometrics-based security systems. Furthermore, manipulated images of faces can be intentionally shared on social media to spread fake news related to the targeted individual. This paper shows how fake face recognition models may mainly rely on the information contained in the background when dealing with generated faces, thus reducing their effectiveness. Specifically, a classifier is trained to separate fake images from real ones, using their representation in a latent space. Subsequently, the faces are segmented and the background removed, and the detection procedure is performed again, observing a significant drop in classification accuracy. Finally, an explainability tool (SHAP) is used to highlight the salient areas of the image, showing that the background and face contours crucially influence the classifier decision....
360° videos and virtual reality (VR) are among the defining applications for a new era of immersive multimedia technologies. However, streaming high-quality 360° videos constitutes a challenge for service providers and network operators, especially in multi-user scenarios. The massive data rates and the unique features of omnidirectional videos necessitate the development of novel streaming techniques. In this paper, we propose an optimized multi-user tiled 360° video streaming framework. Specifically, we formulate the problem of tile quality selection in multi-user bandwidth-constrained communications and propose an algorithm that assigns the quality levels of transmitted tiles. The proposed framework is designed to maximize the perceived quality within the users’ viewports by considering 360° video quality assessment (360° VQA) metrics and tile viewing percentages. To simulate our solution in practical settings, we employ a long shortterm memory (LSTM) model to perform viewport prediction and make the assessment based on real-life viewing data. Simulation results of our proposed framework show significant improvement in the delivered viewports’ quality and robustness against increasing the number of users....
Artistic image transformation is a computer technique widely applied in art creation, design, entertainment, and cultural heritage by converting images into artistic styles. It offers innovative ways for artists to express themselves, provides designers with more choices and inspiration, enhances visual esthetics, and enables creative implementations in movies, games, and virtual reality. Additionally, it aids in the restoration and preservation of ancient artworks, allowing a deeper appreciation of classical art. Traditional image transformation methods, though effective for simple effects, lack the flexibility and expressiveness of deep learning–based approaches. To enhance the effectiveness and efficiency of artistic image transformation, this paper employs generative adversarial networks (GANs), which utilize an adversarial training mechanism between a generator and a discriminator to produce high-quality and realistic image transformations. This study introduces spectral normalization (SNGAN) to further improve GAN performance by constraining the spectral norm of the discriminator’s weight matrix, preventing gradient issues during training, thus improving convergence and image quality. Experimental results on the CHAOS dataset indicate that the proposed SNGAN model achieves the lowest mean absolute error (MAE) of 0.3420, the highest peak signal-to-noise ratio (PSNR) of 32.1423, and a structural similarity index (SSIM) of 0.6696, closely matching the best result. Additionally, the SNGAN model demonstrates the shortest training time, highlighting its efficiency. These results confirm that the proposed method achieves more realistic and efficient artistic image transformations compared to traditional methods and other deep learning algorithms....
Loading....