Plenary Papers

Ilya Chizhov

A new theory on perceptual lossy compression for audio signals

Biography: Ph.D in audio compression, Dr. Sc. thesis in audio compression now in progress in MIPT. Head of Speech/Audio AI Compression Team, Moscow Research Centre. More than 25 years experience in audio compression.

Research interests: perceptual compression of audio signals, psychoacoustics, AI-based speech compression, digital signal processing, machine/deep learning, AI-based speech enhancement, personalized noise reduction, text-to-speech. Lead more than 50 academic and industrial projects in speech/audio compression and speech enhancement. Several russian and international patents and patent applications. More than 30 scientific publications. Has many company awards, included highest individual one.

Abstract:

The report is devoted to the development of a new theory of lossy audio data compression based on psychoacoustic principles of human sound perception.

Usually, the procedure of audio signal compression used by modern lossy audio signal compression methods consists of a number of sequential, rather complicated procedures, the optimality of which in terms of perceptual equality of the original and reconstructed signals has not been proven by any of the research groups.

The construction of a new Theory of Lossy Audio Data Compression (TLAC), ensuring the achievement of perceptual equality of perception of the original and reconstructed audio signals at any bit rates supported by modern standardized audio codecs, seems to be an important scientific problem, essential for such a field of knowledge as audio signal coding.

The report will consider modern approaches to audio signal compression both with losses (lossy) and without them (lossless). The main attention will be paid to lossy compression. We will consider both modern formats of perceptual compression, such as Dolby AC-4 and MPEG-4 AAC, and methods of sound and speech compression based on the use of artificial intelligence (SoundStream, EnCodec, Lira2).

The most important part of the report will be devoted to TSAP - the main principles of this new theory will be considered and the possibility of achieving perceptual equality between the original and restored audio signal will be proven.

Sergey Buzykanov

Challenges in modern AI models for multimedia data processing

Biography:

Ph.D.'03, D.Sc.'13, Associate professor.
Worked as an MPEI Professor, MIPT Associate professor, a RSREU Associate professor.
The main research interests are AI, NLP, Digital signal processing, Computer vision.
Published more than 70 papers.

Abstract:

Artificial intelligence is the main trend in modern research of multimedia processing systems. Compression of images and video streams, natural language processing, data filtering and transformation - the use of AI allows to significantly improve the quality of system performance in all areas. Hundreds of research articles are published in the world in each of the areas every year, proposing new processing methods and new tasks for AI. However, it should be noted that the main method to improve the quality of neural networks results is to increase their size and the number of adjustable parameters. This method is extensive and leads to high computational and hardware costs and, consequently, to high cost of realization of such networks with market monopolization by large players having access to their own computational resources. In this regard, there is a list of fundamental tasks, the solution of which will allow to significantly reduce the cost of AI systems, realize them in mobile devices without a direct access to the Internet and make AI a widely available cheap service, allowing to solve everyday tasks of the user.

The presentation discusses a list of fundamental tasks of multimedia

processing, studies of cognitive processes and methods of building and training neural networks, which are especially actual at the current stage of AI development and of interest in terms of conducting scientific research with the cooperation of academic institutions and manufacturing companies.

Tchobanou Mikhail

Challenges of modern imaging pipelines in smartphones

Biography:

Doctor of Science in digital signal processing, Full Professor. Works in Moscow Research Centre. More than 35 years of experience in image processing. Deep results in color image processing, perceptual aspects of human visual system.

Research interests: image and color processing, multirate multi-dimensional signal processing, perceptual image enhancement, personalized color processing, optimization of color processing pipeline. Lead more than 20 academic and industrial projects in image/color processing. Several Russian and international patents and patent applications. More than 200 scientific publications. Has many company awards.

Abstract:

In the current LED Technology Revolution state high resolution displays/screens for smartphones, tablets and TV’s offer the brightness range necessary to display HDR content, which is typically thousands of Nits (cd/m2), and modern high-quality RGB LEDs are capable of a color gamut close to Rec.2020, ensuring vivid colors can be faithfully reproduced.

High Dynamic Range (HDR) offers a huge step up in image quality on LED screens. HDR and Wide Color Gamut (WCG) technologies really improve users’ viewing experience (UX): HDR allows 500-10000 nits of peak luminance, while WCG lead to more colorful and wider range of colors. Evaluation/optimization of HDR&WCG pipeline performance for color management tasks, device characterization and device calibration of such HDR&WCG devices, like cameras, displays, smartphones, tablets, notebooks, projectors and smart TVs, including video coding applications, are in great demand.

Pipeline performance evaluation/optimization (including NN training), can be done via color difference metric (CDM), which is the foundation of corresponding metrological support. Other problems related to camera imaging pipeline development will include also accurate, efficient cameras’ spectral characterization, multiple camera inconsistency removal, development of solutions for multispectral sensors with extended spectrum, as well as deep human studies for perceptually optimized color processing.

Simone Bianco, Marco Buzzelli, Raimondo Schettini

An Unifying Framework for Color Constancy: bridging the gap between low-level statistics methods and deep learning methods

Biography:

Simone Bianco obtained the PhD in Computer Science at DISCo (Dipartimento di Informatica, Sistemistica e Comunicazione) of the University of Milano-Bicocca, Italy, in 2010. He obtained the BSc and the MSc degree in Mathematics from the University of Milano-Bicocca, Italy, respectively in 2003 and 2006.
He is currently Associate Professor and his teaching and research interests include computer vision, artificial intelligence, machine learning, optimization algorithms applied in multimodal, and multimedia applications.

He is holder of the Italian National Academic Qualification as a Full Professor of computer engineering (09/H1) and computer science (01/B1). He is on Stanford University’s World Ranking Scientists List for his achievements in Artificial Intelligence and Image Processing.

He is also the R&D Manager of the University of Milano Bicocca spin off Imaging and Vision Solutions, and Member of European Laboratory for Learning and Intelligent Systems.

Abstract:

This talk is devoted to the presentation of a unifying framework for computational color constancy. the integration of Convolutional Neural Networks (CNNs) to optimize traditional white balance algorithms, focusing on color constancy in digital images. Color constancy is essential for accurate color rendering across varying lighting conditions, and has historically relied on low level statistics-based approaches.

Most recent approaches instead are based on deep learning, and exploit Convolutional Neural Networks with several millions of trainable parameters that are able to provide accurate results at the cost of very expensive inferences. This talk will present a convolutional framework that extends low-level image statistics to a learnable, deep neural architecture, achieving superior accuracy in illuminant estimation and up to 30× computational speed improvements.

Experimental results demonstrate that CNNs can enhance both the precision and performance of traditional white balance methods, making them more suitable for real-time applications.

Kovalenko D.L., Voruyeu A.V., Kulichenko V.N.

Techniques for Passive and Active WiFi Network Surveys

Biography:

Dmitry Kovalenko

Vice Rector for Research Educational institution "Francisk Skorina Gomel State University"

Candidate of Physical and Mathematical Sciences

Coordinator of the international projects Erasmus+ "Improvement of Master's level education in physical sciences at Belarusian universities" (acronym "PHYSICS") (2015-2018), Erasmus+ "Development of student-directed practice-oriented education in modeling cyber-physical systems" (acronym "CybPhys") (2019-2020). (acronym "Radium") (2020-2023), Erasmus+ "Training based on the best practices of EU countries in the field of radiation protection and nuclear safety culture for the Belarusian academic community" (acronym "Radium") (2020-2023).

Co-organizer of the International scientific conferences "Inter-Academia 2021", "Problems of interaction of radiation with matter", annual Republican scientific conference of students, undergraduates and postgraduates "Topical issues of physics and engineering".

Voruyeu Andrei

Head of the Department of Automated Information Processing Systems of Educational institution "Francisk Skorina Gomel State University"

Candidate of Technical Sciences

National expert of the competence Network and System Administration in the Republican professional skills competition WorldSkills Belarus 2018, WorldSkills Belarus 2020, ProfSkills Belarus 2023, WorldSkills Asia 2018, in Abu Dhabi (UAE), 45th International Championship WorldSkills Kazan 2019

Vladimir Kulinchenko

senior lecturer of the Department of Automated Information Processing Systems of Educational institution "Francisk Skorina Gomel State University"

Responsible executor of the scientific task "Digital and space technologies, security of man, society and state" subprogram "Digital technologies and space informatics" R&D "Diagnostics and multifactor security survey of WiFi wireless networks (IEEE 802.11 standard) of enterprises and organizations".

Abstract:

The wireless communications ecosystem includes a large number of participants, including end-users, device manufacturers, service providers, national regulators, and others. Stable operation of networked devices as part of complex hybrid devices and facilities allows to expand their functionality and automate the collection of data on experiments. The limitation of wireless technologies application is the reliability of the information communication channel. The materials consider approaches to the organization of passive and active survey of WiFi networks in conditions of intensive use of wireless environment.

Pozhar Vitold, Machikhin Alexander

Acousto-optical techniques and instruments for spectral analysis and imaging spectroscopy

Scientific Technological Center of Unique Instrumentation, Russian Academy of Sciences (STC UI RAS)

Biography:

Pozhar Vitold, Head of Department of Acousto-optical information systems, Professor at Moscow Bauman State Technical University,

Professor at National Research Nuclear University “MEPhI” (Moscow)

born in 1958, graduated from Moscow Institute of Physics and Technology (MIPT, Dolgoprudny, Moscow Region), Candidate of Science (PhD) since 1987, Doctor of Physics and Mathematics since 2005

Machikhin Alexander, Head of Acousto-optical spectroscopy Laboratory in STC UI RAS (Moscow), graduated from Moscow Bauman State Technical University (2007), Associated professor at National Research University “MPEI” (Moscow), Candidate of Science (PhD) since 2011, Doctor of Technology since 2019

Basic fields of activity: acousto-optics, optics, acoustics, radiophysics, spectrometry, optical gas analysis, environment monitoring, spectral imaging, differential spectroscopy, metrology.

Abstract:

Nowadays, instruments and technologies detecting and processing spatial-spectral data form the basis for various industrial, biomedical, agricultural and other specialized systems capable of solving many urgent problems. Among the different technical platforms implementing hyperspectral technologies, acousto-optical devices occupy a unique place due to the combination of exceptional features. The report presents a family of original (STC UI RAS) acousto-optical instruments and techniques for spectral and hyperspectral analysis as well as for spectral imaging. Some examples of their practical applications are discussed.

Özgür Gürbüz

Resource alloction for THZ drone communications with realistic antennas and mobility patterns

Biography:

Özgür Gürbüz received her B.S. and M.S degrees in Electrical and Electronics Engineering at Middle East Technical University, in 1992 and 1995, respectively. She received her Ph.D. degree in Electrical and Computer Engineering from Georgia Institute of Technology in 2000. From 2000 until 2002 she worked as a researcher and systems/algorithms engineer for Cisco Systems, in Wireless Access and Wireless Networking Business Units. As of September 2002, Dr. Gurbuz joined the Faculty of Engineering and Natural Sciences at Sabanci University, where she is now a Professor. Her research interests are in the field of wireless communications and networks, specifically design of link and higher layer network algorithms/protocols for emerging physical layer techniques including full-duplex communication, cooperative communication, MIMO, smart antennas. Recently, she has been working on full-duplex communication, digital self-interference cancellation and applications of machine learning in wireless communications/networks and THz communications. She is a member of IEEE and IEEE Communications Society.

Abstract:

This work considers Terahertz (THz) drone communications by applying various resource allocation schemes with practical THz antennas within the frequency range of 0.75-4.4 THz under realistic mobility and misalignment scenarios, ensuring a more accurate representation of real-world conditions. Through numerical simulations, we unveil the real capacity achievable within the realm of Tbps up to 100 meters range, when drones are in motion and subject to alignment and moderate misalignment. However, when exposed to actual mobility traces, the performance of all resource allocation schemes experiences a significant drop, sometimes up to six orders of magnitude, due to occasional reverse orientations of antennas. Consequently, the need arises for active beam control solutions to maintain the performance of THz drone networks. These findings highlight both the significant strides made in THz technology and the remaining challenges for the integration of THz band drones into the fabric of 6G networks.