Tutorials

The role of the tutorials is to provide a platform for a more intensive scientific exchange amongst researchers interested in a particular topic and as a meeting point for the community. Tutorials complement the depth-oriented technical sessions by providing participants with broad overviews of emerging fields. A tutorial can be scheduled for 1.5 or 3 hours.

TUTORIALS LIST

Tutorial on First Person (Egocentric) Vision: From Augmented Perception to Interaction and Anticipation (VISIGRAPP)
Instructor : Antonino Furnari

Tutorial on Stereoscopic 3D Media Retargeting and Recomposition (VISIGRAPP)
Instructor : Md Baharul Islam

Tutorial on
First Person (Egocentric) Vision: From Augmented Perception to Interaction and Anticipation

Instructor

	Antonino Furnari University of Catania Italy

Brief Bio Antonino Furnari is an Assistant Professor at the University of Catania. He received his PhD in Mathematics and Computer Science in 2017 from the University of Catania and authored one patent and more than 50 papers in international book chapters, journals and conference proceedings. Antonino Furnari is involved in the organization of different international events, such as the Assistive Computer Vision and Robotics (ACVR) workshop series (since 2016), the International Computer Vision Summer School (ICVSS) (since 2017), and the Egocentric Perception Interaction and Computing (EPIC) workshop series (since 2018) and the EGO4D workshop series (since 2022). Since 2018, he has been involved in the collection, release, and maintenance of the EPIC-KITCHENS dataset series, and in particular in the egocentric action anticipation and action detection challenges. Since 2021, he has been involved in the collection and benchmarking of the EGO4D dataset. He is co-founder of NEXT VISION s.r.l., an academic spin-off the the University of Catania since 2021. His research interests concern Computer Vision, Pattern Recognition, and Machine Learning, with focus on First Person Vision. More information is available at http://www.antoninofurnari.it/.

Abstract

Wearable devices equipped with sensing, processing, and display abilities such as Microsoft HoloLens, Google Glass and Magic Leap One allow to perceive the world from user’s point of view. Due to their intrinsic portability and the ability to mix the real and digital worlds, such devices constitute the third wave of computing, after personal computers and smartphones, in which the user plays a central role. Therefore, these wearable devices are ideal candidates for implementing personal intelligent assistants which can understand our behavior and augment our abilities. While in the considered context sensing can go beyond the collection of RGB images and include dedicated depth sensors and IMUs, Computer Vision plays a fundamental role in the egocentric perception pipelines of such systems. Unlike standard “third person vision”, which assumes that the processed images and video are acquired from a static point of view neutral to the events, first person (egocentric) vision assumes images and video to be acquired from the non-static and rather “personal” point of view of the user by means of a wearable device. These unique properties make first person (egocentric) vision different from standard third person vision. Most notably, the visual information collected using wearable cameras always “tells something” about the user, revealing what they do, what they pay attention to and how they interact with the world. In this tutorial, we will discuss the challenges and opportunities behind first person (egocentric) vision. We will cover the historical background and seminal works, present the main technological tools (including devices and algorithms) which can be used to analyze first person visual data and discuss challenges and open problems.

Keywords

wearable, first person vision, egocentric vision, augmented reality, visual localization, action recognition, action anticipation

Aims and Learning Objectives

The participants will understand the main advantages of first person (egocentric) vision over third person vision to analyze the user’s behavior and build personalized applications. Specifically, the participants will learn about: 1) the main differences between third person and first person (egocentric) vision, including the way in which the data is collected and processed, 2) the devices which can be used to collect data and provide services to the users, 3) the algorithms which can be used to manage first person visual data for instance to perform localization, indexing, action and activity recognition.

Target Audience

First year PhD students, graduate students, researchers, practitioners.

Prerequisite Knowledge of Audience

Fundamentals of Computer Vision and Machine Learning (including Deep Learning)

Detailed Outline

The tutorial will cover the following topics:

- Outline of the tutorial;
- Definitions, motivations, history and research trends of First Person (egocentric) Vision;
- Differences between third person and first person vision;
- First Person Vision datasets;
- Wearable devices to acquire/process first person visual data;
- Fundamental tasks for first person vision systems:
- Localization;
- Hand/Object detection;
- Attention;
- Action/Activity recognition;
- Action anticipation;
- Technological tools (devices and algorithms) which can be used to build first person vision applications;
- Challenges and open problems;
- Conclusions and insights for research in the field.

Keywords

wearable, first person vision, egocentric vision, augmented reality, visual localization, action recognition, action anticipation

Aims and Learning Objectives

The participants will understand the main advantages of first person (egocentric) vision over third person vision to analyze the user’s behavior and build personalized applications. Specifically, the participants will learn about: 1) the main differences between third person and first person (egocentric) vision, including the way in which the data is collected and processed, 2) the devices which can be used to collect data and provide services to the users, 3) the algorithms which can be used to manage first person visual data for instance to perform localization, indexing, action and activity recognition.

Target Audience

First year PhD students, graduate students, researchers, practitioners.

Prerequisite Knowledge of Audience

Fundamentals of Computer Vision and Machine Learning (including Deep Learning)

Detailed Outline

The tutorial will cover the following topics:
- Outline of the tutorial;
- Definitions, motivations, history and research trends of First Person (egocentric) Vision;
- Differences between third person and first person vision;
- First Person Vision datasets;
- Wearable devices to acquire/process first person visual data;
- Fundamental tasks for first person vision systems:
- Localization;
- Hand/Object detection;
- Attention;
- Action/Activity recognition;
- Action anticipation;
- Technological tools (devices and algorithms) which can be used to build first person vision applications;
- Challenges and open problems;
- Conclusions and insights for research in the field.

Secretariat Contacts
e-mail: visigrapp.secretariat@insticc.org

Tutorial on
Stereoscopic 3D Media Retargeting and Recomposition

Instructor

	Md Baharul Islam American University of Malta Malta

Brief Bio Md Baharul Islam is an Assistant Professor in the Computer Science and Game Development program at the American University of Malta (AUM). Prior to joining the AUM, he was a Postdoctoral Research Fellow at AI and Augmented Vision Lab of the Miller School of Medicine, University of Miami, United States. Dr. Islam has more than 10 years of working experience including teaching and cutting-edge research in image processing and computer vision. His current research interests lie in the area of 3D processing and AR/VR-based vision rehabilitation. Dr. Islam secured several (three) gold medals and best paper awards (four) from different national and/or international scientific and technological competitions and conferences. His Ph.D. thesis had been selected for the best Ph.D. Thesis by the IEEE SPS Research Excellence Award 2018. He authored/co-authored more than 35 international peer-reviewed research papers including journal articles, conference proceedings, books, and book chapters. Dr. Islam served as a program/technical committee member of about 10 international conferences, symposiums, and workshops. He has received the best reviewer award from CGMIP 2019 where he worked as a program co-chair in 2019, 2020. He is also an editorial board member of two international journals in the United States. Dr. Islam is an active IEEE Senior Member since 2018.

Abstract

Due to the availability and affordability of stereoscopic devices, such as stereoscopic camera, lens, and stereo-enable display device, research interest in stereoscopic image and video manipulation is growing increasingly in recent years. This growth has stimulated the need for stereoscopic post-processing tools that allow users to modify their stereoscopic media without violating stereoscopic properties (e.g. disparity, vertical alignment). However, the modification of stereoscopic image/video is non-trivial compared to conventional 2D image/video modification due to the additional depth information that needs to be considered in order to avoid 3D visual fatigue. In this tutorial, we present a set of methods that can retarget and recompose stereoscopic media for enhancing the 3D viewing experience. In the first method, we present a warping-based method that can resize a given stereoscopic image and video within the target image/video scale. The main challenge of this work is to preserve the stereoscopic properties in the retargeted images/videos, particularly the horizontal disparity, and minimize the vertical drift between the left and right stereoscopic 3D images and videos. In the second method, we present a hybrid method for stereoscopic 3D image retargeting and recomposition that can modify the subject-background relationship in the retargeted and recomposed images. This method ensures better object protection and scene consistency compared to traditional warping-based methods. Experimental results demonstrate that these methods are outperforming than the state-of-the-art and thus generate visual artifacts free and more aesthetically pleasing retargeting and recomposed images/videos.

Keywords

Depth Remapping, Disparity, Depth Comfort Range, Image warping, Media Resizing, Optimization, Stereoscopic 3D Imaging, Stereoscopic Image Recomposition, 3D Image Aesthetics

Aims and Learning Objectives

- Understand the basic concept of the stereoscopic 3D vision system and its properties;
- Learn a set of methods for stereoscopic 3D image and video resizing within the target image/video scale without violating stereoscopic 3D properties;
- Know a hybrid method for stereoscopic 3D image recomposition based on a set of photographic composition rules;
- Remap the depth/disparity within the comfort depth range for a better 3D viewing experience.

Target Audience

Postgraduate students, researchers, scientists, academicians in the field of computer science, particularly image processing and computer vision.

Prerequisite Knowledge of Audience

- Basic knowledge of low-level image processing and computer vision;
- Fundamental knowledge on 2D/3D photography, stereoscopic 3D media processing, human vision system;
- Basic knowledge on image processing and computer vision toolboxes in MATLAB (prefer but not mandatory).

Detailed Outline

- Introduction of Stereoscopic 3D Vision System;
- Overview of the recent Research on Stereoscopic Media Processing;
- Stereoscopic 3D Photo Composition;
- Warping-based Image and Video Retargeting Methods;
- Stereoscopic Properties Preservation;
- Stereoscopic 3D Image Segmentation;
- Error Minimization for the Preservation of the Stereoscopic Quality;
- Stereoscopic 3D Image Recomposition;
- Experimental Results and Empirical User Studies;
- Comparison with the state-of-the-art methods;
- Demonstration of these methods;
- Concluding Remarks;
- Question and Answer with Interactive Discussion.

Keywords

Depth Remapping, Disparity, Depth Comfort Range, Image warping, Media Resizing, Optimization, Stereoscopic 3D Imaging, Stereoscopic Image Recomposition, 3D Image Aesthetics

Aims and Learning Objectives

- Understand the basic concept of the stereoscopic 3D vision system and its properties
- Learn a set of methods for stereoscopic 3D image and video resizing within the target image/video scale without violating stereoscopic 3D properties
- Know a hybrid method for stereoscopic 3D image recomposition based on a set of photographic composition rules
- Remap the depth/disparity within the comfort depth range for a better 3D viewing experience

Target Audience

Postgraduate students, researchers, scientists, academicians in the field of computer science, particularly image processing and computer vision

Prerequisite Knowledge of Audience

- Basic knowledge of low-level image processing and computer vision
- Fundamental knowledge on 2D/3D photography, stereoscopic 3D media processing, human vision system
- Basic knowledge on image processing and computer vision toolboxes in MATLAB (prefer but not mandatory)

Detailed Outline

- Introduction of Stereoscopic 3D Vision System
- Overview of the recent Research on Stereoscopic Media Processing
- Stereoscopic 3D Photo Composition
- Warping-based Image and Video Retargeting Methods
- Stereoscopic Properties Preservation
- Stereoscopic 3D Image Segmentation
- Error Minimization for the Preservation of the Stereoscopic Quality
- Stereoscopic 3D Image Recomposition
- Experimental Results and Empirical User Studies
- Comparison with the state-of-the-art methods
- Demonstration of these methods
- Concluding Remarks
- Question and Answer with Interactive Discussion

Secretariat Contacts
e-mail: visigrapp.secretariat@insticc.org