Key-frame Analysis and Extraction for Automatic Summarization of Real-time Videos

Authors Organisations
Type

Student thesis: Doctoral ThesisDoctor of Philosophy

Original languageEnglish
Awarding Institution
Supervisors/Advisors
Award date2019
Links
Show download statistics
View graph of relations

Abstract

Recent advances in multimedia technologies and internet services mean that, capturing and uploading videos is so easy that enormous quantities of new videos are available online every second. It is very tedious and time consuming to go through the entire video and it is also difficult to manage with this colossal storage. This creates the desperate need for briefer representation of videos in order to browse them more effectively and efficiently and retain their essential information, not only for entertainment benefits but also for commercial applications. This paves way to an emerging research area known as video summarization, which can be either a set of key frames or video skims. Over the past few decades, research in video summarization started flourishing rapidly and branching into new areas within the prevailing Big data era.
Despite various studies having been conducted, the major challenge still remains in effective feature extraction, similarity measurement and appropriate summary length determination, which generates the need for the development of effective and efficient automatic video summarization techniques. On the other hand, the evaluation of video summaries is also challenging due to the lack of consistent evaluation framework and the current evaluation methods are quite subjective. The aim of our research is to develop automatic video summarization and evaluation methods exploiting various image processing and pattern recognition techniques in order to extract distinct key frames within a video, by providing a succinct summary with significantly reduced file size.
To this end, in this thesis, we propose two automatic video summarization techniques. Firstly, we present a local search algorithm for automatic video summarization, using a combination of color and motion vector features via a sliding window approach. This is evaluated using the Open Video Project database, YouTube database and Kodak Home Video database, where it achieves high average accuracy rate (65%) with a low average error rate (45%) for the Open Video Project database. Subsequently, we develop a novel global search algorithm for automatic video summarization, using a Distinct Frame Patch index and Appearance based Linear clustering approach for extracting key frames from a given video. The validity of our proposed method is evaluated using the Open Video Project database, YouTube database and Kodak
Home Video database, where it achieves even a high average accuracy rate (71%) and with a much lower average error rate (41%) for the Open Video Project database. Further, we propose two evaluation methods for automatically evaluating the video summaries produced by the existing summarization techniques. The first method is based on a simple and robust two-way search using Pearson’s correlation coefficient for the initial establishment of matched frames, followed by compatibility modelling, in order to discard the false and the weak matches obtained from the two-way search. The second method improves the IMage Euclidean Distance (IMED) by considering only the neighboring pixels, just like a kernel of size n × n, rather than all the pixels leading to an Efficient IMage Euclidean Distance (EIMED). Finally, we focus on human consistency evaluation of static video summaries in which the user summaries are evaluated among themselves using our compatibility modelling method. The contributions of our work are four-fold. First, we propose two automatic video summarization techniques, where our methods outperform the state-of-the-art techniques both in terms of accuracy and efficiency in the case of Open Video database.
Secondly, we propose two techniques for automatic video summary evaluation using the Open Video Project database in which we discover that the performance evaluation of the state-of-the-art techniques are too optimistic in nature and probably misleading. Thirdly, we investigate human consistency via the evaluation of user summaries using three databases (Open Video Project database, YouTube database and SumMe database) in which we show that the level of agreement varies significantly between the users, which in turn determines the maximum agreement level of the users for a certain dataset and places an upper limit on the overall best performance the automatic video summarization techniques can achieve. Another contribution lies in the creation of static video summaries from the available video skims of the SumMe
database