Intelligent Image Features Extraction
in Knowledge Discovery Systems

Summary

With the size of image databases increasing dramatically, the usefulness of such information is dependent on how well it can be accessed and searched and how well knowledge can be extracted from it. Our ability to generate data currently far outstrips our ability to explore, analyze and let alone understand it.

Recently, there has been a growing and renewed interest in intelligent image management systems based on domain knowledge and applications. Although over the last decade or so some researchers have developed knowledge-based approaches, they rely almost exclusively on low-level features such as texture, colour and intensity and have little high-level features interpretation capability. Also, they mainly operate in pixel domain, such that images (if compressed) need to be decompressed prior to any analysis or processing. Such an approach can slow down any application to the point of being impractical.

This project will create intelligent methods for solving many difficult high-level image feature extraction and analysis problems, in which both local and global properties as well as spatial relations must be taken into consideration. All the techniques will be implemented to work in pixel and compressed domain (avoiding the inverse transform in decompression), thus speeding up the whole process. Hopefully, this research will lead to a better understanding of images and development of efficient methods for other image processing, pattern recognition and computer vision problems. The research results will have a wide range of useful applications, including but not restricted to human face image recognition, medical and biological image analysis, image and video compression, and content-based image indexing and retrieval.

Initial investigation of intelligent image analysis and feature extraction techniques will be done, combining the principles from fields of multiresolution wavelet analysis, compression, computer vision, texture analysis and information retrieval. Next, by applying and extending those techniques we aim to solve a class of difficult image feature extraction problems. By combining various image analysis and signal processing techniques we hope to develop new high-level feature extraction methods, thus improving current state-of-the-art retrieval and classification methods. Using the resulting extracted features as a first step and input to data mining systems would lead to supreme knowledge discovery systems.
back to top
Presumptions

The main problem addressed by this research is current human inability to explore, analyze and let alone understand the huge amount of data produced every day.

Project hypothesis is that efficient image database retrieval can be done only if we have a system that is able to automatically extract relevant features directly from the images stored in the database.

More precisely, research will focus on large image databases and image analysis (with applications like in automatic facial recognition systems, image and video compression), and scientific data (with applications like in intelligent content-based medical and biological image indexing and retrieval). Many images in the aforementioned areas are currently stored in unstructured, non-indexed form, and are hard to analyze by humans due to the sheer size of those databases. Through relevance feedback from domain experts in areas like medicine and biology and through using intelligent and evolutionary learning algorithms, semi-automatic systems that will help analyze given problem will be developed. An automatic approach would consist of unsupervised relevant feature extraction and this kind of solution will be addressed by this research as well. Hopefully, the derived algorithms for intelligent image features extraction combined with some knowledge discovery systems will successfully generalize to broader areas of interest. Developing algorithms that would help humans analyze images even beyond human visual system abilities is a relatively new research area that will be an important challenge for research community for many years to come. Therefore, addressing these issues requires a highly research and collaboration oriented activities, like those proposed in this project.
back to top
Purpose and Aim

With the current huge scientific and non-scientific image databases present, it is naive to expect any human to be able to analyze, understand and extract knowledge from it. Luckily, despite how intellectually complex the task of knowledge discovery from such data may seem, it turns out that the process consists of some computational parts that may be automated and actually better executed by computers than by humans. We will focus our research on intelligent image features extraction, typically by the use of hierarchical or multiresolution analysis. By using knowledge and experiences from face recognition and compression algorithms we aim to improve those algorithms and develop new ones to be able to generalize to various scientific image analysis problems. Wavelets and other image analysis techniques have shown themselves useful in a wide array of domains such as classification of tissues in computed tomography or content-based image retrieval in medical databases. The results from image and video retrieval in general will help solve some of the high-level problems in other areas.
back to top
Research Plan

The work builds on existing image analysis techniques, such as used in multiresolution wavelet analysis, compression, computer vision, texture analysis and information retrieval. Apart from feature extraction techniques, we will also focus our research on image preprocessing techniques for denoising and normalization. Dimensionality reduction techniques, such as Principal Component Analysis (PCA), Independent Component Analysis (ICA) and Linear Discriminant Analysis (LDA), will be addressed as well, in order to be able to handle multi-dimensional data. Since most of the images currently stored in databases come in some form of compressed format, we will also focus our research on implementing current and our newly developed methods directly in compressed domain.

Timetable of the research plan on this project is described below, as well as in the illustrated timetable:


1. State-of-the-art investigation and implementation of the experimental system, targeting domain specific problems to be solved (with domain experts) - 1 YEAR

The first step of this research will be the investigation of underlying mathematics and implementation of these currently used low-level image analysis techniques, thus forming an experimental system needed for further research. By consulting specific domain experts, main areas of problems to be solved will be identified.

2. Design of analysis tools for addressed applications; Development and implementation of new techniques in pixel and compressed domain - 2 YEARS

The second step is the design of analysis tools resulting in new feature extraction techniques (input to recognition, classification/retrieval or knowledge discovery systems), for every application addressed. We will investigate alternative representations and their expressive power, emphasizing the use of multiresolution wavelet analysis and other high-level signal processing techniques. Relevance feedback from domain experts will be an important part at this stage. The features extracted must be relevant to the problem, insensitive to small changes in data (such as noise) and invariant to scaling, rotation and translation. In addition, we need to select discriminating features through appropriate dimensionality reduction techniques, thus making computation feasible. New representation of images suitable for storage and/or further computation may be created. We will investigate how these new intelligent feature extraction methods and image representations serve some standard and some yet unsolved tasks from different areas. To improve the newly developed ideas, possibly some new image preprocessing methods will be investigated as well. For specific application areas (such as face recognition, medical and biological image analysis, image and video compression, and content-based image indexing and retrieval), the team will consult other experts in those areas (if necessary) to specifically determine the main problems to be solved by new image features extraction techniques. Parallel to this pixel domain-based research, the possibility and feasibility of implementing all the methods directly in compressed domain will be investigated as well.

3. Integration of new feature extraction techniques into existing data mining and knowledge discovery systems; Interdisciplinary work and intensive cooperation with domain experts - 1 YEAR

The third step involves the newly extracted features to be combined with the state-of-the-art data mining and knowledge discovery systems and together applied to solve specific tasks in recognition and classification systems, image databases retrieval and medical and biological imaging. At this stage, refinements of the newly developed feature extraction methods are possible and will be based on input from domain experts or feedback from one of the steps of the system/algorithm. Interdisciplinary work and intensive cooperation with various domain experts will be one of the strong points of this project. Finally, the newly developed techniques will also be implemented directly in compressed domain (where possible and needed).

4. Building prototype-like implementation of a new feature extraction system; Detailed testing and validation of the results - 1 YEAR

The fourth step and the final output of the project will be a prototype-like implementation of a new feature extraction system for each application area addressed. Detailed testing of the system and rigorous validation of the results will be done.

WEB-SITE SETUP

A web-site for the project will be set up by the proposer as soon as the project approval has been given (see illustrated timetable). In the first year, the senior researcher of the project will be committed to maintaining this web-site. It will be used for communication within the project, to convey the aims and objectives of the project to the scientific community and for disseminating results to the wide research community and professionals working in the area closely related to this project. Objectives, motivation, appropriate technical documentation, meeting schedule and software demonstrations will be included in the web-site.

MID-TERM AND FINAL SEMINAR

One of the main means for result dissemination are the organization of a Mid-Term and a Final Seminar (see illustrated timetable), where the results of the research carried out within the project will be publicly presented and which will give rise to reports. A Mid-Term Seminar will be organized around the middle of the third year of the project, and a Final Seminar will be organized at its end. At both seminars, on the basis of the successful experience of the past, the research efforts and results will be presented for the identified target audiences. These Seminars will also feature internationally recognized and renowned invited speakers. Moreover, these seminars will help to evaluate and discuss the R&D and emerging know-how with other researchers, as well as to receive a reactions and critics of the national and international scientific society about achieved research results.

OTHER DISSEMINATION ACTIVITIES

A number of dissemination activities will be initiated in order to inform the wider research community of the work of the project. Other dissemination activities will include collaborative submissions to high impact factor journals, conferences and workshops either on specific activities within the project or on the work of the project as a whole. Besides, disseminating of the results will be done through other common publications, presentations, participation in exhibitions, web-pages, Internet newsgroups, etc.
back to top
Application

The research results will have a wide range of useful applications, including but not restricted to human face image recognition, medical and biological imaging and knowledge discovery, advanced image and video compression algorithms, and content-based image indexing and retrieval.

This project is a part of the interdisciplinary and inter-institutional program: "Computational Knowledge Discovery Methods in Scientific Applications", which consists of seven (7) different projects. Collaboration between projects on this program are illustrated here:


Based on this collaboration, three possible areas of applications are envisaged within our project:

  1. Feature extraction by our project and data mining system by project P1 could lead to supreme knowledge discovery approach. An example of joint effort to solve a difficult problem is classification of mammography breast cancer images, but other domains will be tested as well.
  2. Collaboration with project P2 is possible through sharing methodology for effective data compression and retrieval using novel approach to indexing, especially should those data be images or video. There is a possibility of inter-project collaboration in human locomotion and jaw movement analysis as well.
  3. Collaboration between our project and project P4 will concentrate on extraction of reliable features from images obtained in 1D and 2D SDS-PAGE experiments, and building predictive models based on this features. Construction of the new software solution combining image feature extraction and learning predictive models for specific problems will be a final goal of the collaboration.
back to top
Previous Research

This research will use experiences and tools developed in our former projects:

  • 0036015: "Multimedia Communication Systems" (2002-2005), Ministry of Science, Education and Sports, Republic of Croatia, where part of the project was oriented towards the advanced image analysis and compression techniques, multiresolution wavelet analysis techniques, texture classification and face recognition systems;
  • 036041: "Research and Development of Cable TV Services in Croatia" (2000-2002), Ministry of Science and Technology, Republic of Croatia, where novel methods of intermodulation products counting and noise reduction in CATV and other multicarrier systems were investigated;
  • 036015: "Multimedia Communication Technologies" (1996-2002), Ministry of Science and Technology, Republic of Croatia, where advanced image compression techniques for multimedia communications, and methods for subjective and objective image quality evaluation were investigated;
  • 2-07-277: "Research of Advanced Technologies for Croatian Broadcasting Systems" (1991-1996), Ministry of Science and Technology, Republic of Croatia, where video signal processing algorithms for broadband network transmission were investigated.

Research work was also conducted in the framework of the European Commission's COST research activities in the field of telecommunications and information technologies, services and applications (COST-TIST). COST is one of the longest-running instruments supporting co-operation among scientists and researchers across Europe. Participants on this proposed project were involved in the following international COST projects:

  • COST 279: "Analysis and Design of Advanced Multiservice Networks Supporting Mobility, Multimedia and Internetworking" (11/2001 - 06/2005);
  • COST 264: "Enabling Networked Multimedia Group Communication" (09/1998 - 09/2002);
  • COST 257: "Impacts on New Services on the Architecture and Performance of Broadband Networks" (09/1996 - 09/2000);
  • COST 237: "Multimedia Telecommunications Services" (02/1992 - 05/1998);
  • COST 242: "Methods for the Performance Evaluation and Design of Multiservice Broadband Networks" (05/1992 - 05/1996).
back to top
 
Intelligent Image Features Extraction in Knowledge Discovery Systems
© 2007-2014 VCL