Computer Vision And Pattern Analysis Laboratory Home Page  Home
People  People
Publications  Publications
Publications  Databases
Contact Information  Contact
Supported Research Projects  Supported Research Projects
Research Activites  Research Activites
Research Groups
SPIS - Signal Processing and Information Systems Lab.SPIS - Signal Processing and Information Systems Lab.
Medical Vision and Analysis Group  Medical Research Activities
Biometrics Research Group  Biometrics Research Group
SPIS - Signal Processing and Information Systems Lab.MISAM - Machine Intelligence for Speech Audio and Multimedia.
Knowledge Base
  Paper Library


LeaderHakan Erdogan
  • Saygin Topkaya
  • Berkay Yilmaz
  • Hakan Erdogan
  • Harun Karabalkan (alumni)
Project TUBITAK 107E015: Novel Approaches in Audio-Visual Speech Recognition

ContactSend e-mail Hakan Erdogan
Database Description

1. What is SUTAV DB?

SUTAV DB (stands for Sabanci University Turkish Audio Visual DataBase) is an audio-visual database, which can be used for audio-visual recognition applications such as Visual Speech Recognition, Audio-Visual Security Systems etc. It is a part of the project "Novel Approaches in Audio-Visual Speech Recognition (Project 107E015)" supported by TUBITAK.

SUTAV DB is a rapidly growing database, which is maintained as a Sabanci University Project102 Freshman course. Every educational semester, Freshman students record a number of individuals, in different sessions, speaking a vast number of different statements (names, sentences, numbers etc.).

Started in 2006 and already having a great number of recorded sessions, every year hundreds of new videos are added to the database.

2. Record Process and Format

Sessions of SUTAV DB are recorded at Sabanci University Computer Vision and Pattern Analysis Laboratory. The following setup is used for recordings:
- SONY DCR-HC23E - MiniDV Camcorder
- RODE - NTG-1 Directional Condenser Microphone
- M-Audio Fast Track Pro Audio/MIDI Interface
Videos are saved in DV AVI format, having a visual resolution of 720 x 576 and 44.100KHz sound.

3. Recorded Data

In the recording process, a head rotation video shot of the test subject is first taken to be used for any 3D application. Then the test subjects say their own name, and name of some previous test subjects. This part of the recording process is aimed to be used in audio-visual security or speaker recognition applications.

The main content richness of SUTAV DB lies in the following videos of the sessions. Subjects are recorded while speaking numbers from 1 to 10, also random number sets partitioned in groups of four. Then the test subjects speak name of some cities of Turkiye, and finally speak random Turkish sentences chosen from different independent sources.

All these different statements cover a broad area of Turkish, which provide a good source for working about different semantic properties of Turkish language.

4. Database Statistics

As of end of 2008-2009 Fall semester a total number of 169 subjects have been recorded 104 of which are male and 65 female. The outline of one recording session is below:

Every subject is recorded three times, in every of which they speak same statements, so researchers can investigate differences between times of same subject. However statements for each subject is selected from a large pool of database, so there is a great diversity between subjects.

5. How To Obtain and Legal Issues

Contact Prof. Hakan Erdogan

Home Back Make a Comment