Wyniki 1-9 spośród 9 dla zapytania: authorDesc:"TOMASZ MĄKA"

Audio features for speech detection in adverse conditions


  Effectiveness of automatic audio and speech processing systems rely on properties of source signal: acquisition environment, type of signal (audio event, music clip, speech, etc.) and it varies with the recording conditions and equipment. The task of identification of particular type of signal in audio stream is currently widely studied in literature [4]. Each class of signal has to be treated independently in order to different applications. For example, automatic speech recognition (ASR), speaker identification, verification and separation systems assume speech as input signal and can give unpredictable results for others, diminishing overall accuracy. Similarly, in audio coding systems it is important to use separate, dedicated coders for speech and music for better coding efficiency [4, 13]. Therefore, obtained regions in identification process for defined class of signal are employed in segmentation and extraction of input audio stream for final application. The accuracy of detected regions depends on type of features describing signal properties and the influence of external conditions like noise, coding effects, etc [3, 12]. One of critical problems in speech/audio applications is determination of existence voice in signal. Algorithms for such tasks are called VAD (Voice Activity Detection) and play main role in voice transmission systems. The main goal of VAD is to indicate regions of voice occurrence in input audio stream. As a result, they contribute to lower bandwidth usage [3, 5, 9]. Noisy robustness is one of key elements of VAD efficiency, but prediction of detection accuracy in different types of noise is difficult to accomplish. Despite many existing approaches to speech detection, the sensitivity problem of audio features on different types of noise is still exist [...]

Properties of feature contours for audio classification tasks


  Increasing technological advances stimulates development of fully automated speech/audio interaction systems in everyday life. Such systems include various modules to handle different acoustical streams like speech, music, background noises, events, etc. Additionally, there are systems that use extra information from video streams or data captured from dedicated sensors to improve their efficiency in human-machine interaction and interactive multimodal tasks [1]. However, the final effectiveness is still highly dependent on audio processing stage. The processing flow involve two basic steps: segmentation and classification [2-3]. In the segmentation phase input signal is divided into regions, where important audio information from interaction point of view can be extracted. Then, for every region audio classification is performed. Due to high sensitivity on the variable external audio conditions, selection of features with good separability properties for a given set of classes is crucial [4-5]. Nevertheless in many systems a feature set is selected arbitrary [6]. Typical audio classification phase works in the following manner: first, input signal is divided into short frames, then for each frame a set of audio features is computed making feature vector. Every vector is labeled with signal class and then used in learning phase where database is created. Finally, obtained database is exploited in the classification process [7]. The main obstacle in such approach is the impossibility of features modification (for adaptation to changing conditions) without repeating learning phas[...]

Analyzing feature trajectories for audio content discrimination


  The field of audio information indexing and retrieval needs audio analysis tasks like segmentation and classification. Its growing popularity and numerous applications are connected with the availability of huge audio resources on the Internet. The large variability of audio content in such resources has direct influence on effectiveness and usability of audio management systems. Audio data categorization or searching by listening seems to be difficult task, thus automatic and effective mechanism is required to explore data. Typical audio retrieval system involves three steps: segmentation, feature extraction and classification. The final efficiency is dependent on all these steps and audio acquisition environment. Segmentation and classification processes are performed on the features variability basis, thus the choice of audio features set with good discrimination properties is important. It is still difficult to decide how big should be feature set and what kind of features should be selected. Therefore, to determine set of audio features for selected number of classes the feature selection step is introduced in audio analysis systems. The type of source audio data determines the specificity of classification stage. The most common audio classification task is to determine speech regions in input signal. In such case, the binary discrimination between speech and non-speech parts is performed. Automatic systems for audio classification usually categorize sound segments into several groups. For example, in broadcast news material, determination between speech and music is executed at first stage. At subsequent stages, an analysis of mixtures (speech with background music or environmental noise) and specific signals (silence, noise) is often considered. Audio segmentation as the first process performed in automatic systems for audio classification is mostly based on Bayesian Information Criterion (BIC) se[...]

Stream-based cores mapping strategies dedicated to Networks on Chip architecture

Czytaj za darmo! »

The Multi Processor Systems on Chips (MPSoCs) are often considered as being suitable for streaming multimedia applications [5]. However, contemporary bus-based MPSoCs do not scale enough to maintain the foreseen growth of the number of intellectual property (IP) cores in a single chip [2]. It is one of the reasons of the ubiquitness of the packet-based Network-on-Chip (NoC) architectures in [...]

Multi-core realization of audio decoders utilizing on-chip networks

Czytaj za darmo! »

The Multi-Processor Systems on Chips (MPSoCs) are viewed as suitable for future generation of streaming multimedia applications [1]. As modern bus-based and point-to-point (P2P) MPSoCs do not scale enough to maintain the foreseen growth of the number of intellectual property (IP) cores in a single chip [5], the packet-based Network-on-Chip (NoC) architectures are perceived as the necessity o[...]

Features extraction system for automatic speech recognition core mapping into an irregular Network on Chip


  As speech technologies become more and more prevalent they require efficient algorithms in its back end. This demand is required for fulfill speech processing time constrains since human speech communication is characterized by its dynamic behaviour. The most popular example of speech processing system is Automatic Speech Recognition (ASR). ASR system requires at least two modules to accomplish speech recognition in its basic form: feature extraction, extracting descriptor values for input speech data, and data classification [1]. The ASR processing is typically computational-intensive but it can be split into stages to be implemented in separate computational units. Thanks to this property they can benefit from parallel and distributed processing working in a pipeline-like way and transmitting each other streams of relatively large, but usually fixed, amount of data. In these applications, it is usually required to keep an assumed quality level of service and meet real-time constraints [10]. Multi Processor Systems on Chips (MPSoCs) are often considered as suitable hardware implementations of these applications [5]. As each processing unit of a MPSoC can realize a single stage of streaming application processing, it is still problematic to connect these units together. The simplest point to point (P2P) connections require too much space, whereas bus-based connections result in large number of conflicts and, consequently, despite various arbitrage techniques decrease the overall performance of the whole system [7]. Moreover, both P2P and bus-based realizations do not scale well with the constantly increasing number of independent Intellectual Property (IP) cores (i.e., computational units) required by contemporary devices dealing with a number of various algorithms in a single system [2]. In order to omit these obstacles, the packet-based Network-on-Chip (NoC) paradigm for designs of chips realizing distributed computation has[...]

Flexible packet scheduling algorithm utilization for on-chip networks

Czytaj za darmo! »

Using contemporary technology, it is possible to implement a large number of computational units inside a single chip. Such kind of chip is usually referred to as System on Chip - SoC. A typical system of this type is usually composed of computational elements, I/O interfaces, memory, and communication infrastructure. Due to the continuously increasing complexity of these systems and low scalability of inner connections, system cores are connected with Network on Chip (NoC) [1]. These networks are an alternative to the state-of-the-art point to point or bus-based connections and, similarly to buses, they offer an universal interface for connecting SoC elements. However, NoCs guarantee better electrical parameters of transmitted signals, higher bandwidth and are capobility of tr[...]

The impact of the type of sung vowels on the singing quality


  An assessment of the singing voice quality can be considered on many levels and thus is a complex process. Such process is influenced by technical skills or medical aspects. The quality of singing may be affected by technical skills of the singer. Some shortcomings can be observed in the intonation while singing, others may occur in timbre as the effect of “clamped throat", metallic or roaring sound. A similar symptoms, induced by physiological diseases (like hoarseness or sudden jumps in pitch), may be reported in case of technical imperfections. The evaluation of quality of singing voice can not be treated separately from quality of singing; many articles omit this connection. A long-term training experience is helpful to achieve high quality of singing. It leads to diminishing or even avoiding mistakes in the case of emission of unusual melodic sets which are difficult for the vocal. From a medical point of view the quality of voice is examined for voice apparatus diseases. There are various diseases that are common in singers. One of them are nodules of vocal cords [1], caused by unskilful singing with the effect of clenched around-laryngeal muscles. Micro damages, abrasions, wounds and small scars of vocal cords and nodules are problems caused by permanent lack of singing technique. Unskilful use of voice can lead to the loss of any ability to produce the sound in the sense of music. If this situation occurs longer, it will lead to a complete loss of voice. Regurgitation of the vocal cords may be another ailment [2] which is manifested by soundlessness in speech and singing similar to a hoarse whisper. Some other problems may be caused by the congestion and swelling of vocal cords due to overload, alcohol, or respiratory disease. Inability to achieve high notes effortlessly is the symptom of these disorders. Dried vocal cords (because of poor body hydration) or nicotine can cause a sudden change from closed to open [...]

Time constraints analysis of data-driven multimedia algorithms in Networks on Chip


  Modern multimedia algorithms are usually data-dominated and require an immense amount of computation [14, 15]. However, these algorithms can be divided into stages that are implementable in separate units of computation (cores). Thanks to this property, they can use parallel and distributed computing working in a manner similar to pipeline and transfer their streams in the form of relatively large, but usually fixed amount of data. In these applications, it is usually required to maintain the established quality of service and satisfy the constraints in real-time [15]. Multi-Processor System on Chips (MPSoCs) are often considered as the best tailored hardware for these applications [10]. In each of the MPSoC processing units one processing step can be computed, but it is still a problem to connect these units together. The simplest point to point (P2P) connections require too much space, and the bus-based connections linking a large number of cores result in many conflicts and, consequently, in spite of various arbitration techniques, decrease the overall system performance [11]. In addition, both P2P and bus-projects do not scale well with the growing number of independent intellectual property (IP) cores required by contemporary hardware to deal with a number of different algorithms in a single system [2]. To overcome these obstacles, a Network-on-Chip (NoC) design paradigm for implementing distributed computing systems has been introduced [4]. The growing popularity of this approach can be attributed to fewer conflicts in a chip with many cores in comparison with the traditional approaches. It is reported that the NoC architecture offers high throughput and reliable communication, but requires additional mechanisms to overcome the problems typical for packet switching communication, such as packet deadlock or starvation. Due to the difference with the traditional computer networks, popular techniques for dealing with these [...]

 Strona 1