­
­
­
­

Inventi Impact - Audio, Speech & Music Processing

Patent Watch

  • AUDIO AND VIDEO TRANSMISSION AND RECEPTION IN BUSINESS AND ENTERTAINMENT ENVIRONMENTS

    Transmitters convert audio and/or video signals into wireless communication signals. Smart phones, personal data assistants and/or other audio/video electronic devices are programmed to detect the wireless communications signals and display them to users. A user selects a desired signal and the smartphone receives the signal and converts it into an audio and/or video signal, which is recorded and/or output. The transmitters are connected to or integrated into microphones, televisions, phones, or other video/audio/media sources, converting audio and/or video received from these sources. This allows for ultra-high quality private broadcasts and reception. The smartphone, personal data assistants and/or other audio/video electronic devices programming also allows identification, commenting and bookmarking of received audio/video, transfer of the audio/video to other devices, transcription of the audio, and other manipulation of the received signals. When specified, the device application requires payment before enabling a recording option, protecting the copyrights of a performer/presenter.

  • BROADCASTING SIGNAL RECEIVER AND METHOD FOR TRANSMITTING/RECEIVING BROADCASTING SIGNAL

    A broadcasting signal receiver and a method for transmitting/receiving a broadcasting signal are disclosed. An identifier of a cell is configured in second program table information or signaling information of the broadcasting signal. If the cell is changed, channel information of the changed cell can be obtained from second program table information in which transmission channel information of each cell for a broadcasting program is configured. Accordingly, the broadcasting signal receiver can continuously output the program although the cell is changed.

  • Method for storing and transmitting audio data in an audio and video data stream

    A method for storing and transmitting audio data in an audio and video data stream. Audio and video data is received by a television reception apparatus. At least part of the received audio and video data is transmitted to a processing and storage apparatus. Audio data is extracted from the transmitted audio and video data by the processing and storage apparatus, and the extracted audio data is stored on a storage medium of the processing and storage apparatus. The stored audio data is provided in a motor vehicle information and entertainment system.

  • LIGHTWEIGHT AUDIO SYSTEM FOR AUTOMOTIVE APPLICATIONS AND METHOD

    A lightweight radio/CD player for vehicular application is virtually "fastenerless" and includes a case and frontal interface formed of polymer based material that is molded to provide details to accept audio devices such as playback mechanisms (if desired) and radio receivers, as well as the circuit boards required for electrical control and display. The case and frontal interface are of composite structure, including an insert molded electrically conductive wire mesh screen that has been pre-formed to contour with the molding operation. The wire mesh provides EMC, RFI, BCI and ESD shielding and grounding of the circuit boards via exposed wire mesh pads and adjacent ground clips. The PCB architecture is bifurcated into a first board carrying common circuit components in a surface mount configuration suitable for high volume production, and a second board carrying application specific circuit components in a wave soldered stick mount configuration. The major components and subassemblies are self-fixturing during the final assembly process, eliminating the need for dedicated tools, fixtures and assembly equipment. The major components and subassemblies self-interconnect by integral guide and connection features effecting "slide lock" and "snap lock" self-interconnection. The radio architecture includes improved push buttons employing 4-bar living hinge linkage and front loaded decorative trim buttons.

  • AUDIO HOLE SUPPRESSION METHOD AND APPARATUS FOR A TWO-WAY RADIO

    A channel scanning technique and apparatus provides audio hole suppression in two-way radio communications. Upon detecting the absence of a carrier signal on a priority channel during a priority scan mode of operation, a training waveform is constructed upon returning to the home-channel. The training waveform is applied to audio shaping filters within an audio lineup to suppress transients and minimize or eliminate the occurrence of audio pops at a speaker output thereby reducing the audio hole.

  • SPEECH INPUT DEVICE, SPEECH RECOGNITION SYSTEM AND SPEECH RECOGNITION METHOD

    A device for speech input includes a speech input unit configured to convert a speech of a user to a speech signal; an angle detection unit configured to detect an angle of the speech input unit; a distance detection unit configured to detect a distance between the speech input unit and the user; and an input switch unit configured to control on and off of the speech input unit based on the angle and the distance.

  • SYSTEM AND METHOD OF DICTATION FOR A SPEECH RECOGNITION COMMAND SYSTEM

    In embodiments of the present invention, a system and computer-implemented method for enabling dictation may include parsing standard reports in order to identify a plurality of logical phrases in the report used for discrete sections and descriptions. In the report method, the phrases may be parsed and identifier words throughout the report may be compared to eliminate ambiguities. The method may then involve constructing text macros that follow the parsed text, thereby enabling the user to speak the identifiers to indicate full, formatted text. Finally, the report method may involve constructing a mnemonic document so both beginner and experienced users can easily read the identifiers out loud to produce a report. The result of the method is an intuitive, notes-style way to use speech commands to quickly produce a standard, formatted report.

  • Mixed lossless audio compression

    A mixed lossless audio compression has application to a unified lossy and lossless audio compression scheme that combines lossy and lossless audio compression within a same audio signal. The mixed lossless compression codes a transition frame between lossy and lossless coding frames to produce seamless transitions. The mixed lossless coding performs a lapped transform and inverse lapped transform to produce an appropriately windowed and folded pseudo-time domain frame, which can then be losslessly coded. The mixed lossless coding also can be applied for frames that exhibit poor lossy compression performance.

  • Techniques for accommodating primary content (pure voice) audio and secondary content remaining audio capability in the digital audio production process

    The invention enables the inclusion of voice and remaining audio information at different parts of the audio production process. In particular, the invention embodies special techniques for VRA-capable digital mastering, accommodation of PCPV/PCA and/or SCRA signals in audio CODECs, VRA-capable encoders and decoders, and VRA in DVD and other digital audio file formats. The invention facilitates an end-listener's voice-to-remaining audio (VRA) adjustment upon the playback of digital audio media formats by focusing on new configurations of multiple parts of the entire digital audio system, thereby enabling a new technique intended to benefit audio end-users (end-listeners) who wish to control the ratio of the primary vocal/dialog content of an audio program relative to the remaining portion of the audio content in that program. The invention facilitates storage of VRA audio programs on optical storage media, authoring systems for VRA-capable DVDs, playback hardware integrated into VRA-capable optical disc apparatus, and VRA playback hardware for use with non-VRA capable optical disc playback apparatus.

  • Techniques for accommodating primary content (pure voice) audio and secondary content remaining audio capability in the digital audio production process

    The invention enables the inclusion of voice and remaining audio information at different parts of the audio production process. In particular, the invention embodies special techniques for VRA-capable digital mastering, accommodation of PCPV/PCA and/or SCRA signals in audio CODECs, VRA-capable encoders and decoders, and VRA in DVD and other digital audio file formats. The invention facilitates an end-listener's voice-to-remaining audio (VRA) adjustment upon the playback of digital audio media formats by focusing on new configurations of multiple parts of the entire digital audio system, thereby enabling a new technique intended to benefit audio end-users (end-listeners) who wish to control the ratio of the primary vocal/dialog content of an audio program relative to the remaining portion of the audio content in that program. The invention facilitates storage of VRA audio programs on optical storage media, authoring systems for VRA-capable DVDs, playback hardware integrated into VRA-capable optical disc apparatus, and VRA playback hardware for use with non-VRA capable optical disc playback apparatus.

  • Spatial noise suppression for a microphone array

    A noise reduction system and a method of noise reduction includes a microphone array comprising a first microphone, a second microphone, and a third microphone. Each microphone has a known position and a known directivity pattern. An instantaneous direction-of-arrival (IDOA) module determines a first phase difference quantity and a second phase difference quantity. The first phase difference quantity is based on phase differences between non-repetitive pairs of input signals received by the first microphone and the second microphone, while the second phase difference quantity is based on phase differences between non-repetitive pairs of input signals received by the first microphone and the third microphone. A spatial noise reduction module computes an estimate of a desired signal based on a priori spatial signal-to-noise ratio and an a posteriori spatial signal-to-noise ratio based on the first and second phase difference quantities.

  • Audio processor

    An audio processor of a loud speech communication system including a speaker and a microphone is provided. The audio processor includes: an adaptive filter wherein an amount of update in a learning event is set to an arbitrary value, and a filter coefficient is serially determined corresponding to the set amount of update; a semi-fixed filter adapted to an echo cancellation process of an audio input signal input from the microphone; adaptive filter assessment unit that calculates a length of an update vector based on the filter coefficient determined by the adaptive filter and a length of an update vector based on a filter coefficient set in the semi-fixed filter and that performs assessment of the filter coefficients in accordance with the update vectors; and coefficient specifying unit that sets an optimal filter coefficient among the filter coefficients into the semi-fixed filter in accordance with the result of the assessment of the filter coefficients performed by the adaptive filter assessment unit.

  • Correlation-based method for ambience extraction from two-channel audio signals

    A method of ambience extraction includes analyzing an input signal to determine the time-dependent and frequency-dependent amount of ambience in the input signal, wherein the amount of ambience is determined based on a signal model and correlation quantities computed from the input signals and wherein the ambience is extracted using a multiplicative time-frequency mask. Another method of ambience extraction includes compensating a bias in the estimation of a short-term cross-correlation coefficient. In addition, systems having various modules for implementing the above methods are disclosed.

  • Network system and audio signal processor

    An audio network system that performs transport of audio signals among nodes by cascading a plurality of nodes each including two sets of transmission I/Fs and reception I/Fs, and circulating among the nodes in each fixed period an audio transport frame generated by a master node, is configured such that the master node generates the audio transport frame in an S-th period based on the audio transport frame in an (S-k)-th period, and each of the other nodes delays the audio signals written by another node after the audio transport frame is generated by the master node until the audio transport frame is transmitted to the self node, by k period(s) with respect to the other audio signals, for use in signal processing.

  • Digital audio processing system and method

    A digital audio processing system includes an input to receive a phase component of a signal. The digital audio processing system includes symbol recognition logic to adjust a sample of the phase component using an offset value. The symbol recognition logic maps the adjusted sample to a nearest predetermined phase value of a plurality of predetermined phase values. The symbol recognition logic determines a symbol using a difference between the nearest predetermined phase value and a prior nearest predetermined phase value. The prior nearest predetermined phase value corresponds to a prior sample of the phase component of the signal. The offset value is based on a detected error of the prior sample of the phase component of the signal. The digital audio processing system also includes an output to provide a second signal that indicates the symbol.

  • Multi-channel audio encoding and decoding

    An audio encoder and decoder use architectures and techniques that improve the efficiency of multi-channel audio coding and decoding. The described strategies include various techniques and tools, which can be used in combination or independently. For example, an audio encoder performs a pre-processing multi-channel transform on multi-channel audio data, varying the transform so as to control quality. The encoder groups multiple windows from different channels into one or more tiles and outputs tile configuration information, which allows the encoder to isolate transients that appear in a particular channel with small windows, but use large windows in other channels. Using a variety of techniques, the encoder performs flexible multi-channel transforms that effectively take advantage of inter-channel correlation. An audio decoder performs corresponding processing and decoding. In addition, the decoder performs a post-processing multi-channel transform for any of multiple different purposes.

  • Sound quality control device and sound quality control method

    According to one embodiment, a sound quality control device includes: a time domain analysis module configured to perform a time-domain analysis on an audio-input signal; a frequency domain analysis module configured to perform a frequency-domain analysis on a frequency-domain signal; a first calculation module configured to calculate first speech/music scores based on the analysis results; a compensation filtering processing module configured to generate a filtered signal; a second calculation module configured to calculate second speech/music scores based on the filtered signal; a score correction module configured to generate one of corrected speech/music scores based on a difference between the first speech/music score and the second speech/music score; and a sound quality control module configured to control a sound quality of the audio-input signal based on the one of the corrected speech/music scores.

  • Personalized sound system hearing profile selection process

    A method of generating a personalized sound system hearing profile for a user. The method begins by selecting an initial profile, based on selected factors of user input. In an embodiment, the initial profile is selected based on demographic factors. Then the system identifies one or more alternate profiles, each having a selected relationship with the initial profile. The relationship between alternate profiles and the initial profile can be based on gain as a function of frequency, one alternate profile having a higher sensitivity at given frequencies and the other a lower sensitivity. The next step links at least one audio sample with the initial and alternate profiles and then plays the selected samples for the user. The system then receives identification of the preferred sample from the user; and selects a final profile based on the user's preference. An embodiment offers multiple sound samples in different modes, resulting in the selection of multiple final profiles for the different modes. Finally, the system may apply the final profile to the sound system.

  • MULTIPLE CHANNEL AUDIO SYSTEM SUPPORTING DATA CHANNEL REPLACEMENT

    An audio processing system is disclosed. The audio processing system may include a processor and a transmitter and may allow a surround sound system to utilize the transmission bandwidth efficiently by adaptively transmitting a supplementary data from a secondary source in addition to the audio signals of a primary source. The processor may determine a first number of channels available for audio in the transmitter and a second number of channels available in a remote receiver that is capable of receiving the audio. The processor may cause a secondary source to adaptively communicate a combination of data from a plurality of supplementary sources in some of a plurality of channels of the audio based upon the first number of channels and the second number of channels.

  • KEYBOARD HAVING VIDEO AND AUDIO RECORDING FUNCTION

    A keyboard for recording a video and an audio is disclosed. The keyboard having a video and an audio recording function includes a universal serial bus hub coupling with a computer system, a keyboard controller, coupled with the universal serial bus hub, for controlling a keypad matrix, and a video and audio processing module coupling with the universal serial bus hub. The video and audio processing module comprises an analog to digital converter for receiving an analog video signal and an analog audio signal and converting the analog video signal and the analog audio signal to a digital video signal and a digital audio signal, and an encoding controller for encoding the digital video signal and the digital audio signal into a formatted file according to a recording signal from the computer system and transmitting the formatted file to the computer system via the universal serial bus hub.

  • METHOD AND SYSTEM FOR NOISE CANCELLATION AND AUDIO ENHANCEMENT BASED ON CAPTURED DEPTH INFORMATION

    A monoscopic camera comprising one or more image sensors and a depth sensor may generate video based on two-dimensional image data captured via the one or more image sensors and corresponding depth information captured via the depth sensor. The camera may process corresponding audio for the generated video based on the captured depth information. The audio processing may comprise mitigating noise in the corresponding audio, enhancing voice quality in the corresponding audio, and/or enhancing audio quality of the corresponding audio. The camera may be operable to determine, based on the captured depth information, one or more sound paths between a source of the corresponding audio and a microphone utilized to capture the corresponding audio emanating from the source. The processing of the audio may comprise removing portions of the captured audio arriving at the microphone via one or more reflection paths.

  • AUDIO PROCESSING APPARATUS, AUDIO PROCESSING METHOD, AND PROGRAM

    An audio processing apparatus is disclosed which includes: a first signal generation portion configured to generate an audio signal of which the frequency is varied over time; an operation portion; a storage portion configured to store characteristic information in accordance with the frequency and an amplitude of the audio signal in effect when the operation portion is operated; a reproduction portion configured to reproduce audio data; and a correction portion configured to correct a reproduced signal from the reproduction portion based on the characteristic information stored in the storage portion.

  • PROCESSOR EXTENSIONS FOR ACCELERATING SPECTRAL BAND REPLICATION

    Enhancements to hardware architectures (e.g., a RISC processor or a DSP processor) to accelerate spectral band replication (SBR) processing are described. In some embodiments, instruction extensions configure a reconfigurable processor to accelerate SBR and other audio processing. In addition to the instruction extensions, execution units (e.g., multiplication and accumulation units (MACs)) may operate in parallel to reduce the number of audio processing cycles. Performance may be further enhanced through the use of source and destination units which are configured to work with the execution units and quickly fetch and store source and destination operands.

  • AUDIO PROCESSING WITH TIME ADVANCED INSERTED PAYLOAD SIGNAL

    An audio processing apparatus for modifying a primary audio signal includes a modulator that increases or decreases a level of a noise signal generated by a noise generator, in response to an increase or a decrease of a detected signal level of the primary audio signal, to generate a modulated noise signal. The apparatus further includes a combiner that combines the primary audio signal and the modulated noise signal. The modulator operates, with respect to a signal delayer, to time-advance a decrease in the level of said noise signal based on a corresponding decrease in the signal level of the primary audio signal.

  • AUDIO PROCESSING DEVICE, AUDIO PROCESSING METHOD, AND PROGRAM

    There is provided an audio processing device including an estimation unit configured to estimate a user's representative perceived position of a stereoscopic image from a difference between a left-eye image and a right-eye image of the stereoscopic image displayed on a display device, and an audio controller configured to control audio output of an audio output device in accordance with the representative perceived position estimated by the estimation unit.

  • Audio Processing Apparatus and Related Method

    An audio processing apparatus including an audio phase detecting device and an adjusting device is provided. After detecting a phase relationship between a first channel signal and a second channel signal, the audio phase detecting device generates a phase control signal. The adjusting device is coupled to the audio phase detecting device and used for selectively adjusting the first channel signal according to the phase control signal.

  • SPEECH PROCESSING SYSTEM AND METHOD

    A speech processing method, comprising: receiving a speech input which comprises a sequence of feature vectors; determining the likelihood of a sequence of words arising from the sequence of feature vectors using an acoustic model and a language model, comprising: providing an acoustic model for performing speech recognition on an input signal which comprises a sequence of feature vectors, said model having a plurality of model parameters relating to the probability distribution of a word or part thereof being related to a feature vector, wherein said speech input is a mismatched speech input which is received from a speaker in an environment which is not matched to the speaker or environment under which the acoustic model was trained; and adapting the acoustic model to the mismatched speech input, the speech processing method further comprising determining the likelihood of a sequence of features occurring in a given language using a language model; and combining the likelihoods determined by the acoustic model and the language model and outputting a sequence of words identified from said speech input signal, wherein adapting the acoustic model to the mismatched speaker input comprises: relating speech from the mismatched speaker input to the speech used to train the acoustic model using: a mismatch function f for primarily modelling differences between the environment of the speaker and the environment under which the acoustic model was trained; and a speaker transform F for primarily modelling differences between the speaker of the mismatched speaker input, such that: y=f(F(x,v),u) where y represents the speech from the mismatched speaker input, x is the speech used to train the acoustic model, u represents at least one parameter for modelling changes in the environment and v represents at least one parameter used for mapping differences between speakers; and jointly estimating u and v.

  • SPEECH PROCESSING DEVICE, SPEECH PROCESSING METHOD, AND COMPUTER PROGRAM PRODUCT FOR SPEECH PROCESSING

    According to one embodiment, a speech processing device includes an utterance error occurrence determination information storage unit that stores utterance error occurrence determination information; a related word information storage unit that stores related word information including words; an utterance error occurrence determining unit that compares each of the divided words with the condition, gives the error pattern to the word corresponding to the condition, and determines that the word which does not correspond to the condition does not cause the utterance error; and a phoneme string generating unit that generates a phoneme string of the utterance error. The one of the error patterns associated with one of the conditions is the speech error, the utterance error occurrence determining unit further gives an incorrectly spoken word from the related word information, and the phoneme string generating unit generates a phoneme string of the incorrectly spoken word.

  • SPEECH RECOGNITION AND VOICE TRAINING DATA STORAGE AND ACCESS METHODS AND APPARATUS

    Embodiments include a speech recognition system and a personal speech profile data (PSPD) storage device that is physically distinct from the speech recognition system. In the speech recognition system, a PSPD interface receives voice training data, which is associated with an individual, from the PSPD storage device. A speech input module produces a digital speech signal derived from an utterance made by a system user. A speech processing module accesses voice training data stored on the PSPD storage device through the PSPD interface, and executes a speech processing algorithm that analyzes the digital speech signal using the voice training data, in order to identify one or more recognized terms from the digital speech signal. A command processing module initiates execution of various applications based on the recognized terms. Embodiments may be implemented in various types of host systems, including an aircraft cockpit-based system.

  • SPEECH AUDIO PROCESSING

    A speech processing engine is provided that in some embodiments, employs Kalman filtering with a particular speaker's glottal information to clean up an audio speech signal for more efficient automatic speech recognition.

  • AUDIO PROCESSING SYSTEM

    The audio processing system disclosed in the invention comprises an audio processor and an audio amplifier. The audio processor receives a data signal to generate a processed signal, and comprises at least one gain control circuit and at least one operational amplifier. The gain control circuit generates a gain signal according to a volume control signal, a reference signal, and a feedback signal. The operational amplifier couples to the gain control circuit and amplifies the data signal by the gain signal to generate a processed signal. The audio amplifier couples to the audio processor to receive and amplify the processed signal, wherein an amplified signal is generated.

  • AUDIO PROCESSING APPARATUS AND METHOD, AND PROGRAM

    An audio processing apparatus includes an audio signal acquisition unit which acquires an audio signal of a musical piece, a feature value extraction unit which extracts a predetermined type of feature value from the audio signal acquired by the audio signal acquisition unit in time series, a change point detection unit which detects a change point in which the amount of change of the feature value extracted in time series by the feature value extraction unit is changed to be greater than a predetermined threshold value, a hook analysis unit which analyzes a hook place of the audio signal based on the feature value extracted by the feature value extraction unit in block units with the change point detected by the change point detection unit as a boundary, and a hook information output unit which outputs the hook place analyzed by the hook analysis unit as hook information.

  • METHOD OF DETERMINING PARAMETERS IN AN ADAPTIVE AUDIO PROCESSING ALGORITHM AND AN AUDIO PROCESSING SYSTEM

    A method and an audio processing system determine a system parameter, e.g. step size, in an adaptive algorithm, e.g. an adaptive feedback cancellation algorithm so as to provide an alternative scheme for feedback estimation in a multi-microphone audio processing system. A feedback part of the system's open loop transfer function is estimated and separated in a transient part and a steady state part, which can be used to control the adaptation rate of the adaptive feedback cancellation algorithm by adjusting the system parameter, e.g. step size parameter, of the algorithm when desired system properties, such as a steady state value or a convergence rate of the feedback, are given/desired. The method can be used for different adaptation algorithms such as LMS, NLMS, RLS, etc. in hearing aids, headsets, handsfree telephone systems, teleconferencing systems, public address systems, etc.

  • AUDIO PROCESSING IN A MULTI-PARTICIPANT CONFERENCE

    A first computing device distributes audio signals to several computing devices of participants in a communication session. In some embodiments, the first computing device serves as a central distributor for receiving audio signals from other computing devices, compositing the audio signals and distributing the composited audio signals to the other computing devices. The first computing device prioritizes the received audio signals based on a set of criteria and selects several highly prioritized audio signals. The first computing device generates composite audio signals using only the selected audio signals. The first computing device sends each computing device the composited audio signal for the device. In some cases, the first computing device sends a selected audio signal to another computing device without mixing the signal with any other audio signal.

  • AUDIO PROCESSING BASED ON SCENE TYPE

    A digital camera system providing processed audio signals, comprising: an image sensor for capturing a digital image; an optical system for forming an image of a scene onto the image sensor; a microphone for capturing an audio signal; a data processing system; a storage memory for storing captured images and audio signals; and a program memory communicatively connected to the data processing system and storing instructions configured to cause the data processing system to implement a method for providing processed audio signals, wherein the instructions include: capturing one or more digital images of a scene using the image sensor and capturing a corresponding audio signal using the microphone; determining a scene type corresponding to the captured digital images; processing the captured audio signal responsive to the determined scene type; and recording the captured digital images together with the processed audio signal in the storage memory.

  • AUDIO PROCESSING APPARATUS, AUDIO PROCESSING METHOD, AND PROGRAM

    An audio processing apparatus is disclosed which includes: a first signal generation portion configured to generate an audio signal of which the frequency is varied over time; an operation portion; a storage portion configured to store characteristic information in accordance with the frequency and an amplitude of the audio signal in effect when the operation portion is operated; a reproduction portion configured to reproduce audio data; and a correction portion configured to correct a reproduced signal from the reproduction portion based on the characteristic information stored in the storage portion.

  • Voice audio processing method and device

    The invention relates to a method for audio switching and conferencing. The method comprises: providing a plurality of audio channels comprising at least one active audio channel, the active audio channel comprising at least one of an input audio stream and output audio stream; converting the input audio streams from the at least one audio channel in input data; providing audio channel communication requests between parties of the at least one active audio channel; determining a set of Boolean values depending on the parties of the audio channels; determining output data for the respective active audio channels by combining the elements of the Boolean set and the input data; and encoding the output data in output audio streams for the respective active audio channels. In this way an efficient and consistent method for audio switching and conferencing is obtained which reduces complexity of software and/or hardware and enables the number of telephone calls or simultaneous conferences between multiple groups and simple implementation of special functions like eavesdropping and microphone functions.

  • AUDIO PROCESSING DEVICE AND AUDIO PROCESSING METHOD

    There is provided a sound processing apparatus and a sound processing method which are capable of reproducing discrete data with a high-quality sound matching users' preferences. In a sound processing means 2, since an interpolation value reflecting a value of a variable parameter .alpha. by which the value of a control sampling function c.sub.0(t) is multiplied can be calculated, an analog signal obtained through the interpolation performed in a sampling function s.sub.N(t) can be regulated in accordance with the variable parameter .alpha. by changing the value of the variable parameter .alpha.. In this way, by allowing the user to appropriately change the variable parameter .alpha. in accordance with various conditions including music reproduction environments, sound sources, musical tones and so on, it becomes possible to reproduce high-quality-sound music in which its frequency characteristics of the analog signal have changed and a high quality desired by the user is obtained.

  • TERMINAL APPARATUS AND SPEECH PROCESSING PROGRAM

    A terminal apparatus configured to obtain positional information indicating a position of another apparatus; to obtain positional information indicating a position of the terminal apparatus; to obtain a first direction, which is a direction to the obtained position of the another apparatus and calculated using the obtained position of the terminal apparatus; to obtain a second direction, which is a direction in which the terminal apparatus is oriented; to obtain inclination information indicating whether the terminal apparatus is inclined to the right or to the left; to switch an amount of correction for a relative angle between the first direction and the second direction in accordance with whether the obtained inclination information indicates an inclination to the right or an inclination to the left; and to determine an attribute of speech output from a speech output unit in accordance with the relative angle corrected by the amount of correction.

  • JOINT FACTOR ANALYSIS SCORING FOR SPEECH PROCESSING SYSTEMS

    Method, system, and computer program product are provided for Joint Factor Analysis (JFA) scoring in speech processing systems. The method includes: carrying out an enrolment session offline to enrol a speaker model in a speech processing system using JFA, including: extracting speaker factors from the enrolment session; estimating first components of channel factors from the enrolment session. The method further includes: carrying out a test session including: calculating second components of channel factors strongly dependent on the test session; and generating a score based on speaker factors, channel factors, and test session Gaussian mixture model sufficient statistics to provide a log-likelihood ratio for a test session.

  • AUDIO PROCESSING APPARATUS AND METHOD OF CONTROLLING THE AUDIO PROCESSING APPARATUS

    An audio processing apparatus includes first and second audio pickup units. The second audio pickup unit includes an audio resistor provided to cover a sound receiving portion to suppress external wind introduction while passing an external audio. A first filter attenuates a signal having a frequency lower than a first cutoff frequency of the output signal of a first A/D converter. A second filter attenuates a signal having a frequency higher than a second cutoff frequency of the output signal of a second A/D converter. A third filter is provided between the first audio pickup unit and the first A/D converter to attenuate a signal having a frequency lower than a third cutoff frequency for suppressing the wind noise

  • Multi-Way Analysis for Audio Processing

    It is disclosed to determine, for a direction being at least associated with a value of a first direction component and with a value of a second direction component, at least one weighting factor for each basis function of a set of basis functions, each of the basis functions being associated with an audio transfer characteristic, wherein said determining is at least based on a first set of gain factors, associated with the first direction component, and on a second set of gain factors, associated with the second direction component

  • Background Audio Processing

    Some embodiments provide a media-editing application for identifying a set of background audio segments of a composite presentation, generating additional background audio segments based on several identified background audio segments, detecting regions of the composite presentation that lack background audio, and inserting generated background audio into the detected regions of the composite presentation.

  • SYSTEM FOR MULTICHANNEL MULTITRACK AUDIO AND AUDIO PROCESSING METHOD THEREOF

    A multichannel multitrack audio system and an audio processing method are provided. The audio processing method down-mixes and encodes a first audio object constituting the audio from multiple channels to a lower number of channels. Thus, the method for down-mixing audio objects of the audio from the multichannel to the lower number of channels generates the multichannel multi-object audio and reproduces the generated multichannel multi-object audio. Abrupt data increase can be addressed in processing the multichannel multi-object audio.

  • AUDIO PROCESSING APPARATUS

    An audio processing section generates at least a left-front audio signal, a right-front audio signal, a second left audio signal that is the same as a left-front audio signal, and a second right audio signal that is the same as the right-front audio signal based on a left-front audio signal and a right-front audio signal supplied from an audio selector, and outputs these signals to a main output terminal, and supplies the second left audio signal and the second right audio signal also to a switch section. When a first mode is selected, the switch section selects the audio signal from the audio processing section, and supplies it to a ZONE2 output terminal. On the other hand, when modes other than the first mode are selected, the switch section selects an audio signal from the audio selector, and supplies the selected audio signal to the ZONE2 output terminal.

  • SPEECH PROCESSING RESPONSIVE TO A DETERMINED ACTIVE COMMUNICATION ZONE IN A VEHICLE

    A system for and method of speech processing for a vehicle. Speech is received from at least one vehicle occupant via a plurality of microphones corresponding to the plurality of zones in the vehicle, wherein the microphones convert the speech into speech signals. At least one active communication zone is determined in which the at least one vehicle occupant corresponding to the active communication zone is speaking Speech processing is modified in response to the determined active communication zone.

  • SPEECH PROCESSING DEVICE, SPEECH PROCESSING METHOD, AND COMPUTER PROGRAM PRODUCT

    According to one embodiment, in a speech processing device, an extractor windows a part of the speech signal and extracts a partial waveform. A calculator performs frequency analysis of the partial waveform to calculate a frequency spectrum. An estimator generates an artificial waveform that is a waveform according to an interval between the pitch marks for each harmonic component having a frequency that is a predetermined multiple of a fundamental frequency of the speech signal and estimates harmonic spectral features representing characteristics of the frequency spectrum of the harmonic component from each of the artificial waveforms. A separator separates the partial waveform into a periodic component produced from periodic vocal-fold vibration as an acoustic source and an aperiodic component produced from aperiodic acoustic sources other than the vocal-fold vibration by using the respective harmonic spectral features and the frequency spectrum of the partial waveform.

  • METHOD AND APPARATUS FOR MULTI-CHANNEL AUDIO PROCESSING USING SINGLE-CHANNEL COMPONENTS

    Processing multi-channel audio streams using one or more arrangements of single-channel components. Components that only process the near-end, or capture stream, such as noise suppression (NS) components, are limited in how they can be suitably arranged for processing multi-channel streams. However, components that process the near-end stream using one or more inputs from the far-end, or render stream, such as acoustic echo cancellation (AEC) and automatic gain control (AGC) components, are arranged in one or more of the ways suitable for use with multiple channels.

  • Time Scaling of Audio Frames to Adapt Audio Processing to Communications Network Timing

    Methods and apparatus for coordinating audio data processing and network communication processing in a communication device by using time scaling for either inbound or outbound audio data processing, or both, in an communication device. In particular, time scaling of audio data is used to adapt timing for audio data processing to timing for modem processing, by dynamically adjusting a collection of audio samples to fit the container size required by the modem. Speech quality can be preserved while recovering and/or maintaining correct synchronizing between audio processing and communication processing circuits. In an example method, it is determined that a completion time for processing a first audio data frame falls outside a pre-determined timing window. Responsive to this determination, a subsequent audio data frame is time-scaled to control the completion time for processing the subsequent audio data frame.

  • SPEECH PROCESSING SYSTEM AND METHOD

    A method for identifying a plurality of speakers in audio data and for decoding the speech spoken by said speakers; the method comprising: receiving speech; dividing the speech into segments as it is received; processing the received speech segment by segment in the order received to identify the speaker and to decode the speech, processing comprising: performing primary decoding of the segment using an acoustic model and a language model; obtaining segment parameters indicating the differences between the speaker of the segment and a base speaker during the primary decoding; comparing the segment parameters with a plurality of stored speaker profiles to determine the identity of the speaker, and selecting a speaker profile for said speaker; updating the selected speaker profile; performing a further decoding of the segment using a speaker independent acoustic model, adapted using the updated speaker profile; outputting the decoded speech for the identified speaker, wherein the speaker profiles are updated as further segments of speech relating to a speaker profile are processed.

  • METHOD OF PROVIDING DYNAMIC SPEECH PROCESSING SERVICES DURING VARIABLE NETWORK CONNECTIVITY

    A client device for providing dynamic speech processing services during variable network connectivity with a network server includes a connection monitor that monitors network connectivity between the client device and the network server. The device further includes a simplified speech processor that processes speech data and is initiated based on an assessment from the connection monitor that the network connectivity is impaired. The device further includes a speech data storage that stores processed speech data from the simplified speech processor and a transmitter that is configured to transmit the stored speech data to the network.

  • METHOD AND APPARATUS FOR MULTI-CHANNEL AUDIO PROCESSING USING SINGLE-CHANNEL COMPONENTS

    Processing multi-channel audio streams using one or more arrangements of single-channel components. Components that only process the near-end, or capture stream, such as noise suppression (NS) components, are limited in how they can be suitably arranged for processing multi-channel streams. However, components that process the near-end stream using one or more inputs from the far-end, or render stream, such as acoustic echo cancellation (AEC) and automatic gain control (AGC) components, are arranged in one or more of the ways suitable for use with multiple channels.

  • Time Scaling of Audio Frames to Adapt Audio Processing to Communications Network Timing

    Methods and apparatus for coordinating audio data processing and network communication processing in a communication device by using time scaling for either inbound or outbound audio data processing, or both, in an communication device. In particular, time scaling of audio data is used to adapt timing for audio data processing to timing for modem processing, by dynamically adjusting a collection of audio samples to fit the container size required by the modem. Speech quality can be preserved while recovering and/or maintaining correct synchronizing between audio processing and communication processing circuits. In an example method, it is determined that a completion time for processing a first audio data frame falls outside a pre-determined timing window. Responsive to this determination, a subsequent audio data frame is time-scaled to control the completion time for processing the subsequent audio data frame.

  • Gain Control Device for an Amplifier and Related Methods, and an Audio Processing Device

    A gain control device for an amplifier and related methods, and an audio processing device are described herein. In one aspect, a gain control device for an amplifier includes: a receiver module configured to receive a control signal; a gain control module configured to control the gain of said amplifier based on said control signal. In another aspect, an audio processing device includes: a microphone; an audio player; and an amplifier, wherein said amplifier includes a gain control device configured to receive a control signal and to control the gain of said amplifier based on the received control signal. In another aspect, a method for controlling the gain of an amplifier includes: receiving a control signal; controlling the gain of said amplifier based on said control signal. The described methods and device reduce the application cost and size.

  • AUDIO PROCESSING DEVICE

    An audio processing device including a first audio collecting unit configured to convert an audio vibration into an electric signal and acquire an audio signal includes a shielding unit having a predetermined resonant frequency that shields the first audio collecting unit from an influence of airflow outside the device; and an acquiring unit configured to acquire, as a first audio signal, an audio signal in a predetermined frequency band lower than the resonant frequency of the shielding unit from among the audio signal acquired by the first audio collecting unit that is shielded from the influence of the air flow outside the device by the shielding unit.

  • SYSTEM AND METHOD FOR MONAURAL AUDIO PROCESSING BASED PRESERVING SPEECH INFORMATION

    A method, system and machine readable medium for noise reduction is provided. The method includes: (1) receiving a noise corrupted signal; (2) transforming the noise corrupted signal to a time-frequency domain representation; (3) determining probabilistic bases for operation, the probabilistic bases being priors in a multitude of frequency bands calculated online; (4) adapting longer term internal states of the method; (5) calculating present distributions that fit data; (6) generating non-linear filters that minimize entropy of speech and maximize entropy of noise, thereby reducing the impact of noise while enhancing speech; (7) applying the filters to create a primary output in a frequency domain; and (8) transforming the primary output to the time domain and outputting a noise suppressed signal.

  • AUDIO PROCESSING DEVICE, SYSTEM, USE AND METHOD

    An audio processing device includes a) an input unit for converting a time domain input signal to a number N.sub.I of input frequency bands and b) an output unit for converting a number N.sub.O of output frequency bands to a time domain output signal. A signal processing unit processes the input signal in a number N.sub.P of processing channels, smaller than the number N.sub.I of input frequency bands. A frequency band allocation unit allocates input frequency bands to processing channels. A frequency band redistribution unit redistributes processing channels to output frequency bands, and a control unit dynamically controls the allocation of input frequency bands to processing channels and the redistribution of processing channels to output frequency bands.

  • AUDIO PROCESSING APPARATUS AND METHOD OF CONTROLLING THE AUDIO PROCESSING APPARATUS

    An audio processing apparatus includes first and second audio pickup units. The second audio pickup unit includes an audio resistor provided to cover a sound receiving portion to suppress external wind introduction while passing an external audio. A first filter attenuates a signal having a frequency lower than a first cutoff frequency of the output signal of a first A/D converter. A second filter attenuates a signal having a frequency higher than a second cutoff frequency of the output signal of a second A/D converter. A third filter is provided between the first audio pickup unit and the first A/D converter to attenuate a signal having a frequency lower than a third cutoff frequency for suppressing the wind noise.

  • AUDIO PROCESSING APPARATUS AND METHOD OF CONTROLLING THE AUDIO PROCESSING APPARATUS

    An audio processing apparatus includes first and second audio pickup units. The second audio pickup unit includes an audio resistor provided to cover a sound receiving portion to suppress external wind introduction while passing an external audio. A first filter attenuates a signal having a frequency lower than a first cutoff frequency of the output signal of a first A/D converter. A second filter attenuates a signal having a frequency higher than a second cutoff frequency of the output signal of a second A/D converter. A third filter is provided between the first audio pickup unit and the first A/D converter to attenuate a signal having a frequency lower than a third cutoff frequency for suppressing the wind noise.

  • Multi-Way Analysis for Audio Processing

    It is disclosed to determine, for a direction being at least associated with a value of a first direction component and with a value of a second direction component, at least one weighting factor for each basis function of a set of basis functions, each of the basis functions being associated with an audio transfer characteristic, wherein said determining is at least based on a first set of gain factors, associated with the first direction component, and on a second set of gain factors, associated with the second direction component

  • SYSTEMS AND METHODS FOR SPEECH PROCESSING

    Systems and methods described herein modify audio content on an electronic device. Embodiments can be configured to detect a mode of the electronic device to determine whether the device is in a telephone mode; receive a speech signal from a speech source while the device is in the telephone mode; and process the speech signal to improve the perceived quality of the speech at a recipient when the electronic device is in a telephone mode; wherein processing the speech signal to improve the perceived quality of the speech comprises, decreasing the signal level of audio content outside of a determined frequency band relative to the signal level of the audio content within the determined frequency band; and wherein the determined frequency band is a frequency band associated a vocal range of the anticipated speech content.

  • SPEECH PROCESSING APPARATUS, SPEECH PROCESSING METHOD AND PROGRAM

    The present invention relates to a speech processing apparatus, a speech processing method and a program which, when multichannel audio signals are downmixed and coded, prevent delay and an increase in the computation amount upon decoding of the audio signals. An inverse multiplexing unit (101) acquires coded data on which a BC parameter is multiplexed. An uncorrelated frequency-time transform unit (102) performs IMDCT transform and IMDST transform of frequency spectrum coefficients of a monaural signal (X.sub.M) obtained from this coded data to generate the monaural signal X.sub.M) which is a time domain signal and a signal (X.sub.D') which is substantially uncorrelated with this monaural signal (X.sub.M). The stereo synthesis unit (103) generates a stereo signal by synthesizing the monaural signal (X.sub.M) and the signal (X.sub.D') using the BC parameter. The present invention is applicable to, for example, a speech processing apparatus which decodes a downmixed and coded stereo signal.

  • HEARING AID ALGORITHMS

    The invention relates to a method of operating an audio processing device. The invention further relates to an audio processing device, to a software program and to a medium having instructions stored thereon. The object of the present invention is to provide improvements in the processing of sounds in listening devices. The problem is solved by a method comprising a) receiving an electric input signal representing an audio signal; b) providing an event-control parameter indicative of changes related to the electric input signal and for controlling the processing of the electric input signal; c) storing a representation of the electric input signal or a part thereof; d) providing a processed electric output signal with a configurable delay based on the stored representation of the electric input signal or a part thereof and controlled by the event-control parameter. The invention may e.g. be used in hearing instruments, headphones or headsets or active ear plugs.

  • MULTIFUNCTIONAL ELECTRONIC ACCESSORY

    An electronic accessory for an electronic device includes an audio processing circuit. The audio processing circuit includes a first switch circuit, an echo cancellation circuit and an amplification circuit. The echo cancellation circuit removes echoes from a first voice signal from a user in a call and a second voice signal from a caller, outputs the first voice signal to the electronic device via the first switch circuit, and outputs the second voice signal to the amplification circuit. The amplification circuit amplifies the second voice signal. The amplification circuit further directly receives and amplifies a third voice signal from multimedia content stored in the electronic device via the first switch circuit.

  • APPARATUS FOR PROCESSING AN AUDIO SIGNAL AND METHOD THEREOF

    A method of processing an audio signal is disclosed. The present invention includes a method for processing an audio signal, comprising: receiving, by an audio processing apparatus, the spectral data including a current block, and substitution type information indicating whether to apply a shape prediction scheme to a current block; when the substitution type information indicates that the shape prediction scheme is applied to the current block, receiving lag information indicating an interval between spectral coefficients of the current block and the predictive shape vector of a current frame or a previous frame; obtaining spectral coefficients by substituting for spectral hole included in the current block using the predictive shape vector.

  • METHODS AND APPARATUS FOR GENERATING, UPDATING AND DISTRIBUTING SPEECH RECOGNITION MODELS

    Techniques for generating, distributing, and using speech recognition models are described. A shared speech processing facility is used to support speech recognition for a wide variety of devices with limited capabilities including business computer systems, personal data assistants, etc., which are coupled to the speech processing facility via a communications channel, e.g., the Internet. Devices with audio capture capability record and transmit to the speech processing facility, via the Internet, digitized speech and receive speech processing services, e.g., speech recognition model generation and/or speech recognition services, in response. The Internet is used to return speech recognition models and/or information identifying recognized words or phrases. Thus, the speech processing facility can be used to provide speech recognition capabilities to devices without such capabilities and/or to augment a device's speech processing capability. Voice dialing, telephone control and/or other services are provided by the speech processing facility in response to speech recognition results.

  • Method and Device for Providing Speech-to-Text Encoding and Telephony Service

    A machine-readable medium and a network device are provided for speech-to-text translation. Speech packets are received at a broadband telephony interface and stored in a buffer. The speech packets are processed and textual representations thereof are displayed as words on a display device. Speech processing is activated and deactivated in response to a command from a subscriber.

  • System and Method for Dynamic Noise Adaptation for Robust Automatic Speech Recognition

    A speech processing method and arrangement are described. A dynamic noise adaptation (DNA) model characterizes a speech input reflecting effects of background noise. A null noise DNA model characterizes the speech input based on reflecting a null noise mismatch condition. A DNA interaction model performs Bayesian model selection and re-weighting of the DNA model and the null noise DNA model to realize a modified DNA model characterizing the speech input for automatic speech recognition and compensating for noise to a varying degree depending on relative probabilities of the DNA model and the null noise DNA model.

  • MARKUP ASSISTANCE APPARATUS, METHOD AND PROGRAM

    According to one embodiment, a markup assistance apparatus includes an acquisition unit, a first calculation unit, a detection unit and a presentation unit. The acquisition unit acquires feature amount for respective tags, each of the tags being used to control text-to-speech processing of a markup text. The first calculation unit calculates, for respective character strings, a variance of feature amounts of the tags which are assigned to the character string in a markup text. The detection unit detects first character string assigned first tag having the variance not less than a first threshold value as a first candidate including the tag to be corrected. The presentation unit presents the first candidate.

  • System and Method of Dynamically Modifying a Spoken Dialog System to Reduce Hardware Requirements

    A system and method for providing a scalable spoken dialog system are disclosed. The method comprises receiving information which may be internal to the system or external to the system and dynamically modifying at least one module within a spoken dialog system according to the received information. The modules may be one or more of an automatic speech recognition, natural language understanding, dialog management and text-to-speech module or engine. Dynamically modifying the module may improve hardware performance or improve a specific caller's speech processing accuracy, for example. The modification of the modules or hardware may also be based on an application or a task, or based on a current portion of a dialog.

  • AUDIO PROCESSING COMPRESSION SYSTEM USING LEVEL-DEPENDENT CHANNELS

    Disclosed herein, among other things, are methods and apparatus for a level-dependent compression system for hearing assistance devices, such as hearing aids. The present subject matter includes a hearing assistance device having a buffer for receiving time domain input signals and a frequency analysis module to convert time domain input signals into a plurality of subband signals. A power detector is adapted to receive the subband signals and to provide a subband version of the input signals. A nonlinear gain stage applies gain to the plurality of subband versions of the input signals, and a frequency synthesis module processes subband signals from the nonlinear gain stage and to create a processed output signal. The device also includes a filter for filtering the signals, and a level-dependent compression module. The level-dependent compression module is adapted to provide bandwidth control to the plurality of subband signals produced by the frequency analysis stage.

  • AUDIO PROCESSING DEVICE, AUDIO PROCESSING METHOD, RECORDING MEDIUM, AND PROGRAM

    An audio processing device includes a plurality of loudspeakers outputting audio for each band, a correction filter correcting an audio signal including a plurality of bands in accordance with characteristics of the plurality of loudspeakers, and a plurality of band division filters dividing the audio signal corrected by the correction filter into bands of the loudspeakers so that a phase difference of phase characteristics is approximately 0 degree or approximately 180 degrees. The correction filter is an inverse filter set with an impulse response based on the audio outputted for each band from the plurality of loudspeakers via the plurality of band division filters.

  • AUDIO PROCESSING METHOD, SYSTEM, AND CONTROL SERVER

    An audio processing method includes: after the terminal accesses the control server, the control server obtains audio capabilities of the terminal through capability negotiation; and the control server forwards the coded audio data to each terminal according to the audio capabilities. An audio processing system and a control server are disclosed. In the embodiments of the present disclosure, the audio data does not need to undergo an operation of audio coding and decoding every time when the audio data passes through a control server, and the number of coding and decoding operations performed by the control server are reduced drastically. Especially, in the case that only one control server exists, the audio delay between terminals only derives from network transmission, coding of the sending terminal and decoding of the receiving terminal, and the control server extracts and reassembles the packets of the audio data only.

  • SYSTEMS AND METHODS FOR SPEECH PROCESSING

    Systems and methods described herein modify audio content on an electronic device. Embodiments can be configured to detect a mode of the electronic device to determine whether the device is in a telephone mode; receive a speech signal from a speech source while the device is in the telephone mode; and process the speech signal to improve the perceived quality of the speech at a recipient when the electronic device is in a telephone mode; wherein processing the speech signal to improve the perceived quality of the speech comprises, decreasing the signal level of audio content outside of a determined frequency band relative to the signal level of the audio content within the determined frequency band; and wherein the determined frequency band is a frequency band associated a vocal range of the anticipated speech content.

  • AUDIO PROCESSING APPARATUS

    A ZONE4 left audio signal Z4L is output without releasing a connection of a surround left speaker SSL. When determining that a playback of the ZONE4 left audio signal Z4L is set by a user manipulation, a controller 2 controls switches S11b and S13b in an on state, and controls switches S11a, S13a, and S13c in an off state. Accordingly, the ZONE4 left audio signal Z4L is supplied from a DSP to an amplifier 12a through the switch S11b, amplified by the amplifier 12a, and supplied to a surround back left/ZONE4 left SP terminal 14b through the switch S13b. A surround left SP terminal 14a is an output terminal dedicated to a surround left audio signal, so that the ZONE4 left audio signal Z4L can be output without releasing the connection of the surround left speaker SSL.

  • Multi-Channel Audio Processing

    A method including: receiving at least a first input audio channel and a second input audio channel; and using an inter-channel prediction model to form at least an inter-channel direction of reception parameter.

  • AUDIO PROCESSING DEVICE AND AUDIO PROCESSING METHOD

    An audio processing device includes a reverb property estimating unit that estimates a reverb property at each frequency on the basis of a first audio signal and a second audio signal representing sounds of the first audio signal output by an audio output unit and collected by an audio input unit, a gain calculating unit that determines an attenuating ratio for a component of the first audio signal at each frequency such that the larger the reverb property at the frequency is, the larger the attenuating ratio for the component at the frequency becomes, and a correcting unit that attenuates the first audio signal at the each frequency in accordance with the attenuating ratio determined for each frequency.

  • INFORMATION PROCESSOR, AUDIO PROCESSOR, AUDIO PROCESSING SYSTEM, PROGRAM, AND VIDEO GAME PROGRAM

    An information processor includes a generator that is adapted to generate original sound data indicating an original sound as a basis for a processed sound to be heard by a user and a parameter indicating a content of processing on the original sound data, an original sound data output section that is adapted to output the original sound data generated by the generator from any channel of a plurality of channels, and a control signal output section that is adapted to output a control signal including the parameter and a correspondence relationship between the parameter and the channel for the original sound data to be processed by means of the parameter.

  • SPEECH PROCESSING RESPONSIVE TO ACTIVE NOISE CONTROL MICROPHONES

    Speech processing for a vehicle, including receiving speech from a user via at least one speech microphone that converts the speech into a speech signal, receiving vehicle noise via at least one active noise control microphone that converts the noise into a vehicle noise signal, and processing the speech signal in response to the vehicle noise signal to reduce vehicle noise in the speech signal.

  • SPEECH PROCESSING DEVICE AND SPEECH PROCESSING METHOD

    A speech processing device which can accurately extract a conversation group from among a plurality of speakers, even when a conversation group formed of three or more people is present. This device (400) comprises: a spontaneous speech detection unit (420) and a direction-specific speech detection unit (430) which separately detect, from a sound signal, uttered speech from the speakers; a conversation establishment level calculation unit (450) which calculates a conversation establishment level for each separated segment of the time being determined, for all of the pairings of two people, on the basis of the detected uttered speech; an extended-period characteristic amount calculation unit (460) which calculates an extended-period characteristic amount for the conversation establishment level of the time being determined, for each pairing; and a conversation-partner determination unit (470) which extracts a conversation group which forms a conversation on the basis of the calculated extended-period characteristic amount.

  • VOICE ACTIVITY DETECTION IN PRESENCE OF BACKGROUND NOISE

    In speech processing systems, compensation is made for sudden changes in the background noise in the average signal-to-noise ratio (SNR) calculation. SNR outlier filtering may be used, alone or in conjunction with weighting the average SNR. Adaptive weights may be applied on the SNRs per band before computing the average SNR. The weighting function can be a function of noise level, noise type, and/or instantaneous SNR value. Another weighting mechanism applies a null filtering or outlier filtering which sets the weight in a particular band to be zero. This particular band may be characterized as the one that exhibits an SNR that is several times higher than the SNRs in other bands.

  • Speech Processing in Telecommunication Networks

    Systems and methods for speech processing in telecommunication networks are described. In some embodiments, a method may include receiving speech transmitted over a network, causing the speech to be converted to text, and identifying the speech as predetermined speech in response to the text matching a stored text associated with the predetermined speech. The stored text may have been obtained, for example, by subjecting the predetermined speech to a network impairment condition. The method may further include identifying terms within the text that match terms within the stored text (e.g., despite not being identical to each other), calculating a score between the text and the stored text, and determining that the text matches the stored text in response to the score meeting a threshold value. In some cases, the method may also identify one of a plurality of speeches based on a selected one of a plurality of stored texts.

  • AUDIO PROCESSING APPARATUS AND AUDIO PROCESSING METHOD

    An audio processing apparatus includes a compositional pattern determining module and an audio processor. The compositional pattern determining module is configured to estimate a compositional pattern of an input video signal. The audio processor is configured to perform audio processing according to the estimation.

  • AUDIO PROCESSING DEVICE, AUDIO PROCESSING METHOD AND PROGRAM

    An audio processing device includes: a directivity adjustment unit adjusting directivity and sharpness thereof in audio picked up by plural microphones picking up audio; and a howling suppression adjustment unit adjusting intensity of suppressing howling of audio picked up by the plural microphones, wherein the directivity adjustment unit adjusts the directivity and sharpness thereof in preference to the howling suppression of audio performed by the howling suppression adjustment unit.

  • AUDIO PROCESSING APPARATUS

    The number of digital-analog converters is decreased in an audio processing apparatus that outputs an audio signal to not only a device disposed in a main room but also a device disposed in a sub-room. The audio processing apparatus includes a digital-analog conversion unit for outputting an extended left audio signal and an extended right audio signal in an analog format, or for outputting a sub-room left audio signal and a sub-room right audio signal in the analog format; and an input switch unit that switches between a supply of a combination of the extended left audio signal and the extended right audio signal to the digital-analog conversion unit and a supply of a combination of the sub-room left audio signal and the sub-room right audio signal to the digital-analog conversion unit.

  • SPEECH PROCESSING APPARATUS, CONTROL METHOD THEREOF, STORAGE MEDIUM STORING CONTROL PROGRAM THEREOF, AND VEHICLE, INFORMATION PROCESSING APPARATUS, AND INFORMATION PROCESSING SYSTEM INCLUDING THE SPEECH PROCESSING APPARATUS

    An apparatus of this invention is a speech processing apparatus that acquires pseudo speech from a mixture sound including desired speech and noise. The speech processing apparatus includes a first microphone that inputs a first mixture sound including desired speech and noise and outputs a first mixture signal, a second microphone that is opened to the same sound space as that of the first microphone, inputs a second mixture sound including the desired speech and the noise at a ratio different from the first mixture sound, and outputs a second mixture signal, a sound insulator that is disposed between the first microphone and the second microphone, and a noise suppression circuit that suppresses an estimated noise signal based on the first mixture signal and the second mixture signal and outputs a pseudo speech signal. With this arrangement, it is possible to, in a single sound space where desired speech and noise mix, correctly estimate the noise and reconstruct pseudo speech close to the desired speech.

  • METHOD OF PROVIDING DYNAMIC SPEECH PROCESSING SERVICES DURING VARIABLE NETWORK CONNECTIVITY

    A user device provides dynamic speech processing services during variable network connectivity with a network server. The user device includes a connection determiner that monitors a level of network connectivity between the user device and the network server, a simplified speech processor that processes speech data and is initiated based on a determination by the connection determiner that the level of network connectivity between the user device and the network server is impaired, a memory that stores processed speech data processed by the simplified speech processor, and a transmitter configured to transmit the stored processed speech data. The connection determiner determines when the level of network connectivity between the user device and the network server is no longer impaired.

  • FACILITY FOR PROCESSING VERBAL FEEDBACK AND UPDATING DIGITAL VIDEO RECORDER (DVR) RECORDING PATTERNS

    A method, a system and a computer program product for using speech/voice recognition technology to update digital video recorder (DVR) program recording patterns, based on program viewer/listener feedback. A speech controlled patient modification (SCPM) utility, utilizes a DVR recording sub-system integrated with speech processing functionality to compare control phrases with phrases uttered by a viewer. If a control phrase matches a phrase uttered by the viewer, the SCPM utility modifies the DVR recording patterns, according to a set of pre-programmed governing rules. For example, the SCPM utility may avoid, modifying the recording patterns for programs within a list of "favorite" programs but may modify the recording patterns for programs excluded from the list. The SCPM utility determines priority of the uttered phrases by identifying users and retrieving a preset priority level of the identified users. The priority level is then used to control changes to the recording patterns.

  • FACILITY FOR PROCESSING VERBAL FEEDBACK AND UPDATING DIGITAL VIDEO RECORDER (DVR) RECORDING PATTERNS

    A method, a system and a computer program product for using speech/voice recognition technology to update digital video recorder (DVR) program recording patterns, based on program viewer/listener feedback. A speech controlled patient modification (SCPM) utility, utilizes a DVR recording sub-system integrated with speech processing functionality to compare control phrases with phrases uttered by a viewer. If a control phrase matches a phrase uttered by the viewer, the SCPM utility modifies the DVR recording patterns, according to a set of pre-programmed governing rules. For example, the SCPM utility may avoid, modifying the recording patterns for programs within a list of "favorite" programs but may modify the recording patterns for programs excluded from the list. The SCPM utility determines priority of the uttered phrases by identifying users and retrieving a preset priority level of the identified users. The priority level is then used to control changes to the recording patterns.

  • SPEECH PROCESSING APPARATUS, CONTROL METHOD THEREOF, STORAGE MEDIUM STORING CONTROL PROGRAM THEREOF, AND VEHICLE, INFORMATION PROCESSING APPARATUS, AND INFORMATION PROCESSING SYSTEM INCLUDING THE SPEECH PROCESSING APPARATUS

    An apparatus of this invention is a speech processing apparatus that acquires pseudo speech from a mixture sound including desired speech and noise. The speech processing apparatus includes a first microphone that inputs a first mixture sound including desired speech and noise and outputs a first mixture signal, a second microphone that is opened to the same sound space as that of the first microphone, inputs a second mixture sound including the desired speech and the noise at a ratio different from the first mixture sound, and outputs a second mixture signal, a first sound collector including a concave surface that collects the first mixture sound to the first microphone, a second sound collector including a concave surface that collects the second mixture sound to the second microphone and disposed in a direction different from the first sound collector, and a noise suppression circuit that suppresses an estimated noise signal based on the first mixture signal and the second mixture signal and outputs a pseudo speech signal. With this arrangement, it is possible to, in a single sound space where desired speech and noise mix, collect the desired speech and the noise, correctly estimate the noise, and reconstruct pseudo speech close to the desired speech.

  • SPEAKER ADAPTATION

    A method for speaker adaptation includes receiving a plurality of media files, each associated with a call center agent of a plurality of call center agents and receiving a plurality of terms. Speech processing is performed on at least some of the media files to identify putative instances of at least some of the plurality of terms. Each putative instance is associated with a hit quality that characterizes a quality of recognition of the corresponding term. One or more call center agents for performing speaker adaptation are determined, including identifying call center agents that are associated with at least one media file that includes one or more putative instances with a hit quality below a predetermined threshold. Speaker adaptation is performed for each identified call center agent based on the media files associated with the identified call center agent and the identified instances of the plurality of terms.

  • AUDIO PROCESSING DEVICE, AUDIO PROCESSING METHOD, PROGRAM AND INTEGRATED CIRCUIT

    An audio processing device including a feature calculation unit, a boundary calculation unit and a judgment unit, detects points of change of audio features from an audio signal in an AV content. The feature calculation unit calculates, for each unit section of the audio signal, section feature data expressing features of the audio signal in the unit section. The boundary calculation unit calculates, for each target unit section among the unit sections of the audio signal, a piece of boundary information relating to at least one boundary of a similarity section. The similarity section consists of consecutive unit sections, inclusive of the target unit section, which each have similar section feature data. The judgment unit calculates a priority of each boundary indicated by one or more of the pieces of boundary information and judges whether the boundary is a scene change point based on the priority.

  • VIDEO OR AUDIO PROCESSING FORMING AN ESTIMATED QUANTILE

    A method of video or audio processing receives a sequence of sample values, each corresponding with a location in video or audio content; forms an initially estimated quantile value; and then modifies the estimated value in dependence upon a count of the results of comparisons between sample values within a fixed-length interval of the sequence of samples values and the estimated value to form an estimated quantile of the sequence of sample values.

  • SPEECH PROCESSING SYSTEM

    A text to speech method, the method comprising: receiving input text; dividing said inputted text into a sequence of acoustic units; converting said sequence of acoustic units to a sequence of speech vectors using an acoustic model, wherein said model has a plurality of model parameters describing probability distributions which relate an acoustic unit to a speech vector; and outputting said sequence of speech vectors as audio, the method further comprising determining at least some of said model parameters by: extracting expressive features from said input text to form an expressive linguistic feature vector constructed in a first space; and mapping said expressive linguistic feature vector to an expressive synthesis feature vector which is constructed in a second space.

  • SENDER-RESPONSIVE TEXT-TO-SPEECH PROCESSING

    A method of speech synthesis including receiving a text input sent by a sender, processing the text input responsive to at least one distinguishing characteristic of the sender to produce synthesized speech that is representative of a voice of the sender, and communicating the synthesized speech to a recipient user of the system.

  • DETECTING POTENTIAL SIGNIFICANT ERRORS IN SPEECH RECOGNITION RESULTS

    In some embodiments, a recognition result produced by a speech processing system based on an analysis of a speech input is evaluated for indications of potential errors. In some embodiments, sets of words/phrases that may be acoustically similar or otherwise confusable, the misrecognition of which can be significant in the domain, may be used together with a language model to evaluate a recognition result to determine whether the recognition result includes such an indication. In some embodiments, a word/phrase of a set that appears in the result is iteratively replaced with each of the other words/phrases of the set. The result of the replacement may be evaluated using a language model to determine a likelihood of the newly-created string of words appearing in a language and/or domain. The likelihood may then be evaluated to determine whether the result of the replacement is sufficiently likely for an alert to be triggered.

  • IMAGE PICKUP DEVICE AND METHOD OF PICKING UP IMAGE USING THE SAME

    An image pickup device includes an image processing unit which processes an image input through the plurality of image pickup units, a plurality of microphones which are spaced apart from each other, an audio processing unit which senses a voice of a photographer using the plurality of microphones, and a control unit which, when the voice of a photographer is sensed through the audio processing unit, controls the image processing unit to combine an image of an image pickup unit corresponding to a location of the photographer with an image of an image pickup unit currently performing photographing.

  • DISPLAY APPARATUS, DISPLAY SYSTEM, AND CONTROL METHOD THEREOF

    Exemplary embodiments disclose a display apparatus, a display system, and a control method thereof, the display apparatus including: an image processing device configured to process an image signal; an audio processing device configured to process an audio signal; a communication device configured to conduct wireless communication with a peripheral and configured to transmit the audio signal processed by the audio processing device to an audio output device; and a controller configured to operate in a discoverable mode for wireless communication access of the communication device, and configured to control the communication device to conduct pairing of the audio output device configured to transmit a preset message and the display apparatus for the wireless communication, when the preset message is received from the audio output device.

  • CONTENT REPRODUCTION APPARATUS AND CONTENT PROCESSING METHOD THEREFOR

    A content reproduction apparatus that adopts a content processing method includes a video processor, a video analyzer, and an audio processor for processing audio data and video data input thereto. The video analyzer analyzes video characteristics of video data such as resolutions, compressive distortions, and real frame rates. The video processor processes video data in accordance with video processing, which is determined based on analyzed video characteristics of video data. The audio processor processes audio data in accordance with audio processing, such as dynamic range compression and/or frequency component extension/enhancement, which is determined based on analyzed video characteristics of video data. Thus, it is possible to reproduce sound in an articulate manner depending on the video quality, which is either professional-level video shooting or nonprofessional-level video shooting.