G10L  13/00		ιƥȤѴ륷ƥΣ	Speech synthesis; Text to speech systems	13364
G10L  13/02		ˡΣ	Methods for producing synthetic speech; Speech synthesisers	8016
G10L  13/027		ǰ鲻ؤι˴ŤǰμʥƥȤ鲻뤿Υѥ᡼ǣ̣ˡΣ	Concept to speech synthesisers; Generation of natural phrases from machine-based concepts(generation of parameters for speech synthesis out of text <b>G10L13/08</b>)	1500
G10L  13/033		Խ㡥βΣ	Voice editing, e.g. manipulating the voice of the synthesiser	2481
G10L  13/04		ƥκ㡥ι¤ޤϥΣ	Details of speech synthesis systems, e.g. synthesiser structure or memory management	6777
G10L  13/047		ΥƥΣ	Architecture of speech synthesisers	1978
G10L  13/06		ǻȤ벻ҡ絬§Σ	Elementary speech units used in speech synthesisers; Concatenation rules	3474
G10L  13/07		絬§Σ	Concatenation rules	997
G10L  13/08		ƥȤ鲻뤿ΡƥȤʬϡޤϥѥ᡼㡥ɽǤ鲻ǤؤѴΧޤ϶ޤϥȥ͡ηΣ	Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination	8709
G10L  13/10		ƥȤƳФ줿Χ§Ĵޤϥȥ͡Σ	Prosody rules derived from text; Stress or intonation	3497
G10L  15/00		ǧʣǣ̣ͥˡΣ	Speech recognition(<b>G10L17/00</b> takes precedence)	22197
G10L  15/01		ǧƥɾΣ	Assessment or evaluation of speech recognition systems	1530
G10L  15/02		ǧΤħСǧñ̤Σ	Feature extraction for speech recognition; Selection of recognition unit	14032
G10L  15/04		ơ󡨸維СΣ	Segmentation; Word boundary detection	6407
G10L  15/05		維СΣ	Word boundary detection	598
G10L  15/06		ɸѥκǧƥγؽ㡥üŬʣǣ̣ͥˡΣ	Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice(<b>G10L15/14</b> takes precedence)	16383
G10L  15/065		ŬΣ	Adaptation	909
G10L  15/07		üԤФΡΣ	to the speaker	1426
G10L  15/08		μ̤ޤõΣ	Speech classification or search	9802
G10L  15/10		̤βɸѥȤεΥޤĤߤѤΡΣ	using distance or distortion measures between unknown speech and reference templates	9106
G10L  15/12		ưŪײˡѤΡ㡥ưŪֿ̡ΣģԣסϡΣ	using dynamic programming techniques, e.g. dynamic time warping [DTW]	877
G10L  15/14		ŪǥѤΡ㡥ޥ륳եǥΣȣͣ͡ϡʣǣ̣ͥˡΣ	using statistical models, e.g. Hidden Markov Models [HMM](<b>G10L15/18</b> takes precedence)	3901
G10L  15/16		˥塼롦ͥåȥѤΡΣ	using artificial neural networks	8646
G10L  15/18		ǥѤΡΣ	using natural language modelling	12415
G10L  15/183		ʸ̮¸ѤΡ㡥ǥΣ	using context dependencies, e.g. language models	3234
G10L  15/187		ǥƥȡ㡥ȯ§ˡޤϲǣݣΣ	Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams	1182
G10L  15/19		ʸˡŪƥȡ㡥絬§˴ŤǧۣáΣ	Grammatical context, e.g. disambiguation of recognition hypotheses based on word sequence rules	556
G10L  15/193		ʸˡ㡥ͭ¾֥ȥޥȥʸ̮ͳʸˡޤñͥåȥΣ	Formal grammars, e.g. finite state automata, context free grammars or word networks	542
G10L  15/197		ΨŪʸˡ㡥ñݣΣ	Probabilistic grammars, e.g. word n-grams	1010
G10L  15/20		ʴĶäŬǧѡ㡥ޤϥȤΤ벻ʣǣ̣ͥˡΣ	Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise or of stress induced speech(<b>G10L21/02</b> takes precedence)	6688
G10L  15/22		ǧμ硤㡥ޥ󡦥ޥáΣ	Procedures used during a speech recognition process, e.g. man-machine dialog	45157
G10L  15/24		ʳħѤ벻ǧΣ	Speech recognition using non-acoustical features	1914
G10L  15/25		ΰ֡αưޤϴʬϤѤΡΣ	using position of the lips, movement of the lips or face analysis	1476
G10L  15/26		ƥȤѴ륷ƥʣǣ̣ͥˡΣ	Speech to text systems(<b>G10L15/08</b> takes precedence)	31796
G10L  15/28		ǧƥι¤κΣ	Constructional details of speech recognition systems	8244
G10L  15/30		ʬǧ㡥饤ȡХƥˤΡäޤϥͥåȥץꥱΤΤΡΣ	Distributed recognition, e.g. in client-server systems, for mobile phones or network applications	6957
G10L  15/32		缡ޤ˻Ѥʣǧ֡Ȥ߹碌륷ƥࡤ㡥ɼƥΣ	Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems	2289
G10L  15/34		ñβǧ֤ؤŬѡ㡥ޥץåޤϥ饦ɥԥ塼ƥ󥰤ѤˤΡΣ	Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing	899
G10L  17/00		üԼ̤ޤϾȹ絻ѡΣ	Speaker identification or verification techniques	8285
G10L  17/02		㡥򡨥ѥɽޤϥǥ벽㡥ȽʬϡΣ̣ģϤޤϼʬ˴ŤΡħޤСΣ	Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction	4656
G10L  17/04		ؽȤޤϥǥ빽ۡΣ	Training, enrolment or model building	4119
G10L  17/06		ջַ굻ѡѥ󡦥ޥåάΣ	Decision making techniques; Pattern matching strategies	2204
G10L  17/08		ѥɸѥȤδ֤εΥޤĤߤѤΡΣ	Use of distortion metrics or a particular distance between probe pattern and reference templates	975
G10L  17/10		ޥ⡼륷ƥࡤʣǧ󥸥ޤϥѡȥƥͻ˴ŤΡΣ	Multimodal systems, i.e. based on the integration of multiple recognition engines or fusion of expert systems	347
G10L  17/12		Σ	Score normalisation	211
G10L  17/14		üԤǧޤϸβǤʬޤϲǧλѡΣ	Use of phonemic categorisation or speech recognition prior to speaker recognition or verification	1011
G10L  17/16		ޥ륳եǥΣȣͣ͡ϡΣ	Hidden Markov models [HMM]	164
G10L  17/18		͹˥塼롦ͥåȥͥ˥ȡץΣ	Artificial neural networks; Connectionist approaches	2122
G10L  17/20		ƥδ뤿ΥѥѴ㡥ͥ뻨ޤϰۤʤưĶФΡΣ	Pattern transformations or operations aimed at increasing system robustness, e.g. against channel noise or different working conditions	481
G10L  17/22		÷硨ޥޥ󡦥󥿡եΣ	Interactive procedures; Man-machine interfaces	3894
G10L  17/24		桼ѥɤޤϤ餫줿ȯ褦¥ΡΣ	 the user being prompted to utter a password or a predefined phrase	1478
G10L  17/26		üʲħǧ㡥ȯѤΤΡưʪǧΣ	Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices	2545
G10L  19/00		Ĺ򸺤餹βޤϲʬϹѡ㡥ܥˤ롨ޤϲΥɲޤϥǥɲե륿ǥޤϿʬϤѤΡʳڴˤΣǣȡˡΣ	Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis(in musical instruments <b>G10H</b>)	16574
G10L  19/002		ưŪӥåȳơγвѣǣ̣ˡΣ	Dynamic bit allocation(for perceptual audio coders <b>G10L19/032</b>)	601
G10L  19/005		沽르ꥺबϢȤΡͥ顼Σ	Correction of errors induced by the transmission channel, if related to the coding algorithm	1408
G10L  19/008		ޥͥ륪ǥ沽ڤ沽֤ͥδطѤƾĹ򸺾Ρ㡥祤ȥƥ쥪ƥ󥷥ƥ沽ϥޥȥꥭ󥰡Σ	Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing	4639
G10L  19/012		եȥΥ̵沽Σ	Comfort noise or silence coding	611
G10L  19/018		ǥƩʤԲİǡΥǥؤߡΣ	Audio watermarking, i.e. embedding inaudible data in the audio signal	2046
G10L  19/02		ڥȥʬϤѤΡ㡥Ѵܥޤϥ֥ХɥܥΣ	using spectral analysis, e.g. transform vocoders or subband vocoders	7676
G10L  19/022		֥åʤΰɸܤΥ롼ײʬϥɥ򡨥СåװҡΣ	Blocking, i.e. grouping of samples in time; Choice of analysis windows; Overlap factoring	772
G10L  19/025		֡ȿʬǽڤؤ뤿βϡΩ夬θСΣ	Detection of transients or attacks for time/frequency resolution switching	280
G10L  19/028		Υִ㡥Υˤڥȥʬִ³ŪΤΥեȥΥǣ̣ˡΣ	Noise substitution, e.g. substituting non-tonal spectral components by noisy source(comfort noise for discontinuous speech transmission <b>G10L19/012</b>)	169
G10L  19/03		ץꥨɻߤΤΥڥȥͽ¬ŪʥΥԥ󥰡ΣԣΣӡϡ㡥ͣУţǣޤϣͣУţǣˤΡΣ	Spectral prediction for preventing pre-echo; Temporary noise shaping [TNS], e.g. in MPEG2 or MPEG4	166
G10L  19/032		ڥȥʬ̻Ҳޤϵ̻ҲΣ	Quantisation or dequantisation of spectral components	1006
G10L  19/035		顼̻ҲΣ	Scalar quantisation	814
G10L  19/038		٥ȥ̻Ҳ㡥ԣ֣ѥǥΣ	Vector quantisation, e.g. TwinVQ audio	1511
G10L  19/04		ͽ¬ѤѤΡΣ	using predictive techniques	2611
G10L  19/06		ڥȥħФޤϥɲ㡥ûͽ¬Σ	Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients	1585
G10L  19/07		ڥȥСΣ̣ӣСϥܥΣ	Line spectrum pair [LSP] vocoders	304
G10L  19/08		忶ǽФޤϥɲĹͽ¬ѥ᡼ФޤϥɲΣ	Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters	1719
G10L  19/083		忶ǽ忶ǤΡʣǣ̣ͥˡΣ	the excitation function being an excitation gain(<b>G10L25/90</b> takes precedence)	163
G10L  19/087		忶ǥѤΡ㡥ͣţ̣Сͣ£šʬӰ̣Уäޤϣȣ֣أáΣ	using mixed excitation models, e.g. MELP, MBE, split band LPC or HVXC	164
G10L  19/09		Ĺͽ¬ʤŪĹν㡥Ŭɥ֥åޤϥԥåͽ¬ѤΡΣ	Long term prediction, i.e. removing periodical redundancies, e.g. by using adaptive codebook or pitch predictor	334
G10L  19/093		忶ǥѤΡΣ	using sinusoidal excitation models	128
G10L  19/097		ץȥפȷʬޤȷ沽ΣУףɡϤѤΡΣ	using prototype waveform decomposition or prototype waveform interpolative [PWI] coders	38
G10L  19/10		忶ǽޥѥ륹忶ǤΡΣ	the excitation function being a multipulse excitation	531
G10L  19/107		ѡѥ륹忶㡥Ūɥ֥åѤΡΣ	Sparse pulse excitation, e.g. by using algebraic codebook	145
G10L  19/113		쥮顼ѥ륹忶Σ	Regular pulse excitation	28
G10L  19/12		忶ǽ忶ǤΡ㡥忶ͽ¬Σãţ̣СϥܥΣ	the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders	1540
G10L  19/125		ԥå忶㡥ԥåƱ忶ãţ̣СΣ	Pitch excitation, e.g. pitch synchronous innovation CELP [PSI-CELP] 	194
G10L  19/13		ĺ忶ͽ¬Σ	Residual excited linear prediction [RELP]	88
G10L  19/135		٥ȥ忶ͽ¬Σ	Vector sum excited linear prediction [VSELP]	65
G10L  19/16		ܥΥƥΣ	Vocoder architecture	3522
G10L  19/18		ʣΥ⡼ɤѤܥΣ	Vocoders using multiple modes	608
G10L  19/20		μ̤ͭ沽ϥ֥å沽ޤϥ֥ȥ١沽ѤΡΣ	using sound class specific coding, hybrid encoders or object based coding	714
G10L  19/22		⡼ɤηꡤʤѥ᡼Ф륪ǥƤ˴ŤΡΣ	Mode decision, i.e. based on audio signal content versus external parameters	585
G10L  19/24		ѥ졼ȥǥå㡥Ū沽ޤؾ沽Τ褦ʥ֥ɽѤưۤʤʼ뤿ΤΡΣ	Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding	1034
G10L  19/26		ץ졦ե륿󥰤ޤϥݥȡե륿󥰡Σ	Pre-filtering or post-filtering	1586
G10L  21/00		λ٤ѹ뤿ˡ¾βİޤԲİ桤㡥пޤϿп桤벻ѡʣǣ̣ͥˡΣ	Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility (<b>G10L19/00</b> takes precedence)	5401
G10L  21/003		ѹΡ㡥ԥåޤϥեޥȡΣ	Changing voice quality, e.g. pitch or formants	2472
G10L  21/007		ħΡΣ	characterised by the process used	1577
G10L  21/01		ּѹΣ	Correction of time axis	312
G10L  21/013		ɸΥԥåؤŬΣ	Adapting to target pitch	1214
G10L  21/02		ζĴ㡥㸺ޤϥͭˤ륨㸺ȣ£ϥ󥺥ե꡼äˤ륨ȣͣˡΣ	Speech enhancement, e.g. noise reduction or echo cancellation(reducing echo effects in line transmission systems <b>H04B3/20</b>; echo suppression in hands-free telephones <b>H04M9/08</b>)	6605
G10L  21/0208		Σ	Noise filtering	12378
G10L  21/0216		οΤѤˡħΡΣ	characterised by the method used for estimating noise	5326
G10L  21/0224		ּǽΡΣ	Processing in the time domain	1443
G10L  21/0232		ȿǽΡΣ	Processing in the frequency domain	4515
G10L  21/0264		ѥ᡼¬μħΡ㡥صѡ򺹵Ѥޤͽ¬ѡΣ	characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques	1595
G10L  21/0272		ʬΥΣ	Voice signal separating	3433
G10L  21/028		ѤΡΣ	using properties of sound source	1446
G10L  21/0308		ѥ᡼¬μħΡ㡥صѡ򺹵Ѥޤͽ¬ѡΣ	characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques	916
G10L  21/0316		ѹΡΣ	by changing the amplitude	1298
G10L  21/0324		κΣ	Details of processing therefor	355
G10L  21/0332		ȷѲΡΣ	involving modification of waveforms	424
G10L  21/034		ưĴΤΡΣ	Automatic adjustment	1169
G10L  21/0356		¾οƱԤΤΡ㡥Σ	for synchronising with other signals, e.g. video signals	189
G10L  21/0364		ʹ䤹뤿ΤΡΣ	for improving intelligibility	1019
G10L  21/038		ӰĥѤѤΡΣ	using band spreading techniques	761
G10L  21/0388		κΣ	Details of processing therefor	685
G10L  21/04		ְ̤ޤϳĥΣ	Time compression or expansion	2361
G10L  21/043		®٤ѹˤΡΣ	by changing speed	793
G10L  21/045		ȷδְޤѤΡΣ	using thinning out or insertion of a waveform	222
G10L  21/047		ְޤȷΥפħΡΣ	characterised by the type of waveform to be thinned out or inserted	226
G10L  21/049		ȷ³ħΤΡΣ	characterised by the interconnection of waveforms	125
G10L  21/055		¾οȤƱԤΤΡ㡥Σ	for synchronising with other signals, e.g. video signals	531
G10L  21/057		ʹ䤹뤿ΤΡΣ	for improving intelligibility	407
G10L  21/06		ԲİɽؤѴ㡥βĻ벽ޤϿФΤβʣǣ̣ͥˡΣ	Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids(<b>G10L15/26</b> takes precedence)	1401
G10L  21/10		ĻѴΡΣ	Transforming into visible information	2767
G10L  21/12		ּɽΡΣ	by displaying time domain information	258
G10L  21/14		ȿɽΡΣ	by displaying frequency domain information	302
G10L  21/16		ԲĻѴΡʼδԤľܤİФ¾μγФȼ费뤳ȤǤ褦ˤ֤ޤˡƣˡΣ	Transforming into a non-visible representation(devices or methods enabling ear patients to replace direct auditory perception by another kind of perception <b>A61F11/04</b>)	182
G10L  21/18		ѴκΣ	Details of the transformation process	158
G10L  25/00		ǣ̣ǣ̣Υ롼Τɤ줫Ĥ˸ꤵʤʬϵѡʿ̤ħФ줿硤㡥̵֤Ф줿硤ȾƳ̵ȣǣˡΣ	Speech or voice analysis techniques not restricted to a single one of groups <b>G10L15/00 to G10L21/00</b> (muting semiconductor-based amplifiers when some special characteristics of a signal are sensed by a speech detector, e.g. sensing when no signal is present, <b>H03G3/34</b>)	4524
G10L  25/03		Фѥ᡼ΥפħΤΡΣ	characterised by the type of extracted parameters	4962
G10L  25/06		Фѥ᡼طǤΡΣ	the extracted parameters being correlation coefficients	842
G10L  25/09		Фѥ᡼򺹿ǤΡΣ	the extracted parameters being zero crossing rates	282
G10L  25/12		Фѥ᡼ͽ¬ǤΡΣ	the extracted parameters being prediction coefficients	560
G10L  25/15		Фѥ᡼եޥȾǤΡΣ	the extracted parameters being formant information	701
G10L  25/18		Фѥ᡼ӰΥڥȥǤΡΣ	the extracted parameters being spectral information of each sub-band	6492
G10L  25/21		Фѥ᡼ѥǤΡΣ	the extracted parameters being power information	2510
G10L  25/24		Фѥ᡼ץȥǤΡΣ	the extracted parameters being the cepstrum	4939
G10L  25/27		ʬϼˡħΡΣ	characterised by the analysis technique	3362
G10L  25/30		˥塼롦ͥåȥѤΡΣ	using neural networks	11177
G10L  25/33		եѤΡΣ	using fuzzy logic	54
G10L  25/36		ѤΡΣ	using chaos theory	24
G10L  25/39		Ū르ꥺѤΡΣ	using genetic algorithms	53
G10L  25/45		ʬΥפħΤΡΣ	characterised by the type of analysis window	1314
G10L  25/48		ӤäŬ礷ΡΣ	specially adapted for particular use	3712
G10L  25/51		ӤޤȽ̤ΤΤΡΣ	for comparison or discrimination	14988
G10L  25/54		ΤΤΡΣ	for retrieval	1135
G10L  25/57		ưνΤΤΡΣ	for processing of video signals	1704
G10L  25/60		ομ¬ꤹ뤿ΤΡΣ	for measuring the quality of voice signals	2192
G10L  25/63		ꤹ뤿ΤΡΣ	for estimating an emotional state	6799
G10L  25/66		򹯾֤˴ؤѥ᡼Ф뤿ΤΡʿǤΤθФޤ¬£ˡΣ	for extracting parameters related to health condition(detecting or measuring for diagnostic purposes <b>A61B5/00</b>)	1832
G10L  25/69		ޤϥǥɤɾ뤿ΤΡΣ	for evaluating synthetic or decoded voice signals	770
G10L  25/72		ʬϷ̤뤿ΤΡΣ	for transmitting results of analysis	391
G10L  25/75		ƻѥ᡼ǥ벽뤿ΤΡΣ	for modelling vocal tract parameters	252
G10L  25/78		ο̵ͭθСˤ롢ȿˤڤ괹ʣȣͣˡΣ	Detection of presence or absence of voice signals(switching of direction of transmission by voice frequency in two-way loud-speaking telephone systems <b>H04M9/10</b>)	5898
G10L  25/81		ڤȲȽ̤뤿ΤΡΣ	for discriminating voice from music	470
G10L  25/84		ȲȽ̤뤿ΤΡΣ	for discriminating voice from noise	1943
G10L  25/87		οϢ³θСΣ	Detection of discrete points within a voice signal	1497
G10L  25/90		ΥԥåСΣ	Pitch determination of speech signals	3098
G10L  25/93		̵ͭʬ̡ʣǣ̣ͥˡΣ	Discriminating between voiced and unvoiced parts of speech signals(<b>G10L25/90</b> takes precedence)	1648
G10L  99/00		Υ֥饹¾Υ롼פʬवʤΣ	Subject matter not provided for in other groups of this subclass	66
