自定义语音合成器

语音合成器是一种文本转语音的服务，用于将文本转换为近似人类声音的语音。它可以与 Read Aloud 插件配合工作，提供强大的文本转语音功能，实现页面内容的朗读。

提示：根据语音技术的不同，生成的声音可能有些会不太自然或者是仿人工声音，也有可能会非常像真人的声音。

为了更好地演示如何使用 SDK 和不同的文本转语音技术，我们将介绍：

使用浏览器自带的语音合成接口 Web Speech API
使用 Google cloud 的 text-to-speech API

语音合成器 API

`PDFTextToSpeechSynthesis` 接口规范

typescript

interface PDFTextToSpeechSynthesis {
    status: PDFTextToSpeechSynthesisStatus;
    supported(): boolean;
    pause(): void;
    resume(): void;
    stop(): void;
    play(utterances: IterableIterator<Promise<PDFTextToSpeechUtterance>>, options?: ReadAloudOptions): Promise<void>;
    updateOptions(options: Partial<ReadAloudOptions>): void;
}

1. `status` 属性

status 是表示当前朗读状态的枚举，定义如下：

typescript

enum PDFTextToSpeechSynthesisStatus {
    playing, paused, stopped,
}

提示：状态初始值设置为 stopped。

2. `supported():boolean` 方法

该方法用来检测当前客户端环境是否支持 PDFTextToSpeechSynthesis。如果后台有运行第三方语音服务，则只需要检测客户端是否支持 HTML<audio>。

提示：这里的客户端可以是浏览器，也可以是其他的，如 Electron、Apache Cordova 等。

代码示例:

typescript

class CustomPDFTextToSpeechSynthesis {
    supported(): boolean {
        return typeof window.HTMLAudioElement === 'function';
    }
    // .... other methods
}

3. `pause()`, `resume()` 和 `stop()` 方法

这三个方法用来控制朗读的状态。通过这三个方法，PDFTextToSpeechSynthesis 可以管理：

语音媒体暂停
恢复
停止
设置 status 属性值

4. `updateOptions(options: Partial<ReadAloudOptions>)` 方法

该方法用于在朗读状态下更新 PDFTextToSpeechSynthesis，比如更改语音音量。

5. `play()` 方法

typescript

play(utterances: IterableIterator<Promise<PDFTextToSpeechUtterance>>, options?: ReadAloudOptions): Promise<void>

参数说明：

utterances: 是一个 IterableIterator，包含需要阅读的文本内容以及所在的页码和坐标信息，可以使用 for...of 语法进行遍历
options: 是一个可选参数，包含了：
- 播放的语速
- 音调
- 音量
- external 参数（传给第三方语音合成器服务的参数对象）

自定义 `PDFTextToSpeechSynthesis`

方法 1: 实现 PDFTextToSpeechSynthesis 接口

提示：该 demo 仅支持在 Chrome、Firefox 和 Chromium Edge 浏览器中运行。

html

<html>
</html>
<script>
    const PDFTextToSpeechSynthesisStatus = UIExtension.PDFViewCtrl.readAloud.PDFTextToSpeechSynthesisStatus;

    class CustomPDFTextToSpeechSynthesis {
        constructor() {
            this.playingOptions = {};
            this.status = PDFTextToSpeechSynthesisStatus.stopped;
        }

        supported() {
            return typeof window.speechSynthesis !== 'undefined';
        }

        pause() {
            this.status = PDFTextToSpeechSynthesisStatus.paused;
            window.speechSynthesis.pause();
        }

        resume() {
            this.status = PDFTextToSpeechSynthesisStatus.playing;
            window.speechSynthesis.resume();
        }

        stop() {
            this.status = PDFTextToSpeechSynthesisStatus.stopped;
            window.speechSynthesis.cancel();
        }

        /**
         * @param {IterableIterator<Promise<PDFTextToSpeechUtterance>>} utterances
         * @param {ReadAloudOptions} options
         *
         */
        async play(utterances, options) {
            for await (const utterance of utterances) {
                const nativeSpeechUtterance = new window.SpeechSynthesisUtterance(utterance.text);
                const {pitch, rate, volume} = Object.assign(
                        {}, this.playingOptions, options || {}
                );
                if (typeof pitch === 'number') {
                    nativeSpeechUtterance.pitch = pitch;
                }
                if (typeof rate === 'number') {
                    nativeSpeechUtterance.rate = rate;
                }
                if (typeof volume === 'number') {
                    nativeSpeechUtterance.volume = volume;
                }
                await new Promise((resolve, reject) => {
                    nativeSpeechUtterance.onend = resolve;
                    nativeSpeechUtterance.onabort = resolve;
                    nativeSpeechUtterance.onerror = reject;
                    speechSynthesis.speak(nativeSpeechUtterance);
                });
            }
        }

        updateOptions(options) {
            Object.assign(this.playingOptions, options);
        }
    }

    var libPath = window.top.location.origin + '/lib';
    var pdfui = new UIExtension.PDFUI({
        viewerOptions: {
            libPath: libPath,
            jr: {
                licenseSN: licenseSN,
                licenseKey: licenseKey
            }
        },
        renderTo: document.body,
        appearance: UIExtension.appearances.ribbon,
        addons: [
            libPath + '/uix-addons/read-aloud'
        ]
    });
    pdfui.getReadAloudService().then(function (service) {
        service.setSpeechSynthesis(new CustomPDFTextToSpeechSynthesis());
    });

</script>

json

{
  "iframeOptions": {
    "style": "height: 500px"
  }
}

方法 2: 使用 `AbstractPDFTextToSpeechSynthesis` 自定义语音合成器

html

<html>
</html>
<script>
    const PDFTextToSpeechSynthesisStatus = UIExtension.PDFViewCtrl.readAloud.PDFTextToSpeechSynthesisStatus;
    const AbstractPDFTextToSpeechSynthesis = UIExtension.PDFViewCtrl.readAloud.AbstractPDFTextToSpeechSynthesis;
    const CustomPDFTextToSpeechSynthesis = AbstractPDFTextToSpeechSynthesis.extend({
        init() {
        },
        supported() {
            return typeof window.speechSynthesis !== 'undefined';
        },
        doPause() {
            window.speechSynthesis.pause();
        },
        doResume() {
            window.speechSynthesis.resume();
        },
        doStop() {
            window.speechSynthesis.cancel();
        },
        /**
         * @param {string} text
         * @param {ReadAloudOptions | undefined} options
         */
        async speakText(text, options) {
            const nativeSpeechUtterance = new window.SpeechSynthesisUtterance(text);
            const {pitch, rate, volume} = Object.assign(
                    {}, this.playingOptions, options || {}
            );
            if (typeof pitch === 'number') {
                nativeSpeechUtterance.pitch = pitch;
            }
            if (typeof rate === 'number') {
                nativeSpeechUtterance.rate = rate;
            }
            if (typeof volume === 'number') {
                nativeSpeechUtterance.volume = volume;
            }
            await new Promise((resolve, reject) => {
                nativeSpeechUtterance.onend = resolve;
                nativeSpeechUtterance.onabort = resolve;
                nativeSpeechUtterance.onerror = reject;
                speechSynthesis.speak(nativeSpeechUtterance);
            });
        }
    })
    const libPath = window.top.location.origin + '/lib';
    const pdfui = new UIExtension.PDFUI({
        viewerOptions: {
            libPath: libPath,
            jr: {
                licenseSN: licenseSN,
                licenseKey: licenseKey
            }
        },
        renderTo: document.body,
        appearance: UIExtension.appearances.ribbon,
        addons: [
            libPath + '/uix-addons/read-aloud'
        ]
    });
    pdfui.getReadAloudService().then(function (service) {
        service.setSpeechSynthesis(new CustomPDFTextToSpeechSynthesis());
    });

</script>

json

{
  "iframeOptions": {
    "style": "height: 500px"
  }
}

`PDFTextToSpeechSynthesis` 和 `AbstractPDFTextToSpeechSynthesis` 两种自定义方法的区别

方法1 通过实现 PDFTextToSpeechSynthesis 接口来自定义语音合成器。其需要手动管理状态的变化以及通过 for await...of 遍历 'utterances' 列表。'utterances' 列表的每一项均是从 PDFPage 获取的文本块。在某些情况下，文本块可能出现单词或者句子不完整的问题，则需要合并文本块来组成完整的单子或句子，以便更好地进行语音合成。该合并操作可以在 play() 方法中实现。

方法2 通过继承 AbstractPDFTextToSpeechSynthesis 抽象类来实现自定义语音合成器。其不需要手动管理状态的变化和遍历 utterances 列表，但是需要根据接收的文本和参数正确调用 window.SpeechSynthesisUtterance 接口来生成和播放语音。接收到的文本块会通过 AbstractPDFTextToSpeechSynthesis 自动进行合并。但是，目前很难保证在不同语音环境下合并的文本块一定是完整的单词或句子，因此，如果您对每个句子和单词的阅读正确性要求很严格，则建议您使用 方法1。

集成第三方TTS服务

本节以 @google-cloud/text-to-speech 为例来做说明。

服务器

Google Cloud 文本转语音的各个开发语言SDK版本，可参考 https://cloud.google.com/text-to-speech/docs/quickstarts。

客户端

javascript

var readAloud = UIExtension.PDFViewCtrl.readAloud;
var PDFTextToSpeechSynthesisStatus = readAloud.PDFTextToSpeechSynthesisStatus;
var AbstractPDFTextToSpeechSynthesis = readAloud.AbstractPDFTextToSpeechSynthesis;
var SPEECH_SYNTHESIS_URL = '<server url>'; // the server API address

var ThirdpartyPDFTextToSpeechSynthesis = AbstractPDFTextToSpeechSynthesis.extend({
    init: function () {
        this.audioElement = null;
    },
    supported: function () {
        return typeof window.HTMLAudioElement === 'function' && document.createElement('audio') instanceof window.HTMLAudioElement;
    },
    doPause: function () {
        if (this.audioElement) {
            this.audioElement.pause();
        }
    },
    doStop: function () {
        if (this.audioElement) {
            this.audioElement.pause();
            this.audioElement.currentTime = 0;
            this.audioElement = null;
        }
    },
    doResume: function () {
        if (this.audioElement) {
            this.audioElement.play();
        }
    },
    onCurrentPlayingOptionsUpdated: function () {
        if (!this.audioElement) {
            return;
        }
        var options = this.currentPlayingOptions;
        if (this.status === PDFTextToSpeechSynthesisStatus.playing) {
            if (options.volume >= 0 && options.volume <= 1) {
                this.audioElement.volume = options.volume;
            }
        }
    },
    speakText: function (text, options) {
        var audioElement = document.createElement('audio');
        this.audioElement = audioElement;
        if (options.volume >= 0 && options.volume <= 1) {
            audioElement.volume = options.volume;
        }
        return this.speechSynthesis(text, options).then(function (src) {
            return new Promise(function (resolve, reject) {
                audioElement.src = src;
                audioElement.onended = function () {
                    resolve();
                };
                audioElement.onabort = function () {
                    resolve();
                };
                audioElement.onerror = function (e) {
                    reject(e);
                };
                audioElement.play();
            }).finally(function () {
                URL.revokeObjectURL(src);
            });
        });
    },
    // If the server API request method or parameter form is not consistent with the following implementation, it will need to be adjusted accordingly.
    speechSynthesis: function (text, options) {
        var url = SPEECH_SYNTHESIS_URL + '?' + this.buildURIQueries(text, options);
        return fetch(url).then(function (response) {
            if (response.status >= 400) {
                return response.json().then(function (json) {
                    return Promise.reject(JSON.parse(json).error);
                });
            }
            return response.blob();
        }).then(function (blob) {
            return URL.createObjectURL(blob);
        });
    },
    buildURIQueries: function (text, options) {
        var queries = [
            'text=' + encodeURIComponent(text)
        ];
        if (!options) {
            return queries.join('&');
        }
        if (typeof options.rate === 'number') {
            queries.push('rate=' + options.rate);
        }
        if (typeof options.spitch === 'number') {
            queries.push('spitch=' + options.spitch);
        }
        if (typeof options.lang === 'string') {
            queries.push('lang=' + encodeURIComponent(options.lang));
        }
        if (typeof options.voice === 'string') {
            queries.push('voice=' + encodeURIComponent(options.voice));
        }
        if (typeof options.external !== 'undefined') {
            queries.push('external=' + encodeURIComponent(JSON.stringify(options.external)));
        }
        return queries.join('&');
    }
});

使用自定义语音合成器

javascript

pdfui.getReadAloudService().then(function (service) {
    serivce.setSpeechSynthesis(new ThirdpartyPDFTextToSpeechSynthesis());
});

PDF 表单

PDF 签名

可访问性

插件

自定义

UI 基础

业务组件

基础组件

指令

字体

疑难解答

11.0.0

10.0.0

自定义语音合成器

语音合成器 API

`PDFTextToSpeechSynthesis` 接口规范

1. `status` 属性

2. `supported():boolean` 方法

3. `pause()`, `resume()` 和 `stop()` 方法

4. `updateOptions(options: Partial<ReadAloudOptions>)` 方法

5. `play()` 方法

自定义 `PDFTextToSpeechSynthesis`

方法 1: 实现 PDFTextToSpeechSynthesis 接口

方法 2: 使用 `AbstractPDFTextToSpeechSynthesis` 自定义语音合成器

`PDFTextToSpeechSynthesis` 和 `AbstractPDFTextToSpeechSynthesis` 两种自定义方法的区别

集成第三方TTS服务

服务器

客户端

使用自定义语音合成器

自定义语音合成器 ​

语音合成器 API ​

PDFTextToSpeechSynthesis 接口规范 ​

1. status 属性 ​

2. supported():boolean 方法 ​

3. pause(), resume() 和 stop() 方法 ​

4. updateOptions(options: Partial<ReadAloudOptions>) 方法 ​

5. play() 方法 ​

自定义 PDFTextToSpeechSynthesis ​

方法 1: 实现 PDFTextToSpeechSynthesis 接口 ​

方法 2: 使用 AbstractPDFTextToSpeechSynthesis 自定义语音合成器 ​

PDFTextToSpeechSynthesis 和 AbstractPDFTextToSpeechSynthesis 两种自定义方法的区别 ​

集成第三方TTS服务 ​

服务器 ​

客户端 ​

使用自定义语音合成器 ​

自定义语音合成器

语音合成器 API

`PDFTextToSpeechSynthesis` 接口规范

1. `status` 属性

2. `supported():boolean` 方法

3. `pause()`, `resume()` 和 `stop()` 方法

4. `updateOptions(options: Partial<ReadAloudOptions>)` 方法

5. `play()` 方法

自定义 `PDFTextToSpeechSynthesis`

方法 1: 实现 PDFTextToSpeechSynthesis 接口

方法 2: 使用 `AbstractPDFTextToSpeechSynthesis` 自定义语音合成器

`PDFTextToSpeechSynthesis` 和 `AbstractPDFTextToSpeechSynthesis` 两种自定义方法的区别

集成第三方TTS服务

服务器

客户端

使用自定义语音合成器