Skip to content

自定义语音合成器

语音合成器是一种文本转语音的服务,用于将文本转换为近似人类声音的语音。它可以与 Read Aloud 插件配合工作,提供强大的文本转语音功能,实现页面内容的朗读。

提示: 根据语音技术的不同,生成的声音可能有些会不太自然或者是仿人工声音,也有可能会非常像真人的声音。

为了更好地演示如何使用 SDK 和不同的文本转语音技术,我们将介绍:

语音合成器 API

PDFTextToSpeechSynthesis 接口规范

typescript
interface PDFTextToSpeechSynthesis {
    status: PDFTextToSpeechSynthesisStatus;
    supported(): boolean;
    pause(): void;
    resume(): void;
    stop(): void;
    play(utterances: IterableIterator<Promise<PDFTextToSpeechUtterance>>, options?: ReadAloudOptions): Promise<void>;
    updateOptions(options: Partial<ReadAloudOptions>): void;
}

1. status 属性

status 是表示当前朗读状态的枚举,定义如下:

typescript
enum PDFTextToSpeechSynthesisStatus {
    playing, paused, stopped,
}

提示:状态初始值设置为 stopped

2. supported():boolean 方法

该方法用来检测当前客户端环境是否支持 PDFTextToSpeechSynthesis。如果后台有运行第三方语音服务,则只需要检测客户端是否支持 HTML<audio>

提示: 这里的客户端可以是浏览器,也可以是其他的,如 Electron、Apache Cordova 等。

代码示例:

typescript
class CustomPDFTextToSpeechSynthesis {
    supported(): boolean {
        return typeof window.HTMLAudioElement === 'function';
    }
    // .... other methods
}

3. pause(), resume()stop() 方法

这三个方法用来控制朗读的状态。通过这三个方法,PDFTextToSpeechSynthesis 可以管理:

  • 语音媒体暂停
  • 恢复
  • 停止
  • 设置 status 属性值

4. updateOptions(options: Partial<ReadAloudOptions>) 方法

该方法用于在朗读状态下更新 PDFTextToSpeechSynthesis,比如更改语音音量。

5. play() 方法

typescript
play(utterances: IterableIterator<Promise<PDFTextToSpeechUtterance>>, options?: ReadAloudOptions): Promise<void>

参数说明:

  • utterances: 是一个 IterableIterator,包含需要阅读的文本内容以及所在的页码和坐标信息,可以使用 for...of 语法进行遍历
  • options: 是一个可选参数,包含了:
    • 播放的语速
    • 音调
    • 音量
    • external 参数(传给第三方语音合成器服务的参数对象)

自定义 PDFTextToSpeechSynthesis

方法 1: 实现 PDFTextToSpeechSynthesis 接口

提示:该 demo 仅支持在 Chrome、Firefox 和 Chromium Edge 浏览器中运行。

html

<html>
</html>
<script>
    const PDFTextToSpeechSynthesisStatus = UIExtension.PDFViewCtrl.readAloud.PDFTextToSpeechSynthesisStatus;

    class CustomPDFTextToSpeechSynthesis {
        constructor() {
            this.playingOptions = {};
            this.status = PDFTextToSpeechSynthesisStatus.stopped;
        }

        supported() {
            return typeof window.speechSynthesis !== 'undefined';
        }

        pause() {
            this.status = PDFTextToSpeechSynthesisStatus.paused;
            window.speechSynthesis.pause();
        }

        resume() {
            this.status = PDFTextToSpeechSynthesisStatus.playing;
            window.speechSynthesis.resume();
        }

        stop() {
            this.status = PDFTextToSpeechSynthesisStatus.stopped;
            window.speechSynthesis.cancel();
        }

        /**
         * @param {IterableIterator<Promise<PDFTextToSpeechUtterance>>} utterances
         * @param {ReadAloudOptions} options
         *
         */
        async play(utterances, options) {
            for await (const utterance of utterances) {
                const nativeSpeechUtterance = new window.SpeechSynthesisUtterance(utterance.text);
                const {pitch, rate, volume} = Object.assign(
                        {}, this.playingOptions, options || {}
                );
                if (typeof pitch === 'number') {
                    nativeSpeechUtterance.pitch = pitch;
                }
                if (typeof rate === 'number') {
                    nativeSpeechUtterance.rate = rate;
                }
                if (typeof volume === 'number') {
                    nativeSpeechUtterance.volume = volume;
                }
                await new Promise((resolve, reject) => {
                    nativeSpeechUtterance.onend = resolve;
                    nativeSpeechUtterance.onabort = resolve;
                    nativeSpeechUtterance.onerror = reject;
                    speechSynthesis.speak(nativeSpeechUtterance);
                });
            }
        }

        updateOptions(options) {
            Object.assign(this.playingOptions, options);
        }
    }

    var libPath = window.top.location.origin + '/lib';
    var pdfui = new UIExtension.PDFUI({
        viewerOptions: {
            libPath: libPath,
            jr: {
                licenseSN: licenseSN,
                licenseKey: licenseKey
            }
        },
        renderTo: document.body,
        appearance: UIExtension.appearances.ribbon,
        addons: [
            libPath + '/uix-addons/read-aloud'
        ]
    });
    pdfui.getReadAloudService().then(function (service) {
        service.setSpeechSynthesis(new CustomPDFTextToSpeechSynthesis());
    });

</script>
json
{
  "iframeOptions": {
    "style": "height: 500px"
  }
}

方法 2: 使用 AbstractPDFTextToSpeechSynthesis 自定义语音合成器

html

<html>
</html>
<script>
    const PDFTextToSpeechSynthesisStatus = UIExtension.PDFViewCtrl.readAloud.PDFTextToSpeechSynthesisStatus;
    const AbstractPDFTextToSpeechSynthesis = UIExtension.PDFViewCtrl.readAloud.AbstractPDFTextToSpeechSynthesis;
    const CustomPDFTextToSpeechSynthesis = AbstractPDFTextToSpeechSynthesis.extend({
        init() {
        },
        supported() {
            return typeof window.speechSynthesis !== 'undefined';
        },
        doPause() {
            window.speechSynthesis.pause();
        },
        doResume() {
            window.speechSynthesis.resume();
        },
        doStop() {
            window.speechSynthesis.cancel();
        },
        /**
         * @param {string} text
         * @param {ReadAloudOptions | undefined} options
         */
        async speakText(text, options) {
            const nativeSpeechUtterance = new window.SpeechSynthesisUtterance(text);
            const {pitch, rate, volume} = Object.assign(
                    {}, this.playingOptions, options || {}
            );
            if (typeof pitch === 'number') {
                nativeSpeechUtterance.pitch = pitch;
            }
            if (typeof rate === 'number') {
                nativeSpeechUtterance.rate = rate;
            }
            if (typeof volume === 'number') {
                nativeSpeechUtterance.volume = volume;
            }
            await new Promise((resolve, reject) => {
                nativeSpeechUtterance.onend = resolve;
                nativeSpeechUtterance.onabort = resolve;
                nativeSpeechUtterance.onerror = reject;
                speechSynthesis.speak(nativeSpeechUtterance);
            });
        }
    })
    const libPath = window.top.location.origin + '/lib';
    const pdfui = new UIExtension.PDFUI({
        viewerOptions: {
            libPath: libPath,
            jr: {
                licenseSN: licenseSN,
                licenseKey: licenseKey
            }
        },
        renderTo: document.body,
        appearance: UIExtension.appearances.ribbon,
        addons: [
            libPath + '/uix-addons/read-aloud'
        ]
    });
    pdfui.getReadAloudService().then(function (service) {
        service.setSpeechSynthesis(new CustomPDFTextToSpeechSynthesis());
    });

</script>
json
{
  "iframeOptions": {
    "style": "height: 500px"
  }
}

PDFTextToSpeechSynthesisAbstractPDFTextToSpeechSynthesis 两种自定义方法的区别

方法1 通过实现 PDFTextToSpeechSynthesis 接口来自定义语音合成器。其需要手动管理状态的变化以及通过 for await...of 遍历 'utterances' 列表。'utterances' 列表的每一项均是从 PDFPage 获取的文本块。在某些情况下,文本块可能出现单词或者句子不完整的问题,则需要合并文本块来组成完整的单子或句子,以便更好地进行语音合成。该合并操作可以在 play() 方法中实现。

方法2 通过继承 AbstractPDFTextToSpeechSynthesis 抽象类来实现自定义语音合成器。其不需要手动管理状态的变化和遍历 utterances 列表,但是需要根据接收的文本和参数正确调用 window.SpeechSynthesisUtterance 接口来生成和播放语音。 接收到的文本块会通过 AbstractPDFTextToSpeechSynthesis 自动进行合并。但是,目前很难保证在不同语音环境下合并的文本块一定是完整的单词或句子,因此,如果您对每个句子和单词的阅读正确性要求很严格,则建议您使用 方法1

集成第三方TTS服务

本节以 @google-cloud/text-to-speech 为例来做说明。

服务器

Google Cloud 文本转语音的各个开发语言SDK版本,可参考 https://cloud.google.com/text-to-speech/docs/quickstarts

客户端

javascript
var readAloud = UIExtension.PDFViewCtrl.readAloud;
var PDFTextToSpeechSynthesisStatus = readAloud.PDFTextToSpeechSynthesisStatus;
var AbstractPDFTextToSpeechSynthesis = readAloud.AbstractPDFTextToSpeechSynthesis;
var SPEECH_SYNTHESIS_URL = '<server url>'; // the server API address

var ThirdpartyPDFTextToSpeechSynthesis = AbstractPDFTextToSpeechSynthesis.extend({
    init: function () {
        this.audioElement = null;
    },
    supported: function () {
        return typeof window.HTMLAudioElement === 'function' && document.createElement('audio') instanceof window.HTMLAudioElement;
    },
    doPause: function () {
        if (this.audioElement) {
            this.audioElement.pause();
        }
    },
    doStop: function () {
        if (this.audioElement) {
            this.audioElement.pause();
            this.audioElement.currentTime = 0;
            this.audioElement = null;
        }
    },
    doResume: function () {
        if (this.audioElement) {
            this.audioElement.play();
        }
    },
    onCurrentPlayingOptionsUpdated: function () {
        if (!this.audioElement) {
            return;
        }
        var options = this.currentPlayingOptions;
        if (this.status === PDFTextToSpeechSynthesisStatus.playing) {
            if (options.volume >= 0 && options.volume <= 1) {
                this.audioElement.volume = options.volume;
            }
        }
    },
    speakText: function (text, options) {
        var audioElement = document.createElement('audio');
        this.audioElement = audioElement;
        if (options.volume >= 0 && options.volume <= 1) {
            audioElement.volume = options.volume;
        }
        return this.speechSynthesis(text, options).then(function (src) {
            return new Promise(function (resolve, reject) {
                audioElement.src = src;
                audioElement.onended = function () {
                    resolve();
                };
                audioElement.onabort = function () {
                    resolve();
                };
                audioElement.onerror = function (e) {
                    reject(e);
                };
                audioElement.play();
            }).finally(function () {
                URL.revokeObjectURL(src);
            });
        });
    },
    // If the server API request method or parameter form is not consistent with the following implementation, it will need to be adjusted accordingly.
    speechSynthesis: function (text, options) {
        var url = SPEECH_SYNTHESIS_URL + '?' + this.buildURIQueries(text, options);
        return fetch(url).then(function (response) {
            if (response.status >= 400) {
                return response.json().then(function (json) {
                    return Promise.reject(JSON.parse(json).error);
                });
            }
            return response.blob();
        }).then(function (blob) {
            return URL.createObjectURL(blob);
        });
    },
    buildURIQueries: function (text, options) {
        var queries = [
            'text=' + encodeURIComponent(text)
        ];
        if (!options) {
            return queries.join('&');
        }
        if (typeof options.rate === 'number') {
            queries.push('rate=' + options.rate);
        }
        if (typeof options.spitch === 'number') {
            queries.push('spitch=' + options.spitch);
        }
        if (typeof options.lang === 'string') {
            queries.push('lang=' + encodeURIComponent(options.lang));
        }
        if (typeof options.voice === 'string') {
            queries.push('voice=' + encodeURIComponent(options.voice));
        }
        if (typeof options.external !== 'undefined') {
            queries.push('external=' + encodeURIComponent(JSON.stringify(options.external)));
        }
        return queries.join('&');
    }
});

使用自定义语音合成器

javascript
pdfui.getReadAloudService().then(function (service) {
    serivce.setSpeechSynthesis(new ThirdpartyPDFTextToSpeechSynthesis());
});