语音交互与控制
体验完了语音识别、语音合成与大模型文本交互,本节将把这个三个部分融合起来,实现语音交互!同时实现用语音来控制机器人执行一些动作!
语音交互运行实例
开启OriginMan电源开关,并在其终端输入以下指令:
运行后将看到如下终端输出,并对OriginMan说出一段话,例如:你好,你是谁?
root@ubuntu:~# ros2 launch originman_llm_chat originman_llm_chat.launch.py
[INFO] [launch]: All log files can be found below /root/.ros/log/2025-02-28-13-31-04-432778-ubuntu-38680
[INFO] [launch]: Default logging verbosity is set to INFO
[INFO] [text_to_speech_node-1]: process started with pid [38693]
[INFO] [llm_chat_node-2]: process started with pid [38695]
[INFO] [asr_node-3]: process started with pid [38697]
[text_to_speech_node-1] [INFO] [1740720667.721629199] [text_to_speech_node]: 使用示例:ros2 topic pub /tts_input std_msgs/msg/String "data: '请告诉我今天的天气。'"
[text_to_speech_node-1] [INFO] [1740720667.724207952] [text_to_speech_node]: 文本转语音节点已启动,等待输入/tts_input话题数据...
[llm_chat_node-2] [INFO] [1740720668.992896394] [llm_chat_node]: OpenAI 交互节点已启动,等待输入/text_input话题数据...
[llm_chat_node-2] [INFO] [1740720668.995342605] [llm_chat_node]: 使用示例:ros2 topic pub --once /text_input std_msgs/msg/String "data: '你好,今天天气如何?'"
[asr_node-3] [INFO] [1740720670.683385618] [asr_node]: ASR 节点已启动,开始录音和识别...
[asr_node-3] [INFO] [1740720670.690166792] [asr_node]: 开始录音和识别...
[asr_node-3] [INFO] [1740720670.694668880] [asr_node]: 收到监听状态,开始聆听中...
[asr_node-3] [INFO] [1740720673.730837496] [asr_node]: 录制音频: test.wav
[asr_node-3] [INFO] [1740720673.756912233] [asr_node]: 采样率: 16000, 时长: 3.0 秒
[asr_node-3] [INFO] [1740720674.881400056] [asr_node]: 识别到用户语音: '你好,你是谁?'
[asr_node-3] [INFO] [1740720674.884098559] [asr_node]: 已发布文本到 /text_input: '你好,你是谁?'
[llm_chat_node-2] [INFO] [1740720674.885190268] [llm_chat_node]: 接收到文本: '你好,你是谁?'
[llm_chat_node-2] 2025-02-28 13:31:15,772 - INFO - HTTP Request: POST http://59.110.158.57:3000/api/v1/chat/completions "HTTP/1.1 200 OK"
[asr_node-3] [INFO] [1740720675.891859882] [asr_node]: 开始录音和识别...
[asr_node-3] [INFO] [1740720675.892257591] [asr_node]: 收到监听状态,开始聆听中...
[llm_chat_node-2] 2025-02-28 13:31:16,562 - INFO - AI 响应: 我是OriginMan,帅气的人形机器人。有什么问题尽管问,让我用幽默的方式为你解答!
[llm_chat_node-2] [INFO] [1740720676.566284933] [llm_chat_node]: 已发送 TTS 消息: '我是OriginMan,帅气的人形机器人。有什么问题尽管问,让我用幽默的方式为你解答!'
[text_to_speech_node-1] [INFO] [1740720676.567079142] [text_to_speech_node]: 接收到文本: '我是OriginMan,帅气的人形机器人。有什么问题尽管问,让我用幽默的方式为你解答!'
[text_to_speech_node-1] [INFO] [1740720676.569749478] [text_to_speech_node]: 开始播报语音...
[asr_node-3] [INFO] [1740720676.570593437] [asr_node]: 收到播报状态,正在说话,暂停录音...
[text_to_speech_node-1] [INFO] [1740720676.573927483] [text_to_speech_node]: 文本分割为 2 个句子
[text_to_speech_node-1] [INFO] [1740720676.582458200] [text_to_speech_node]: 开始合成文本: '有什么问题尽管问,让我用幽默的方式为你解答。'
[text_to_speech_node-1] [INFO] [1740720676.583595951] [text_to_speech_node]: 开始合成文本: '我是OriginMan,帅气的人形机器人。'
[text_to_speech_node-1] 2025-02-28 13:31:16,993 - INFO - Websocket connected
[text_to_speech_node-1] 2025-02-28 13:31:16,996 - INFO - Websocket connected
[text_to_speech_node-1] [INFO] [1740720678.380090152] [text_to_speech_node]: 播放第 1 段音频
[asr_node-3] [INFO] [1740720678.945347378] [asr_node]: 录制音频: test.wav
[asr_node-3] [INFO] [1740720679.005670400] [asr_node]: 采样率: 16000, 时长: 3.0 秒
[asr_node-3] [INFO] [1740720679.829108107] [asr_node]: 未识别到任何内容
[asr_node-3] [INFO] [1740720680.839732554] [asr_node]: 正在播报,暂停录音...
[asr_node-3] [INFO] [1740720681.850556125] [asr_node]: 正在播报,暂停录音...
[text_to_speech_node-1] [INFO] [1740720682.299188975] [text_to_speech_node]: 播放第 2 段音频
[asr_node-3] [INFO] [1740720682.861295195] [asr_node]: 正在播报,暂停录音...
[asr_node-3] [INFO] [1740720683.872126640] [asr_node]: 正在播报,暂停录音...
[asr_node-3] [INFO] [1740720684.882228874] [asr_node]: 正在播报,暂停录音...
[text_to_speech_node-1] [INFO] [1740720685.698440778] [text_to_speech_node]: 所有音频段播放完成
[text_to_speech_node-1] [INFO] [1740720685.704874118] [text_to_speech_node]: 播报结束,进入聆听状态...
[asr_node-3] [INFO] [1740720685.706222703] [asr_node]: 收到监听状态,开始聆听中...
[asr_node-3] [INFO] [1740720685.904607204] [asr_node]: 收到监听状态,开始聆听中...
[asr_node-3] [INFO] [1740720685.905819122] [asr_node]: 开始录音和识别...
此时可以看到OriginMan已完成ASR、LLM 文本生成以及TTS!
语音控制运行实例
OriginMan同样支持语音控制动作,请运行如下指令:
此时可以对OriginMan说出如下命令:开怀大笑、鞠躬、仰卧起坐...
以开怀大笑为例:
root@ubuntu:~# ros2 launch originman_audio_control audio_control.launch.py
[INFO] [launch]: All log files can be found below /root/.ros/log/2025-02-28-13-34-55-172981-ubuntu-39392
[INFO] [launch]: Default logging verbosity is set to INFO
[INFO] [asr_node-1]: process started with pid [39405]
[INFO] [audio_control_node-2]: process started with pid [39407]
[audio_control_node-2] [INFO] [1740720896.542396669] [audio_control_node]: 动作控制节点已启动,执行初始站立动作...
[audio_control_node-2] [INFO] [1740720897.048742408] [audio_control_node]: 执行动作组: stand
[asr_node-1] [INFO] [1740720900.783398244] [asr_node]: ASR 节点已启动,开始录音和识别...
[asr_node-1] [INFO] [1740720900.791614501] [asr_node]: 开始录音和识别...
[asr_node-1] [INFO] [1740720900.792100168] [asr_node]: 收到监听状态,开始聆听中...
[audio_control_node-2] [INFO] [1740720902.056489656] [audio_control_node]: 动作控制节点已就绪,等待语音命令...
[asr_node-1] [INFO] [1740720915.662410345] [asr_node]: 开始录音和识别...
[asr_node-1] [INFO] [1740720918.716207620] [asr_node]: 录制音频: test.wav
[asr_node-1] [INFO] [1740720918.749889649] [asr_node]: 采样率: 16000, 时长: 3.0 秒
[asr_node-1] [INFO] [1740720919.839121135] [asr_node]: 识别到用户语音: '开怀大笑。'
[asr_node-1] [INFO] [1740720919.841752929] [asr_node]: 已发布文本到 /text_input: '开怀大笑。'
[audio_control_node-2] [INFO] [1740720919.843013013] [audio_control_node]: 收到语音命令: '开怀大笑。'
[audio_control_node-2] [INFO] [1740720919.846050474] [audio_control_node]: 模糊匹配成功: '开怀大笑。' -> '开怀大笑'
[asr_node-1] [INFO] [1740720920.865806773] [asr_node]: 收到监听状态,开始聆听中...
[audio_control_node-2] [INFO] [1740720928.365244782] [audio_control_node]: 执行动作组: chest
此时即可看到OriginMan正在"开怀大笑ing"!
Attention
需要先完成联网操作,联网步骤请参考网络配置与远程开发方法