当前位置：首页 > news >正文

WebRTC音频 04 - 关键类

news 2025/9/18 4:03:34

WebRTC音频01 - 设备管理
WebRTC音频 02 - Windows平台设备管理
WebRTC音频 03 - 实时通信框架
WebRTC音频 04 - 关键类(本文)

一、前言：

在WebRTC音频代码阅读过程中，我们发现有很多关键的类比较抽象，搞不清楚会导致代码阅读一脸懵逼。比如PeerConnection、Call、AudioState、Channel、Stream，本文就尽力介绍下。

二、关键类关系图：

在这里插入图片描述

一个PeerConnection拥有一个Call，来管理和对端的会话；可以看出Call里面主要负责创建一些收发的Stream；
但是PeerConnectionFactory是全局唯一的;
同时CompositeMediaEngine属于PeerConnectonFactory，因此，MediaEngine也是唯一的;
因此WebRtcVoiceEngine也是唯一的，里面的AudioState也是全局唯一，来处理众多PeerConnection中的Call过来的收发流;

三、PeerConnection：

1、职责：

可以理解成一个Socket Plus，比如我们有一个Mesh架构，里面有三个终端C1\C2\C3，他们互相之间都可以P2P通信。当同时加入一个会议中，那么，C1需要创建两个PeerConnection，一个负责和C2通信，另外一个负责和C3通信；

由于PeerConnection位于核心层最上面，需要和session层的API进行交互，因此，它首先得实现 PeerConnectionInterface API ；

PeerConnection单独负责的有：

管理会话状态机（信令状态）。
创建和初始化较低级别的对象，如 PortAllocator 和 BaseChannels；
拥有和管理 RtpSender/RtpReceiver 和跟踪对象的生命周期；
跟踪当前和待处理的本地/远程会话描述;

共同负责的有：

解析和解释 SDP;
根据当前状态生成Offer和Answer;
ICE 状态机;
成统计数据;

2、创建时机：

开始呼叫的时候会要做最主要的三件事（可以参考Conductor::InitializePeerConnection）:

CreatePeerConnectionFactory（创建PeerConnection工厂对象）;
CreatePeerConnection（就创建Peerconntion了）;

最后AddTracks（添加Track到PeerConnection中，track后面会介绍，理解成一条音频或者视频流即可）；

上面这些步骤在介绍音频架构的文章中都有出现。可以看看代码：

bool Conductor::InitializePeerConnection() {// 这函数前四个参数如果你想使用自己定义的线程函数，那么就传入，否则就使用的是webrtc内部的默认函数peer_connection_factory_ = webrtc::CreatePeerConnectionFactory(nullptr /* network_thread */, nullptr /* worker_thread */,nullptr /* signaling_thread */, nullptr /* default_adm */,webrtc::CreateBuiltinAudioEncoderFactory(),webrtc::CreateBuiltinAudioDecoderFactory(),webrtc::CreateBuiltinVideoEncoderFactory(),webrtc::CreateBuiltinVideoDecoderFactory(), nullptr /* audio_mixer */,nullptr /* audio_processing */);// 创建PeerConnection对象if (!CreatePeerConnection(/*dtls=*/true)) {main_wnd_->MessageBox("Error", "CreatePeerConnection failed", true);DeletePeerConnection();}// 添加track到PeerConnection中AddTracks();return peer_connection_ != nullptr;
}

上面的CreatePeerConnection就会创建出来PeerConnection；

四、Call：

1、职责：

Call代表着和某一个终端的会话，管理者通话的整体流程和状态；一个Call对象可以包含多个发送/接收流，且这些流对应同一个远端端点，并共享码率估计。具体职责如下：

在WebRtc内部职责：

创建/销毁 AudioReceiveStream、AudioSendStream；
创建/销毁 VideoSendStream、VideoReceiveStream；

开放给上层应用的功能：（通过PeerConnection开放）

发送码率设置（包含最大码率、最小码率、初始码率，初始码率作为编码器的初始参数以及带宽估计的先验值）；
提供获取传输统计数据途径（包含估算的可用发送带宽、估算的可用接收带宽、平滑发送引入的延迟、RTT估计值、累计的最大填充bit）；
提供获取所有发送的数据包回调；
另外其还持有PacketReceiver对象，因此，所有接收到RTP/RTCP数据包，也将经过Call。

2、创建时机：

Call的创建还是和大多数WebRtc模块一块，通过工厂模式创建，先创建工厂，再创建自己；

Factory创建：

// 文件路径：.\api\create_peerconnection_factory.cc
rtc::scoped_refptr<PeerConnectionFactoryInterface> CreatePeerConnectionFactory(//...) {// 具体参数都打包，为了方便向下传递PeerConnectionFactoryDependencies dependencies;dependencies.call_factory = CreateCallFactory(); // 创建了CallFactory// ...
}

看到创建了call_factory并放入dependencies，继续往下传给后续流程。

创建Call对象：

在CreatePeerConnection -> CreatePeerConnectionOrError 处创建

RTCErrorOr<rtc::scoped_refptr<PeerConnectionInterface>>
PeerConnectionFactory::CreatePeerConnectionOrError(const PeerConnectionInterface::RTCConfiguration& configuration,PeerConnectionDependencies dependencies) {// ... std::unique_ptr<Call> call = worker_thread()->Invoke<std::unique_ptr<Call>>(RTC_FROM_HERE,[this, &event_log] { return CreateCall_w(event_log.get()); }); // 创建了Call// 创建PeerConnectionauto result = PeerConnection::Create(context_, options_, std::move(event_log),std::move(call), configuration,std::move(dependencies));return result_proxy;
}

可以看出在工作线程通过调用CreateCall_w创建了一个call对象，并在PeerConnection创建的时候传给PeerConnection，这样，以后PeerConnecton就持有了Call；

看看CreateCall_w：

// 文件路径：pc\peer_connection_factory.cc
std::unique_ptr<Call> PeerConnectionFactory::CreateCall_w(RtcEventLog* event_log) {// 前面主要是设置一些收发流相关的参数，省略.// 调用工厂类创建具体Call对象return std::unique_ptr<Call>(context_->call_factory()->CreateCall(call_config));
}

然后就是调用Factory创建Call：

// 文件路径：call\call_factory.cc
Call* CallFactory::CreateCall(const Call::Config& config) {RTC_DCHECK_RUN_ON(&call_thread_);absl::optional<webrtc::BuiltInNetworkBehaviorConfig> send_degradation_config =ParseDegradationConfig(true);absl::optional<webrtc::BuiltInNetworkBehaviorConfig>receive_degradation_config = ParseDegradationConfig(false);if (send_degradation_config || receive_degradation_config) {return new DegradedCall(std::unique_ptr<Call>(Call::Create(config)),send_degradation_config, receive_degradation_config,config.task_queue_factory);}if (!module_thread_) {module_thread_ = SharedModuleThread::Create(ProcessThread::Create("SharedModThread"), [this]() {RTC_DCHECK_RUN_ON(&call_thread_);module_thread_ = nullptr;});}// 调用Call的静态Create方法创建return Call::Create(config, module_thread_);
}

这样Call对象就被创建出来了。

五、AudioState：

1、职责：

前面的WebRtcVoiceEngine里面有个AudioState成员变量，这个变量非常重要，主要负责管理AudioTranport模块（里面又有AudioMixer和AudioProcessing）和Adm模块，AudioState有两个，Call模块有一个，引擎模块AudioEngine里面也有一个，Call模块里面的主要定义一些接口，并且创建AudioEngine里面的Call。

小结一下：

AudioState并不是管理音频状态，实际可以理解成一个音频上下文；
AudioState主要管理两个模块Adm和AudioTransport；
其中adm主要管理音频硬件的；
其中AudioTransport主要通过Mixer和Processing模块来进行混音和3A处理；

2、定义：

// 文件路径：audio\audio_state.h
class AudioState : public webrtc::AudioState {public:// ...AudioProcessing* audio_processing() override;AudioTransport* audio_transport() override;void SetPlayout(bool enabled) override;void SetRecording(bool enabled) override;void SetStereoChannelSwapping(bool enable) override;AudioDeviceModule* audio_device_module() {RTC_DCHECK(config_.audio_device_module);return config_.audio_device_module.get();}void AddReceivingStream(webrtc::AudioReceiveStream* stream);void RemoveReceivingStream(webrtc::AudioReceiveStream* stream);void AddSendingStream(webrtc::AudioSendStream* stream,int sample_rate_hz,size_t num_channels);void RemoveSendingStream(webrtc::AudioSendStream* stream);private:// Transports mixed audio from the mixer to the audio device and// recorded audio to the sending streams.AudioTransportImpl audio_transport_;// Null audio poller is used to continue polling the audio streams if audio// playout is disabled so that audio processing still happens and the audio// stats are still updated.std::unique_ptr<NullAudioPoller> null_audio_poller_;std::unordered_set<webrtc::AudioReceiveStream*> receiving_streams_;struct StreamProperties {int sample_rate_hz = 0;size_t num_channels = 0;};std::map<webrtc::AudioSendStream*, StreamProperties> sending_streams_;
};

看得出，主要是对Stream做一些创建、删除等操作；

AddReceiveingStream：将要处理的音频接收流添加到AudioState中；
AddSendingStream：将要处理的音频发送流添加到AudioState中；
AudioTransportImpl：
- RecordedDataIsAvailable：拿到录制后的数据；
- NeedMorePlayData：向扬声器喂更多的数据；
发现WebRtcVoiceEngine持有AudioState，并且还持有ADM、Mixer、AudioProcessing（主要进行3A处理），但是，这哥仨的状态维护是AudioState来完成的（WebRtcVoiceEngine是这哥仨它爹，负责生了这仨，AudioState是这哥仨部门经理，负责管理派活）；
AudioState持有AudioDeviceModule，就说明AudioState是底层硬件设备和上层应用之间的桥梁，上层应用想控制底层设备的采集与播放，必须通过AudioState；AudioState再控制AudioDeviceModule对硬件进行操作；

3、创建时机：

调用栈如下：

WebRtcVoiceEngine::Init()
CompositeMediaEngine::Init()
ChannelManager::Init()
ConnectionContext::Create()
PeerConnectionFactory::Create()
CreateModularPeerConnectionFactory()
CreatePeerConnectionFactory()

也就是说在音频引擎的初始化函数里面创建的，看看源代码（非关键代码已经删除）

// 文件路径：media\engine\webrtc_voice_engine.cc
void WebRtcVoiceEngine::Init() {// Set up AudioState.{webrtc::AudioState::Config config;if (audio_mixer_) {config.audio_mixer = audio_mixer_;} else {config.audio_mixer = webrtc::AudioMixerImpl::Create();}config.audio_processing = apm_;config.audio_device_module = adm_;if (audio_frame_processor_)config.async_audio_processing_factory =new rtc::RefCountedObject<webrtc::AsyncAudioProcessing::Factory>(*audio_frame_processor_, *task_queue_factory_);audio_state_ = webrtc::AudioState::Create(config);}
}

// 文件路径：audio\audio_state.cc
rtc::scoped_refptr<AudioState> AudioState::Create(const AudioState::Config& config) {// 创建一个使用引用计数管理的AudioState对象return new rtc::RefCountedObject<internal::AudioState>(config);
}

备注：AudioState创建的对象使用智能指针管理的，不明白WebRTC智能指针的，可以看看我的另外一篇文章：WebRTC基本类 - 智能指针（RefCountedObject和scoped_refptr）-CSDN博客

4、Call和AudioState的关系：

前面介绍了Call代表一个通话，管理整个通话的流程和状态。里面提供了Audio/Video的Send/Receive相关的Stream。但是刚才看到AudioState里面也有一些Add/Remove Stream相关操作，他俩之间什么关系呢？其实Call主要是建立连接、与远端通信、处理媒体数据流、处理通话状态等。但是自己并不能操作到硬件，而AudioState又持有adm指针（config_.audio_device_module），因此Call需要调用AudioState的方法来完成具体的音频数据的采集、处理、播放等动作。

六、Stream和Channel：

前面分析媒体协商的时候我说过，WebRtc含有很多种Channel、Stream，非常混乱，搞不清楚会导致阅读代码n脸懵逼。因此，我们必须要弄懂它：

1、API层：

我们从API层（Web API和Native API都一样）看到的有Stream和Track两个概念，其中Track就表示一条媒体源，比如音视频会议中，一路音频是一个auidioTrack，一路视频又是一个videoTrack。而Stream就是将音频Track和视频Track打包起来，作为一路流（MediaStream）。

2、媒体引擎层：

引擎层又有Stream和Channel两个概念，比如WebRtcVoiceMediaChannel和WebRtcAudioSendStream。由于引擎层一个Channel其实就代表的和一个编解码器的连接，因此，需要分类管理，Audio和Audio放一起，Video和Video放一起。而这个Channel中包含的音频、视频它又叫做stream，比如AudioSendStream、AudioReceiveStream打包起来叫做VoiceMediaChannel。

基本相当于引擎层的Channel对应API层的Track。

3、Call层：

Call层作为PeerConnecton的得力干将，主要负责将这个业务给拉起来。它里面也有Stream和Channel的概念，和引擎层相反，是Stream里面包含Channel（我猜这么设计的原因是，Call层偏重于业务的概念，我一条业务流只能包含一个方向的一种媒体流），这又要分为音频和视频了：

音频：
1. Stream是有方向的，要么是Send，要么是Receive方向；
2. Stream里面包含Channel，Channel也是有方向的；
视频：
1. 只有Stream，没有Channel的概念，并且Stream也是有方向的。要么是Send，要么是Receive；

4、小结：

引擎层是将audio和video分类管理，一个Channel中包含多个stream，可以是send也可以是recv。Call层的音频Stream中有Channel，但是，视频的Stream中并没有Channel。

关注公众号，和你分享优质资源：
在这里插入图片描述

一、前言：

二、关键类关系图：

三、PeerConnection：

1、职责：

2、创建时机：

四、Call：

1、职责：

2、创建时机：

五、AudioState：

1、职责：

2、定义：

3、创建时机：

4、Call和AudioState的关系：

六、Stream和Channel：

1、API层：

2、媒体引擎层：

3、Call层：

4、小结：

相关文章：