Understanding the Signaling Tsunami
A recent development of Tekelec as the diameter supplier for T-Mobile LTE deployment has once again brought forth the discussion of control plane congestion and the operator readiness to address the issues. I have been working on this area for a long time and have seen the impact first hand on several customer outages. 3GPP and the Infrastructure vendors have come a long way since the days when the first smartphones changed the trend of user behavior and network congestion. It is not just an improvement on the ‘plumbing’ of smart pipes but the overall change end-to-end that has helped mitigate many problems, both from a signaling as well as performance. QOE (quality of experience) plays a big role in network planning and management today, as users now demand more from a handset/tablet than ever before. Signaling plane control gives the operator a better leverage and control over the various elements of the network that handles customer management and quality of service along with it.
Understanding user behavior with Smartphones
User behavior in consuming more content than what is produced, the popularity of multimedia services and the lack of processing power on Smartphones results in far greater traffic flowing downstream into Smartphones than flowing upstream from Smartphones into the cloud. Naturally, communication channels are typically asymmetric reserving greater bandwidth for the downlink vs. the uplink. In spite of the asymmetric bandwidth allocation, mobile networks are seeing significant traffic pressure on the downlink due to the sheer number of applications using the networks and the multimedia heavy nature of the traffic generated by many of the applications.
In addition, a number of symmetric applications such as file sharing or Peer-to-Peer (P2P), mobile Voice over IP (VoIP) and gaming contribute to traffic on the downlink. However, P2P traffic has been on the decline over the last several years as first audio content and then video content became readily available through legitimate storefronts. In addition, relative to other traffic types, Mobile VoIP and Mobile Gaming are low bit rate services, which do not impact the overall traffic as much in spite of a very large number of concurrent sessions in use.
As Smartphones become more capable, services such as video conferencing, User Generated Content (UGC) uploads, P2P applications, video surveillance and augmented reality are gaining popularity. Further, as voice services transition to VoIP in LTE networks, the potential increased traffic from voice will also contribute to the uplink traffic mix. The aggregate of these services is making network capacity requirements more symmetric and resulting in significant pressure on the uplink capacity. Foreseeing this trend, HSPA and LTE have increased the relative throughput of the uplink vs. the downlink compared to earlier technologies18 and further LTE also allows for more flexible spectrum allocation to account for evolving network traffic usage patterns.
Nevertheless, until LTE is widely deployed, accelerated growth in uplink traffic will require immediate and unique solutions on deployed HSPA and HSPA+ networks. Machine-to-Machine (M2M) devices are expected to grow into a significant share of the devices on the network. These M2M applications have diverse network requirements varying from heavy signaling-low throughput in the case of geo-tracking to low signaling-high downlink throughput in the case of Near-Video On Demand (VOD) to low signaling-high uplink throughput in the case of webcams. Further, M2M applications may also be widely distributed in large numbers due to the low cost low maintenance nature of the Smartphones resulting in rapid growth in M2M traffic. In addition, the wide distribution of these devices will require remote manageability using OTA software updates further adding to the traffic demands on the network.
As the 4G service gets widely deployed and next-generation service starts to kick off, various types of mobile devices are introduced into the market. Providing this ever-increasing capability is the mobile operating system, which is also evolving at a rapid pace, including the provision of frequent updates or upgrades to the end users for downloading to their mobile device? As the rapid growth of various mobile devices being supported by UMTS/LTE network containing smartphones, network congestion becomes one of the biggest is smartphones, highlighted by the popularity of smartphones and M2M usage. However, the capacity crunch is not only due to smartphones and increasing user data traffic, but also in increasing signaling traffic. Due to the fact that the applications that run on smartphones and M2M have different behavioral characteristics compared to that of traditional voice and simple data, which run on the wireless network, they pose a large signaling traffic challenge to the networks which support. For example, smart phones and M2M have an increasing number of applications that send only a small amount of data, but the transmission frequency of the packets is relatively high.
Users of smartphones make constant queries to the network as they move among cell sites to push email, access social networking tools and conduct other repetitive actions. These always-on applications also rely on keep-alive messages. As a result, while data traffic is growing, by many accounts, signaling traffic is outpacing actual mobile data traffic by 30 to 50 percent, if not higher. As another example, a web-based IM user may send a message but then wait a couple of seconds between messages. To preserve battery life, the smartphone moves into idle mode. When the user pushes another message seconds later, the device has to set up a signaling path again. The base station controller (i.e., RNC) spends a lot of its resources trying to process such signaling, which prevents it from doing other things like allocating additional resources for data. There are several common smartphone behaviors which could potentially bring signaling storm to the network and need to be considered and handled within the ecosystem. Some key behaviors are:
- Fast dormancy
- Heartbeat for always-on application
- Constant push service
- Network (re-)attachment
Fast Dormancy/URA PCH in UMTS
It is essential to keep the smartphones power consumption low while having to do frequent transmissions. The original idea in 3GPP was to provide the Radio Resource Control (RRC) states Cell_PCH and URA_PCH to allow for very low samrtphones power consumption. Here, when a data transmission is over, the phone is moved to PCH state quickly and there would be no need to move phone to idle state for low power consumption. The motivation to avoid idle state is because the next packet would then cause packet connection (PS RAB) setup leading to increased latencies and increased signaling traffic in the network.
It is common practice that many networks are configured with relatively long inactivity timers for Cell_DCH and for Cell_FACH states, resulting in infrequent transitions to the PCH state. As a result, transmissions of even a small packet may lead to high battery drain. In order to keep phone power consumption low, Smartphones have implemented proprietary (that is, a functionality not defined by 3GPP standards) Fast Dormancy. When using such a Fast Dormancy, the mobile application informs the radio layers when the data transmission is over, and the phone can then send Signaling Connection Release Indication (SCRI) to the RNC simulating a failure in the signaling connection. Consequently, the phone releases RRC connection and moves to idle state. That approach keeps the phone power consumption low, but it causes frequent setup of packet connections unnecessarily increasing network signaling load. There are also differences between different phone vendors on how they implement Fast Dormancy functionality. In addition to the high signaling load, the network counters indicate a large number of signaling connection failures as this battery saving method cannot be distinguished from a genuine signaling connection failure in the network.
RNC Control Plane Signaling
RNC protocol or RANAP (Radio Access Network Application Part) protocol has the following functions:
- Relocating serving RNC. This function enables to change the serving RNC functionality as well as the related Iu resources (RAB(s) and Signaling connection) from one RNC to another.
- Overall RAB management. This function is responsible for setting up, modifying and releasing RABs.
- Queuing the setup of RAB. The purpose of this function is to allow placing some requested RABs into a queue, and indicate the peer entity about the queuing.
- Requesting RAB release. While the overall RAB management is a function of the CN, the RNC has the capability to request the release of RAB.
- Release of all Iu connection resources. This function is used to explicitly release all resources related to one Iu connection.
- Requesting the release of all Iu connection resources. While the Iu release is managed from the CN, the RNC has the capability to request the release of all Iu connection resources from the corresponding Iu connection.
- SRNS context forwarding function. This function is responsible for transferring SRNS context from the RNC to the CN for intersystem change in case of packet forwarding.
- Controlling overload in the Iu interface. This function allows adjusting the load in the control plane of the Iu interface.
- Resetting the Iu. This function is used for resetting an Iu interface.
- Sending the UE Common ID (permanent NAS UE identity) to the RNC. This function makes the RNC aware of the UE’s Common ID.
- Paging the user. This function provides the CN for capability to page the UE.
- Controlling the tracing of the subscriber or user equipment activity. This function allows setting the trace mode for a given subscriber or user equipment. This function also allows the deactivation of a previously established trace.
- Transport of NAS information between UE and CN (see ). This function has two sub-classes:
- Transport of the initial NAS signaling message from the UE to CN. This function transfers transparently the NAS information.
- Transport of NAS signaling messages between UE and CN, This function transfers transparently the NAS signaling messages on the existing Iu signaling connection.
- Controlling the security mode in the UTRAN. This function is used to send the security keys (ciphering and integrity protection) to the UTRAN, and setting the operation mode for security functions.
- Controlling location reporting. This function allows the CN to operate the mode in which the UTRAN reports the location of the UE.
- Location reporting. This function is used for transferring the actual location information from RNC to the CN.
- Data volume reporting function. This function is responsible for reporting unsuccessfully transmitted DL data volume over UTRAN for specific RABs.
- Reporting general error situations. This function allows reporting of general error situations, for which function specific error messages have not been defined.
- Location related data. This function allows the CN to either retrieve from the RNC deciphering keys (to be forwarded to the UE) for the broadcast assistance data, or request the RNC to deliver dedicated assistance data to the UE.
- Information Transfer. This function allows the CN to transfer information to the RNC.
- Uplink Information Exchange. This function allows the RNC to transfer or request information to the CN. For instance the RNC has the capability to request MBMS specific information to the CN e.g. the Multicast Service lists for a given UE or the IP Multicast Address and APN for one or several MBMS Bearer Services.
- MBMS RANAP overall function. This function allows the following different sub-functions:
- MBMS RAB management. This function is responsible for setting up, updating and releasing the MBMS RAB as well as the MBMS Iu signaling connection corresponding to one MBMS Session. The MBMS RAB is defined for the CN PS domain only.
- MBMS CN (PS domain) de-registration. This function makes the RNC aware that a given Multicast Service is no longer available.
- MBMS UE linking/de-linking. This function makes the RNC aware that a given UE, with existing Iu-ps signaling connection, has joined/left some Multicast Service(s).
- Requesting MBMS Service registration/de-registration. While the overall MBMS CN de-registration is a function of the CN (PS domain), the RNC has the capability to register/de-register to a specific Multicast Service.
These functions are implemented by one or several RANAP elementary procedures described in the following clause.
LTE NAS signaling
LTE is structured quite differently from HSPA, and so LTE signaling is handled differently. LTE has a flat architecture without separate radio network controllers (RNCs), and the number of state transitions (that is, number of steps from Idle to Active in the handset, each of which generates signals) is minimized compared to 3G. These design changes have been specifically planned so that signaling will not cause an issue in the network the way it has for some operators in HSPA. As an industry, we’ve seen what kind of evil knock-on effects uncontrolled signaling can have in 3G, and we’re determined not to see this issue repeat itself in LTE. LTE is developed to have a simpler architecture (fewer nodes) and less signaling (fewer messages) than UTRAN. Also, the number of states which the phones can be in corresponding to RRC states) are reduced from 5 in UTRAN (DETACHED, IDLE, URA_PCH, CELL_FACH, CELL_DCH) to only 3 in E-UTRAN (DETACHED, IDLE and CONNECTED). Furthermore, the area concept is somewhat simplified in LTE compared to UTRAN. In LTE only one area for idle mode mobility is defined: the Tracking Area (TA). In UTRAN, Routing Area (RA) and UTRAN Registration Area (URA) is defined for PS traffic and Location Area (LA) for CS traffic. In ECM-IDLE (EPS Connection Management IDLE) the SMARTPHONES position is only known by the network on TA level in case the SMARTPHONES is EMM-Registered. In ECM-ONNECTED the phone location is known on cell level by the eNB.
However, it’s important for us to emphasize that we don’t have the end to end LTE smartphone environment in place yet, since by and large, the end device is still missing: LTE smartphones have not yet been produced in bulk. Only when we see how all device manufacturers have used LTE in their handsets can we be completely confident about what level of LTE signaling will be generated. In HSPA, signaling wasn’t an issue with smartphones at all before handset manufacturers developed Fast Dormancy software that improved battery life while greatly increasing signaling. Likewise, we’ll need to see exactly how LTE smartphones use the network before we’ll be able to give accurate predictions about any LTE signaling volumes.
There are two signaling protocols that drive most of the communications in IMS networks: Session Initiation Protocol (SIP) and Diameter. SIP is the industry standard for message signaling in real-time communications such as Voice over IP (VoIP) and videoconferencing sessions. Diameter is the industry standard for data signaling from mobile devices such as smartphones and tablets. The two protocols are designed to perform separate but complementary functions in IMS/LTE networks. For example, when a SIP session is initiated, Diameter messages are working behind the scenes within the core network to authenticate that the subscriber is who they say they are, is authorized to use certain network services or applications, and is charged correctly for using those services.
Until now, Diameter traffic hasn’t been a big issue for service providers simply because of the limited number of IMS subscribers. But as service providers begin to deploy more 4G/LTE networks to meet the mobile broadband demand of smartphones and tablets, the number of Diameter signaling messages in service provider networks will grow exponentially. Smartphones, for example, generate a lot of Diameter signaling messages in the core network: when they’re accessing an application, when they’re downloading data, when they’re roaming, even when they’re simply being turned on and off. Multiply dozens of Diameter signaling messages by millions of smartphones, and you now have potential chokepoints in the network that can disrupt service or even take down the network. Today, there are over 75 unique Diameter signaling interfaces assigned to specific IMS and LTE network elements. Some have likened the complexity of Diameter signaling to the problems originally presented by Signaling System 7 (SS7) in the first wave of mobile networks. Others have found a more recent parallel in the explosion of SIP traffic that appeared with the popularization of VoIP. And just as the need to handle large amounts of SIP traffic led to the development of the Session Border Controller (SBC), the anticipated increase in Diameter signaling traffic has resulted in the introduction of a new product category, the Diameter Signaling Controller (DSC), and it subsets, the Diameter Routing Agent (DRA) and Diameter Edge Agent (DEA).
To overcome these challenges, service providers are deploying Diameter signaling controllers (DSCs) to provide critical congestion control, mediation and routing functions for Diameter signaling. These functions reduce costs, streamline networks and ensure resiliency for LTE and IMS networks. The management and mediation of Diameter signaling enables the seamless communication and control of AAA, charging, QoS and mobility information between network elements within LTE or IMS networks and across LTE network borders. Like session border controllers (SBCs), they incorporate many of the same routing functions and security, interoperability and related signaling control, but are targeted for the Diameter protocol instead of SIP. DSCs fulfill the role of the previously described Diameter agent as defined in the Diameter protocol (RFC 3588), serving as proxy or relay agents between clients and servers. DSCs are intermediaries in the Diameter transaction and exert the necessary control functions to facilitate successful completion. DSCs fulfill two major functional elements defined by 3GPP: the Diameter Routing Agent (DRA) and the Subscriber Location Function (SLF). These functions are proxies in the middle of specific Diameter transactions and assist in completing the exchange of critical subscriber information. The DRA is associated with load balancing clusters of PCRF servers, and the SLF is charged with discovery of the appropriate HSS for a given subscriber. Both elements assume a large installation of PCRF and HSS elements. Diameter routers are a commonly used, but inadequate term, for the DSC product category. To provide a complete solution, routing is only the starting point.
To address those challenges outlined earlier, the major features that DSCs provide are:
- Broad Diameter interface support
- Dynamic and intelligent routing
- Load balancing across Diameter servers
- Overload control and denial of service (DoS) attack prevention
- Diameter protocol mediation and normalization
- Transport protocol and IP address interworking
- Aggregation of messages and reporting of key performance metrics
Data & VoLTE
LTE Data and VoLTE Roaming: Diameter signaling controllers secure the Diameter signaling border between visited and home service providers so roaming subscribers can access data and voice services. DSCs also enable IPX carriers and roaming hubs to evolve their business and support multilateral LTE roaming services.
This addresses control of Diameter signaling across IP borders including:
• Authentication between visited MME and home HSS using the S6a interface
• QoS and charging information between visited and home PCRF policy servers using S9
Federated Service Delivery: Diameter signaling controllers manage Diameter traffic between a broadband provider and a MVNO or over-the-top/cloud provider, enabling new revenue-sharing business models, while ensuring an optimal user experience. This addresses exchange of authentication, charging and QoS information across network borders.