Openflow 1.X Discussion

From OpenFlow Wiki

Jump to: navigation, search

This is a placeholder page to keep features that may not make it into OpenFlow 1.0 but that are candidates for a future OpenFlow Version 1.X. This is a wish list, the exact scope of the future release (or releases) will change.

Contents

Current status (post 1.1)

This page was created following the OpenFlow 1.0 specification effort, and prior to the OpenFlow 1.1 specification effort. Now that OpenFLow 1.1 is done, most of the items described on this page tend to be, IMHO, obsolete. I've added comments on each of them in light of the current advance in OpenFlow. Jean

Sophisticated Failover/Load Balancing

Proposer:

What: A combined primitive that allows load balancing and failover

Why: The vast majority of real-world OpenFlow deployments will run with redundant controllers. For controllers that handle large flow volumes, it would additionally be good to allow load balancing between different controller. As these two techniques are related, it seems to make sense (and be easy enough) to combine the two. Now technically one can argue that connecting to multiple servers is not part of the OpenFlow protocol, but it seems close enough to include it in the OpenFlow Protocol spec.

How: An OpenFlow must be able to connect to multiple (at least 16) controllers in parallel. It must support the following failover/load balancing modes:

  • All: Send all messages to all controllers (changed name from Flood - would get confused with STP meaning)
  • Hash: Send each openflow message to one controller. Which controller is determined by the hash of the OpenFlow message.

An OpenFlow Switch must accept messages from all controllers. Responses to messages must be sent to the controller the message came from if that controller is still available.

An OpenFlow Switch should check if a controller is alive by sending it echo messages at regular intervals.

Questions:

  • What happens if a switch receives no echo reply from a controller, and uses the hashing method? Does it drop the switch from its active controller list, and immediately forward to the others, changing the hash distribution? This could create consistency issues - in practice, I imagine those wishing to handle controller failures with minimal downtime would use the All option, and handle consistency and load balancing at the controller level.

Jean post-1.1 update: Current versions of Open vSwitch implement an alternate mechanism which I believe is better. The switch connect simultaneously to multiple controllers, and one of the controller elect itself the master. The advantage is that both load balancing and fail-over are entirely dictated by the controllers, increasing flexibility and reducing complexity.

Topology Discovery

Proposer:

What: Perform topology discovery in OpenFlow switches (not controllers).

Why: There are two problems with making controllers responsible for topology discovery. However, for a small amount of extra work in the switch, we can make discovery cheap, reliable, and simpler. The problems:

  1. Discovery cost increases linearly as more controllers are added (e.g., when we run several guest controllers through the FlowVisor).
  2. Discovery packets don't reliably work when sent by the controller
    • For heavily loaded links, they might get dropped.
    • When there are many ports, the controller can't send discovery packets fast enough.

Benefits:

  1. Discovery cost is independent of the number of controllers.
  2. Discovery packets sent by the switch can be appropriately prioritized so they get through even on heavily loaded links.

How:

  1. Make the switch responsible for sending out LLDP packets at a regular interval. (each is stamped with the sender's datapath ID and outgoing port)
  2. Make the switch responsible for processing LLDP packets. (to learn what it is connected to).
    • Approach 1: This requires new messages (bad) but makes link changes induce minimally sized messages (good).
      • Link up. Whenever it detects a link to a new switch, it sends a (new) message to the controller to tell it about the new link. (Switch-->Controller)
      • Link down. Whenever it detects a link is down (no new LLDP packets received for some period of time) it sends a message to the controller to tell it about the loss of the link. (Switch-->Controller)
      • Get links. Ask a switch to send "Link up" messages for all known links. Useful when a controller first connects to the switch. (Controller-->Switch)
    • Approach 2: Use existing structures and messages (good) but updates have lots of extra info (bad)
      • Use ofp_port_status by extending ofp_phy_port (see below) to also include an array of switches it is connected to (datapath ID and port number).
      • Whenever it detects a new link is added or and old one has gone down, send an OFPT_PORT_STATUS message with code OFPPR_MODIFY).
  3. Nuances.
    • New switch configuration parameters:
      • LLDP send interval - how often to send an LLDP packet out each port
      • Link down timeout - how long to wait for an LLDP packet before expiring a connection to a switch
    • The "LLDP" packet format should be specified - it should simply contain the datapath ID and outgoing port. Perhaps a TTL too?

Extended ofp_port_status:

 struct ofp_conn_switch {
   uint64_t dpid;  /* datapath ID of the connected switch */
   uint16_t port;  /* port on the other switch we are connected to */
 };
 
 struct ofp_phy_port {
    uint16_t port_no;
    uint8_t hw_addr[OFP_ETH_ALEN];
    uint8_t name[OFP_MAX_PORT_NAME_LEN]; /* Null-terminated */
 
    uint32_t config;        /* Bitmap of OFPPC_* flags. */
    uint32_t state;         /* Bitmap of OFPPS_* flags. */
 
    /* Bitmaps of OFPPF_* that describe features.  All bits zeroed if
     * unsupported or unavailable. */
    uint32_t curr;          /* Current features. */
    uint32_t advertised;    /* Features being advertised by the port. */
    uint32_t supported;     /* Features supported by the port. */
    uint32_t peer;          /* Features advertised by peer. */
 
    uint32_t num_conns;              /* Number of switches connected to this port. */
    struct ofp_conn_switch conns[0]; /* Switches connected to this port. */
 };

Jean post-1.1 update: Topology discovery is not a performance critical function, so moving it to the switch does not seem necessary. Priority queuing introduced in 1.0 will solve the packet priority problem. Multicast groups introduced in 1.1 enables to send packet to multiple ports cheaply. And Failover groups introduced in 1.1 enable quick traffic handover and reduce the need to perform topology discovery in a timely manner. Note that one way to have the switch handles everything would be for the controller to dump the existing LLDP database of the switch via SNMP.

Quality of Service

Proposer:

What: Add QoS primitives to OpenFlow.

Why: Everyone agrees that QoS is a critical feature. The open question for 0.9 is if we can define a QoS mechanism that is both general enough and works on existing hardware in the short time frame that we are targeting. If not, we would move this to a future release (e.g. 1.5 or 2.0).

What: The current proposal is:

  • Each flow entry would additionally contain a queue that a flow is mapped to
  • An OpenFlow switch assigns the packet that matches a flow entry to that queue
  • The OpenFlow protocol would have additional commands to set constraints for queues, specifically:
    • Maximum Data-rate (i.e. The queue is policed so as not to exceed a specified data rate. If the queue receives packets at too high a rate, packets are dropped.)
    • Minimum Data-rate (i.e. A promise that a queue will receive at least a specified minimum outgoing data rate).
    • Strict priorities. (i.e. Queues would have strict priority levels; e.g. 4 or 8. A queue at a given priority level is only served if all the queues with higher priority level are empty).
  • We would also like to have similar capabilites to throttle the traffic from the data path to the control path on the switch. This would allow for prioritization and graceful degradation if the control path is overloaded.
  • We assume there is a minimum number of queues that all compliant switches must support (e.g. 8). But the number of supported queues is expected to vary (i.e. depends on the switch, allowing room for vendors to compete/differentiate) and can be queried by the controller.

There was an alternative approach with virtual ports that was previously discussed. Our understanding is that this would not work well with existing hardware architectures.


Jean post-1.1 update: Priority queuing was added to 1.0, but the lack of a config protocol makes the configuration of those queues complex. A config protocol is considered for 1.2. Per-flow Rate Limiters are also considered for 1.2.

External Data Protocol

Proposer:

What: A separate protocol to set/get state on the switch

Why: There seems to be a need to access switch information via an external interface that is separate form OpenFlow. Information includes:

  • Port information
  • Statistics
  • Topology information
  • etc.

The consensus seems to be that the best way to do this is not via the OpenFlow protocol, but via a different protocol. Most frequently named candidates are SNMP and Netconf.

How: As this does not affect the OpenFlow protocol itself, this will likely be a different spec and should thus not affect the OpenFlow protocol spec 0.9.


Jean post-1.1 update: Such config protocol is under consideration for 1.2.

Efficient representation of multiple ports

Proposer:

What: Bitmap-based port representation.

Why: For multicast and ECMP, you may want to select large numbers of ports. 16 bits per port * lots of ports is way less efficient, bandwidth-wise, than a packed bitmap representation.

How: Define a bitmap struct.

Note: This may be premature optimization, and we should do profiling on the CPU cost and TCP bandwidth costs before doing any code.


Jean post-1.1 update: Those were introduced in 1.1.

Multiple VLAN Tag Support

What: Support matching on multiple VLAN tags

Why: Although not common, there are an increasing number of network deployments that have packets with multiple VLAN tags. Typically two (inner and outer) is enough, though in theory any number could occur.

How:


Jean post-1.1 update: Implemented in 1.1.

Add ofp_match to Packet-In Messages

What: Add the ofp_match structure to Packet-In messages.

Why:

How:


Jean post-1.1 update: Why ? The vast majority of packet-in did not match any entry in the table, and the controller knows what it set in the table, so it looks useless.

Add flag to disable Packet-In Messages for a Controller

What: Add a flag to disable packet-in messages to the controller.

Why:

How:


Jean post-1.1 update: Done in 1.1, per table setting.

Report flow-table entries affected by flow-mod

What: Report the number of flow-table entries affected by a flow-mod command. (Unsure if it should apply to adds only or all flow-mods.)

Why: Some flow-mods may result in multiple flow-table entries being consumed. Common examples include matches with a wild-carded input port -- numerous current switches expand the flow-match into a match per port.

How: Return an acknowledgement for flow-mod messages which includes the number of flow-table entries occupied/affected.


Jean post-1.1 update: Discussed during 1.1, rejected because dubious value and does not really solve the problem at hand.

Define IPv4 Forward Action

What: Define an IPv4 forward action.

Why: If using OpenFlow in an L3 network, there is currently no way to decrement the TTL and recalc the checksum, so traceroute breaks.

How: Define an IPv4 forward action, based on RFC 1812.


Jean post-1.1 update: TTL set and decrement were added in 1.1.

Automatic ejection of flows when ports go down

Proposer: Martin Casado

What: Automatically eject flows when ports go down.

Why:

How:


Jean post-1.1 update: Implemented as failover group in 1.1.

Specify fast-path/destination table for flow adds

Proposal to be replaced by a better mechanism

Proposer: Glen Gibb/Rob Sherwood

What: Enable the destination table to be specified when adding flows. This could implemented either as place this flow in the fast path or place this flow in table X.

Why: A controller may want guarantee that a newly added flow will be processed entirely in the fast-path.

How: Two possible approaches:

  1. Add an additional flag to flow-mod messages to require the flow be processed in the fast-path (only applicable when adding a flow). An error should be generated if the flow can't be processed in the fast-path.
  2. Add a field to flow-mod messages to allow the destination table to be specified (only applicable when adding a flow). An error should be generated if the flow can't be added to the specified table.

The first approach provides a simple mechanism to ensure that a particular flow is processed in hardware.

The second approach enables the controller to have much finer-grained control over where flows are placed. The disadvantage of this approach is that it requires controllers to understand the features of each table in the switch, or it requires table types to be defined in the OpenFlow specification.

Note: This is only relevant to switches that distinguish between fast-path and slow-path. (ie. it probably doesn't apply to software-only switches)

Discussion: After internal discussion we have concluded that this is not the best way to proceed. A better mechanism will be proposed for a future release.


Jean post-1.1 update: Multiple table included in 1.1 is a superset of that proposal.

Deferred Spec Clarifications

Tagged and Untagged Matching

What: The matching and actions on VLAN tags should be clarified:

  • For matching, how do you indicate the difference between:
    • Requiring a VLAN tag, but wildcarding the VID; or
    • Requiring that there be no VLAN tag; or
    • Matching whether or not there is a VLAN tag?
  • For actions, clarify the use of the vlan_vid tag in ofp_action_vlan_vid to indicate result of action.

Why: A specific value of the vid_tag has a special meaning and other values are not specified as reserved or unused.

How: For matching, update Section 5.2.2 (in Appendix A -- why does an appendix get a section number?) with the following:

  • The high order 4 bits in the dl_vlan member are flags interpreted as follows:
    • Flags set to 0: Match on the VLAN id given in the lower 12 bits.
    • Flags set to 0xF: Match on packets that do not have a VLAN tag. For backwards compatibility, the VLAN id in the lower 12 bits should be 0xFFF.
    • All other values of flags are reserved and must not be used.

For actions, update Table 5 in the Set VLAN ID entry:

  • In Associated Data, indicate 4 bits of flags and 12 bits of VLAN id
  • In the Description:
    • Flags set to 0 indicates add a VLAN tag with VLAN ID being the 12 VLAN id; note that 0 is valid VLAN id and indicates the packet is "priority tagged". Note that 0xfff is technically an invalid value, but will be placed in the packet; different hardware may treat a packet with a VID of 0xfff differently.
    • Flags set to 0xf indicates send the packet without a VLAN tag; Note that this is equivalent to the action OFPAT_STRIP_VLAN.
    • Indicate that other values of flags are reserved

Discussion: For matching, there is currently no text about this. Is the value 0xffff in the dl_vlan member of ofp_match treated in the same way as implied in ofp_action_vlan_vid for actions?

  • Question: Is there a difference between stripping a tag and not adding a tag?

See openflow-spec discussion


Jean post-1.1 update: Fixed in 1.1.

Semantics of Exact Match

What: Non addressing field in the match can break the semantic of exact match. Ask Jean Tourrilhes for details.

Why: The OpenFlow strongly separate two classes of entries, Exact Match and Wildcard. Those are processed quite differently in most software implementations, with Exact Match beeing much more efficient than Wildcard, and has implications with respect to prioritisation.

Up to now, the tupple space was only composed of addressing fields, which matches the basic definition of a packet flow within the network. The VLAN PCP and the IP ToS are orthogonal to addressing, and can change within a packet flow. This cause problem with the semantic of Exact Match.

The problem : for an exact match entry to be formed, those fields need to be known by the controller. The controller can not assume the value from the first packet will remain constant. If another entity (like a DiffServ switch) upstream play with it, it could cause the controller to have to create many exact match entries.

How:


Jean post-1.1 update: 1.1 no longer special case exact matches, therefore this is irrelevant.

Disambiguate VLAN actions

What: The way the current VLAN action operate is not optimal, can generate invalid packets and can be ambiguous in some context. The commands should be separated into OFPAT_ADD_VLAN_VID, OFPAT_MOD_VLAN_VID and OFPAT_MOD_VLAN_PCP. Ask Jean Tourrilhes for more details.

Why: First, the current semantic for OFPAT_SET_VLAN_VID does not allow to do QinQ. If a packet already has a VLAN header, and it needs to be encapsulated within another VLAN in a section of the OpenFlow network, it is not possible to do it. By splitting the action into a ADD and a MOD, it's possible to support QinQ and control precisely if the current VLAN header is modified or if a new one is added. Obviously, when doing QinQ, current OpenFlow would only match on the outer VLAN until it is stripped.

Second, because the current OFPAT_SET_VLAN_VID action can operate in two different mode, depending on the packet, debugging VLAN setup is made harder. Splitting the two actions would make it more explicit, making it easier to catch errors.

Third, OFPAT_SET_VLAN_PCP should never adds a VLAN header on its own. If the flow action list only has OFPAT_SET_VLAN_PCP and no OFPAT_SET_VLAN_VID, and if the packet did not have already a VLAN header, then a VLAN header is added with VLAN tag 0. This is a priority tag, and is legal as per IEEE 802.1Q, but not supported by most end-hosts (for example Linux will drop silently the packet). This action should only modifies the tag if it exist, and can be used in conjunction with OFPAT_ADD_VLAN_VID when a true priority tag is needed.

How: Change OFPAT_SET_VLAN_VID and OFPAT_SET_VLAN_PCP into OFPAT_ADD_VLAN_VID, OFPAT_MOD_VLAN_VID and OFPAT_MOD_VLAN_PCP.

Discussion:

  • Unfortunately, it seems like there's a difference (at least in many hardware implemenations) between ADD one tag (when the packet was not tagged) and ADD a second tag. Do we at least need to differentiate these as capabilities?
  • There are (at least) three issues at hand:
    • Dealing with the current spec's short comings described at the beginning of this entry, specifically that it is ambiguous and can generate invalid packets.
    • Supporting packets that have multiple VLAN tags, both in terms of matching the tags and the impact on where in the packet other fields end up because of the additional tags;
    • Providing actions for manipulating multiple VLAN tags.

The first of these should be dealt with in the 1.0 spec for sure. Can the author of the original please provide details of these problems and propose solutions? The second and third points currently appear to be lacking support for inclusion in 1.0.


Jean post-1.1 update: Fixed in 1.1.

Correct way to handle "invalid" matches

See openflow-spec mailing list thread


Jean post-1.1 update: Fixed in 1.1.

Add Flowchart for Flow Mod Types

Corner cases can be hard to understand; a flowchart would help to visually explain the differences.


Jean post-1.1 update: Text was improved in 1.1.

Clarify STP

In the spec: "packets received on ports that are disabled by spanning tree must follow the normal flow table processing path."

Does that mean the packets received in ports disabled by STP will be sent to the controller?

More discussion at [1]


Jean post-1.1 update: Fixed in 1.1.

Clarify stats: include FCS?

What: Which specific bytes to include with port counters are not defined.

Why: Say the FCS bytes are included in the port counters; then a hardware architecture may not be able to precisely support counters at all. If the hardware counters don't include the FCS, and there are no packet counters, there is no way to report the true FCS-include byte count.

The Broadcom hardware is like this, and requires a choice of either byte or packet counters.

For internally-forwarded packets (such as those sent to a virtual port), there may be no FCS, which complicates things.

How: Specify that tx/rx byte port counters either include (or don't) framing characters.

For example, in case of ethernet, specify that stats counts include entire ethernet frame including MAC header and FCS, but not the preamble, start of frame delimiter, or extension octets. E.g.:

rx_bytes: The total number of octets received on the port, including framing chacters.


Jean post-1.1 update: No change.

Port status to non-existent port (spec clarification)

What: unspecified behavior when querying the port status of a non-existent port.

See: https://mailman.stanford.edu/pipermail/openflow-spec/2010-January/000874.html

It's not clear in the spec what the response should be when a specific port
is given for a port stats request, but the port doesn't exist.  The
reference implementation just returns a ofp_stats_reply without any
ofp_port_stats entries.  This is the same behavior I implemented in Open
vSwitch.  

I think it would be good to add some clarifying text to the spec, since
sending some sort of error would be a reasonable response, too.

--Justin

Jean post-1.1 update: No change.

Null-termination of strings (spec clarification)

What: Spec/comments in openflow.h are unclear about null-termination of strings.

See: https://mailman.stanford.edu/pipermail/openflow-spec/2010-February/000876.html

Why: Consistency

How: Update comments/spec to clearly state that all strings are null-terminated and null-padded to the right.


Jean post-1.1 update: No change.

Best-effort output action vs min-rate enqueue (spec clarification)

What: The spec doesn't clarify that output action send packets to a default best-effort queue. Misunderstanding was reported where output action was interpreted as a high-priority queue that could starve min-rate queues. See : https://mailman.stanford.edu/pipermail/openflow-spec/2010-February/000881.html

How: Add clarification text to the spec.


Jean post-1.1 update: No change.

Correction of flowchart showing parsing of header fields

What: The first box of the "Flowchart showing how header fields are parsed for matching" (figure 3) is incorrect for the VLAN field. The VLAN field is set by default to OFP_VLAN_NONE.

Why: Consistency

How: Update figure to indicate that the default value of the VLAN field is OFP_VLAN_NONE.


Jean post-1.1 update: Done in 1.1.

Inconsistent behavior between OFP_VLAN_NONE and VLAN PCP matching

What: The spec doesn't specify how using OFP_VLAN_NONE for the VLAN tag should interact with specify a VLAN PCP tag.

Why: Logically it doesn't make sense for a packet not to have a VLAN VID (OFP_VLAN_NONE) but for it to have a PCP tag.

How: See https://mailman.stanford.edu/pipermail/openflow-spec/2010-March/000903.html


Jean post-1.1 update: Fixed in 1.1.

Incorrect reference to OFPAT_SET_DL_VLAN in spec

What: Section 5.3.1 on page 23 of the spec makes reference to OFPAT_SET_DL_VLAN. This should be OFPAT_SET_VLAN_VID.

Why: OFPAT_SET_DL_VLAN does not exist. (I suspect it did in a previous version but haven't looked back further than 0.8.9.)

How: s/OFPAT_SET_DL_VLAN/OFPAT_SET_VLAN_VID/

See: https://mailman.stanford.edu/pipermail/openflow-spec/2010-March/000907.html


Jean post-1.1 update: Fixed in 1.1.

Clarification of timeouts (spec clarification)

What: OpenFlow1.0.0 spec 5.3.3 briefly mentions:

[...]the entry must expire after idle_timeout seconds with no received trac. [...] the entry must expire in hard_timeout seconds,[...]

But it doesn't strictly mention whether timeouts should be "greater than" or "greater than or equal to".

Why:

How: Clarify whether > or >= in the spec.

See:


Jean post-1.1 update: No change.

Clarification of invalid NW TOS values (spec clarification)

What: The spec specified that the NW TOS is six bits wide (see page 7, Table 5). The TOS value is actually in the upper 6 bits. It is not specified how to handle the lower 2 bits -- should they be ignored? Should an error be generated?

Why: Different semantics from different controllers

How: Clarify how to handle non-zero values in the 2 LSBs


Jean post-1.1 update: New errors added to 1.1, see OFPBAC_BAD_ARGUMENT.