OpenFlow 0.9 desired features 2

From OpenFlow Wiki

Jump to: navigation, search
Note: This page is outdated, for an up-to-date version of the 0.9 features see the OpenFlow v0.9 page.


OpenFlow Protocol Changes

This is the list of requested changes for future versions of OpenFlow. It is a cut-down version of discussions from the openflow-spec list.

These suggestions are currently not implemented on the reference implementation. OpenFlow tries to strike a balance between providing flexibility without requiring too much software complexity or hardware changes -- which may preclude some suggested changes. The most-desired changes are listed first.


Command Return Values

What: Optional success/failure code message to be returned when an OpenFlow message has finished executing on the switch.

Why: Faster handover, testing, and monitoring

Let's say you're making a mobility controller. You want this controller to support handover between APs with no dropped packets and a minimum of latency, and you want to modify the flow entry in the crossover switch right when you've confirmed that flow entries have been added to the switches downstream of the crossover switch. Or you're making a network monitor, and want an accurate view of the network, without having to constantly poll the switches. Or you're running an experiment where you want predictable performance, and want to verify that an entry has been inserted before beginning. Or you're writing a continuous testing program and want to send packets as soon as you're sure a flow has been added to hardware.

All of these are situations where the ability to request an ack when the command has completed, or nack if it didn't, would be very useful. Of course, this is a potentially huge rat-hole where to cover all cases properly (request lost, ack lost, duplicated messages received) you could end up with something that looked like TCP.

How: Add a "tell me when complete" bit to request messages that don't already result in a reply, with a newly defined generic Status Reply message that returns the txid, original message type, and a success or failure code that depends on the original message type. This doesn't cover all reliability cases, but it would help with those listed above, with minimal change to the spec. For example an ack for a packet out might not get delivered, resulting in a duplicated packet. That seems like an OK tradeoff in favor of simplicity over completeness.


Selective Flow Expirations

What: Make flow expiration messages happen on a per-flow, rather than per-switch granularity.

Why: This could reduce flow expiration traffic for both single-owner OpenFlow networks and shared ones. In a single-owner OpenFlow network, you might only care about logging expiration events for specific flows. In a shared network, if anyone wants a flow expiration than everyone has to receive them. With per-flow expirations a virtualized controller sharing OpenFlow instances between two students does not need to enable expirations on a switch and filter those to specific students. In general, state should be per-flow rather than per-switch wherever possible, for flexibility and easier virtualization.

Another reason is to reduce processor load. The CPUs in embedded switches are not the fastest, and selectively sending flow expirations may reduce dropped packets.

How: Remove OFPC_SEND_FLOW_EXP in ofp_config_flags and add a newly defined send_flow_exp bit to each flow mod. When the switch software component is timing out flows, it checks this bit.



Flow Mod Behavior

What: Define errors for flow mods

Why: If you try to add a flow that conflicts with another and has the same priority.

How: Update spec with new error type, or define that the switch accepts the new entry on top of the old one.


ARP packet field matching

Currently flows cannot match on IP addresses and request types in ARP packets, but this is a useful and desirable feature that should be added.


Encryption action

Why: Networking hardware may include hardware-accelerated encryption capabilities.

How: Define new 'Encrypt' action.


Queue action

Why: Networking hardware may include hardware queues. Queues are required to support QoS management or network bandwidth isolation.

How: One approach is to treat a queue as if it were just another physical port. All vendor-specific queue configuration parameters would be accessed and changed through a vendor extension. This approach requires no spec changes.

The second approach is to find the least-common-denominator of queuing, by reading lots of switch datasheets and talking to vendors. This has the advantage of defining a standard interface for queuing, at the expense of spec bloat, greater work for vendors, and possibly defining the wrong interface.


Source Port aggregation

Why: On some platforms, like NetFPGA, flow entries are at a premium. If multiple entries are identical for everything except the source port, it would be possible to combine them. Since these entries are physically represented with a bitmask anyway, there could be a nice increase in usable flow entries. Requested by UCSD.

How: Not clear yet. Requires modifying the flow mod format, as well as data the switch sends to the controller about flow capabilities.


Topology Discovery

What: Perform topology discovery in OpenFlow switches (not controllers).

Why: There are two problems with making controllers responsible for topology discovery. However, for a small amount of extra work in the switch, we can make discovery cheap, reliable, and simpler. The problems:

  1. Discovery cost increases linearly as more controllers are added (e.g., when we run several guest controllers through the FlowVisor).
  2. Discovery packets don't reliably work when sent by the controller
    • For heavily loaded links, they might get dropped.
    • When there are many ports, the controller can't send discovery packets fast enough.

Benefits:

  1. Discovery cost is independent of the number of controllers.
  2. Discovery packets sent by the switch can be appropriately prioritized so they get through even on heavily loaded links.

How:

  1. Make the switch responsible for sending out LLDP packets at a regular interval. (each is stamped with the sender's datapath ID and outgoing port)
  2. Make the switch responsible for processing LLDP packets. (to learn what it is connected to).
    • Approach 1: This requires new messages (bad) but makes link changes induce minimally sized messages (good).
      • Link up. Whenever it detects a link to a new switch, it sends a (new) message to the controller to tell it about the new link. (Switch-->Controller)
      • Link down. Whenever it detects a link is down (no new LLDP packets received for some period of time) it sends a message to the controller to tell it about the loss of the link. (Switch-->Controller)
      • Get links. Ask a switch to send "Link up" messages for all known links. Useful when a controller first connects to the switch. (Controller-->Switch)
    • Approach 2: Use existing structures and messages (good) but updates have lots of extra info (bad)
      • Use ofp_port_status by extending ofp_phy_port (see below) to also include an array of switches it is connected to (datapath ID and port number).
      • Whenever it detects a new link is added or and old one has gone down, send an OFPT_PORT_STATUS message with code OFPPR_MODIFY).
  3. Nuances.
    • New switch configuration parameters:
      • LLDP send interval - how often to send an LLDP packet out each port
      • Link down timeout - how long to wait for an LLDP packet before expiring a connection to a switch
    • The "LLDP" packet format should be specified - it should simply contain the datapath ID and outgoing port. Perhaps a TTL too?

Extended ofp_port_status:

 struct ofp_conn_switch {
   uint64_t dpid;  /* datapath ID of the connected switch */
   uint16_t port;  /* port on the other switch we are connected to */
 };
 
 struct ofp_phy_port {
    uint16_t port_no;
    uint8_t hw_addr[OFP_ETH_ALEN];
    uint8_t name[OFP_MAX_PORT_NAME_LEN]; /* Null-terminated */
 
    uint32_t config;        /* Bitmap of OFPPC_* flags. */
    uint32_t state;         /* Bitmap of OFPPS_* flags. */
 
    /* Bitmaps of OFPPF_* that describe features.  All bits zeroed if
     * unsupported or unavailable. */
    uint32_t curr;          /* Current features. */
    uint32_t advertised;    /* Features being advertised by the port. */
    uint32_t supported;     /* Features supported by the port. */
    uint32_t peer;          /* Features advertised by peer. */
 
    uint32_t num_conns;              /* Number of switches connected to this port. */
    struct ofp_conn_switch conns[0]; /* Switches connected to this port. */
 };

Spec Questions

Spec Reorganization

It might be worthwhile to organize the spec in the same way as message type in 0x93 - instead of controller-to-switch, async, and symmetric messages, organizing as Immutable, Switch Config, Async, Controller Command, and Stats messages.


Create Control Section

The Control protocol is starting to get bigger, and either needs more description in the current spec, or needs to be pulled out into a separate spec.

This section/doc would include the following:

  • Spanning Tree
  • In-band Control
  • Failure Method (none/learning switch etc)
  • Automatic Controller Discover Protocol

Can someone from Nicira write up a blurb on how these are handled?


OpenFlow Port

From Justin: Before the v0.9.0 release, we should decide on an official OpenFlow port. Currently, we're using 975 for plain TCP and 976 for SSL-wrapped OpenFlow. These ports are not assigned by IANA, but they are reserved, which means people who run controllers on Unix systems must be root.

I'd recommend that we choose new ports that are not reserved. If we want to continue to squat, it looks like there are some free ones in the 2000s. However, since we'd like wide-spread adoption, I think it would be good to request an official one. If someone is so inclined at the consortium, here's a link to the application:

 http://www.iana.org/cgi-bin/usr-port-number.pl

If we're going to request a TCP port from IANA, then we're probably going to be limited to one official port. New protocols are actively discouraged from using separate unencrypted and SSL ports, instead they are supposed to negotiate it with some sort of STARTTLS command. Since I think the non-encrypted form should only be used for testing, and adding STARTTLS would be difficult, I'd recommend that we just squat on a port near the official one we get.

UPDATE: Since 0.8.9, we've switched to using port 6633. We should still see about getting this officially blessed from IANA.


Spec tied to Tests

If the spec could list the test(s) that cover each message format, this could help us get closer to 100% coverage.


Start Port Enumeration at 1

The current reference design begins assigning OpenFlow port identifiers at zero. A number of protocols such as SNMP and STP start counting ports at one. To increase compatibility, we should consider starting at one instead of zero. This requires no protocol change, and is only a spec change and implementation detail.


Normal Action Required?

The spec currently requires the "normal" action to be implemented for a Type 0 switch. The "normal" action is to behave as a regular L2 learning switch. The reference implementation does not support this. Support will likely be added to secchan in the near future, but should it be a requirement for 0.9.0?


Buffered Packet Behavior

The spec doesn't have a policy about how buffered packets should be handled in a switch. In the reference implementation, packets are held in a circular buffer and guaranteed not to be reused for one second or until the controller specifies an action, whichever happens first. The spec should specify requirements and recommendations for buffered packets. For example: Switches MUST gracefully handle not getting a response from the controller about a buffered packet. The switch SHOULD prevent a buffer from being reused until its been handled by the controller or some amount of time has passed.


Clarify Matching Behavior for Flow Modification and Stats

The spec does not currently define how wildcard matching should behave for flow modification (specifically DELETE and MODIFY commands) and stats. A match will occur when a flow entry exactly matches or is more specific than the description in the flow_mod command. For example, if a flow delete command says to delete all flows with a destination port of 80, then a flow entry that is all wildcards will not be deleted. However, a flow delete command that is all wildcards will delete an entry that matches all port 80 traffic.


Clarify Spanning Tree

Modify spec to make explicit that packets received on ports that are disabled by spanning tree must follow the normal flow table processing path.


Clarify Transaction ID in Error Messages

Add to section 5.4.4, error messages: "If the error message is in response to a specific message from the controller, e.g., OFPET_BAD_REQUEST, OFPET_BAD_ACTION, OFPET_FLOW_MOD_FAILED, then the transaction ID in the header should match that of the offending message."


Clarify Duration Field of Flow Expiration Messages

Fix an ambiguity regarding the "duration" field in the Flow Expiration message. The code and most of the spec indicate that it is the amount of time the flow has been in the flow table. However, on page 31, it states that it is the amount of time the flow received traffic.

Just in case that description wasn't clear, I'll give an example. Let's say a flow expired because the idle timer went off. If traffic was received for 45 seconds and the idle timer was set to 30 seconds. In the "duration" field, we can either return 45 or 75. Right now, the code returns 75 ("the amount of time the flow was active"). The alternative is to send 45 (the amount of time the flow received traffic").

Clearly, if the controller always sets the idle timeout to 30 seconds, it's trivial to derive one from the other. If the controller uses different idle timeouts, the controller will need to store the idle timeouts for each flow. And if a hard timeout expired, then you have no idea if you've received traffic the entire time or it stopped at some time before an idle timeout would have expired.

I can see an argument for both being useful. My suggestion is that we leave it as the amount of time the flow was active for 0.8.9 and fix the one bad reference in the spec. For 0.9.0, I think we should return both (along with a few other minor changes).

--Justin


Clarify Format for Strip VLAN Action

Clarify that the OFPAT_STRIP_VLAN action takes no argument and strips the VLAN tag if one is present. The header is the generic action header.