Monday 26 April 2010

Faster Payments - ISO8583 Application Switch

Need to route ISO8583 transactions based on their message type?

A customer came to me with a requirement to interpret their ISO8583 messages and then act accordingly. This is very different from the recent post Faster Payments - ISO8583 Resilience. That article addressed the issue of ensuring the ISO8583 service was actually up and running before sending it transactions. This article is helpful in deciding which IP Address to send the transaction to when there are multiple instances of the service. Also refer to: The Evolution of Application Delivery Part 4: Service Agility.

To better understand what we are doing here lets reference the very handy Wikipedia Article on ISO8583 - Financial transaction card originated messages. As you will see in the section titled "Message Type Indicator", there is a Four Digit identifier used to determine:

  • Protocol Version: 1987, 1993, 2003
  • Message Class: Authorisation, Reversal, Reconciliation, etc
  • Message Function: Request, Response, Notification, etc
  • Message Origin: Acquirer, Issuer, etc.


    The Problem

    Why might I wish to route messages based on this information? Lets say you are in the process of upgrading/changing your Transaction Handler. You are interested in routing, or maybe mirroring, only Reconciliation Messages, to the new system an no others. The remainder of the messages, or all of them if you are going to mirror, will still need to be passed to your existing service.



    The solution


    By implementing F5's LTM (Application Delivery Controller), you can read the ISO8583 message and then perform an action based on this message. LTM isn't natively aware of the 8583 standard so you must tell it what to do. Below is an iRule (TCL script that the LTM can interpret) to route/mirror your traffic as needed:


    when CLIENT_ACCEPTED {
      TCP::collect
    }
    
    ## This section collects the first 6 bytes and then uses
    ## 'switch' to find a match. It allows to you decide where
    ## to send a request based on its message type.
    ## You could also use this to simply log without 
    ## performing a pool selection.
    
    when CLIENT_DATA {
    
      set clientData [TCP::payload 6]
    
      switch -glob $clientData {
        "??0800" { 
          pool pool_EchoTestMessage
          log local0. "This is a Network Management Message."
        }
    
        "??2202" { 
    # Need to mirror message to the new service for testing
          log local0. "Mirroring message to test: $clientData"
          pool pool_ISO8583_Service
    
    # Here's the cloning for testing the new service
          clone pool pool_NewServiceForTesting
        } 
      }
      TCP::release
    }
    
    when SERVER_CONNECTED {
      TCP::collect
    }
    
    ## This section allows you to perform an action based
    ## on the message type of the response from the server.
    ## You could log the message type or even manipulate
    ## it before sending it out.
    
    when SERVER_DATA {
    
      set serverData [TCP::payload 6]
      log local0. "Response Data raw: $clientData"
    
      switch -glob $clientData {
        "??0810" { 
          pool pool_EchoTestMessage
          log local0. "Network Management Message RESPONSE."
        }
      }
      TCP::release
    }
    

    At the beginning of this post I referenced the Service Agility post from last month. Applying the above rule delivers enormous flexibility and agility to your solution breaking free from static, fixed designs that hinder growth.
  • Wednesday 7 April 2010

    The Evolution of Application Delivery Part 4: Service Agility

    The final article of the Four Part series explains what companies should aim for today. Service Agility. Virtualisation is a key step on the path to agility but it is possible to configure yourself into a virtual point of failure.

    The following diagram takes the same multi-tiered application illustrated in the Parts 1- 3 but deployed in a model that delivers Service Agility.

    True Service Agility exists with the elimination of static links between the elements of a service.

    Think DNS! DNS provides the ability to bind a meaningful name to a less than meaningful IP Address. It also brought us the ability to change the IP Address binding while maintaining the meaningful name. A very basic for of agility but very slow and reliant on human intervention. ADC's deliver agility in real time on a per-user or per transaction bases. A good ADC can even react to server responsiveness.

    Wednesday 31 March 2010

    The Evolution of Application Delivery Part 3: Multi Tiered Virtualisation

    Some of those conscientious companies noticed there was more work to be done and, consequently, deployed Application Delivery Controllers between the Web Servers and the Application Servers. Often, the plug-in still exists but it only see a single IP Address for the Application Server Tier (see below). The ADC virtualises the Application.




    Virtualising each tier separately is much better than the previous designs when it comes to service resilience and availability but we are not there yet. We have not delivered Service Agility. In the diagram above we have virtualised. Yes. However, we are still at the mercy of a Web Server Plug-in. And this is not a place I would want to be.

    Once you have virtualised a service you can then deliver:
    *) rapid response to market demand: add a new server to the ADC pool without change to the Application
    *) Separate In-band monitoring of each tier
    *) Reduction in devices traversed, and latency incurred, between the users and the Application

    Faster Payments - ISO 8583 Resilience

    What is it?
    Today I was asked to provide a resilient solution for a Point-to-Point system using the FasterPayments Service. I have to admit that my knowledge of FasterPayments was limited to begin with but a quick Google search revealed that it is none other that ISO8583 - http://en.wikipedia.org/wiki/ISO_8583 - and, the more I read, the more I realised this is BIG!

    From the WikiPedia article referenced above, "The vast majority of transactions made at Automated Teller Machines use ISO 8583 at some point in the communication chain, as do transactions made when a customer uses a card to make a payment in a store. In particular, both the MasterCard and Visa networks base their authorization communications on the ISO 8583 standard, as do many other institutions and networks".

    Two quick thoughts: 1) this is obviously quite important, 2) there must be a lot of organisations wanting to ensure the maximum uptime of services utilising ISO8583.

    What do we know
    ISO8583 messages:
    1) Run over TCP/IP.
    2) Work on a Request/Response message acknowledgment system
    3) Have very short payloads: 1 packet = 1 message.
    4) Use a bit value to identify what message type!!

    The fourth point is what we can use to implement some service resilience.

    Delivering a resilient solution requires knowledge of the application/service being delivered. ASCII-based protocols like HTTP and SMTP are simple for different reasons: a) most Application Delivery solutions have interpreters for HTTP, b) if you don't have an SMTP Interpreter, its text-based messaging.... a packet trace will reveal all.

    In the case of ISO8583, its much more efficient and uses a numeric value to identify the message type instead of the Human readable format used by HTTP. Some examples:

    Example 1: ASCII-based message: Try the following commands (NOTE: Press return twice after the 'GET' command):
    telnet www.google.com 80
    GET / HTTP/1.0

    As you will see from this example, you can communicate with the Web Server in the same way that your web browser does. Telnet creates a TCP/IP Socket to the Web Server on TCP Port 80 and then the String "GET / HTTP/1.0" returns a valid HTTP Response.

    The customer that raised the query has a number of F5 LTM's (Local Traffic Manager) installed. Adding resilience to their Credit Card transaction service is a matter of making the LTM aware of the service. This calls for a ISO8583 Service Monitor. This monitor will send a properly formatted ISO8583 'Echo Test' message and verify that it receives a properly formatted Echo Response.:

    1) How to talk ISO8583:
    Unfortunately, you can't just open up a TCP socket and enter a String like a HTTP monitor does. ISO8583 has its own set of special requirements. Fortunately, there is a python library for generating ISO8583 formatted messages which can be found here: http://code.google.com/p/iso8583py/

    many thanks "igorvc"


    uncomress files to /var/tmp
     cd /var/tmp/ISO8583 Module-1.1/
    python setup.py build
    mount -o rw,remount /usr
    python setup.py install
    mount -o ro,remount /usr



    2) Provide your Big-IP the tools to construct an ISO8583 'echo' message:
    a) install a python script onto your Big-IP in the /config/monitors/ directory named mon_ISO8583.py  and with the contents (Thanks again "igorvc"):






    """

    (C) Copyright 2009 Igor V. Custodio

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.

    """


    from ISO8583.ISO8583 import ISO8583
    from ISO8583.ISOErrors import *
    import socket
    import sys
    import time

    # Configure the client
    serverIP6 = sys.argv[1]
    serverIP4 = serverIP6.lstrip("::ffff:")
    serverPort = sys.argv[2]
    numberEcho = 1
    timeBetweenEcho = 0 # in seconds

    bigEndian = True
    #bigEndian = False

    s = None
    for res in socket.getaddrinfo(serverIP, serverPort, socket.AF_UNSPEC, socket.SOCK_STREAM):
        af, socktype, proto, canonname, sa = res
        try:
                    s = socket.socket(af, socktype, proto)
        except socket.error, msg:
            s = None
            continue
        try:
                    s.connect(sa)
        except socket.error, msg:
            s.close()
            s = None
            continue
        break
    if s is None:
        print 'Could not connect :('
        sys.exit(1)
            
            
    for req in range(0,numberEcho):
            iso = ISO8583()
            iso.setMTI('0800')
            iso.setBit(3,'300000')  
            iso.setBit(24,'045')    
            iso.setBit(41,'11111111')       
            iso.setBit(42,'222222222222222')        
            iso.setBit(63,'This is a Test Message')
            if bigEndian:
                    try:
                            message = iso.getNetworkISO() 
                            s.send(message)
                            print 'Sending ... %s' % message
                            ans = s.recv(2048)
                            print "\nInput ASCII |%s|" % ans
                            isoAns = ISO8583()
                            isoAns.setNetworkISO(ans)
                            v1 = isoAns.getBitsAndValues()
                            for v in v1:
                                    print 'Bit %s of type %s with value = %s' % (v['bit'],v['type'],v['value'])
                                    
                            if isoAns.getMTI() == '0810':
                                    print "OK"
                            else:
                                    print "ERR"
                                            
                    except InvalidIso8583, ii:
                            print ii
                            break   
                    

                    time.sleep(timeBetweenEcho)
                    
            else:
                    try:
                            message = iso.getNetworkISO(False) 
                            s.send(message)
                            print 'Sending ... %s' % message
                            ans = s.recv(2048)
                            print "\nInput ASCII |%s|" % ans
                            isoAns = ISO8583()
                            isoAns.setNetworkISO(ans,False)
                            v1 = isoAns.getBitsAndValues()
                            for v in v1:
                                    print 'Bit %s of type %s with value = %s' % (v['bit'],v['type'],v['value'])
                                            
                            if isoAns.getMTI() == '0810':
                                    print "\tThat's great !!! The server understand my message !!!"
                            else:
                                    print "The server dosen't understand my message!"
                            
                    except InvalidIso8583, ii:
                            print ii
                            break   
                    
                    time.sleep(timeBetweenEcho)
                    
    print 'Closing...'              
    s.close()



    b) Now, create a wrapper (LTM can't call python directly) called mon_ISO8583.sh with the contents:


    #!/bin/sh
    /usr/bin/python /config/monitors/mon_ISO8583.py ${1} ${2} | grep -i "OK" > /dev/null 2>&1

    if [ $? -eq 0 ]
    then 
    echo "UP"
    fi



    c) Setup permissions
    Now we setup the permissions by executing:
    chmod 700 /config/monitors/mon_ISO8583.py
    chmod 700 /config/monitors/mon_ISO8583.sh



    3) Testing Comms
    Execute:  /config/monitors/mon_ISO8583.sh ipAddress tcpPort <-- sub IP and Port with real values



    You should receive "UP".

    If you didn't receive "UP", you can get a lot more detail which might help in debugging by executing the pything script itself:
    /usr/bin/python /config/monitors/mon_ISO8583.py ipAddress tcpPort <-- sub with real Addr/Port



    One you have got this up and working you are ready to add the monitor to a Pool!!



    4) Adding the monitor to a pool:
    In your v10 Big-IP GUI, create a monitor of type 'external' and call it mon_ISO8583.

    In the field 'External Program' enter:
    mon_ISO8583.sh



    Apply the monitor to your pool. Your pool should now turn Green.

    You now have a solution that validates the ISO8583 service is operating and responding to messages ensuring that the customer transaction can be fulfilled. This solution goes above and beyond TCP Port checking systems which can reveal false positives.

    Wednesday 24 March 2010

    The Evolution of Application Delivery Part 2: The ADC is born

    In "Part 1: Unintelligent" we covered basic Load-balancing. This week we will explore early application delivery implementations.

    Some organisations identified there were issues with Load-balancing and invested in Application Delivery Controllers (ADC's hereafter). There are many published articles touting the expression, "Load-balancers are dead!" (Gartner: Load-balancers are Dead). This is precisely what it means. Conscientious corporations rolled out ADC's that would:
    a) monitor the service itself e.g., send an out-of-band HTTP request and see if the response is valid - HTTP 200 OK
    b) distribute the web clients across the service based on the Connection table, or responsiveness of each individual server

    In the event of a server failure (monitor detects a bad response) the ADC would cease distributing traffic to the failed device. This brought forward an enormous change in service uptime, brand protection and resilience.

    Maturity in the ADC market added services like SSL Offload: the ADC terminated the encryption and passed the encapsulated payload back to the servers in 'clear', unencrypted form. Server CPU utilisation went down, SSL certificate management (and cost) was reduced and the servers went back to running applications with as much as possible of the Network functions being performed by the ADC.

    But there were still problems. The Applications themselves were receiving their Customer requests from a plug-in loaded into a web server. This plug-in delivered unintelligent, unmonitored, round-robin request distribution to the Application Server Tier. Didn't we just solve this problem for the Web Servers??

    Wednesday 17 March 2010

    The Evolution of Application Delivery Part 1: Unintelligent

    This is Part 1 of a 4 part series covering the Evolution of Application Delivery. I come across a lot of confusion as to what Application Delivery really is. Some think its Load-balancing. They are wrong. :-) Hopefully this series will help better explain ADC's and their purpose.







    In the beginning we had basic, unintelligent TCP distribution which was typically delivered through destination address re-writting. This was fine for spreading HTTP Request across multiple servers when one server alone simply wasn't enough.





















    The internet matured and soon we had the need to load-balance dynamic applications which included anything from a cgi script residing on a Web Server to multi-tiered architecture like WebSphere or Weblogic systems. These multitiered applications were typically deployed behind a load-balancer and the Web Server was loaded with a crude plug-in as depicted below.













    The aforementioned, multi-tiered architecture delivered much needed scalability allowing organisations to build out sideways as needed. The problem with this architecture, especially in more recent times with growing expectations around application availability and responsiveness, is that it is still unintelligent.






    A load-balancer has no appreciation of the service running through the architecture. Consider the following: in the diagram above, what happens if one of the Web Servers starts returning errors? A failed disk could result in 404 Page not found. The load-balanacer would then upset 1/3 of your customers.






    Some of these early load-balancers would 'ping' the server to see that it was turned on but sending an ICMP packet doesn't verify if the response was good (HTTP 200 OK) or bad (404 Object not found).







    Tuesday 2 March 2010

    Load-balance or Virtualise your application??

    Most Web Applications today are behind some form of Load-balancer. But is that enough?

    The Problem
    What is a Load-balancer: A load-balancer unintelligently sprays traffic across a tier of servers in hope to distribute load. 2+ servers can handle more load than 1. Load-balancers sound great, right? If a Load-balancer is all you have then be afraid!

    Lets say you have 3 Web Servers behind a Load-balancer. The Load-balancer is distributing traffic across the servers using a round-robin algorithm: Server A, Server B, Server C, Server A, Server B, Server C, ... you get the picture. What happens if Server A begins to fail and starts returning errors? As the load-balancer is not aware of the issue (it only works at the network level), 1/3 of your customers are now failing to get the service you promised to deliver. Instead they are receiving an ugly error page.

    The Solution
    Enter: Service Virtualisation. The right solution is to virtualise each tier using an Application Delivery Controller - Gartner ADC Market. A good ADC is capable of detecting the response error and, instead of passing it back to the customer, transparently redistributing the request to another server. Put another way, if Server B replies with a 404 Not Found, the ADC will try again, before responding to the customer, with either Server A or Server C.

    By implementing an ADC customer traffic is no longer scattered across a Web Server tier as gracefully as leaves in the wind. With an ADC the Web Server tier is virtualised separating the customer from what could happen by inserting a device to keep things honest and, more importantly, to protect your company brand.

    Sunday 28 February 2010

    In-band monitoring for HTTP.... protect the customer from error pages.

    Today's topic is in-band monitoring.
    1. The problem
    2. The cause
    3. The solution
    1. The Problem
    "404 - Page/Object Not Found", "500 - Server Error". Sound familiar? We receive these errors daily. Sometimes we receive them without noticing - ever noticed an image missing after a page loads?

















    These are all examples of wasting precious customer time. Customers with high expectations!


    2. The cause
    Web Applications are made of up both Static and Dynamic elements. Static: A basic web server request to get 'logo.gif'. Dynamic: a request to a database to list all the transactions where username is 'Bob'.

    Typically, a 404 is received when a static object is requested of a web server but the object does not exist. Maybe it wasn't copied to the server? Was accidentally deleted? Permission problem? Server has a bad disk? There are a lot of reasons why this can happen but all have the same result. Poor customer experience.

    A 500 Server error is usually (I did say usually - not always) returned from the Application Server and could be the result of: a hung process, poorly coded database timeout, over utilised server, code bug.... the list goes on. Before I get flamed by some Application Developers, the code they right is complex and the developers themselves are often under significant pressure from the business to deploy new functionality, to remain competitive. Rapid time to market = rushed code.


    3. The solution
    In-band monitoring. Why assume that every response from the Web Server or Application Server is a response that you are willing to share with your customer? A 500 Server Error is a valid response, after all! With an in-band monitoring solution you have the option of rejecting a bad response and retrying the request with a different server.

    How's this sound in real time: If server A returns 500/404 then try again with Server B. The customer doesn't need to know.

    Friday 26 February 2010

    What is Maximum Uptime?

    For those unfamiliar, it has nothing to do with how long I can ride a unicycle. Nor has it anything to do with when I first learning to ski. Maximum Uptime is what we should all strive for in the arena of Internet Service Delivery.

    We live in a time when almost everything can be done on-line. Paying bills, ordering food, buying a car and many other things. But the more society adopts on-line services the greater the expectation that these services are always on. The web site must be up, the Point of Sale terminal must be able to authorise transactions, the email confirmation must be received. Furthermore, the accessibility of information has further fuelled growth in expectations. Customers are fickle. They won't stand for delays. "Back in 5 mins", is NOT acceptable.

    The solution is to implement systems based on Maximum Uptime.