IS 651: Distributed Systems

IS 651: Distributed Systems

IS 651: Distributed Systems Distributed Communication Sisi Duan Assistant Professor Information Systems [email protected] Today

Socket RPC REST Abstracting Distributed Communication Common Communication Pattern Communication Mechanisms Many protocols are available

Sockets Remote Procedure Call (RPC) Distributed shared memory (later in the class) MPI

Socket Communication Your own protocol on top of a transmission protocol Remember the network layers from the first class? Whats TCP? Whats UDP? When is it best to use TCP v.s. UDP? Socket Communication TCP (Transmission Control Protocol)

Protocol built upon the IP networking protocol, which supports sequenced, reliable, two-way transmission over a connection (or session, stream) between two sockets More reliable More expensive UDP (User Datagram Protocol) Also protocol built on top of IP. Supports best-effort, transmission of single datagrams Its ok to lose, re-order, or duplicate messages

Low latency TCP Socket API TCP Socket Demo Python 2.7 (not too different in python 3) Java will be similar (links are provided in the reading list) Simple-client.py Simple-server.py (Available at class website under Reading)

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.bind((hostname, port)) sock.listen(1) conn,addr = sock.accept() server_addr=(hostname,portnum) sock.connect(server_addr) sock.sendall(msg)

buf = conn.recv(8092) sock.close() Socket API You need to manually implement a lot of things Very handy if you are a distributed systems developer RPC Remote Procedure Call

A type of client/server communication Ease of programming Hide complexity Standardize some low-level data packaging protocols RPC The application calls the remote procedure locally at the stub The stub intercepts calls that are for remote servers

Marshalling: pack the parameters into a message Make a system call to send the message Stubs are generated automatically by RPC frameworks (libraries), which also provide the RPC Runtime Programmers only write definitions for their data structures and protocols in an IDL RPC The RPC Runtime handles message

sending The interface definition language (IDL) handles message translation RPC hides heterogeneity among the computers and handles the communication across network RPC technologies XML/RPC Over HTTP, huge XML parsing overheads

SOAP Designed for web services via HTTP, huge XML overhead CORBA Relatively comprehensive, but quite complex and heavy Protocol buffers Lightweight, developed by Google

Thrift Lightweight, supports services, developed by Facebook Extensible Markup Language (XML) XML documents form a tree structure Well-formed XML v.s. Valid XML XML validation Document type definition (DTD) XML Schema

Well-formed XML It contains only properly encoded, legal Unicode characters None of the special syntax characters (<, &) appear except when performing their markup-delineation roles The begin, end, and empty-element tags that delimit the elements are correctly nested, with none missing and none overlapping The element tags are case-sensitive - the beginning and end tags must match exactly There is a single "root" element that contains all the other elements

Valid XML The declaration in line 1 is contains question mark characters and is called a processing instruction. It references the version and encoding for the XML document. Line 2 has a reference to an external DTD file that contains the DTD. It can replaced by embedding DTD content Line 3 is the root tag for the document. Note

that it contains an attribute. Any XML tag may have an attribute and it must be quoted. Note that even though item is repeated, it uses the same tag. Never create tags like item1, item2, etc. Document Type Definition (DTD) The declaration of the DTD in the XML document has the syntax where SYSTEM refers to that fact that the DTD is a private implementation for this document rather than a standard. It would change to PUBLIC if it was a standard. < DOCTYPE root-element SYSTEM "URI" >

DTDs do not have XML syntax. They have their own syntax. The !ELEMENT declares an element (also called a tag). The child elements of a tag are declared as an ordered list in parentheses. If an element can be repeated 1 or more times, it must have a plus sign (+) after it. The character star (*) means 0 or more and so makes elements optional. A leaf node of the hierarchy is declared #PCDATA which means parsed character data and it is the text of the content. The < and > are XML entities for the less than and greater than (< >) characters. XML markup characters cannot be used because they would confuse a parser, so these pre-defined entities must replace them. There are no data types in DTDs. Everything is text. The !ATTLIST declares an attribute for an element and typically declares it as CDATA which means character data. This means that the XML parser does not parse it.

One can require a document to have an attribute in order to be valid by using #REQUIRED. DTD Example Validation command: $>xmllint --noout --valid shiporder.xml XML Well-form check command xmllint --noout shiporder.xml xmllint --noout shiporder-not-well-formed.xml xmllint --noout shiporder-well-formed-not-valid.xml

Validation check command xmllint --noout --valid shiporder.xml xmllint --noout --valid shiporder-well-formed-not-valid.xml XML/RPC a HTTP POST request Data serialization: XML RPC call to add(17,13) results in this request

sample.add 17 13

XML/RPC Server side has to implement a function to process the request Sample Response 30

Problems with XML/RPC XML is very verbose, which affects performance The library doesnt support protocol versioning What happens if I want another param? What happens if I reverse order of x and y? Nothing, but what if function werent cumulative?

What if I forgot to add a param? In general, lack of types makes it difficult to build and maintain code There are other more complex and flexible XML/RPC libraries. SOAP Simple Object Access Protocol (SOAP) It is the messaging protocol for XML web services SOAP is described in XML following SOAP

schema SOAP is usually used with HTTP SOAP structure RPC-style SOAP VS. Document-style SOAP RPC-style SOAP The message corresponds to a procedure call RPC-style SOAP

Sample response Document-style SOAP Client just sends XML documents Service knows what to do XML/RPC Demo

Socket-based Approach Demo Client.py Server.py Other types of RPC Protocol Buffers

Efficient, binary serialization Support protocol evolution Support types (more compile-time errors) Usage: serializing data to non-relational databases Thrift Very similar to protocol buffers Supports both ordered lists and unordered sets

RPC Benefits Hides marshaling details (Do not need to implement the low-level details such as network-level communication, byte orders) Supports evolution of the communication components independently (e.g., protocol version numbers, rules of when to remove/rename arguments to procedure calls, etc.) Allows for efficient packaging RPC Problems

Latency (can be expensive) Pointer transfers local address space isnt shared with remote process. The programmer/RPC library has to make a decision of whether to follow pointers and serialize in depth, or to exclude that information. Typically, IDLs avoid this problem by requiring one to specify ones messages/data structures to avoid pointer confusion. RPC Key Challenges RPC semantics in the face of

Communication failures Delayed and lost messages Connection resets Expected packets never arrive Machine failures Server or client failures Did server fail before or after processing the request? Might be impossible to tell communication failures from machine failures

ATM Example A distributed system with an ATM machine and the bank back-end system (networked system) Client request: withdrawal of cash from the ATM machine Expected response: success or fail How? ATM Example Server

Check persons identity (password) Check whether the person has sufficient funds (talk to the back-end bank system to get a response of yes or no) Withdraw the amount from back account (write to the database) Wait wait wait Until the back-end bank says ok (RPC) Sends reply to the client What if the ATM does not get any response? What could be wrong?

ATM Example ATM cannot wait forever (since client wont wait) What is our goal? If the customer leaves, his/her balance should not change Can we guarantee that easily? Need timeout (what should be the timeout value?) ATM Example

A possible solution ATM keep trying to talk to the bank until it gets a response What if the ATM itself fails at some point? Alternatively ATM gives the money upon a time anyway and resolves the consistency issue with the back-end later Any issues? RPC Summary

RPC focuses on programming use and aims to Make distributed communication similar to local calls Support protocol evolution Make it hard to get it wrong Semantics are challenging Cant really hide the network and make it all look local Performance is often the concern

RPC Semantics Semantic of an RPC invocation? Network failures imply timeouts and retransmissions Server will see (a lot of) duplicates How should the server behave when it sees duplications? RPC Semantics At least once The RPC call, once issued by the client, is executed eventually at least once, but possibly multiple times

Easy to implement but raises issues How to implement? RPC Semantics At least once implementation Client keeps issuing the RPC multiple times until it gets a response from the server As long as the failures are temporary, the RPC will be executed at least once Good for some cases (Any example?) Email in a users inbox has been read (basically read requests)

Might be problematic for other types of request ATM.. RPC Semantics At most once The RPC call, once issued by a client, gets executed zero or one time. How to implement? RPC Semantics

At most once implementation Server remembers the requests that it has already seen and processed Ignore duplicated messages Key: unique id for the messages What if server crashes/restarts?

RPC Semantics Exactly once Ideal case This is when the RPC, once issued by the client, is invoked exactly once by the server. How to implement? At most once implementation Will it work?

Abstracting Communications Fair-loss-links If a correct process infinitely often sends a message m to a correct process q, then q delivers m an infinite number of times If a correct process p sends a message m a finite number of times to process q, then m cannot be delivered an infinite number of times by q No creation (related to authentication, ignored here) Stubborn links

If a correct process p sends a message m once to a correct process q, then q delivers m an infinite number of times Perfect links If a correct process p sends a message m to a correct process q, then q eventually delivers m No message is delivered by a process more than once Map to at least once, at most once, and exactly once? REST Representational State Transfer

For web services only REST defines a set of architectural principles by which you can design Web services that focus on a system's resources, including how resource states are addressed and transferred over HTTP by a wide range of clients written in different languages. There are no standards/specifications for REST Web service REST An HTTP REST Web service follows three basic design principles Use HTTP methods explicitly (HTTP is stateless)

Expose directory structure-like URIs Transfer XML, JavaScript Object Notation (JSON), or both HTTP Methods

POST: create, sending data GET: read, list, retrieve PUT: replace, update DELETE: delete Payload The web service makes available a URL to submit a purchase order (PO) The payload (HTTP entity body) should be in XML or JSON Other communication models

So far Point-to-point communication What else? Other communication models Broadcast Multicast

Reading List and Resources Recommended Socket Python https://docs.python.org/2/howto/sockets.html Java https://www.geeksforgeeks.org/socket-programming-in-java/ XML https://www.w3schools.com/xml/default.asp XML-RPC Java https://www.tutorialspoint.com/xml-rpc/index.htm Python2 https://docs.python.org/2/library/xmlrpclib.html

Python3 https://docs.python.org/3/library/xmlrpc.client.html SOAP https://www.w3schools.com/xml/xml_soap.asp Python library https://python-zeep.readthedocs.io/en/master/ Optional Cachin book Ch2 Protocol Buffer https://developers.google.com/protocol-buffers/docs/overview Thrift https://www.scribd.com/presentation/95866167/Thrift-Protobuf

Recently Viewed Presentations

  • IDENTIFYING AND UNDERSTANDING WORDS - Oneonta

    IDENTIFYING AND UNDERSTANDING WORDS - Oneonta

    Identifying and Understanding Words BY: Annalisa C. Dimeo EDUC 231 Table of Contents: Identifying and Understanding Words There's a Nightmare in My Closet, By Mercer Mayer Differences between identifying and understanding Five ways to call children's attention to sight words...
  • Writing a Character Analysis Essay - Central Bucks School ...

    Writing a Character Analysis Essay - Central Bucks School ...

    What is a Character Analysis? Character analysis is when you evaluate a character's traits, their role in the story, and the conflicts they experience.. When analyzing, you will want to think critically, ask questions, and draw conclusions about the character...
  • Defining Culture Essential Questions: What is culture? How

    Defining Culture Essential Questions: What is culture? How

    Defining Culture. Essential Questions: What is culture? How does it shape the way we see ourselves, others, and the world? 'Everyone Has Culture' Worksheet
  • Session ID: Session Title - download.psdn.com

    Session ID: Session Title - download.psdn.com

    What's New in OpenEdge® 10.1C? Jim Lundy Principal Product Manager Housekeeping Please set mobile phones to "stun" Fill out survey forms at the end Some Numbers to Consider 8 59 180 681 Some Numbers 8 Critical Features 59 Secondary features...
  • Approaches to Characterization of Indeterminate Lung Nodules Denise

    Approaches to Characterization of Indeterminate Lung Nodules Denise

    This is the ROC curve showing the individual model performance using semantic features (red), quantitative features (green), and combined semantic and quantitative features (blue). These data are very preliminary. We are now going back to confirm that:
  • fres.hcpss.org

    fres.hcpss.org

    Why is it important for my child to read a "Just Right Book?" Too Easy Book. Effortless reading, reader becomes easily distracted. The reader reads so fast they can't remember what they read.
  • Know Yourself, Be Yourself Awareness-Based Strategies for Success

    Know Yourself, Be Yourself Awareness-Based Strategies for Success

    "Know Yourself, Be Yourself: Using Self-Awareness to Succeed" featuring Janet Treer, president, The Treer Group, LLC To be an effective leader, it's been said that changing one's self on the inside will cause them to influence change in the external...
  • Delivering Data Futures at Edinburgh  Project Structure Project

    Delivering Data Futures at Edinburgh Project Structure Project

    Delivering Data Futures at Edinburgh - Project Structure. Investigate specific issues e.g. managing PhD Supervision records. Ensuring that Edinburgh is able to meet Data Futures requirements presents challenges in terms of systems development and in terms of policy and adapting...