IS 651: Distributed Systems Distributed Communication Sisi Duan Assistant Professor Information Systems [email protected] Today
Socket RPC REST Abstracting Distributed Communication Common Communication Pattern Communication Mechanisms Many protocols are available
Sockets Remote Procedure Call (RPC) Distributed shared memory (later in the class) MPI
Socket Communication Your own protocol on top of a transmission protocol Remember the network layers from the first class? Whats TCP? Whats UDP? When is it best to use TCP v.s. UDP? Socket Communication TCP (Transmission Control Protocol)
Protocol built upon the IP networking protocol, which supports sequenced, reliable, two-way transmission over a connection (or session, stream) between two sockets More reliable More expensive UDP (User Datagram Protocol) Also protocol built on top of IP. Supports best-effort, transmission of single datagrams Its ok to lose, re-order, or duplicate messages
Low latency TCP Socket API TCP Socket Demo Python 2.7 (not too different in python 3) Java will be similar (links are provided in the reading list) Simple-client.py Simple-server.py (Available at class website under Reading)
buf = conn.recv(8092) sock.close() Socket API You need to manually implement a lot of things Very handy if you are a distributed systems developer RPC Remote Procedure Call
A type of client/server communication Ease of programming Hide complexity Standardize some low-level data packaging protocols RPC The application calls the remote procedure locally at the stub The stub intercepts calls that are for remote servers
Marshalling: pack the parameters into a message Make a system call to send the message Stubs are generated automatically by RPC frameworks (libraries), which also provide the RPC Runtime Programmers only write definitions for their data structures and protocols in an IDL RPC The RPC Runtime handles message
sending The interface definition language (IDL) handles message translation RPC hides heterogeneity among the computers and handles the communication across network RPC technologies XML/RPC Over HTTP, huge XML parsing overheads
SOAP Designed for web services via HTTP, huge XML overhead CORBA Relatively comprehensive, but quite complex and heavy Protocol buffers Lightweight, developed by Google
Thrift Lightweight, supports services, developed by Facebook Extensible Markup Language (XML) XML documents form a tree structure Well-formed XML v.s. Valid XML XML validation Document type definition (DTD) XML Schema
Well-formed XML It contains only properly encoded, legal Unicode characters None of the special syntax characters (<, &) appear except when performing their markup-delineation roles The begin, end, and empty-element tags that delimit the elements are correctly nested, with none missing and none overlapping The element tags are case-sensitive - the beginning and end tags must match exactly There is a single "root" element that contains all the other elements
Valid XML The declaration in line 1 is contains question mark characters and is called a processing instruction. It references the version and encoding for the XML document. Line 2 has a reference to an external DTD file that contains the DTD. It can replaced by embedding DTD content Line 3 is the root tag for the document. Note
that it contains an attribute. Any XML tag may have an attribute and it must be quoted. Note that even though item is repeated, it uses the same tag. Never create tags like item1, item2, etc. Document Type Definition (DTD) The declaration of the DTD in the XML document has the syntax where SYSTEM refers to that fact that the DTD is a private implementation for this document rather than a standard. It would change to PUBLIC if it was a standard. < DOCTYPE root-element SYSTEM "URI" >
DTDs do not have XML syntax. They have their own syntax. The !ELEMENT declares an element (also called a tag). The child elements of a tag are declared as an ordered list in parentheses. If an element can be repeated 1 or more times, it must have a plus sign (+) after it. The character star (*) means 0 or more and so makes elements optional. A leaf node of the hierarchy is declared #PCDATA which means parsed character data and it is the text of the content. The < and > are XML entities for the less than and greater than (< >) characters. XML markup characters cannot be used because they would confuse a parser, so these pre-defined entities must replace them. There are no data types in DTDs. Everything is text. The !ATTLIST declares an attribute for an element and typically declares it as CDATA which means character data. This means that the XML parser does not parse it.
One can require a document to have an attribute in order to be valid by using #REQUIRED. DTD Example Validation command: $>xmllint --noout --valid shiporder.xml XML Well-form check command xmllint --noout shiporder.xml xmllint --noout shiporder-not-well-formed.xml xmllint --noout shiporder-well-formed-not-valid.xml
Validation check command xmllint --noout --valid shiporder.xml xmllint --noout --valid shiporder-well-formed-not-valid.xml XML/RPC a HTTP POST request Data serialization: XML RPC call to add(17,13) results in this request
XML/RPC Server side has to implement a function to process the request Sample Response 30
Problems with XML/RPC XML is very verbose, which affects performance The library doesnt support protocol versioning What happens if I want another param? What happens if I reverse order of x and y? Nothing, but what if function werent cumulative?
What if I forgot to add a param? In general, lack of types makes it difficult to build and maintain code There are other more complex and flexible XML/RPC libraries. SOAP Simple Object Access Protocol (SOAP) It is the messaging protocol for XML web services SOAP is described in XML following SOAP
schema SOAP is usually used with HTTP SOAP structure RPC-style SOAP VS. Document-style SOAP RPC-style SOAP The message corresponds to a procedure call RPC-style SOAP
Sample response Document-style SOAP Client just sends XML documents Service knows what to do XML/RPC Demo
Socket-based Approach Demo Client.py Server.py Other types of RPC Protocol Buffers
Efficient, binary serialization Support protocol evolution Support types (more compile-time errors) Usage: serializing data to non-relational databases Thrift Very similar to protocol buffers Supports both ordered lists and unordered sets
RPC Benefits Hides marshaling details (Do not need to implement the low-level details such as network-level communication, byte orders) Supports evolution of the communication components independently (e.g., protocol version numbers, rules of when to remove/rename arguments to procedure calls, etc.) Allows for efficient packaging RPC Problems
Latency (can be expensive) Pointer transfers local address space isnt shared with remote process. The programmer/RPC library has to make a decision of whether to follow pointers and serialize in depth, or to exclude that information. Typically, IDLs avoid this problem by requiring one to specify ones messages/data structures to avoid pointer confusion. RPC Key Challenges RPC semantics in the face of
Communication failures Delayed and lost messages Connection resets Expected packets never arrive Machine failures Server or client failures Did server fail before or after processing the request? Might be impossible to tell communication failures from machine failures
ATM Example A distributed system with an ATM machine and the bank back-end system (networked system) Client request: withdrawal of cash from the ATM machine Expected response: success or fail How? ATM Example Server
Check persons identity (password) Check whether the person has sufficient funds (talk to the back-end bank system to get a response of yes or no) Withdraw the amount from back account (write to the database) Wait wait wait Until the back-end bank says ok (RPC) Sends reply to the client What if the ATM does not get any response? What could be wrong?
ATM Example ATM cannot wait forever (since client wont wait) What is our goal? If the customer leaves, his/her balance should not change Can we guarantee that easily? Need timeout (what should be the timeout value?) ATM Example
A possible solution ATM keep trying to talk to the bank until it gets a response What if the ATM itself fails at some point? Alternatively ATM gives the money upon a time anyway and resolves the consistency issue with the back-end later Any issues? RPC Summary
RPC focuses on programming use and aims to Make distributed communication similar to local calls Support protocol evolution Make it hard to get it wrong Semantics are challenging Cant really hide the network and make it all look local Performance is often the concern
RPC Semantics Semantic of an RPC invocation? Network failures imply timeouts and retransmissions Server will see (a lot of) duplicates How should the server behave when it sees duplications? RPC Semantics At least once The RPC call, once issued by the client, is executed eventually at least once, but possibly multiple times
Easy to implement but raises issues How to implement? RPC Semantics At least once implementation Client keeps issuing the RPC multiple times until it gets a response from the server As long as the failures are temporary, the RPC will be executed at least once Good for some cases (Any example?) Email in a users inbox has been read (basically read requests)
Might be problematic for other types of request ATM.. RPC Semantics At most once The RPC call, once issued by a client, gets executed zero or one time. How to implement? RPC Semantics
At most once implementation Server remembers the requests that it has already seen and processed Ignore duplicated messages Key: unique id for the messages What if server crashes/restarts?
RPC Semantics Exactly once Ideal case This is when the RPC, once issued by the client, is invoked exactly once by the server. How to implement? At most once implementation Will it work?
Abstracting Communications Fair-loss-links If a correct process infinitely often sends a message m to a correct process q, then q delivers m an infinite number of times If a correct process p sends a message m a finite number of times to process q, then m cannot be delivered an infinite number of times by q No creation (related to authentication, ignored here) Stubborn links
If a correct process p sends a message m once to a correct process q, then q delivers m an infinite number of times Perfect links If a correct process p sends a message m to a correct process q, then q eventually delivers m No message is delivered by a process more than once Map to at least once, at most once, and exactly once? REST Representational State Transfer
For web services only REST defines a set of architectural principles by which you can design Web services that focus on a system's resources, including how resource states are addressed and transferred over HTTP by a wide range of clients written in different languages. There are no standards/specifications for REST Web service REST An HTTP REST Web service follows three basic design principles Use HTTP methods explicitly (HTTP is stateless)
POST: create, sending data GET: read, list, retrieve PUT: replace, update DELETE: delete Payload The web service makes available a URL to submit a purchase order (PO) The payload (HTTP entity body) should be in XML or JSON Other communication models
So far Point-to-point communication What else? Other communication models Broadcast Multicast
Reading List and Resources Recommended Socket Python https://docs.python.org/2/howto/sockets.html Java https://www.geeksforgeeks.org/socket-programming-in-java/ XML https://www.w3schools.com/xml/default.asp XML-RPC Java https://www.tutorialspoint.com/xml-rpc/index.htm Python2 https://docs.python.org/2/library/xmlrpclib.html
Identifying and Understanding Words BY: Annalisa C. Dimeo EDUC 231 Table of Contents: Identifying and Understanding Words There's a Nightmare in My Closet, By Mercer Mayer Differences between identifying and understanding Five ways to call children's attention to sight words...
What is a Character Analysis? Character analysis is when you evaluate a character's traits, their role in the story, and the conflicts they experience.. When analyzing, you will want to think critically, ask questions, and draw conclusions about the character...
What's New in OpenEdge® 10.1C? Jim Lundy Principal Product Manager Housekeeping Please set mobile phones to "stun" Fill out survey forms at the end Some Numbers to Consider 8 59 180 681 Some Numbers 8 Critical Features 59 Secondary features...
This is the ROC curve showing the individual model performance using semantic features (red), quantitative features (green), and combined semantic and quantitative features (blue). These data are very preliminary. We are now going back to confirm that:
"Know Yourself, Be Yourself: Using Self-Awareness to Succeed" featuring Janet Treer, president, The Treer Group, LLC To be an effective leader, it's been said that changing one's self on the inside will cause them to influence change in the external...
Delivering Data Futures at Edinburgh - Project Structure. Investigate specific issues e.g. managing PhD Supervision records. Ensuring that Edinburgh is able to meet Data Futures requirements presents challenges in terms of systems development and in terms of policy and adapting...
Ready to download the document? Go ahead and hit continue!