CP2PC Application Programming Interface (Version 6.2) Patrick Verkaik, Ihor Kuz 1. Introduction --------------- This document contains the concepts and functions that are in the CP2PC application programming interface (API). The position of the CP2PC API in the CP2PC architecture is as follows: Application ........................................................ . User Interface . . ------------------ . . | Application | . . | Component | . . ------------------ . . | . . ---------------- ------------------ . . | | | . . CP2PC API CP2PC API CP2PC API . specific - . ---------- ----------- ----------- . to each | . | Gnutella | | JXTA | | CFS | . P2P | . | Component | | Component | | Component | . network - . ---------- ----------- ----------- . ........................................................ | | | Gnutella JXTA CFS network network network This diagram shows the main parts of a CP2PC application. At the top is the application-specific part , the 'application component'. It implements, amongst other things, a user interface. The application interfaces with any number of P2P networks through P2P network-specific components (three of which are shown). Each network-specific component implements the CP2PC API, allowing the application to remain independent of the particulars of any network-specific component. In this document, we use 'P2P network' to mean 'P2P file sharing network', and 'P2P component' is used as a synonym for 'P2P network-specific component'. In previous documents: o We identified important P2P file sharing concepts and functions that occur in P2P networks. For this we studied a number of existing P2P networks. o We identified important P2P file sharing concepts and functions that are visible to the user through the user interface (the CP2PC user interface). For this we studied a number of existing P2P applications. These are summarised as the user interface specifications in [CP2PC-USER]. Based on these experiences, the specifications for the API are given in this document. In future documents: o We will provide mappings of the API to the existing P2P Networks. o We will provide a mapping of the user interface specifications to the API. This document also contains additional elements, such as rationale, how a particular concept might be mapped to a particular P2P network. This kind of information is placed between square brackets ([]). 2. Requirements --------------- The requirements for the API are as follows: o It must be possible to implement the API on top of each of the existing P2P networks that we have studied. o The API must be able to support an application that implements the user interface specs in [CP2PC-USER]. This requirement does not say to what extent the user interface specs are directly supported by the API. o The API must be as low-level as possible while still abstracting from P2P network particulars. In particular we would like to avoid introducing too many abstract concepts or objects into the API. By making the API low-level, we hope that the API will be usable by a diverse range of P2P file sharing applications. In addition a low-level API is most easily implemented by the P2P components. [ Additional 'utility' components can be developed on top of the API that implement higher level concepts. These are not shown in the above diagram but would presumably form the lowest level of the application component part in the diagram. ] o The API must support P2P components that run as separate processes, possibly on separate hosts, for the following reasons: o In some P2P networks, such as GrapeVine, peers are expected to be long- running processes (e.g. daemons). On the other hand, we expect users to want to shut down and restart their GUI process (including the application component) often. o To allow for P2P components that lack firewall support of themselves. To get around the lack of firewall support, such a P2P component might be run on the firewall host, while the GUI process (including the application component) runs on the user's host. A P2P component that is able to run as a separate process can help in addressing both these issues. Note that an alternative solution is to divide the P2P component in two layers, with the upper layer running in the same process as the application component, and the lower layer as a separate process. o The API must suppport asynchronous invocation of operations. By this we mean that: o The application component must be able to perform high-latency operations (such as download operations or searches) without the caller (i.e. calling thread) blocking for the entire duration of the operation. o The API must allow the application component to access intermediate results of high-latency operations before they complete (e.g. accessing intermediate results of a search at the time that they come in). o The API must allow the application component to terminate high-latency operations before they complete (e.g. to cancel a search of which a sufficient number of results have come in). [ Note that synchronous operations can be provided as a 'utility layer' within the application component, on top of the asynchronous CP2PC API. Alternatively, it is possible to provide a synchronous API, and layer an asynchronous utility layer on top of that. However, that makes it harder to access intermediate results and to terminate operations. We have observed that many other P2P APIs have asynchronous features. ] o The API should be interoperable with as many languages and runtime systems as possible. Both the application component and the P2P components can be implemented independently in various languages. o The API should be language neutral. The runtime layout of the API and its data structures should not be optimised for one language (in particular the language in which the API is defined) at the expense of other languages. o The API should be compatible with the Tristero framework. The work of the Tristero project appears to be complementary to ours. o The API must be easy to understand and easy to program to. Note that there is no requirement that the full capabilities of a specific P2P network be accessible through the API, nor does a specific P2P component have to implement the full capabilities of the API. 3. Concepts ----------- Before discussing the API, we first introduce a number of concepts used by the API. 3.1. Concepts from User Interface Specifications ------------------------------------------------ A number of concepts are defined in the user interface specs [CP2PC-USER]. They are summarised below. File - The basic element of shared data. File Attributes - Attributes (metadata about a file). Native File ID - A persistent, network-wide reference, identifying a file. It is specific to a particular P2P network, and can be exchanged between CP2PC and non-CP2PC users on the P2P network. CP2PC File ID - A persistent, global reference, identifying a file. It is specific to CP2PC, and can be exchanged between CP2PC users (but is of little use to non- CP2PC users). (File) Collection - A (possibly empty) group of files, which can be published together. Native Collection ID - Identifies a collection ID, analagous to the native file ID. CP2PC Collection ID - Identifies a collection ID, analagous to the CP2PC file ID. Peer - A 'node' in one of the P2P networks. 3.2. File Handle ---------------- In addition to the native and CP2PC file IDs defined in the user interface specifications, the API defines a file handle. Like native and CP2PC file IDs: o A file handle identifies an instance of a file on some P2P network. o Different instances of a file on a P2P network have different file handles. o A file handle is not necessarily a unique identifier. o A file handle is a string. A file handle is created by a P2P component for use only by that P2P component and only within the same application. It is a local, opaque reference. The P2P component is free to place whatever information it desires, using whatever encoding, in the file handle. The format of the file handle may be different from one P2P component implementation to another, even when such P2P components are for the same P2P network. [ The reason for introducing a file handle is this. We need some way to refer to a published file (for later unpublishing), a file in a search result (for later downloading), etc. The native and CP2PC file IDs cannot always be used for this: o Not all networks support native and CP2PC file IDs (see user interface specs). Therefore not all P2P components define them. o A native or CP2PC file ID might not contain the information needed by a P2P component to e.g. unpublish a file. ] A P2P component creates file handles (and returns them to the application component) in the following cases: o When the application publishes a file by means of the P2P component. The returned file handle refers to the published file. o As part of search results that are returned to the application component. The file handles in the search results refer to files that are found on the P2P network. o The application component may request a P2P component to create a file handle from a native or CP2PC file ID (e.g. received by email from another peer). In addition the application component may request a P2P component to create a native or CP2PC file ID from a file handle (e.g. to send by email to another peer). Note that whereas native and CP2PC file IDs may be exchanged with other peers, a file handle is confined to a single CP2PC application. To illustrate the difference between the various file identifiers, consider the following example, in which a user wishes to publish a file on Gnutella, have it available to other peers on the Gnutella network for a week, and after a week withdraw (unpublish) it. 1. The user publishes a file on Gnutella. The user tells the application to publish the file on Gnutella. The application component tells the Gnutella P2P component to publish the file. In return, the Gnutella P2P component passes a file handle back to the application component, which the application component stores for future use. As a simple example, let's assume that the representation of the file handle is the same as that of the native file ID for the Gnutella network: "http://thehost:6346/get/548/filename/". (Note that the application component is not interested in the representation of the native file ID.) 2a.The user sends a file ID to other Gnutella users (mostly non-CP2PC users). The user asks the application for a native file ID for the file just published. The application component asks the Gnutella P2P component to convert the file handle to a native file ID. In our example, the native file ID and file handle are the same, i.e. "http://thehost:6346/get/548/filename/". When the user has obtained the native file ID, it can be sent to other Gnutella users (e.g. through a mailing list). Gnutella users can download the file from Gnutella using the native file ID and standard Gnutella software. 2b.The user sends a file ID to other CP2PC users. The user asks the application for a CP2PC file ID for the file. The application component asks the Gnutella P2P component to convert the file handle to a CP2PC file ID. When the user has obtained the CP2PC file ID, it can be sent to other CP2PC users. These users can download the file from Gnutella using the CP2PC file ID and their CP2PC application. Note: the native file ID representation cannot be used for this purpose, since the native file ID does not tell the CP2PC application which network the identified file is published on. However, the CP2PC file ID typically contains the native file ID as e.g. a substring. An example of a Gnutella CP2PC file ID is the following URL: "cp2pc:gnutella:http://thehost:6346/get/filename/". 3. After a week has passed, the user tells the application to unpublish the file from Gnutella. The application component passes the file handle to the Gnutella P2P component telling it to unpublish the file. 3.3. Collection Handle ---------------------- In addition to the native and CP2PC collection IDs defined in the user interface specifications, the API defines a collection handle. The collection handle is analogous to the file handle. 3.4. Operation ID ----------------- An operation ID identifies an invocation of a CP2PC operation. It is returned by asynchronous CP2PC operations (which are discussed in more detail later). When the application component invokes an asynchronous operation in the CP2PC API, the operation will return an operation ID. The application component can subsequently use the operation ID to: o Get the status of the operation. For example, an application component can inquire how far a putFile operation has progressed. o Receive callback invocations from the P2P component. A P2P component makes callbacks to the application component, reporting events that have occurred. The callback invocation includes an operation ID that identifies what operation the event pertains to. o Terminate the operation. For example, a search operation may be terminated by the application component after having received a sufficient number of search results. 3.5. Attributes --------------- We use attributes to represent metadata of many kinds (e.g. to describe published files, progress of CP2PC operations, CP2PC configuration). An attribute is represented as an RDF statement. From [RDF]: "A specific resource together with a named property plus the value of that property for that resource is an RDF statement. These three individual parts of a statement are called, respectively, the subject, the predicate, and the object. The object of a statement (i.e., the property value) can be another resource or it can be a literal; ..." An RDF statement can therefore be represented as a triple: (subject, predicate, object). This is called an 'RDF triple'. We use the terms 'attribute', '(RDF) statement', and '(RDF) triple' interchangeably. We distinguish the following categories of attributes: o File or collection attributes. File attributes describe a file that can be or is published on a P2P network. Examples are file size, file content type, file name. Collection attributes describe a collection in a similar manner. o Operational attributes. Operational attributes describe a particular invocation of a CP2PC operation. Examples are progress of a putFile operation, speed of a get operation. Specific CP2PC attributes are specified in another document. 3.6. Local Databases -------------------- Each P2P component contains a number of local databases of attributes: o The Local Attribute Database (LAD), containing operational and a number of other attributes. o The Local Monitoring Database (LMD), containing monitoring attributes. o The Local Configuration Database (LCD), containing configuration attributes. In addition, temporary local databases may be added as necessary, in the spirit of Tristero. The application component can access these databases through a number of interfaces that are part of the CP2PC API and are implemented by the P2P component: ------------------------------ | Application | | Component | ------------------------------ ^ ^ ^ ^ | | | | v v v v ------------------------------ | . . . | | . . . | | P2P . LAD . LMD . LCD | | Component . . . | | . . . | ------------------------------ Of particular importance is the LAD. All operational attributes and a subset of file/collection attributes are stored in the LAD. The remainder of file/collection attributes are stored (implicitly) in the P2P network. The LAD allows the application component and the P2P component to communicate metadata and other attributes to one another. The LMD and LCD are described later. [ The use of term 'database' does not imply that the attributes are backed by persistent storage. They can also be stored in volatile memory. The advantage of using separate interfaces and a the LAD database for metadata is that we can define metadata independently of the syntax and (to a lesser extent) semantics of the CP2PC operations. This makes it possible to reuse the CP2PC interface definitions in a context without metadata, or with a different structure of metadata (e.g. metadata represented in some other way than RDF statements). ] 4. CP2PC Application Programming Interface ------------------------------------------ In this section we specify the operations of the CP2PC API and the accompanying data types. The purpose of this section is to: o Identify the methods (primitives) that are part of the API. o Specify the semantics of each method. o Specify what data each method takes as parameters and produces as result values. o Identify the data types that are part of the API. o Specify the structure of each data type. Note that the API is implemented by each P2P component separately (as shown in the diagram earlier). Certain issues that have an impact on the API are not resolved in this section. These include: o Error handling. The syntax of the API is a C-mapping of XML-RPC. We use XML-RPC to address the requirements of: language-independence, running P2P components as separate processes, and compatibility with the Tristero framework. 4.1. Placement of Components in Processes ----------------------------------------- The API allows components to run in one or more processes, and caters for the following cases (in increasing order of complexity): o A P2P component running in the same process as the application component. o A P2P component running in a different process than the application component, but on the same host. o A P2P component running on a different host than the application component, but with the P2P component and application component having access to a shared file system. o A P2P component running on a different host than the application component, without the components having access to a shared file system. However, the API does not address the issue of 'binding' together components that are in different processes. An example of such binding is the application component and a P2P component establishing a connection. In this specification we assume that the P2P component and the application component are somehow 'bound' together at initialisation time. In several places, the API specifies that the P2P component and the application component make files available to one another by exchanging URIs that point to these files. For example, in the 'asyncPutFile' methods the application component passes a fileURI to the P2P component, and in the 'statGet' method (part of the 'asyncGet' operation) the P2P component returns a fileURI to the application component. If the two components have access to a shared file system, a 'file:' URI can be used in these cases. Otherwise, some other URI, such as an 'http:' URI must be used. Discovering whether a component can pass 'file:' URIs to another component is part of binding and is not addressed by this document. 4.2. Asynchronous Operations ---------------------------- The API allows operations to be performed asynchronously. We distinguish between 'high-latency operations' and 'low-latency operations'. A high-latency operation is allowed to perform wide-area communication (e.g. access the P2P network), and other work that takes a comparable length of time. A low-latency operation can only perform local operations (e.g. access the local file system and local networks). The distinction is somewhat vague. Many CP2PC operations may have high latency. To avoid blocking the caller while a high-latency operation is performed, the CP2PC API provides asynchronous calls for high-latency operations. Most asynchronous operations follow the pattern described below. When invoked, an asynchronous operation returns immediately and carries out its work asynchronously (e.g. in a separate thread). While carrying out its work, the operation may communicate with its caller (the application component) by performing callbacks (methods defined by the caller). The following example shows and explains the methods required for asynchronous operations. The example is an asynchronous version of the following operation: string foo(string p1, int p2); 'foo' is a regular, synchronous, high-latency operation. It takes two parameters (a string and an int) and returns a string. The asynchronous version of 'foo' is this: operationID asyncFoo (string p1, int p2, boolean allow_cb); struct { string aResult; boolean completed; } statFoo (operationID opid); Note that the operation has been mapped to two methods: 'asyncFoo' and 'statFoo'. The 'asyncFoo' method starts the operation. It takes input parameters similar to those of 'foo'. 'asyncFoo' returns immediately, and carries out any lengthy (high latency) work asynchronously. An operation ID that identifies this operation is returned. The 'allow_cb' parameter suppresses callbacks if it is 'false'. The caller can obtain results of the operation (i.e. corresponding to output parameters and the return value of 'foo') by invoking the 'statFoo' method. The 'opid' parameter identifies the 'asyncFoo' operation whose results are requested. If the 'asyncFoo' operation has completed, then 'completed' is set to true, and the other returned value(s) are the final results of the 'asyncFoo' operation (in this example, the string 'aResult', corresponding to the string returned by 'foo'). If the 'asyncFoo' operation has not yet completed, then 'completed' is set to false, and the other returned value(s) reflect possible intermediate results of the 'asyncFoo' operation (if that is supported by the operation). 4.2.1 Callbacks --------------- In CP2PC there are two types of callbacks that may be performed. The first informs the caller of changes in the operation's status (for example, a progress change). This type of callback may be performed multiple times during the operation's lifetime. The second type of callback informs the caller that the operation has completed. This type of callback is performed exactly once per operation (unless 'allow_cb' was set to false when invoking asyncFoo()). The caller (application component) implements the following callback interface, which is used by all asynchronous operations: void statusChanged (operationID opid); void statusChangeList (operationID opid, list deleted, list added); void completed (operationID opid); All callback methods take the operation ID of the asynchronous operation as a parameter. None have a return value. The 'statusChanged' callback method is invoked when any of the (intermediate) results or operational attributes of the 'asyncFoo' operation have changed. Results (possibly intermediate results if the operation has not yet completed) can be retrieved by invoking the 'statFoo' method. Operational attributes can be retrieved by querying the LAD of this P2P component for any triples containing the operation ID as a subject. It is up to the P2P component to place these operational attributes into its LAD. The 'statusChangeList' callback is an alternative to 'statusChanged'. It has the same semantics, but additionally allows a list of changes to triples to be passed. The details of this are further described in 'Subscribing to Database Changes' An asynchronous operation uses either 'statusChanged' or 'statusChangeList', but not both. Unless specified otherwise, an asynchronous operation uses 'statusChanged'. The 'completed' callback signals the completion of the 'asyncFoo' operation. Following a 'completed' callback, the P2P component will make no more callbacks to the caller for this operation ID. Notes: o The 'asyncFoo' operation needs to know where to invoke callbacks (e.g., which function or method to call, where to send an RPC to, etc.). As mentioned under 'Placement of Components', we assume that the P2P component and the application component are somehow 'bound' together at initialisation time, i.e. have been initialised with each other's addresses (e.g. reference to objects, URLs). [ There are a number of other ways to specify where an asynchronous operation should make its callbacks. One way is to include extra parameters to the 'asyncFoo' call. The details of these parameters differ depending on the implementation of the API. In a C implementation, the operation would take several function pointer parameters (each pointing to one of the callback operations). A Java implementation would take a reference to an object that implements the callback methods. An XML-RPC implementation would take a URL representing the address of a server where the callback invocations should be sent to. ] 4.2.2 Asynchronous 'Utility' Methods ------------------------------------ The CP2PC API also provides a number of standard methods that allow the caller to e.g. cancel an asynchronous operation. The following methods apply to all asynchronous CP2PC operations: void cancel (operationID opid); void wait (operationID opid); boolean isDone (operationID opid); The caller can request that an asynchronous operation be terminated by invoking 'cancel', passing the operation ID. The P2P component will make a best effort to terminate the operation quickly, without causing inconsistency in the P2P network or the P2P component. Preferably any changes that it has made as part of the operation so far are undone. However, it is up to the P2P component implementation to decide how much time it can reasonably take to perform cancellation work, given that the intent of the 'cancel' method is to terminate the operation quickly. Note that 'cancel' itself has low-latency, which means that usually it does not perform cancellation work itself, nor does it wait for the operation to terminate. Instead any high-latency work is performed asynchronously and the final 'completed' callback invocation is delayed until the operation has been fully cancelled. A caller can wait for an asynchronous operation to complete (possibly following a call to 'cancel') by invoking 'wait' passing the operation ID. This is particularly useful when combined with suppressed callbacks ('allow_cb' is false), as it allows the application component to invoke asynchronous operations in a synchronous way. If the operation has already completed at the time that 'wait' is called, then 'wait' returns immediately. The 'isDone' method tests whether an asynchronous operation has completed. It returns true if it has, false otherwise. 4.3. Data Types --------------- /* The data types are defined in terms of XML-RPC types. */ typedef string uri; typedef string pathname; typedef uri fileURI; /* Note: not restricted to the 'file:' scheme. */ typedef string handle; typedef handle fileHandle; typedef handle collectionHandle; /* we define the empty string ("") as an invalid file/collection handle */ typedef string nativeID; typedef nativeID nativeFileID; typedef nativeID nativeCollectionID; /* CP2PC file ID */ typedef uri cpID; typedef cpID cpFileID; typedef cpID cpCollectionID; typedef string operationID; /* 'list' is defined by Tristero. It is an array of variable length. */ typedef list statement; /* RDF triple: a list of length three */ typedef list statementList; /* a list of statements */ typedef statement fileAttribute; typedef statement collectionAttribute; typedef statement operAttribute; /* operational attribute */ 4.4. Publishing Files ---------------------- operationID asyncPutFile (fileURI furi, boolean immediate, boolean allow_cb); struct { boolean completed; fileHandle fid; } statPutFile (operationID opid); The 'asyncPutFile' operation publishes a file to the P2P network. The file content is taken from 'furi', which can point to a file in the local file system (e.g. a 'file:' URI) or somewhere else (e.g. an 'http:' URI). 'immediate' is a boolean which specifies whether the current contents of 'furi' must be used as file content, or whether the P2P component is allowed to fetch the contents of 'furi' at a later time. If 'immediate' is true, then the P2P component will publish the current contents of the file identified by 'furi' to the P2P network. Changes to the contents of the file identified by 'furi' (or the removal of this file) do not affect the published content. If 'immediate' is false, then the P2P component may upload a copy of the file identified by 'furi' to the P2P network as part of this operation; alternatively it may store 'furi' as a reference and retrieve data from the file later. This behaviour is unspecified. The file attributes with which the file is to be published must be entered in the LAD of the P2P component prior to this call. The 'asyncPutFile' operation looks for file attributes in the LAD with 'furi' as the subject. The file attributes include the name of the file under which it should be published. This name need not be (lexically) related to 'furi'. XXX define this attribute in a document on file/collection attributes. If the P2P network supports native file IDs, a nativeFileID and cpFileID are defined under which the file is published (to be used by other peers). The nativeFileID and cpFileID should be related to the name specified by the caller in the file attributes. A fileHandle is returned (in 'fid') which can be used to unpublish the file later and to obtain the nativeFileID and cpFileID (see section 'File and collection IDs'). An invalid fileHandle (empty string) is returned if the file id has not been determined yet. If 'completed' is true then the returned fileHandle must be valid. Notes: o In some networks (such as CFS), publications may expire if they are not periodically refreshed. The P2P component for such a network is responsible for preventing the expiration of those publications before they expire. This may, for example, be done by periodically refreshing the published information. Note that the P2P component can only do so while connected to the network. See also the 'asyncLeave' operation. [ Alternatives for dealing with expiry of published publications: o The P2P component CP2PC notify the application component of publications that are about to expire. It is then up to the application component to republish or refresh the publications (perhaps using a CP2PC operation yet to be defined), or to notify the user, or to take some other course of action. o The P2P component could ignore expiry entirely, and leave this to the application component. However, at the very least the application component should be able to determine of a particular P2P network (e.g. by querying a property of the P2P component) whether it expires its publications, and if so after what length of time publications expire. Possibly operations would need to be added to the API to query the expiry status of a particular publication, and to refresh a publication. The 'furi' argument will typically be a 'file:' argument if a file system is shared by the application component and the P2P component. However, we wish to allow for the case that these components do not have a shared file system. In such a case, an 'http:' or other scheme might be used. ] XXX move to separate doc on attributes? The following operational attributes are defined for this operation. If supported by the P2P component, the P2P component will enter them into the its LAD (with subject 'opid'), possibly maintaining them to reflect current values during the course of the operation: o 'progressDescription' A human-readable progress indication. o 'progressBytes' The number of bytes transferred (uploaded) so far. o 'progressFraction' The progress of the publication as a fraction, in the range of 0.0 (0%) to 1.0 (100%). o P2P component-specific attributes. 4.5. Unpublishing Files ----------------------- operationID asyncDeleteFile (fileHandle fid, boolean allow_cb); struct { boolean completed; } statDeleteFile (operationID opid); The 'asyncDeleteFile' operation unpublishes a file previously published (using the 'asyncPutFile' operation) to the P2P network. The file to unpublish is identified by 'fid'. XXX move to separate doc on attributes? The following operational attributes are defined for this operation. If supported by the P2P component, the P2P component will enter them into the its LAD (with subject 'opid'), possibly maintaining them to reflect current values during the course of the operation: o 'progressDescription' A human-readable progress indication. o 'progressFraction' The progress of the operation as a fraction, in the range of 0.0 (0%) to 1.0 (100%). o P2P component-specific attributes. 4.6. Publishing Collections --------------------------- typedef struct _collectionElement { fileURI furi; boolean immediate; } collectionElement; operationID asyncPutCollection (uri attrs_uri, list files, boolean allow_cb); // 'files' is a list of collectionElements struct { boolean completed; collectionHandle cid; list idlist; // list of fileHandles } statPutCollection (operationID opid); The 'asyncPutCollection' operation publishes a number of files as a collection to the P2P network. The collection attributes with which the collection is to be published must be entered in the LAD of the P2P component prior to this call. The 'asyncPutCollection' operation looks for collection attributes in the LAD with 'attrs_uri' as the subject. ('attrs_uri' has no other significance.) The collection attributes include the name of the collection under which it should be published. XXX define this attribute in a document on file/collection attributes. Both the individual files and the collection are said to be published. This operation makes the individual files available on the P2P network (e.g. for searching and downloading). Whether the collection is also available as a separate entity is specific to the P2P component. [ In CFS and GDN we expect this operation to create a CFS file system and GDN package containing the files. In Gnutella we expect the files to be published separately, and the collection either to be ignored or to be encoded in the native file ID or some attribute of the file. As specified in the user interface specs [CP2PC-USER], collections are allowed to be empty. This poses the following problems: o For an empty collection, a network such as Gnutella will not have any files in the network with which to associate collection state (such as collection attributes). However, a Gnutella P2P component can solve this problem by encoding collection state in the collection handle. o A network that supports explicit collection concepts (such as GDN and CFS) but that does not allow its collections to be empty. (We do not know of any such networks.) In this case, the P2P component can keep a collection 'alive' by creating a dummy file inside it. ] The files in the collection are conceptually published as follows: int i; collectionElement celt; for (i = 0; i < files.length; i++) { celt = files[i]; /* The 'publishName' file attribute associated with celt.furi * could be modified here, e.g. to include the 'publishName' * associated with attrs_uri. */ putFile_ret r = putFile (celt.furi, celt.immediate); result.idlist[i] = r.fid; /* Could incorporate this file's operational attributes into the * collection's operational attributes here. */ } Note that for clarity of the example, we assume an invocation of the synchronous operation 'putFile'. This operation does not actually exist. Note that the above code implies that file attributes with which each file is to be published are looked for in the LAD of the P2P component. These attributes have 'celt.furi' as the subject. A collectionHandle is returned (in 'cid') which can be used to unpublish the collection later and obtain the nativeCollectionID and cpCollectionID (see section 'File and collection IDs'). An invalid collectionHandle (empty string) is returned if the collection id has not been set yet. If 'completed' is true then the returned collectionHandle must be valid. Until 'completed' is true, the list of fileHandles returned may be incomplete (it may even be empty). Once 'completed' is true the list will be complete. The returned list never contains invalid file IDs (empty strings). If the P2P network supports native collection IDs, a nativeCollectionID and cpCollectionID are defined under which the collection is published (to be used by other peers). The nativeCollectionID and cpCollectionID should be related to the name specified by the caller in the collectionAttributes. Notes: o The note about expiry of published files under 'Publishing Files' applies here as well. XXX move to document on file/collection attributes. Note that for each file the caller specifies both a collection name (the 'publishName' attribute in the collection attributes) and a file name (the 'publishName' attribute associated with 'celt.furi'). These two names need not have a lexical relationship with each other (e.g. one being a prefix of the other). The two names should be combined in some way by each P2P component to form a native file ID. The way in which this takes place is specific to each P2P component. As an example, consider a user that publishes a CD as a collection, and specifies as the publishName attribute of the collection "Cheerful songs". The individual tracks of the CD (the files in the collection) each have their own publishName attribute: Happy birthday I do like to be beside the seaside It's a long way to Tipperary The resulting combined names under which the tracks are published by a particular P2P component might then be: Cheerful songs: Happy birthday Cheerful songs: I do like to be beside the seaside Cheerful songs: It's a long way to Tipperary XXX move to separate doc on attributes? The following operational attributes are defined for this operation. If supported by the P2P component, the P2P component will enter them into the its LAD (with subject 'opid'), possibly maintaining them to reflect current values during the course of the operation: o 'progressDescription' A human-readable progress indication. o 'progressBytes' The number of bytes transferred (uploaded) so far (over all files in the collection). o 'progressFraction' The progress of the publication as a fraction, in the range of 0.0 (0%) to 1.0 (100%) (over all files in the collection). o 'progressFiles' The progress of the publication in terms of the number of files published so far. o P2P component-specific attributes. 4.7. Unpublishing Collections ----------------------------- operationID asyncDeleteCollection (collectionHandle cid, list idlist, boolean allow_cb); // 'idlist' is a list of fileHandles struct { boolean completed; } statDeleteCollection (operationID opid); The 'asyncDeleteCollection' operation unpublishes a collection previously published to the P2P network. The collection to unpublish is identified by 'cid'. The files in the collection are also unpublished. The caller is responsible for passing a list of file handles that are in the collection ('idlist'). Whether this list of file ids is actually used is specific to the P2P component. [ In CFS and GDN we expect this operation to remove a CFS file system or GDN package, without considering the supplied list of file IDs. In Gnutella we expect the individual files to be unpublished separately using the supplied list of file IDs, and the collectionHandle to be ignored. ] The files in the collection are conceptually unpublished as follows: int i; for (i = 0; i < idlist.length; i++) { deleteFile (idlist[i]); /* Could incorporate the file's operational attributes into the * collection's operational attributes here. */ } Note, that for clarity of the example, we assume an invocation of the synchronous operation 'deleteFile'. This operation does not actually exist. XXX move to separate doc on attributes? The following operational attributes are defined for this operation. If supported by the P2P component, the P2P component will enter them into the its LAD (with the operation ID as the subject), possibly maintaining them to reflect current values during the course of the operation: o 'progressDescription' A human-readable progress indication. o 'progressFraction' The progress of the operation as a fraction, in the range of 0.0 (0%) to 1.0 (100%) (over all files in the collection). o 'progressFiles' The progress of the operation in terms of the number of files unpublished so far. o P2P component-specific attributes. 4.8. Listing Collections ------------------------ operationID asyncListCollection (collectionHandle cid, boolean allow_cb); struct { boolean completed; list files; // list of fileHandles } statListCollection (operationID opid); The 'asyncListCollection' operation makes a best effort to return a list of files contained in a collection that exists on the P2P network. This collection need not have been published by the caller. The collection is identified by 'cid'. Until 'completed' is true, the list of files in 'files' may be incomplete (it may even be empty). Once 'completed' is true the list will be complete. The returned files in 'files' may be missing or incomplete. The returned list never contains invalid file IDs (empty strings). XXX move to separate doc on attributes? The following operational attributes are defined for this operation. If supported by the P2P component, the P2P component will enter them into the its LAD (with subject 'opid'), possibly maintaining them to reflect current values during the course of the operation: o 'progressDescription' A human-readable progress indication. o 'progressFraction' The progress of the operation as a fraction, in the range of 0.0 (0%) to 1.0 (100%) (over all files in the collection). o 'progressFiles' The progress of the operation in terms of the number of files seen so far. o P2P component-specific attributes. 4.9. Adding a File to a Collection ---------------------------------- operationID asyncPutCollectionFile (collectionHandle cid, fileURI furi, boolean immediate, boolean allow_cb); struct { boolean completed; fileHandle fid; } statPutCollectionFile (operationID opid); The 'asyncPutCollectionFile' operation publishes a file as part of an existing collection (i.e. a collection that has been previously published) to the P2P network. The semantics of this operation are equivalent to having published the file as part of the collection, at the time that the collection was published (using 'asyncPutCollection'). Note that this remark implies that file attributes with which the file is to be published are looked for in the LAD of the P2P component. These attributes have 'furi' as the subject. Note also that this implies that 'fid' may be invalid (an empty string) until 'completed' is true. Notes: o The note about expiry of published files under 'Publishing Files' applies here as well. XXX move to separate doc on attributes? The following operational attributes are defined for this operation. If supported by the P2P component, the P2P component will enter them into the its LAD (with subject 'opid'), possibly maintaining them to reflect current values during the course of the operation: o Same as 'asyncPutFile'. 4.10. Removing a File from a Collection --------------------------------------- operationID asyncDeleteCollectionFile (collectionHandle cid, fileHandle fid, boolean allow_cb); struct { boolean completed; } statDeleteCollectionFile (operationID opid); The 'asyncDeleteCollectionFile' operation unpublishes a file that is part of an existing collection (i.e. a collection that has been previously published). [ In CFS and GDN we expect this operation to remove a file from a CFS file system or GDN package identified by the collectionHandle. In Gnutella we expect the individual file to be unpublished as though by the 'deleteFile' operation, and the collectionHandle to be ignored. ] XXX move to separate doc on attributes? The following operational attributes are defined for this operation. If supported by the P2P component, the P2P component will enter them into the its LAD (with the operation ID as the subject), possibly maintaining them to reflect current values during the course of the operation: o Same as 'deleteFile'. 4.11. Downloading Files ----------------------- For downloading a file we use an asynchronous version of the Tristero Redirect.get() method: [TRISTERO-REDIRECTOR] operationID asyncGet (/* fileHandle */ string uri, boolean allow_cb); struct { boolean completed; /* fileURI */ string result; } statGet (operationID opid); In CP2PC context, 'uri' is a fileHandle and 'result' is a fileURI. Note that the fileHandle and fileURI types are strings. In the CP2PC API, the 'asyncGet' operation downloads a file published on the P2P network by some peer (possibly the application component). The file to download is specified by 'uri'. This is the file handle under which the file was published (e.g. the original file handle if the file was published by this application, or a file handle converted from a native or cp2pc file ID if the file was published by another peer). The P2P component stores the downloaded content in a local file. A reference to this file is returned in 'result'. The application component is not allowed to modify the content of the local file ('result'). An invalid fileURI (empty string) is returned if the file has not yet been created. If 'completed' is true then the returned fileURI must be valid. [ This operation is intended to perform the actual download from the P2P network, and to create a local copy of the file (local to the P2P component). Although 'result' may (for example) be an HTTP URI (e.g. in the case that the P2P and the application components do not share a file system), it must not be a reference into the P2P network. The reason that the local file must not be modified by the application component is to deal with the following situation. The application component publishes a file using 'asyncPutFile', passing "file:///home/britney/oops.mp3" for 'furi', and a false value for 'immediate'. If the application component subsequently calls 'asyncGet' for the same file (perhaps the user is unwittingly downloading her own file as a result of a search), a clever P2P component is allowed to return "file:///home/britney/oops.mp3". From this situation it should be clear that the application component should not modify the file. Also, the application component should make a copy of the file available to the user if there is any danger that the user might try to modify the file. ] [ CONSIDER: /* fileURI */ resumeGet (/* fileHandle */ string uri); Best-effort attempt to resume an interrupted get. ] XXX move to separate doc on attributes? The following operational attributes are defined for this operation. If supported by the P2P component, the P2P component will enter them into the its LAD (with subject 'opid'), possibly maintaining them to reflect current values during the course of the operation: o 'progressDescription' A human-readable progress indication. o 'progressBytes' The number of bytes transferred (downloaded) so far. o 'progressFraction' The progress of the download as a fraction, in the range of 0.0 (0%) to 1.0 (100%). o P2P component-specific attributes. 4.12. File and Collection IDs ----------------------------- nativeFileID handle2native (fileHandle fid); cpFileID handle2cp (fileHandle fid); fileHandle native2handle (nativeFileID fid); fileHandle cp2handle (cpFileID fid); nativeCollectionID collection_handle2native (collectionHandle cid); cpCollectionID collection_handle2cp (collectionHandle cid); collectionHandle collection_native2handle (nativeCollectionID cid); collectionHandle collection_cp2handle (cpCollectionID cid); These operations convert handles to/from native and CP2PC IDs. string handle2debug (fileHandle fid); string collection_handle2debug (collectionHandle cid); These operations return a string with useful debugging information for a given file/collection handle. Note that although a handle is itself a string, the contents of the handle might not always be useful for debugging (see Concepts - File Handle). 4.13. Joining / Leaving ----------------------- /********************************* join ********************************/ operationID asyncJoin (boolean allow_cb); struct { boolean completed; } statJoin (operationID opid); /******************************** leave ********************************/ typedef enum _leaveStatus { LeaveRetains, LeaveDeletes, LeaveExpires } leaveStatus; operationID asyncLeave (boolean allow_cb); struct { boolean completed; leaveStatus leave_status; } statLeave (operationID opid); /****************************** retainFile *****************************/ operationID asyncRetainFile (fileHandle fid, boolean allow_cb); struct { boolean completed; boolean success; } statRetainFile (operationID opid); /*************************** retainCollection **************************/ typedef struct _retainCollectionResult { fileHandle fid; boolean file_success; } retainCollectionResult; operationID asyncRetainCollection (collectionHandle cid, list idlist, boolean allow_cb); // 'idlist' is a list of fileHandles struct { boolean completed; list rc_results; // list of retainCollectionResults boolean collection_success; } statRetainCollection (operationID opid); The 'asyncJoin' operation connects the P2P component to the P2P network. The 'asyncLeave' operation disconnects the P2P component from the P2P network. The 'asyncLeave' operation returns a value in 'leave_status' indicating whether current publications by the caller to the P2P network will remain published ('LeaveRetains'), are unpublished ('LeaveDeletes'), or will expire ('LeaveExpires'). This value is dependent on the P2P network and the P2P component. For example, an mnet P2P component will probably return 'LeaveRetains' (since mnet publications are uploaded to the network and do not expire), a CFS P2P component will probably return 'LeaveExpires' (since CFS publications are uploaded to the network but expire), and a Gnutella P2P component will probably return 'LeaveDeletes' (since Gnutella publications are not uploaded to the network, but are 'served' by the P2P component while connected to the network). 'leave_status' is only guaranteed to be set once 'completed' is true. Note that as mentioned under 'Publishing Files', a P2P component for a network where publications expire is responsible for preventing the expiration of those files. After rejoining, there are several cases depending on the 'leave_status' value returned by 'asyncLeave': o LeaveRetains: The files and collections are still published. o LeaveDeletes: The files and collections are no longer published. If the caller wishes the publications to be available again, they should be republished (see below). o LeaveExpires: The files and collections may or may not be still available depending on whether they have expired. If they have expired, and the caller wishes them to be available again, they should be republished (see below). XXX move to separate doc on attributes? The following operational attributes are defined for the 'asyncJoin' and 'asyncLeave' operations. If supported by the P2P component, the P2P component will enter them into the its LAD (with subject 'opid'), possibly maintaining them to reflect current values during the course of the operation: o 'progressDescription' A human-readable progress indication. o 'progressFraction' The progress of the operation as a fraction, in the range of 0.0 (0%) to 1.0 (100%). o P2P component-specific attributes. To efficiently republish a file or collection after rejoining, the caller should make use of the asyncRetainFile and asyncRetainCollection operations. The 'asyncRetainFile' operation attempts to republish a file, and returns true in 'success' if successful, or false if unsuccessful. If the 'asyncRetainFile' operation is unable to republish the file (i.e. 'success' is false), the caller should instead republish the file using the 'asyncPutFile' operation. For a CFS-like network the P2P component will typically verify whether the file has expired, and if not, start to refresh it periodically (as it does for any other published file). It will return a false 'success' if the file has expired, true otherwise. For a Gnutella-like network, the P2P component will typically return a true 'success' if it is still able to serve the file (e.g. has preserved the file on disk while disconnected), and false otherwise. For an mnet-like network the P2P component will typically always return a true 'success'. 'success' will only be set once 'completed' is true. The 'asyncRetainCollection' operation is similar to 'asyncRetainFile', but attempts to republish a collection and all of the files in the collection. An array of 'retainCollectionResult' is returned in 'results', one for each file in the collection, each indicating whether that file was republished successfully. In addition, the returned 'collection_success' indicates whether the collection itself (as a separate entity) was republished successfully. 'collection_succes' will only be set when 'completed' is true. If 'completed' is false then the returned array may be incomplete. Note that 'asyncRetainFile' and 'asyncRetainCollection' only need to be called on rejoining in the case that the 'asyncLeave' operation returned 'LeaveDeletes' or 'LeaveExpires' in 'leave_status'. [ We effectively do not distinguish between publication of files and availability of files: a file that is unavailable is not considered published. We do not attempt to hide from the application component whether files published by a P2P component continue to be available to the P2P network for the following reasons: o The user may want to know that published files are no longer available on the network. The application component should be able to let the user know. o In general, we like to avoid requiring complexity of P2P components. Hiding unavailibility can be complex, so we do not require it. (A P2P component is free to do implement this though.) ] XXX move to separate doc on attributes? The following operational attributes are defined for 'retainFile'. If supported by the P2P component, the P2P component will enter them into the its LAD (with subject 'opid'), possibly maintaining them to reflect current values during the course of the operation: o Same as 'asyncPutFile'. XXX move to separate doc on attributes? The following operational attributes are defined for 'retainCollection'. If supported by the P2P component, the P2P component will enter them into the its LAD (with subject 'opid'), possibly maintaining them to reflect current values during the course of the operation: o Same as 'putCollection'. 4.14. Extensions to Tristero Search Interfaces ---------------------------------------------- Notation: We refer to a Tristero interface 'x' as 'Tristero.x', e.g. 'Tristero.Search', 'Tristero.SearchSet', etc. This section contains extensions to the Tristero search interfaces in [TRISTERO-SEARCH]. We use the Tristero search interfaces for two purposes: accessing the local databases (Section 'Accessing the Local Databases'), and performing searches in the P2P networks (Section 'Searching P2P Networks'). 4.14.1 'getSubjects' and Friends -------------------------------- string getSubjects (string statementsURI); string getPredicates (string statementsURI); string getObjects (string statementsURI); Each of the above operations is an operation on a list of statements represented by a URI, and returns a list of strings represented by a URI. The result of 'getSubjects' consists of those strings that appear as a subject in any of the 'statementsURI' statements. The result contains no duplicate strings. The result of 'getPredicates' consists of those strings that appear as a predicate in any of the 'statementsURI' statements. The result contains no duplicate strings. The result of 'getObjects' consists of those strings that appear as an object in any of the 'statementsURI' statements. The result contains no duplicate strings. An example appears in 'Searching P2P Networks'. 4.14.2 The 'expand' Operations ------------------------------ string expandSubjects (string subjectsURI, string statementsURI); string expandPredicates (string predicatesURI, string statementsURI); string expandObjects (string objectsURI, string statementsURI); Each 'expand' operation is an operation on a list of strings and a list of statements, where both lists are represented by URIs. (The URI for the list of strings may have been returned by an operation such as getSubjects().) The result of this operation is a list of statements represented by a URI. The result of 'expandSubjects' consists of those statements in the 'statementsURI' list that have as their subject one of the strings in the 'subjectsURI' list. The result of 'expandPredicates' consists of those statements in the 'statementsURI' list that have as their predicate one of the strings in the 'predicatesURI' list. The result of 'expandObjects' consists of those statements in the 'statementsURI' list that have as their object one of the strings in the 'objectsURI' list. An example appears in 'Searching P2P Networks'. 4.14.3 The 'intersection' Operation ----------------------------------- We extend the semantics of 'intersection' to not only work on statement lists, but also on string lists. An example of why this is useful may be found in 'Searching P2P Networks'. 4.14.4 Subscribing to Database Changes -------------------------------------- operationID subscribe (string searchURI, boolean ignoreExisting); The 'subscribe' operation can be used to continually and incrementally receive changes to the set of statements that match a search query ('searchURI'). It is an alternative mechanism to Tristero.Fetch for resolving a search query. An example of its use can be found in 'Monitoring'. 'subscribe' is an asynchronous operation that does *not* follow the pattern described in 'Asynchronous Operations'. The 'subscribe' operation uses the 'statusChangeList' callback rather than the 'statusChanged' callback. (See section on 'Callbacks'.) Each call to 'statusChangeList' summarises the changes that occurred to the set of statements matching 'searchURI' since the previous call to 'statusChangeList' if there was such a call. If there was no previous call to 'statusChangeList' then the call summarises the changes since some starting point (see below). Changes are summarised as follows. 'deleted' is a list of statements that were removed, and 'added' is a list of statements that were added. 'deleted' and 'added' are sets: neither contains duplicate statements and both are unordered. Furthermore, the set of deleted statements is disjunct from the set of added statements. Although the order of changed statements passed to a single 'statusChangeList' invocation is irrelevant, the order of several 'statusChangeList' invocations is significant. If 'ignoreExisting' is true, then the first call to 'statusChangeList' summarises the changes since the time of subscription (i.e. the time that 'subscribe' is called). If on the other hand 'ignoreExisting' is false, the first call to 'statusChangeList' passes the entire current set of matching statements in the 'added' parameter. The 'subscribe' operation can be terminated by calling 'cancel', passing the operationID that was returned by 'subscribe'. The 'wait' and 'isDone' methods are not defined for the 'subscribe' operation. The 'completed' callback signals the completion of the 'subscribe' operation. 4.14.5 Asynchronous Execution of a Search ----------------------------------------- operationID asyncExecuteSearch (string searchURI, boolean allow_cb); struct { boolean completed; string resultsURI; } statExecuteSearch (operationID opid); In CP2PC context, the operations in the Tristero.Search and Tristero.SearchSet interfaces as well as the 'getSubjects', 'expandSubjects' etc. operations merely prepare queries but do not execute (resolve) them. (Note that Tristero already allows for this behaviour, but does not mandate it.) A query is subsequently executed using an operation in the Tristero.Fetch interface. Therefore, in CP2PC context Tristero.Search and Tristero.SearchSet always have low latency. Tristero.Fetch however may have high or low latency depending on whether the query is on a local or a remote database. CP2PC adds the 'asyncExecuteSearch' as a means to execute a query asynchronously. The search query to execute is represented by 'searchURI'. The results are placed in a local 'results' database whose URI is returned as 'resultsURI'. The results may be read from the results database using Tristero.Fetch and 'resultsURI'. Since the results database is local, this is guaranteed to be a low-latency operation. Until the 'asyncExecuteSearch' invocation has completed ('completed' is still false) intermediate results may be available in the results database. If no intermediate results are available yet, the results database is empty, or an empty string is returned as 'resultsURI'. The 'asyncExecuteSearch' uses the 'statusChangeList' callback instead of the 'statusChanged' callback. (See section on 'Callbacks'.) See 'Subscribing to Database Changes' about the 'statusChangeList' callback. XXX move to separate doc on attributes? The following operational attributes are defined for this operation. If supported by the P2P component, the P2P component will enter them into the its LAD (with subject 'opid'), possibly maintaining them to reflect current values during the course of the operation: o 'progressDescription' A human-readable progress indication. o 'searchQuery' A human-readable string containing the P2P component's interpretation of the search query implied by 'uri'. For example, a Gnutella P2P component might interpret a structured query as separate keywords and return a string containing the keywords as a searchQuery attribute. o P2P component-specific attributes. An example appears in 'Searching P2P Networks'. 4.15. Accessing the Local Databases ----------------------------------- The application component accesses the local databases (see Section 'Local Databases'), through Tristero search interfaces implemented by the P2P component: Search, SearchSet, Fetch, Add and Update. The application component reads attributes in a database using the Tristero 'Search', 'SearchSet' and 'Fetch' interfaces, and writes attributes using the Tristero 'Add' and 'Update' interfaces. Operations performed on the local databases are always low-latency (i.e. local) operations. Therefore, Tristero.Fetch can be called safely without invoking 'asyncExecuteSearch' first. The 'database' parameter passed to each Tristero 'search' operation is: o "cp2pc:lad" for the LAD. o "cp2pc:monitor" for the LMD. o "cp2pc:configuration" for the LCD. 4.16. Searching P2P Networks ---------------------------- Searching P2P networks for files and collection is provided through the collection of Tristero search interfaces interfaces as specified in [TRISTERO-SEARCH] and modified in the section on 'Extensions to Tristero Search Interfaces'. Searching is performed in three steps: 1. A search query is formulated using the Tristero Search and SearchSet interfaces. The query will be resolved against a (conceptual) database that represents the P2P network. This database consists of triples that each have a file/collection handle as subject. Additional triples may also be present (e.g. having a person as their subject), depending on the P2P network and P2P component. The 'database' parameter passed to each Tristero 'search' operation is "cp2pc:network". [ Note that the file/collection handles are local artifacts of the P2P component. This means that file/collection handles will not appear in a P2P network's database, whether it is conceptual or implemented as an actual (centralized or decentralized) database. ] The CP2PC API imposes no structure on the search query, nor any limits on how complex the search query is allowed to be. It is up to a P2P component to interpret the query in its own way. A P2P component may even choose not to perform the search if it does not understand the query. As an example, consider a user that searches for a file that is an audio file and contains the song "Oops I did it again" by the artist "Britney Spears". The user would also like the file to be encoded at 128Kbits/s or higher. After acquiring all this information from the user, the application component would perform the following calls on a P2P component. titlesURI = search("", "title", "Oops I did it again", "cp2pc:network"); authorsURI = search("", "author", "Britney Spears", "cp2pc:network"); typesURI = search("", "type", "x-audio/*", "cp2pc:network"); encodingsURI = search("", "encoding", "128", "cp2pc:network", ">="); filesURI1 = getSubjects (titlesURI); filesURI2 = getSubjects (authorsURI); filesURI3 = getSubjects (typesURI); filesURI4 = getSubjects (encodingsURI); filesURI = intersection (filesURI1, filesURI2); filesURI = intersection (filesURI, filesURI3); filesURI = intersection (filesURI, filesURI4); // 'filesURI' represents a query for the file/collection handles // that we are looking for. This may be sufficient in some cases. // However, let's assume that the user is also interested in // obtaining the attributes that we have searched on: titlesURI = expandSubjects (filesURI, titlesURI); authorsURI = expandSubjects (filesURI, authorsURI); typesURI = expandSubjects (filesURI, typesURI); encodingsURI = expandSubjects (filesURI, encodingsURI); attributesURI = union (titlesURI, authorsURI); attributesURI = union (attributesURI, typesURI); attributesURI = union (attributesURI, encodingsURI); 2. The 'attributesURI' search query is optionally executed using asyncExecuteSearch(). Since searching a P2P network is a high-latency operation, this is recommended: opid = asyncExecuteSearch (attributesURI, true); // later, when the 'completed' callback is invoked: attributesURI = statExecuteSearch (opid).resultsURI; 3. The results are retrieved. files = fetchSubjects (attributesURI); Note that if step 2 was performed, this is a low-latency operation. 'files' is a list of handles. Depending on whether each result is a file or a collection, the handle is either a file handle or a collection handle. Here we assume that they are all files. In order to provide the user with more information about each file the application may then perform the following calls for each (unique) entry in 'files': sname = search (files[i], "name", "", attributesURI); name = fetchObjects (sname) [0]; stitle = search (files[i], "title", "", attributesURI); title = fetchObjects (stitle) [0]; sauthor = search (files[i], "author", "", attributesURI); author = fetchObjects (sauthor) [0]; sencoding = search (files[i], "encoding", "", attributesURI); encoding = fetchObjects (sencoding) [0]; Note that if step 2 was performed, these are all low-latency operations. [ Whether search results describing collections will ever be returned remains open. Neither P2P network with collections (CFS and GDN) support searches. ] 4.17. Configuration ------------------- fileURI getConfigurationSchema(); Configuring the P2P component and controlling its behaviour. Configuring the P2P component consists of setting attributes in the P2P component's LCD (see 'Accessing the Local Databases'). Similarly, configuration can be read from the LCD. XXX move to separate doc on attributes? Configuration attributes are likely to be highly specific to each P2P component. We attempt to unify configuration as follows. Each P2P component defines a configuration schema as an RDF schema in XML [RDFS]. The schema describes what configuration attributes are recognised by the P2P component, and what the type and encoding of each attribute's value is. The application component can request the configuration schema from a P2P component ('getConfigurationSchema'), and e.g. convert that to a form presented to the user. The results of the form can then be turned into configuration attributes, which are set in the LCD. This needs further study though: o Are there general configuration settings that apply to all P2P components and might be treated separately? o Can we dynamically control the P2P component in this way? An example is limiting bandwidth or resource usage. 4.18. Monitoring ---------------- fileURI getMonitorSchema(); Monitoring the activity and resource usage of the P2P component. Monitoring the P2P component consists of reading specific monitoring attributes from the P2P component's LMD (see 'Accessing the Local Databases'). In particular, the monitoring attributes can be subscribed to using the 'subscribe' operation. Monitoring attributes are likely to be highly specific to each P2P component. We attempt to unify monitoring as follows. Each P2P component defines a monitoring schema as an RDF schema in XML [RDFS]. The schema describes what monitoring attributes are recognised by the P2P component, and what the type and encoding of each attribute's value is. The application component can request the monitoring schema from a P2P component ('getMonitorSchema'), and use that to present the monitor status in a meaningful way to the user. This needs further study though: o Are there general monitor attributes that apply to all P2P components and might be treated separately? 4.19. Miscellaneous Issues -------------------------- 4.19.1 Concurrency ------------------ It is possible for an application component to invoke CP2PC operations concurrently. The behaviour of a P2P component under concurrent invocations is as follows: o Concurrent invocations must not cause inconsistency in the P2P network or the P2P component. o If possible, the P2P component should execute concurrent invocations in a serialisable way. For example, the effect of a concurrent invocation of 'asyncPutFile' and 'asyncDeleteFile' can be either the effect of fully executing 'asyncPutFile' before 'asyncDeleteFile' (leaving no trace of the file in the P2P network and the P2P component), or the effect of fully executing 'asyncDeleteFile' before 'asyncPutFile' (leaving a fully uploaded copy of the file in the P2P network). However, the situation of a partially uploaded file should if possible be avoided. We do not require serialisability, nor do we specify in which way concurrent invocations should be serialisable. This may depend on the functionality offered by the P2P network. 5. Junk ------- This section contains bits of text that we may wish to recycle. 5.1. Initialisation and Persistence ----------------------------------- void init (fileURI directory); void destroy(); These operations initialise and clean up the P2P component, typically at the time the application process starts up or shuts down. In addition, these operations provide 'hooks' for the P2P component to save (in 'destroy') and restore (in 'init') relevant state on persistent storage. The 'init' operation is passed a directory name in the file system. The directory name is chosen by the application component for sole use by this P2P component. The P2P component can store files, and keep its persistent state, etc., in this directory. P2P components that do not require file storage can entirely ignore the 'directory' argument. However, a P2P component that does require file storage should create the directory if it does not already exist, and preferably only use this directory (and its subdirectories) for file storage. It is up to the P2P component (rather than the application component) to create the directory in 'init', if necessary. In 'destroy' the P2P component can do either of the following: o It can use the directory to save any state that it needs to restore in its next run. The same directory name and directory contents will be passed to 'init' in the next run. We call such a P2P component 'persistent'. o It can remove the directory (and its contents) (if created). The same directory name will be passed to 'init' in the next run, but the directory will not exist, and the P2P component will start afresh. We call such a P2P component 'non-persistent'. The application component may try to reuse file and collection handles that it has obtained from the P2P component in one run of the P2P component in a subsequent run. The P2P component should (but is not required to) ensure that these handles do not become invalid from one run to the next. Depending on the P2P component, this may be easy (e.g. if all state related to the file or collection are encoded in the handle, independent of other state) or more complex (saving internal tables to persistent storage). A P2P component that does not ensure this, must at the very least properly detect file and collection handles that have become invalid. Notes: o 'init' does not join a P2P network, nor should 'destroy' be used to leave a network. To join a network, the 'asyncJoin' operation should be used; to leave a network, the 'asyncLeave' operation should be used. A network can be joined and left any number of times in between 'init' and 'destroy', e.g.: init join leave join /* rejoin the network */ leave destroy o If multiple instances of a P2P component are running (for whatever reason) they are each given a separate directory name. o It is possible that multiple parts of the directory name need to be created by the P2P component, not just the final part. For example, if the directory name is '/home/steen/cp2pc/cfs' it is possible that only '/home/steen' exists, in which case the P2P component should create '/home/steen/cp2pc' as well as '/home/steen/cp2pc/cfs'. 6. CHANGES ---------- Since version 6.1 ----------------- o Replaced the 'Tristero database' with a number of local databases. o Asynchronous operations: o Made the operation-specific callback methods 'fooStatusChanged' and 'fooCompleted', and 'cancelFoo' methods generic: 'statusChanged', 'statusChangeList', 'completed' and 'cancel'. o Renamed 'getFooResult' to 'statFoo'. o Added 'wait' and 'isDone' methods. o Removed the 'done' return value of 'asyncFoo'. 'done' indicated that 'asyncFoo' was able to perform the operation synchronously, and no callbacks would be made. Now callbacks are always made (unless 'allow_cb' is false). o Regarding the issue of how an application component and P2P component 'find' each other (e.g. their respective addresses, object references, function pointers, etc.), in particular how the P2P component knows where to invoke callbacks, the binding of the application component and the P2P component at initialisation time is the default. Alternatives (passing callback objects or function pointers with each operation) are mentioned in square brackets. o Search-related: o Added 'expand' 'asyncExecuteSearch', and 'getSubjects', and friends. These replace 'asyncFetch', 'intersectionBySubject' and friends. o Added 'subscribe' as a generalisation of (and to replace) 'asyncMonitor'. o Redefined Tristero 'intersection' so that it operates not only on statement lists, but also on string lists. o Configuration and monitoring are based on the local databases in combination with the search-related methods. o Replaced 'getFile' with the Tristero.Redirector interface. o Renamed 'internal file ID' and 'internal collection ID' to 'file handle' and 'collection handle'. Renamed 'internalFileID', 'internalCollectionID', 'internal2native' and friends correspondingly. o Added a section on concurrent invocations of the API (Section 'Concurrency'). o Added a few notes about passing and returning 'file:' URIs (Section 'Placement of Components'). o In 'Downloading Files' clarified why the local file must not be modified by the application component. o Replaced "-1" with "" as the invalid file/collection handle. o Added section numbers. Since version 6.0 ----------------- o Added a section describing Asynchronous operations. o Removed synchronous variants of File and Collection access operations Since version 5 --------------- o Added authors. o Added the requirement that the individual P2P components can run as separate processes. o We no longer require the P2P components to be 'as stateless as possible'. This is mainly because the process of separately registering a files attributes before publishing it requires a component to store state. o The API has been modified to conform more to the Tristero framework. This implies that separate interfaces are used for setting and getting file attributes and putting and getting files. Likewise it involves the introduction of tristero interfaces such as the Tristero search interfaces. o Due to tristero compatibility the API is now specified using XML-RPC data types and operations. This includes chnaging C arrays (*'s) to Tristero 'list' and removing the n_... parameters/results that were used to tell how many elements an array contained. o Changed names of publish and download methods: publish/download/unpublish->putFile/getFile/deleteFile o Use URI's that allow protocols other than 'file:' (e.g. HTTP) as parameters to methods that transfer files. The implementation of the method is responsible for accessing the file using that URI. o File and operational attributes are represented by RDF triples. Includes introduction of and explanation of RDF triples. o Removed specifics of attributes and placed in a separate 'cp2pc-attributes' document. o Many methods are now asynchronous. The synchronous versions of these methods are now obsolete although they are still included in this document (for documentation purposes). o In getFile the furi parameter is now OUT only (meaning that it is returned as a result of getGetFileResult) o temporarily removed init and perst since semantics unclear o In section 'Other' removed: getFileAttributes and getCollectionAttributes Their functionalities have been merged in with 'Search' section operations. Since version 4 --------------- o In the introduction, mentioned that 'P2P network' means 'P2P file sharing network'. o In 'Internal File ID' and 'Data types', made the internal file ID a string. o In 'Publishing Files' (and 'Publishing Collections' and 'Adding a file to a collection') added notes about P2P networks where files expire. o Rewrote 'Joining' to deal with P2P networks where files expire. The 'leave' operation returns 'leaveStatus' instead of 'bool'. Added the 'retainFile' and 'retainCollection' operations. o In 'Joining', reworded the note about why we do not attempt to hide from the application component whether files published by a P2P component continue to be available to the P2P network. o In 'File and collection IDs' added the 'internal2debug' and 'collection_internal2debug' operations. o Added 'Initialisation and Persistence'. Since version 3 --------------- o In 'Downloading Files', clarified the description of the 'fid' parameter of download(). o In 'Other', added comments about P2P networks that do not have extensive support for attributes. Since version 2 --------------- o In 'Internal File ID', clarified internal file IDs. o In 'Outline of API', added 'Error handling' to the unresolved issues. o In 'Data types', added the 'bool' type. In 'Joining', replaced the 'int' return type with 'bool'. o In 'Publishing Files', added a 'current_contents' parameter to publish(), and in 'Publishing Collections', added a 'current_contents' field to '_collectionElement'. o Replaced the 'unsigned' data type by 'uint32'. o In 'Publishing Collections', added comments about empty collections. o In 'Publishing Collections', clarified the text explaining what happens when the caller specifies both a collection name and a filename. o In 'Joining', added text concerning the availability of files when a user leaves a P2P network. o In 'Configuration' and 'Monitoring' added '/* or some unicode type */' to getConfigurationSchema() and getMonitorSchema(). 7. References ------------- [CP2PC-USER] 'CP2PC User Interface Specifications', Patrick Verkaik, Ihor Kuz, http://www.cs.vu.nl/pub/globe/cp2pc/notes/allnotes/ cp2pc-ui-specs. [RDF] 'Resource Description Framework (RDF) Model and Syntax Specification', Ora Lassila, Ralph R. Swick, 22 February 1999, W3C Recommendation, http://www.w3.org/TR/REC-rdf-syntax. [RDFS] 'RDF Vocabulary Description Language 1.0: RDF Schema', Dan Brickley, R.V. Guha, 30 April 2002, W3C Working Draft, http://www.w3.org/TR/rdf-schema. [TRISTERO-REDIRECTOR] 'Tristero -- Components -- Redirector', http://tristero.sourceforge.net/redirector.html. [TRISTERO-SEARCH] 'Tristero -- Components -- Search Engine', Brandon Wiley, http://tristero.sourceforge.net/search.html.