Semantic Metadata Extraction and Annotation - A peer’s local knowledge can be in various formats such as: Web pages (unstructured), text documents (unstructured), XML (semi-structured), RDF or OWL, etc. In the context of efficient collective knowledge composition, this data must be in a machine processable format, such as RDF or OWL. Thus, all data that is not in this format must be processed and converted (metadata extraction). Once this is completed, the knowledge will be suitable to be shared with other peers. The Semantic Web envisions making content machine processable, not just readable or consumable by the human beings (Berners-Lee, Hendler, and Lassila, 2001). This is accomplished by the use of ontologies which involve agreed terms and their relationships in different domains. Different peers can agree to use a common ontology to annotate their content and/or resolve their differences using ontology mapping techniques. Furthermore, peers’ local knowledge will be represented in a machine processable format, with the goal of enabling the automatic composition of knowledge. Ontology-driven extraction of domain-specific semantic metadata has been a highly researched area. Both semi-automatic (Handschuh, Staab, and Studer, 2002) and automatic (Hammond, Sheth, and Kochut, 2002) techniques and tools have been developed, and significant work continues in this area (Vargas-Vera, et al., 2002). Knowledge Discovery and Composition - One of the approaches for knowledge discovery is to consider relations in the Semantic Web that are expressed semantically in languages like RDF(S). Anyanwu and Sheth (2003) have formally defined particular kinds of relations in the Semantic Web, namely, Semantic Associations. Discovery and ranking of these kinds of relations have been addressed in a centralized system (Sheth, et al., 2004; Aleman-Meza, Halaschek, Arpinar, and Sheth, 2003). However, a P2P approach can be exploited to make the discovery of knowledge more dynamic, flexible, and scalable. Since different peers may have knowledge of related entities and relationships, they can be interconnected in order to provide a solution for a scientific problem and/or to discover new knowledge by means of composing knowledge of the otherwise isolated peers. In order to exploit peers’ knowledge, it is necessary to make use of knowledge query languages. A vast amount of research has been aimed at the development of query languages and mechanisms for a variety of knowledge representation models. However, there are additional special considerations to be addressed in distributed dynamic systems such as P2P. P E E R - T O - P E E R N E T W O R K S Recently there has been a substantial amount of research in P2P networks. For example, P2P network topology has been an area of much interest. Basic peer networks include random coupling of peers over a transport network such as Gnutella (http://www.gnutella.com) (discussed by Ripeanu, 2001) and centralized server networks such as that of Napster (http://www.napster.com) architecture. These networks suffer from drawbacks such as scalability, lack of search guarantees, and bottlenecks. Yang and Garcia-Molina (2003) discussed super-peer networks that introduce hierarchy into the network in which super-peers have additional capabilities and duties in the network that may include indexing the content of other peers. Queries are broadcasted among super-peers, and these queries are then forwarded to leaf peers. Schlosser, Sintek, Decker and Nejdl (2002) proposed HyperCup, a network in which a deterministic topology is maintained and known of by all nodes in the network. Therefore, nodes at least have an idea of what the network beyond their scope looks like. They can use this globally available information to reach locally optimal decisions while routing and broadcasting search messages. Content addressable networks (CAN) (Ratnasamy, Francis, Handley, Karp, and Shenker, 2001) have provided significant improvements for keyword search. If meta-information on a peer’s content is available, this information can be used to organize the network in order to route queries more accurately and for more efficient searching. Similarly, ontologies can be used to bootstrap the P2P network organization: peers and the content that they provide can be classified by relating their content to concepts in an ontology or concept hierarchy. The classification determines, to a certain extent, a peer’s location in the network. Peers routing queries can use their knowledge of this scheme to route and broadcast queries efficiently. Peer network layouts have also combined multiple ideas briefly mentioned here. In addition, Nejdl et al. (2003) proposed a super-peer based layout for RDFbased P2P networks. Similar to content addressable networks, super-peers index the metadata context that the leaf peers have. Efficient searching in P2P networks is very important as well. Typically, a P2P node broadcasts a search request to its neighboring peers who propagate the request to their peers and so on. However, this can be dramatically improved. For example, Yang and Garcia-Molina (2003) have described techniques to increase search effectiveness. These include iterative deepening, directed Breadth First Search, and local indices over the data contained within r-hops from itself. Ramanathan, Kalogeraki, and Pruyne (2001) proposed a mechanism in which peers monitor which other peers frequently respond successfully to their requests for information. When a peer is known to frequently provide good results, other peers attempt to move closer to it in the network by creating a new connection with that peer. This leads to clusters of peers with similar interests that allow to limit the depth of searches required to find good results. Nejdl et al. (2003) proposed using the semantic indices contained in super-peers to forward queries more efficiently. Yu and Singh (2003) proposed a vector-reputation scheme for query forwarding and reorganization of the network. Tang, Xu and Dwarkadas (2003) made use of data semantics in the pSearch project. In order to achieve efficient search, they rely on a distributed hash table to extend LSI and VSM algorithms for their use in P2P networks.