Collective Knowledge Composition in a Peer-to-Peer Network

Semantic Metadata Extraction and Annotation - A peer’s local knowledge
can be in various formats such as: Web pages (unstructured), text documents
(unstructured), XML (semi-structured), RDF or OWL, etc. In the context of
efficient collective knowledge composition, this data must be in a machine
processable format, such as RDF or OWL. Thus, all data that is not in this format
must be processed and converted (metadata extraction). Once this is completed, the
knowledge will be suitable to be shared with other peers. The Semantic Web
envisions making content machine processable, not just readable or consumable by
the human beings (Berners-Lee, Hendler, and Lassila, 2001). This is accomplished
by the use of ontologies which involve agreed terms and their relationships in
different domains. Different peers can agree to use a common ontology to annotate
their content and/or resolve their differences using ontology mapping techniques.
Furthermore, peers’ local knowledge will be represented in a machine processable
format, with the goal of enabling the automatic composition of knowledge.
Ontology-driven extraction of domain-specific semantic metadata has been a
highly researched area. Both semi-automatic (Handschuh, Staab, and Studer, 2002)
and automatic (Hammond, Sheth, and Kochut, 2002) techniques and tools have been
developed, and significant work continues in this area (Vargas-Vera, et al., 2002).

Knowledge Discovery and Composition - One of the approaches for knowledge
discovery is to consider relations in the Semantic Web that are expressed
semantically in languages like RDF(S). Anyanwu and Sheth (2003) have formally
defined particular kinds of relations in the Semantic Web, namely, Semantic
Associations. Discovery and ranking of these kinds of relations have been
addressed in a centralized system (Sheth, et al., 2004; Aleman-Meza, Halaschek,
Arpinar, and Sheth, 2003). However, a P2P approach can be exploited to make the
discovery of knowledge more dynamic, flexible, and scalable. Since different peers
may have knowledge of related entities and relationships, they can be
interconnected in order to provide a solution for a scientific problem and/or to
discover new knowledge by means of composing knowledge of the otherwise
isolated peers.

In order to exploit peers’ knowledge, it is necessary to make use of knowledge
query languages. A vast amount of research has been aimed at the development of
query languages and mechanisms for a variety of knowledge representation models.
However, there are additional special considerations to be addressed in distributed
dynamic systems such as P2P.

P E E R - T O - P E E R N E T W O R K S

 Recently there has been a substantial amount of research in P2P networks. For
example, P2P network topology has been an area of much interest. Basic peer
networks include random coupling of peers over a transport network such as
Gnutella (http://www.gnutella.com) (discussed by Ripeanu, 2001) and centralized
server networks such as that of Napster (http://www.napster.com) architecture.
These networks suffer from drawbacks such as scalability, lack of search
guarantees, and bottlenecks. Yang and Garcia-Molina (2003) discussed super-peer
networks that introduce hierarchy into the network in which super-peers have
additional capabilities and duties in the network that may include indexing the
content of other peers. Queries are broadcasted among super-peers, and these
queries are then forwarded to leaf peers. Schlosser, Sintek, Decker and Nejdl
(2002) proposed HyperCup, a network in which a deterministic topology is
maintained and known of by all nodes in the network. Therefore, nodes at least
have an idea of what the network beyond their scope looks like. They can use this
globally available information to reach locally optimal decisions while routing and
broadcasting search messages. Content addressable networks (CAN) (Ratnasamy,
Francis, Handley, Karp, and Shenker, 2001) have provided significant
improvements for keyword search. If meta-information on a peer’s content is
available, this information can be used to organize the network in order to route
queries more accurately and for more efficient searching. Similarly, ontologies can
be used to bootstrap the P2P network organization: peers and the content that they
provide can be classified by relating their content to concepts in an ontology or
concept hierarchy. The classification determines, to a certain extent, a peer’s
location in the network. Peers routing queries can use their knowledge of this
scheme to route and broadcast queries efficiently.
Peer network layouts have also combined multiple ideas briefly mentioned
here. In addition, Nejdl et al. (2003) proposed a super-peer based layout for RDFbased
P2P networks. Similar to content addressable networks, super-peers index the
metadata context that the leaf peers have.

Efficient searching in P2P networks is very important as well. Typically, a P2P
node broadcasts a search request to its neighboring peers who propagate the request
to their peers and so on. However, this can be dramatically improved. For example,
Yang and Garcia-Molina (2003) have described techniques to increase search
effectiveness. These include iterative deepening, directed Breadth First Search, and
local indices over the data contained within r-hops from itself. Ramanathan,
Kalogeraki, and Pruyne (2001) proposed a mechanism in which peers monitor
which other peers frequently respond successfully to their requests for information.
When a peer is known to frequently provide good results, other peers attempt to
move closer to it in the network by creating a new connection with that peer. This
leads to clusters of peers with similar interests that allow to limit the depth of
searches required to find good results. Nejdl et al. (2003) proposed using the
semantic indices contained in super-peers to forward queries more efficiently. Yu
and Singh (2003) proposed a vector-reputation scheme for query forwarding and
reorganization of the network. Tang, Xu and Dwarkadas (2003) made use of data
semantics in the pSearch project. In order to achieve efficient search, they rely on a
distributed hash table to extend LSI and VSM algorithms for their use in P2P
networks.
Advertisements

, , ,

  1. Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: