Archive for October, 2011

UML for Modeling Complex Real-Time Systems

The embedded real-time software systems encountered in applications such
as telecommunications, aerospace, and defense typically tend to be large and
extremely complex. It is crucial in such systems that the software is designed with a
sound architecture. A good architecture not only simplifies construction of the initial
system, but even more importantly, readily accommodates changes forced by a
steady stream of new requirements. In this paper, we describe a set of constructs that
facilitate the design of software architectures in this domain. The constructs, derived
from field-proven concepts originally defined in the ROOM modeling language, are
specified using the Unified Modeling Language (UML) standard.

Modeling Structure

The structure of a system identifies the entities that are to be modeled and
the relationships between them (e.g., communication relationships, containment
relationships). UML provides two fundamental complementary diagram types for
capturing the logical structure of systems: class diagrams and collaboration diagrams.
Class diagrams capture universal relationships among classes— those relationships
that exist among instances of the classes in all contexts. Collaboration diagrams
capture relationships that exist only within a particular context— a pattern of usage for
a particular purpose that is not inherent in the class itself. Collaboration diagrams
therefore include a distinction between the usage of different instances of the same
class, a distinction captured in the concept of role. In the modeling approach
described here, there is a strong emphasis on using UML collaboration diagrams to
explicitly represent the interconnections between architectural entities. Typically, the
complete specification of the structure of a complex real-time system is obtained
through a combination of class and collaboration diagrams.

Specifically three principal constructs for modeling structure:

· capsules
· ports
· connectors

, , ,

Leave a comment

Fiber deployment by incumbents will make additional broadband overbuilds less likely

Fiber optic cable deployment by incumbent telephone and cable companies will have a
significant impact on the prospects for last-mile broadband competition. Once a customer is
served by fiber cable, all non-mobile communications services could be provided over the single
fiber pathway: voice, super-high-speed data, and HDTV quality video. Once fiber is put in place
by one provider, the business case for additional high-speed last-mile facilities weakens. This
fact is readily discernable by efforts of incumbents to block fiber-to-the-home projects that have
been pursued by municipalities. Both incumbent telephone companies and incumbent cable
operators have taken steps to disable the attempts of municipalities to deploy fiber. Thus, fiber
optic cable, either connected directly to the household, or terminated near the home (and using
existing metallic cable distribution to bridge the last few hundred feet), will provide a virtually
unlimited supply of bandwidth to any end-user. Once fiber is deployed, its vast capacity will
undermine the attractiveness of other technologies which are not capable of delivering the
extremely high bandwidth (e.g., 100 Mbps) which fiber is capable of delivering to end users.

It is simply not reasonable to believe that capital markets will support numerous last-mile
overbuilds, using fiber optics, wireless, or broadband over power line technology, especially if
incumbent telephone company and cable companies are well on their way to deploying fiber to,
or close to, the home. Alternative technologies have deployment or operational problems. For
example, broadband over power line (BPL) technology, which has the potential to share existing
electric company power distribution networks is currently in the trial phase, but problems have
emerged with this technology, especially due to its generation of external interference which
affects radio transmission of both public safety agencies and ham radio operators. The generation
of radio interference has been an unresolved issue in several BPL trials, and led to the termination
of at least one trial. BPL may offer some promise as an alternative last-mile facility if the
interference problems can be overcome. However, expected transmission speeds from BPL
(2Mbps to 6Mbps) are much lower than those available from fiber optics. Furthermore, BPL
will face a market where incumbents have already gained first-mover advantage by deploying
fiber. As was recently noted by one analyst: “By the time it (BPL) really arrives in the market,
terrestrial broadband will be almost fully saturated”.

Fixed wireless services, such as WiMax service, may be deployed with lower levels of
investment and sunk costs than fiber, but suffer from other limitations, including the requirement
that high-frequency radio waves be utilized to provide the service. Higher frequency radio waves
are more likely to require a direct line of sight between points of transmission. Constructing
line-of-sight wireless networks may be useful for network transport, but it is much more costly to
install as a last-mile facility. The very high frequencies in which WiMax operates, ranging
between 2GHz and 11GHz for the non-line-of-sight service, and up to 66GHz for the highestspeed
line-of-sight transmission, indicates that the spectrum is not optimal for last-mile facilities.
Finally, it is also notable that due to the pending merger of at&t and BellSouth, the resulting
company will control a significant number of WiMax licenses. Regulators may require the
divestiture of these licenses as a merger condition, however, if they do not, it is difficult to
imagine that the licenses will be used by the merged company to compete against its fiber-based
broadband offering.

, , ,

Leave a comment

Integrated Data Management, Retrieval and Visualization System for Earth Science Datasets

This works in developing an integrated data management, retrieval and visualization system for earth science
datasets with extensibility, scalability, uniformity, transparency and heterogeneity. XML based
metadata mechanism is the foundation of data management in our system. Dynamically
generated query GUI makes it easy and convenient for scientists to access and retrieval
diverse datasets. Scientific visualization toolkits display huge amount of data graphically to help
researchers have better understanding of the data and gain valuable insights of the datasets under
investigation. This system helps earth scientists use, share and visualize data more efficiently.
Without knowing any information of the physical storage location, content, structure and format of
each dataset instance, and without programming a single line of codes, scientists can now query
heterogeneous data easily, and view and understand the retrieved data in analytical and graphical ways.


1. Dynamically Generated Query GUIs

To allow users to be able to query data system without background knowledge and training, it
provides query GUIs for all datasets in the system. Its approach is to create a data query system
that dynamically creates dataset query GUI for diverse datasets based on characteristic of diverse
datasets. These characteristics are described in XML metadata. By using stored metadata to
create the query interfaces, a standardized yet dynamic system is created that allows querying of
assorted datasets. By this way, the system eliminates the need to create custom programs for
different datasets. When adding new datasets to the system, query GUI for these datasets will be
dynamically generated if the characteristic of these datasets has been specified in metadata.
Therefore, the system provides an extensible and scalable query GUI framework. Through
dynamically generated query GUI, scientists can specify search conditions, customize the format
and the resolution type of the result files as needed.

2. Query Categories

To make it convenient, users are able to search for data by the category of physical data
types or data sources. Users may also define new categories by adding their definitions into
metadata and the system will then dynamically add them into GUIs.

3. Data Search and Retrieval

The data retrieval is based on the metadata system, that is, the description of the physical
datasets. After user submit query request through GUI, the query can be then devised to search over
the metadata catalogue, which is implemented in hierarchy. Once all data files that possibly satisfy
the query have been identified by searching the available metadata, the system will retrieve these
files holding the actual data and obtain useful result data. If the resolution type is not the same
as original request, it will do computation and obtain data in new resolution. Then it organizes
the result data file in the format that users specified.

, , ,

1 Comment

Integrated Query and search of database, XML, and the Web

The amount of information available on-line is proliferating at a tremendous rate. At one extreme,

traditional database systems are managing large amounts of structured, well-understood data that

can be queried via declarative languages such as SQL. At the other extreme, millions of unstructured

Web pages are being collected and indexed by search engines for keyword-based search. Recently,

XML— the eXtensible Markup Language— has emerged as a simple, practical way to model and

exchange semi structured data across the Internet, without the rigid constraints of traditional database


This describes work towards unifying and integrating query techniques for traditional

databases, search engines, and XML. First, we describe our contributions to the Lore DBMS for

managing semi structured data, focusing on ways to enhance system usability for effective querying

and searching. Next, we discuss algorithms and indexing techniques that enable effective keyword based

search over traditional and semi structured databases. We then describe how we have migrated

and enhanced our research on semi structured data to support the subtle but important nuances of

XML. Finally, we describe a new platform that enables efficient combined querying over structured

traditional databases and existing Web search engines.

Keyword Search Over Semi structured And Structured Databases

Keyword-based search is very useful for unstructured documents, and often is the only way to query

such data. Keyword search also can be very useful over more structured data, since it is inherently

simple for users to master and often is sufficient for the task at hand. However, some IR concepts and

algorithms must be reconsidered in a database setting. In particular, proximity search benefits from

a new approach in a database setting. Traditionally, proximity search in IR systems is implemented

using the “ near” operator. If we search our document collection for “Harrison Ford” near “Carrie

Fisher”, we are looking for documents where those two names appear “ close” to each other, where

closeness is measured by textual proximity. In this sense, proximity search is a relatively simple,

“ intra-object” operation: we measure proximity along a single dimension (text) in each document.

Now, suppose that we have fully migrated our movie document collection to XML. Each movie

might begin with a MOVIE  tag, followed by nested tags for that movie’s actors, producers, etc.

In this setting, we want to account for “ structural proximity” in the database, while textual proximity

may not be relevant. For example, if Harrison Ford and Carrie Fisher both star in the same movie,

then they will both be sub elements of a specific MOVIE element. In the textual representation,

however, there may be many other actors lexically in between these actors. Similarly, we may find

that the last actor listed for some movie X is textually close to the first actor listed for an adjacent

movie Y— but this doesn’t mean that the two actors are related in any way. Thus, we need to extend

the notion of proximity search to handle the structure inherent in a semi structured database.

As per that, algorithms and techniques for performing proximity search over

a graph-structured (semi structured) database are applicable to a traditional relational or object oriented

database as well. We can (logically) translate a relational database into a graph based

on the schema and on primary/foreign key relationships. We can then use our proximity search techniques to measure the distance between database elements based on the graph representation. Viewing an object-oriented database as a graph is of course even simpler. By combining proximity search with traditional indexing techniques for identifying tables or attribute values that contain given keywords, we can provide keyword-based search (and browsing) for traditional databases.

, ,

Leave a comment

Ultimate Cluster Models with NAMCS and NHAMCS Public Use Files

Masked sample design variables were included for the first time on NAMCS and NHAMCS public use data files for survey year 2000. These design variables reflected the complex multi-stage sample design of the surveys and were intended for use with software such as SUDAAN that required such data for variance estimation. Following that release, NAMCS and NHAMCS public use files for 1993-1999 were re-released with masked design variables added.

Research was conducted comparing variance estimation for NAMCS and NHAMCS public use file data using different techniques, including SUDAAN’s with-replacement option, SUDAAN’s without-replacement option, generalized variance functions, and SAS PROC SURVEYMEANS. Multi-stage design variables were used to develop two new variables, CSTRATM and CPSUM, which could be used with analysis software employing an ultimate cluster design for estimating variance.The variances produced with these methods were compared with standard errors obtained for in-house files (which contain non-masked design variables), using SUDAAN’s without-replacement (WOR) option. This option takes into account the multiple sampling stages of the surveys.

The use of the masked design variables with the three software applications yielded more accurate standard error estimates than those derived using the generalized variance functions. Standard errors obtained using both full design SUDAAN and the two ultimate cluster designs with masked survey design variables tended to slightly overstate in-house standard errors, on average. This tendency resulted in conservative tests of significance for the data analyzed in the study.

The results support the adoption of the new CSTRATM and CPSUM variables for variance estimation in general, as they were found to yield acceptable results and can be used with a wide variety of software.

, ,

Leave a comment

Complex system reliability modelling with DOOBN

The complex manufacturing processes have to be dynamically modelled and controlled to optimise the
diagnosis and the maintenance policies. The methodology that will help developing Dynamic Object
Oriented Bayesian Networks (DOOBNs) to formalise such complex dynamic models. The goal is to have a general reliability
evaluation of a manufacturing process, from its implementation to its operating phase. The added value of this formalisation
methodology consists in using the a priori knowledge of both the system’s functioning and malfunctioning. Networks are built
on principles of adaptability and integrate uncertainties on the relationships between causes and effects. Thus, the purpose is to
evaluate, in terms of reliability, the impact of several decisions on the maintenance of the system. This methodology has been
tested, in an industrial context, to model the reliability of a water (immersion) heater system.

One of the main challenges of the Extended Enterprise is to maintain and to optimise the quality of
the services delivered by industrial objects in a dynamic way along their life cycle. The purpose is to conceive
decision aiding systems to maintain the system in operation. Nevertheless, most of the automated systems
do not provide the means of intelligent interpretation of the information when great process disturbances have to
be considered. Moreover, decisions can be taken without a perfect perception of state of the system. This partial
perception argues in favour of using a probabilistic estimation of the system state. The Artificial Intelligence can be used to
bring help in decision aiding systems of manufacturing processes.

, , ,

Leave a comment

Collective Knowledge Composition in a Peer-to-Peer Network

Semantic Metadata Extraction and Annotation - A peer’s local knowledge
can be in various formats such as: Web pages (unstructured), text documents
(unstructured), XML (semi-structured), RDF or OWL, etc. In the context of
efficient collective knowledge composition, this data must be in a machine
processable format, such as RDF or OWL. Thus, all data that is not in this format
must be processed and converted (metadata extraction). Once this is completed, the
knowledge will be suitable to be shared with other peers. The Semantic Web
envisions making content machine processable, not just readable or consumable by
the human beings (Berners-Lee, Hendler, and Lassila, 2001). This is accomplished
by the use of ontologies which involve agreed terms and their relationships in
different domains. Different peers can agree to use a common ontology to annotate
their content and/or resolve their differences using ontology mapping techniques.
Furthermore, peers’ local knowledge will be represented in a machine processable
format, with the goal of enabling the automatic composition of knowledge.
Ontology-driven extraction of domain-specific semantic metadata has been a
highly researched area. Both semi-automatic (Handschuh, Staab, and Studer, 2002)
and automatic (Hammond, Sheth, and Kochut, 2002) techniques and tools have been
developed, and significant work continues in this area (Vargas-Vera, et al., 2002).

Knowledge Discovery and Composition - One of the approaches for knowledge
discovery is to consider relations in the Semantic Web that are expressed
semantically in languages like RDF(S). Anyanwu and Sheth (2003) have formally
defined particular kinds of relations in the Semantic Web, namely, Semantic
Associations. Discovery and ranking of these kinds of relations have been
addressed in a centralized system (Sheth, et al., 2004; Aleman-Meza, Halaschek,
Arpinar, and Sheth, 2003). However, a P2P approach can be exploited to make the
discovery of knowledge more dynamic, flexible, and scalable. Since different peers
may have knowledge of related entities and relationships, they can be
interconnected in order to provide a solution for a scientific problem and/or to
discover new knowledge by means of composing knowledge of the otherwise
isolated peers.

In order to exploit peers’ knowledge, it is necessary to make use of knowledge
query languages. A vast amount of research has been aimed at the development of
query languages and mechanisms for a variety of knowledge representation models.
However, there are additional special considerations to be addressed in distributed
dynamic systems such as P2P.

P E E R - T O - P E E R N E T W O R K S

 Recently there has been a substantial amount of research in P2P networks. For
example, P2P network topology has been an area of much interest. Basic peer
networks include random coupling of peers over a transport network such as
Gnutella ( (discussed by Ripeanu, 2001) and centralized
server networks such as that of Napster ( architecture.
These networks suffer from drawbacks such as scalability, lack of search
guarantees, and bottlenecks. Yang and Garcia-Molina (2003) discussed super-peer
networks that introduce hierarchy into the network in which super-peers have
additional capabilities and duties in the network that may include indexing the
content of other peers. Queries are broadcasted among super-peers, and these
queries are then forwarded to leaf peers. Schlosser, Sintek, Decker and Nejdl
(2002) proposed HyperCup, a network in which a deterministic topology is
maintained and known of by all nodes in the network. Therefore, nodes at least
have an idea of what the network beyond their scope looks like. They can use this
globally available information to reach locally optimal decisions while routing and
broadcasting search messages. Content addressable networks (CAN) (Ratnasamy,
Francis, Handley, Karp, and Shenker, 2001) have provided significant
improvements for keyword search. If meta-information on a peer’s content is
available, this information can be used to organize the network in order to route
queries more accurately and for more efficient searching. Similarly, ontologies can
be used to bootstrap the P2P network organization: peers and the content that they
provide can be classified by relating their content to concepts in an ontology or
concept hierarchy. The classification determines, to a certain extent, a peer’s
location in the network. Peers routing queries can use their knowledge of this
scheme to route and broadcast queries efficiently.
Peer network layouts have also combined multiple ideas briefly mentioned
here. In addition, Nejdl et al. (2003) proposed a super-peer based layout for RDFbased
P2P networks. Similar to content addressable networks, super-peers index the
metadata context that the leaf peers have.

Efficient searching in P2P networks is very important as well. Typically, a P2P
node broadcasts a search request to its neighboring peers who propagate the request
to their peers and so on. However, this can be dramatically improved. For example,
Yang and Garcia-Molina (2003) have described techniques to increase search
effectiveness. These include iterative deepening, directed Breadth First Search, and
local indices over the data contained within r-hops from itself. Ramanathan,
Kalogeraki, and Pruyne (2001) proposed a mechanism in which peers monitor
which other peers frequently respond successfully to their requests for information.
When a peer is known to frequently provide good results, other peers attempt to
move closer to it in the network by creating a new connection with that peer. This
leads to clusters of peers with similar interests that allow to limit the depth of
searches required to find good results. Nejdl et al. (2003) proposed using the
semantic indices contained in super-peers to forward queries more efficiently. Yu
and Singh (2003) proposed a vector-reputation scheme for query forwarding and
reorganization of the network. Tang, Xu and Dwarkadas (2003) made use of data
semantics in the pSearch project. In order to achieve efficient search, they rely on a
distributed hash table to extend LSI and VSM algorithms for their use in P2P

, , ,

Leave a comment

%d bloggers like this: