Archive for October, 2011

UML for Modeling Complex Real-Time Systems

The embedded real-time software systems encountered in applications such
as telecommunications, aerospace, and defense typically tend to be large and
extremely complex. It is crucial in such systems that the software is designed with a
sound architecture. A good architecture not only simplifies construction of the initial
system, but even more importantly, readily accommodates changes forced by a
steady stream of new requirements. In this paper, we describe a set of constructs that
facilitate the design of software architectures in this domain. The constructs, derived
from field-proven concepts originally defined in the ROOM modeling language, are
specified using the Unified Modeling Language (UML) standard.

Modeling Structure

The structure of a system identifies the entities that are to be modeled and
the relationships between them (e.g., communication relationships, containment
relationships). UML provides two fundamental complementary diagram types for
capturing the logical structure of systems: class diagrams and collaboration diagrams.
Class diagrams capture universal relationships among classes— those relationships
that exist among instances of the classes in all contexts. Collaboration diagrams
capture relationships that exist only within a particular context— a pattern of usage for
a particular purpose that is not inherent in the class itself. Collaboration diagrams
therefore include a distinction between the usage of different instances of the same
class, a distinction captured in the concept of role. In the modeling approach
described here, there is a strong emphasis on using UML collaboration diagrams to
explicitly represent the interconnections between architectural entities. Typically, the
complete specification of the structure of a complex real-time system is obtained
through a combination of class and collaboration diagrams.

Specifically three principal constructs for modeling structure:

· capsules
· ports
· connectors

, , ,

Leave a Comment

Fiber deployment by incumbents will make additional broadband overbuilds less likely

Fiber optic cable deployment by incumbent telephone and cable companies will have a
significant impact on the prospects for last-mile broadband competition. Once a customer is
served by fiber cable, all non-mobile communications services could be provided over the single
fiber pathway: voice, super-high-speed data, and HDTV quality video. Once fiber is put in place
by one provider, the business case for additional high-speed last-mile facilities weakens. This
fact is readily discernable by efforts of incumbents to block fiber-to-the-home projects that have
been pursued by municipalities. Both incumbent telephone companies and incumbent cable
operators have taken steps to disable the attempts of municipalities to deploy fiber. Thus, fiber
optic cable, either connected directly to the household, or terminated near the home (and using
existing metallic cable distribution to bridge the last few hundred feet), will provide a virtually
unlimited supply of bandwidth to any end-user. Once fiber is deployed, its vast capacity will
undermine the attractiveness of other technologies which are not capable of delivering the
extremely high bandwidth (e.g., 100 Mbps) which fiber is capable of delivering to end users.

It is simply not reasonable to believe that capital markets will support numerous last-mile
overbuilds, using fiber optics, wireless, or broadband over power line technology, especially if
incumbent telephone company and cable companies are well on their way to deploying fiber to,
or close to, the home. Alternative technologies have deployment or operational problems. For
example, broadband over power line (BPL) technology, which has the potential to share existing
electric company power distribution networks is currently in the trial phase, but problems have
emerged with this technology, especially due to its generation of external interference which
affects radio transmission of both public safety agencies and ham radio operators. The generation
of radio interference has been an unresolved issue in several BPL trials, and led to the termination
of at least one trial. BPL may offer some promise as an alternative last-mile facility if the
interference problems can be overcome. However, expected transmission speeds from BPL
(2Mbps to 6Mbps) are much lower than those available from fiber optics. Furthermore, BPL
will face a market where incumbents have already gained first-mover advantage by deploying
fiber. As was recently noted by one analyst: “By the time it (BPL) really arrives in the market,
terrestrial broadband will be almost fully saturated”.

Fixed wireless services, such as WiMax service, may be deployed with lower levels of
investment and sunk costs than fiber, but suffer from other limitations, including the requirement
that high-frequency radio waves be utilized to provide the service. Higher frequency radio waves
are more likely to require a direct line of sight between points of transmission. Constructing
line-of-sight wireless networks may be useful for network transport, but it is much more costly to
install as a last-mile facility. The very high frequencies in which WiMax operates, ranging
between 2GHz and 11GHz for the non-line-of-sight service, and up to 66GHz for the highestspeed
line-of-sight transmission, indicates that the spectrum is not optimal for last-mile facilities.
Finally, it is also notable that due to the pending merger of at&t and BellSouth, the resulting
company will control a significant number of WiMax licenses. Regulators may require the
divestiture of these licenses as a merger condition, however, if they do not, it is difficult to
imagine that the licenses will be used by the merged company to compete against its fiber-based
broadband offering.

, , ,

Leave a Comment

Integrated Data Management, Retrieval and Visualization System for Earth Science Datasets

This works in developing an integrated data management, retrieval and visualization system for earth science
datasets with extensibility, scalability, uniformity, transparency and heterogeneity. XML based
metadata mechanism is the foundation of data management in our system. Dynamically
generated query GUI makes it easy and convenient for scientists to access and retrieval
diverse datasets. Scientific visualization toolkits display huge amount of data graphically to help
researchers have better understanding of the data and gain valuable insights of the datasets under
investigation. This system helps earth scientists use, share and visualize data more efficiently.
Without knowing any information of the physical storage location, content, structure and format of
each dataset instance, and without programming a single line of codes, scientists can now query
heterogeneous data easily, and view and understand the retrieved data in analytical and graphical ways.

DATA QUERY AND RETRIEVAL

1. Dynamically Generated Query GUIs

To allow users to be able to query data system without background knowledge and training, it
provides query GUIs for all datasets in the system. Its approach is to create a data query system
that dynamically creates dataset query GUI for diverse datasets based on characteristic of diverse
datasets. These characteristics are described in XML metadata. By using stored metadata to
create the query interfaces, a standardized yet dynamic system is created that allows querying of
assorted datasets. By this way, the system eliminates the need to create custom programs for
different datasets. When adding new datasets to the system, query GUI for these datasets will be
dynamically generated if the characteristic of these datasets has been specified in metadata.
Therefore, the system provides an extensible and scalable query GUI framework. Through
dynamically generated query GUI, scientists can specify search conditions, customize the format
and the resolution type of the result files as needed.

2. Query Categories

To make it convenient, users are able to search for data by the category of physical data
types or data sources. Users may also define new categories by adding their definitions into
metadata and the system will then dynamically add them into GUIs.

3. Data Search and Retrieval

The data retrieval is based on the metadata system, that is, the description of the physical
datasets. After user submit query request through GUI, the query can be then devised to search over
the metadata catalogue, which is implemented in hierarchy. Once all data files that possibly satisfy
the query have been identified by searching the available metadata, the system will retrieve these
files holding the actual data and obtain useful result data. If the resolution type is not the same
as original request, it will do computation and obtain data in new resolution. Then it organizes
the result data file in the format that users specified.

, , ,

1 Comment

Integrated Query and search of database, XML, and the Web

The amount of information available on-line is proliferating at a tremendous rate. At one extreme,

traditional database systems are managing large amounts of structured, well-understood data that

can be queried via declarative languages such as SQL. At the other extreme, millions of unstructured

Web pages are being collected and indexed by search engines for keyword-based search. Recently,

XML— the eXtensible Markup Language— has emerged as a simple, practical way to model and

exchange semi structured data across the Internet, without the rigid constraints of traditional database

systems.

This describes work towards unifying and integrating query techniques for traditional

databases, search engines, and XML. First, we describe our contributions to the Lore DBMS for

managing semi structured data, focusing on ways to enhance system usability for effective querying

and searching. Next, we discuss algorithms and indexing techniques that enable effective keyword based

search over traditional and semi structured databases. We then describe how we have migrated

and enhanced our research on semi structured data to support the subtle but important nuances of

XML. Finally, we describe a new platform that enables efficient combined querying over structured

traditional databases and existing Web search engines.

Keyword Search Over Semi structured And Structured Databases

Keyword-based search is very useful for unstructured documents, and often is the only way to query

such data. Keyword search also can be very useful over more structured data, since it is inherently

simple for users to master and often is sufficient for the task at hand. However, some IR concepts and

algorithms must be reconsidered in a database setting. In particular, proximity search benefits from

a new approach in a database setting. Traditionally, proximity search in IR systems is implemented

using the “ near” operator. If we search our document collection for “Harrison Ford” near “Carrie

Fisher”, we are looking for documents where those two names appear “ close” to each other, where

closeness is measured by textual proximity. In this sense, proximity search is a relatively simple,

“ intra-object” operation: we measure proximity along a single dimension (text) in each document.

Now, suppose that we have fully migrated our movie document collection to XML. Each movie

might begin with a MOVIE  tag, followed by nested tags for that movie’s actors, producers, etc.

In this setting, we want to account for “ structural proximity” in the database, while textual proximity

may not be relevant. For example, if Harrison Ford and Carrie Fisher both star in the same movie,

then they will both be sub elements of a specific MOVIE element. In the textual representation,

however, there may be many other actors lexically in between these actors. Similarly, we may find

that the last actor listed for some movie X is textually close to the first actor listed for an adjacent

movie Y— but this doesn’t mean that the two actors are related in any way. Thus, we need to extend

the notion of proximity search to handle the structure inherent in a semi structured database.

As per that, algorithms and techniques for performing proximity search over

a graph-structured (semi structured) database are applicable to a traditional relational or object oriented

database as well. We can (logically) translate a relational database into a graph based

on the schema and on primary/foreign key relationships. We can then use our proximity search techniques to measure the distance between database elements based on the graph representation. Viewing an object-oriented database as a graph is of course even simpler. By combining proximity search with traditional indexing techniques for identifying tables or attribute values that contain given keywords, we can provide keyword-based search (and browsing) for traditional databases.

, ,

Leave a Comment

Ultimate Cluster Models with NAMCS and NHAMCS Public Use Files

Masked sample design variables were included for the first time on NAMCS and NHAMCS public use data files for survey year 2000. These design variables reflected the complex multi-stage sample design of the surveys and were intended for use with software such as SUDAAN that required such data for variance estimation. Following that release, NAMCS and NHAMCS public use files for 1993-1999 were re-released with masked design variables added.

Research was conducted comparing variance estimation for NAMCS and NHAMCS public use file data using different techniques, including SUDAAN’s with-replacement option, SUDAAN’s without-replacement option, generalized variance functions, and SAS PROC SURVEYMEANS. Multi-stage design variables were used to develop two new variables, CSTRATM and CPSUM, which could be used with analysis software employing an ultimate cluster design for estimating variance.The variances produced with these methods were compared with standard errors obtained for in-house files (which contain non-masked design variables), using SUDAAN’s without-replacement (WOR) option. This option takes into account the multiple sampling stages of the surveys.

The use of the masked design variables with the three software applications yielded more accurate standard error estimates than those derived using the generalized variance functions. Standard errors obtained using both full design SUDAAN and the two ultimate cluster designs with masked survey design variables tended to slightly overstate in-house standard errors, on average. This tendency resulted in conservative tests of significance for the data analyzed in the study.

The results support the adoption of the new CSTRATM and CPSUM variables for variance estimation in general, as they were found to yield acceptable results and can be used with a wide variety of software.

, ,

Leave a Comment

Complex system reliability modelling with DOOBN

The complex manufacturing processes have to be dynamically modelled and controlled to optimise the
diagnosis and the maintenance policies. The methodology that will help developing Dynamic Object
Oriented Bayesian Networks (DOOBNs) to formalise such complex dynamic models. The goal is to have a general reliability
evaluation of a manufacturing process, from its implementation to its operating phase. The added value of this formalisation
methodology consists in using the a priori knowledge of both the system’s functioning and malfunctioning. Networks are built
on principles of adaptability and integrate uncertainties on the relationships between causes and effects. Thus, the purpose is to
evaluate, in terms of reliability, the impact of several decisions on the maintenance of the system. This methodology has been
tested, in an industrial context, to model the reliability of a water (immersion) heater system.

One of the main challenges of the Extended Enterprise is to maintain and to optimise the quality of
the services delivered by industrial objects in a dynamic way along their life cycle. The purpose is to conceive
decision aiding systems to maintain the system in operation. Nevertheless, most of the automated systems
do not provide the means of intelligent interpretation of the information when great process disturbances have to
be considered. Moreover, decisions can be taken without a perfect perception of state of the system. This partial
perception argues in favour of using a probabilistic estimation of the system state. The Artificial Intelligence can be used to
bring help in decision aiding systems of manufacturing processes.

, , ,

Leave a Comment

Collective Knowledge Composition in a Peer-to-Peer Network

Semantic Metadata Extraction and Annotation - A peer’s local knowledge
can be in various formats such as: Web pages (unstructured), text documents
(unstructured), XML (semi-structured), RDF or OWL, etc. In the context of
efficient collective knowledge composition, this data must be in a machine
processable format, such as RDF or OWL. Thus, all data that is not in this format
must be processed and converted (metadata extraction). Once this is completed, the
knowledge will be suitable to be shared with other peers. The Semantic Web
envisions making content machine processable, not just readable or consumable by
the human beings (Berners-Lee, Hendler, and Lassila, 2001). This is accomplished
by the use of ontologies which involve agreed terms and their relationships in
different domains. Different peers can agree to use a common ontology to annotate
their content and/or resolve their differences using ontology mapping techniques.
Furthermore, peers’ local knowledge will be represented in a machine processable
format, with the goal of enabling the automatic composition of knowledge.
Ontology-driven extraction of domain-specific semantic metadata has been a
highly researched area. Both semi-automatic (Handschuh, Staab, and Studer, 2002)
and automatic (Hammond, Sheth, and Kochut, 2002) techniques and tools have been
developed, and significant work continues in this area (Vargas-Vera, et al., 2002).

Knowledge Discovery and Composition - One of the approaches for knowledge
discovery is to consider relations in the Semantic Web that are expressed
semantically in languages like RDF(S). Anyanwu and Sheth (2003) have formally
defined particular kinds of relations in the Semantic Web, namely, Semantic
Associations. Discovery and ranking of these kinds of relations have been
addressed in a centralized system (Sheth, et al., 2004; Aleman-Meza, Halaschek,
Arpinar, and Sheth, 2003). However, a P2P approach can be exploited to make the
discovery of knowledge more dynamic, flexible, and scalable. Since different peers
may have knowledge of related entities and relationships, they can be
interconnected in order to provide a solution for a scientific problem and/or to
discover new knowledge by means of composing knowledge of the otherwise
isolated peers.

In order to exploit peers’ knowledge, it is necessary to make use of knowledge
query languages. A vast amount of research has been aimed at the development of
query languages and mechanisms for a variety of knowledge representation models.
However, there are additional special considerations to be addressed in distributed
dynamic systems such as P2P.

P E E R - T O - P E E R N E T W O R K S

 Recently there has been a substantial amount of research in P2P networks. For
example, P2P network topology has been an area of much interest. Basic peer
networks include random coupling of peers over a transport network such as
Gnutella (http://www.gnutella.com) (discussed by Ripeanu, 2001) and centralized
server networks such as that of Napster (http://www.napster.com) architecture.
These networks suffer from drawbacks such as scalability, lack of search
guarantees, and bottlenecks. Yang and Garcia-Molina (2003) discussed super-peer
networks that introduce hierarchy into the network in which super-peers have
additional capabilities and duties in the network that may include indexing the
content of other peers. Queries are broadcasted among super-peers, and these
queries are then forwarded to leaf peers. Schlosser, Sintek, Decker and Nejdl
(2002) proposed HyperCup, a network in which a deterministic topology is
maintained and known of by all nodes in the network. Therefore, nodes at least
have an idea of what the network beyond their scope looks like. They can use this
globally available information to reach locally optimal decisions while routing and
broadcasting search messages. Content addressable networks (CAN) (Ratnasamy,
Francis, Handley, Karp, and Shenker, 2001) have provided significant
improvements for keyword search. If meta-information on a peer’s content is
available, this information can be used to organize the network in order to route
queries more accurately and for more efficient searching. Similarly, ontologies can
be used to bootstrap the P2P network organization: peers and the content that they
provide can be classified by relating their content to concepts in an ontology or
concept hierarchy. The classification determines, to a certain extent, a peer’s
location in the network. Peers routing queries can use their knowledge of this
scheme to route and broadcast queries efficiently.
Peer network layouts have also combined multiple ideas briefly mentioned
here. In addition, Nejdl et al. (2003) proposed a super-peer based layout for RDFbased
P2P networks. Similar to content addressable networks, super-peers index the
metadata context that the leaf peers have.

Efficient searching in P2P networks is very important as well. Typically, a P2P
node broadcasts a search request to its neighboring peers who propagate the request
to their peers and so on. However, this can be dramatically improved. For example,
Yang and Garcia-Molina (2003) have described techniques to increase search
effectiveness. These include iterative deepening, directed Breadth First Search, and
local indices over the data contained within r-hops from itself. Ramanathan,
Kalogeraki, and Pruyne (2001) proposed a mechanism in which peers monitor
which other peers frequently respond successfully to their requests for information.
When a peer is known to frequently provide good results, other peers attempt to
move closer to it in the network by creating a new connection with that peer. This
leads to clusters of peers with similar interests that allow to limit the depth of
searches required to find good results. Nejdl et al. (2003) proposed using the
semantic indices contained in super-peers to forward queries more efficiently. Yu
and Singh (2003) proposed a vector-reputation scheme for query forwarding and
reorganization of the network. Tang, Xu and Dwarkadas (2003) made use of data
semantics in the pSearch project. In order to achieve efficient search, they rely on a
distributed hash table to extend LSI and VSM algorithms for their use in P2P
networks.

, , ,

Leave a Comment

Gaps in CBMIR using Different Methods

Content-Based Medical Image Retrieval (CBMIR) is the application of CBIR technology in medical field.
When CBMIR technology describes the image’s content, it is always extract image’s characteristics such as color, texture,
shape and spatial relation to form image’s low-level feature vector as the basis of making index and matching.
Since there are certain gaps between the description of these low-level features to medical image and the description of
doctor’s, it is always cannot get satisfied results directly use these low-level features as retrieval basis. Therefore, it is
necessary to find some kind of mapping relation between image’s low-level features and high-level semantic
information are called Semantic gaps.

Although the semantic gap & another gap is the sensory gap that describes the loss between the actual structure and
the representation in a (digital) image; might seem more tangible to bridge in the medical domain, there are many other
gaps to fill and limitations to overcome:

A. Color Gaps

In specialized fields, namely in the medical domain, absolute color or grey level features are often of very limited
expressive power unless exact reference points exist as it is the case for computed tomography images.

B. Texture Gaps

Partly due to the imprecise understanding and definition of what exactly visual texture actually is, texture measures have
an even larger variety than color measures. Some of the most common measures for capturing the texture of images are
wavelets and Gabor filters where the Gabor filters do seem to perform better and correspond well to the properties
of the human visual cortex for edge detection.

C. Local and Global Features Gaps

Both, color and texture features can be used on a global image level or on a local level on parts of the image. The
easiest way to use regional features is to use blocks of fixed size and location, so-called partitioning of the image for local
feature extraction.

D. Segmentation and Shape Features Gaps

Fully automated segmentation of images into objects itself is an unsolved problem. Even in fairly specialized domains,
fully automated segmentation causes many problems and is often not easy to realize. In image retrieval, several systems
attempt to perform an automatic segmentation of the images in the collection for feature extraction. To have an effective
segmentation of images using varied image databases the segmentation process has to be done based on the color and
texture properties of the image regions.

, , , ,

Leave a Comment

Web threat delivery mechanisms

Web threats can be divided into two primary categories, based on delivery method – push and pull. Pushbased
threats use spam, phishing, or other fraudulent means to lure a user to a malicious (often spoofed)
Web site, which then collects information and/or injects malware. Push attacks use phishing, DNS
poisoning (or pharming), and other means to appear to originate from a trusted source. Their creators
have researched their target well enough to spoof corporate logos, official Web site copy, and other
convincing evidence to increase the appearance of authenticity.

Precisely-targeted push-based threats are often called “spear phishing” to reflect the focus of their data
gathering (“phishing”) attack. Spear phishing typically targets specific individuals and groups for financial
gain. In November 2006, a medical center fell victim to a spear phishing attack. Employees of the medical
center received an email telling them they had been laid off. The email also contained a link that claimed
to take the recipient to a career counseling site. Recipients that followed the link were infected by a
keylogging Trojan.

In other push-based threats, malware authors use social engineering such as enticing email subject lines
that reference holidays, popular personalities, sports, pornography, world events, and other popular topics
to persuade recipients to open the email and follow links to malicious sites or open attachments with
malware that accesses the Web.

Pull-based threats are often referred to as “drive-by” threats, since they can affect any visitor, regardless
of precautions. Pull threat developers infect legitimate Web sites, which unknowingly transmit malware to
visitors or alter search results to take users to malicious sites. Upon loading the page, the user’s browser
passively runs a malware downloader in a hidden HTML frame (IFRAME) without any user interaction.

Both push- and pull-based Web threat variants target infection at a regional or local level (for example, via
local language sites aimed at particular demographics), rather than using the mass infection technique of
many earlier malware approaches. These threats typically take advantage of Internet port 80, which is
almost always open to permit access to the information, communication, and productivity that the Web
affords to employees.

Case Study: “The Italian Job”

On June 15, 2007, a cyber criminal compromised nearly 6,000 Italian Web sites using
three Trojans (software applications that claim to do one thing, but actually contain
malicious code) that identified, stole, and uploaded personal information to a criminal
network. The attack, which became known as “The Italian Job,” affected roughly 15,000
users over six days. While the damage caused by identity theft and fraud could easily
reach millions of dollars, the cyber criminal who created the initial downloader used a
malware kit (MPack v.86) that cost roughly $700 (USD).

, , ,

Leave a Comment

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: