Archive for October, 2011
UML for Modeling Complex Real-Time Systems
Posted by protogenist in Technology Research on October 30, 2011
The embedded real-time software systems encountered in applications such as telecommunications, aerospace, and defense typically tend to be large and extremely complex. It is crucial in such systems that the software is designed with a sound architecture. A good architecture not only simplifies construction of the initial system, but even more importantly, readily accommodates changes forced by a steady stream of new requirements. In this paper, we describe a set of constructs that facilitate the design of software architectures in this domain. The constructs, derived from field-proven concepts originally defined in the ROOM modeling language, are specified using the Unified Modeling Language (UML) standard. Modeling Structure The structure of a system identifies the entities that are to be modeled and the relationships between them (e.g., communication relationships, containment relationships). UML provides two fundamental complementary diagram types for capturing the logical structure of systems: class diagrams and collaboration diagrams. Class diagrams capture universal relationships among classes— those relationships that exist among instances of the classes in all contexts. Collaboration diagrams capture relationships that exist only within a particular context— a pattern of usage for a particular purpose that is not inherent in the class itself. Collaboration diagrams therefore include a distinction between the usage of different instances of the same class, a distinction captured in the concept of role. In the modeling approach described here, there is a strong emphasis on using UML collaboration diagrams to explicitly represent the interconnections between architectural entities. Typically, the complete specification of the structure of a complex real-time system is obtained through a combination of class and collaboration diagrams. Specifically three principal constructs for modeling structure: · capsules · ports · connectors
Fiber deployment by incumbents will make additional broadband overbuilds less likely
Posted by protogenist in Technology Research on October 28, 2011
Fiber optic cable deployment by incumbent telephone and cable companies will have a significant impact on the prospects for last-mile broadband competition. Once a customer is served by fiber cable, all non-mobile communications services could be provided over the single fiber pathway: voice, super-high-speed data, and HDTV quality video. Once fiber is put in place by one provider, the business case for additional high-speed last-mile facilities weakens. This fact is readily discernable by efforts of incumbents to block fiber-to-the-home projects that have been pursued by municipalities. Both incumbent telephone companies and incumbent cable operators have taken steps to disable the attempts of municipalities to deploy fiber. Thus, fiber optic cable, either connected directly to the household, or terminated near the home (and using existing metallic cable distribution to bridge the last few hundred feet), will provide a virtually unlimited supply of bandwidth to any end-user. Once fiber is deployed, its vast capacity will undermine the attractiveness of other technologies which are not capable of delivering the extremely high bandwidth (e.g., 100 Mbps) which fiber is capable of delivering to end users. It is simply not reasonable to believe that capital markets will support numerous last-mile overbuilds, using fiber optics, wireless, or broadband over power line technology, especially if incumbent telephone company and cable companies are well on their way to deploying fiber to, or close to, the home. Alternative technologies have deployment or operational problems. For example, broadband over power line (BPL) technology, which has the potential to share existing electric company power distribution networks is currently in the trial phase, but problems have emerged with this technology, especially due to its generation of external interference which affects radio transmission of both public safety agencies and ham radio operators. The generation of radio interference has been an unresolved issue in several BPL trials, and led to the termination of at least one trial. BPL may offer some promise as an alternative last-mile facility if the interference problems can be overcome. However, expected transmission speeds from BPL (2Mbps to 6Mbps) are much lower than those available from fiber optics. Furthermore, BPL will face a market where incumbents have already gained first-mover advantage by deploying fiber. As was recently noted by one analyst: “By the time it (BPL) really arrives in the market, terrestrial broadband will be almost fully saturated”. Fixed wireless services, such as WiMax service, may be deployed with lower levels of investment and sunk costs than fiber, but suffer from other limitations, including the requirement that high-frequency radio waves be utilized to provide the service. Higher frequency radio waves are more likely to require a direct line of sight between points of transmission. Constructing line-of-sight wireless networks may be useful for network transport, but it is much more costly to install as a last-mile facility. The very high frequencies in which WiMax operates, ranging between 2GHz and 11GHz for the non-line-of-sight service, and up to 66GHz for the highestspeed line-of-sight transmission, indicates that the spectrum is not optimal for last-mile facilities. Finally, it is also notable that due to the pending merger of at&t and BellSouth, the resulting company will control a significant number of WiMax licenses. Regulators may require the divestiture of these licenses as a merger condition, however, if they do not, it is difficult to imagine that the licenses will be used by the merged company to compete against its fiber-based broadband offering.
Integrated Data Management, Retrieval and Visualization System for Earth Science Datasets
Posted by protogenist in Technology Research on October 27, 2011
This works in developing an integrated data management, retrieval and visualization system for earth science datasets with extensibility, scalability, uniformity, transparency and heterogeneity. XML based metadata mechanism is the foundation of data management in our system. Dynamically generated query GUI makes it easy and convenient for scientists to access and retrieval diverse datasets. Scientific visualization toolkits display huge amount of data graphically to help researchers have better understanding of the data and gain valuable insights of the datasets under investigation. This system helps earth scientists use, share and visualize data more efficiently. Without knowing any information of the physical storage location, content, structure and format of each dataset instance, and without programming a single line of codes, scientists can now query heterogeneous data easily, and view and understand the retrieved data in analytical and graphical ways. DATA QUERY AND RETRIEVAL 1. Dynamically Generated Query GUIs To allow users to be able to query data system without background knowledge and training, it provides query GUIs for all datasets in the system. Its approach is to create a data query system that dynamically creates dataset query GUI for diverse datasets based on characteristic of diverse datasets. These characteristics are described in XML metadata. By using stored metadata to create the query interfaces, a standardized yet dynamic system is created that allows querying of assorted datasets. By this way, the system eliminates the need to create custom programs for different datasets. When adding new datasets to the system, query GUI for these datasets will be dynamically generated if the characteristic of these datasets has been specified in metadata. Therefore, the system provides an extensible and scalable query GUI framework. Through dynamically generated query GUI, scientists can specify search conditions, customize the format and the resolution type of the result files as needed. 2. Query Categories To make it convenient, users are able to search for data by the category of physical data types or data sources. Users may also define new categories by adding their definitions into metadata and the system will then dynamically add them into GUIs. 3. Data Search and Retrieval The data retrieval is based on the metadata system, that is, the description of the physical datasets. After user submit query request through GUI, the query can be then devised to search over the metadata catalogue, which is implemented in hierarchy. Once all data files that possibly satisfy the query have been identified by searching the available metadata, the system will retrieve these files holding the actual data and obtain useful result data. If the resolution type is not the same as original request, it will do computation and obtain data in new resolution. Then it organizes the result data file in the format that users specified.
Integrated Query and search of database, XML, and the Web
Posted by protogenist in Technology Research on October 26, 2011
The amount of information available on-line is proliferating at a tremendous rate. At one extreme,
traditional database systems are managing large amounts of structured, well-understood data that
can be queried via declarative languages such as SQL. At the other extreme, millions of unstructured
Web pages are being collected and indexed by search engines for keyword-based search. Recently,
XML— the eXtensible Markup Language— has emerged as a simple, practical way to model and
exchange semi structured data across the Internet, without the rigid constraints of traditional database
systems.
This describes work towards unifying and integrating query techniques for traditional
databases, search engines, and XML. First, we describe our contributions to the Lore DBMS for
managing semi structured data, focusing on ways to enhance system usability for effective querying
and searching. Next, we discuss algorithms and indexing techniques that enable effective keyword based
search over traditional and semi structured databases. We then describe how we have migrated
and enhanced our research on semi structured data to support the subtle but important nuances of
XML. Finally, we describe a new platform that enables efficient combined querying over structured
traditional databases and existing Web search engines.
Keyword Search Over Semi structured And Structured Databases
Keyword-based search is very useful for unstructured documents, and often is the only way to query
such data. Keyword search also can be very useful over more structured data, since it is inherently
simple for users to master and often is sufficient for the task at hand. However, some IR concepts and
algorithms must be reconsidered in a database setting. In particular, proximity search benefits from
a new approach in a database setting. Traditionally, proximity search in IR systems is implemented
using the “ near” operator. If we search our document collection for “Harrison Ford” near “Carrie
Fisher”, we are looking for documents where those two names appear “ close” to each other, where
closeness is measured by textual proximity. In this sense, proximity search is a relatively simple,
“ intra-object” operation: we measure proximity along a single dimension (text) in each document.
Now, suppose that we have fully migrated our movie document collection to XML. Each movie
might begin with a MOVIE tag, followed by nested tags for that movie’s actors, producers, etc.
In this setting, we want to account for “ structural proximity” in the database, while textual proximity
may not be relevant. For example, if Harrison Ford and Carrie Fisher both star in the same movie,
then they will both be sub elements of a specific MOVIE element. In the textual representation,
however, there may be many other actors lexically in between these actors. Similarly, we may find
that the last actor listed for some movie X is textually close to the first actor listed for an adjacent
movie Y— but this doesn’t mean that the two actors are related in any way. Thus, we need to extend
the notion of proximity search to handle the structure inherent in a semi structured database.
As per that, algorithms and techniques for performing proximity search over
a graph-structured (semi structured) database are applicable to a traditional relational or object oriented
database as well. We can (logically) translate a relational database into a graph based
on the schema and on primary/foreign key relationships. We can then use our proximity search techniques to measure the distance between database elements based on the graph representation. Viewing an object-oriented database as a graph is of course even simpler. By combining proximity search with traditional indexing techniques for identifying tables or attribute values that contain given keywords, we can provide keyword-based search (and browsing) for traditional databases.
Ultimate Cluster Models with NAMCS and NHAMCS Public Use Files
Posted by protogenist in Technology Research on October 25, 2011
Masked sample design variables were included for the first time on NAMCS and NHAMCS public use data files for survey year 2000. These design variables reflected the complex multi-stage sample design of the surveys and were intended for use with software such as SUDAAN that required such data for variance estimation. Following that release, NAMCS and NHAMCS public use files for 1993-1999 were re-released with masked design variables added.
Research was conducted comparing variance estimation for NAMCS and NHAMCS public use file data using different techniques, including SUDAAN’s with-replacement option, SUDAAN’s without-replacement option, generalized variance functions, and SAS PROC SURVEYMEANS. Multi-stage design variables were used to develop two new variables, CSTRATM and CPSUM, which could be used with analysis software employing an ultimate cluster design for estimating variance.The variances produced with these methods were compared with standard errors obtained for in-house files (which contain non-masked design variables), using SUDAAN’s without-replacement (WOR) option. This option takes into account the multiple sampling stages of the surveys.
The use of the masked design variables with the three software applications yielded more accurate standard error estimates than those derived using the generalized variance functions. Standard errors obtained using both full design SUDAAN and the two ultimate cluster designs with masked survey design variables tended to slightly overstate in-house standard errors, on average. This tendency resulted in conservative tests of significance for the data analyzed in the study.
The results support the adoption of the new CSTRATM and CPSUM variables for variance estimation in general, as they were found to yield acceptable results and can be used with a wide variety of software.
Complex system reliability modelling with DOOBN
Posted by protogenist in Technology Research on October 21, 2011
The complex manufacturing processes have to be dynamically modelled and controlled to optimise the diagnosis and the maintenance policies. The methodology that will help developing Dynamic Object Oriented Bayesian Networks (DOOBNs) to formalise such complex dynamic models. The goal is to have a general reliability evaluation of a manufacturing process, from its implementation to its operating phase. The added value of this formalisation methodology consists in using the a priori knowledge of both the system’s functioning and malfunctioning. Networks are built on principles of adaptability and integrate uncertainties on the relationships between causes and effects. Thus, the purpose is to evaluate, in terms of reliability, the impact of several decisions on the maintenance of the system. This methodology has been tested, in an industrial context, to model the reliability of a water (immersion) heater system. One of the main challenges of the Extended Enterprise is to maintain and to optimise the quality of the services delivered by industrial objects in a dynamic way along their life cycle. The purpose is to conceive decision aiding systems to maintain the system in operation. Nevertheless, most of the automated systems do not provide the means of intelligent interpretation of the information when great process disturbances have to be considered. Moreover, decisions can be taken without a perfect perception of state of the system. This partial perception argues in favour of using a probabilistic estimation of the system state. The Artificial Intelligence can be used to bring help in decision aiding systems of manufacturing processes.
Collective Knowledge Composition in a Peer-to-Peer Network
Posted by protogenist in Technology Research on October 20, 2011
Semantic Metadata Extraction and Annotation - A peer’s local knowledge can be in various formats such as: Web pages (unstructured), text documents (unstructured), XML (semi-structured), RDF or OWL, etc. In the context of efficient collective knowledge composition, this data must be in a machine processable format, such as RDF or OWL. Thus, all data that is not in this format must be processed and converted (metadata extraction). Once this is completed, the knowledge will be suitable to be shared with other peers. The Semantic Web envisions making content machine processable, not just readable or consumable by the human beings (Berners-Lee, Hendler, and Lassila, 2001). This is accomplished by the use of ontologies which involve agreed terms and their relationships in different domains. Different peers can agree to use a common ontology to annotate their content and/or resolve their differences using ontology mapping techniques. Furthermore, peers’ local knowledge will be represented in a machine processable format, with the goal of enabling the automatic composition of knowledge. Ontology-driven extraction of domain-specific semantic metadata has been a highly researched area. Both semi-automatic (Handschuh, Staab, and Studer, 2002) and automatic (Hammond, Sheth, and Kochut, 2002) techniques and tools have been developed, and significant work continues in this area (Vargas-Vera, et al., 2002). Knowledge Discovery and Composition - One of the approaches for knowledge discovery is to consider relations in the Semantic Web that are expressed semantically in languages like RDF(S). Anyanwu and Sheth (2003) have formally defined particular kinds of relations in the Semantic Web, namely, Semantic Associations. Discovery and ranking of these kinds of relations have been addressed in a centralized system (Sheth, et al., 2004; Aleman-Meza, Halaschek, Arpinar, and Sheth, 2003). However, a P2P approach can be exploited to make the discovery of knowledge more dynamic, flexible, and scalable. Since different peers may have knowledge of related entities and relationships, they can be interconnected in order to provide a solution for a scientific problem and/or to discover new knowledge by means of composing knowledge of the otherwise isolated peers. In order to exploit peers’ knowledge, it is necessary to make use of knowledge query languages. A vast amount of research has been aimed at the development of query languages and mechanisms for a variety of knowledge representation models. However, there are additional special considerations to be addressed in distributed dynamic systems such as P2P. P E E R - T O - P E E R N E T W O R K S Recently there has been a substantial amount of research in P2P networks. For example, P2P network topology has been an area of much interest. Basic peer networks include random coupling of peers over a transport network such as Gnutella (http://www.gnutella.com) (discussed by Ripeanu, 2001) and centralized server networks such as that of Napster (http://www.napster.com) architecture. These networks suffer from drawbacks such as scalability, lack of search guarantees, and bottlenecks. Yang and Garcia-Molina (2003) discussed super-peer networks that introduce hierarchy into the network in which super-peers have additional capabilities and duties in the network that may include indexing the content of other peers. Queries are broadcasted among super-peers, and these queries are then forwarded to leaf peers. Schlosser, Sintek, Decker and Nejdl (2002) proposed HyperCup, a network in which a deterministic topology is maintained and known of by all nodes in the network. Therefore, nodes at least have an idea of what the network beyond their scope looks like. They can use this globally available information to reach locally optimal decisions while routing and broadcasting search messages. Content addressable networks (CAN) (Ratnasamy, Francis, Handley, Karp, and Shenker, 2001) have provided significant improvements for keyword search. If meta-information on a peer’s content is available, this information can be used to organize the network in order to route queries more accurately and for more efficient searching. Similarly, ontologies can be used to bootstrap the P2P network organization: peers and the content that they provide can be classified by relating their content to concepts in an ontology or concept hierarchy. The classification determines, to a certain extent, a peer’s location in the network. Peers routing queries can use their knowledge of this scheme to route and broadcast queries efficiently. Peer network layouts have also combined multiple ideas briefly mentioned here. In addition, Nejdl et al. (2003) proposed a super-peer based layout for RDFbased P2P networks. Similar to content addressable networks, super-peers index the metadata context that the leaf peers have. Efficient searching in P2P networks is very important as well. Typically, a P2P node broadcasts a search request to its neighboring peers who propagate the request to their peers and so on. However, this can be dramatically improved. For example, Yang and Garcia-Molina (2003) have described techniques to increase search effectiveness. These include iterative deepening, directed Breadth First Search, and local indices over the data contained within r-hops from itself. Ramanathan, Kalogeraki, and Pruyne (2001) proposed a mechanism in which peers monitor which other peers frequently respond successfully to their requests for information. When a peer is known to frequently provide good results, other peers attempt to move closer to it in the network by creating a new connection with that peer. This leads to clusters of peers with similar interests that allow to limit the depth of searches required to find good results. Nejdl et al. (2003) proposed using the semantic indices contained in super-peers to forward queries more efficiently. Yu and Singh (2003) proposed a vector-reputation scheme for query forwarding and reorganization of the network. Tang, Xu and Dwarkadas (2003) made use of data semantics in the pSearch project. In order to achieve efficient search, they rely on a distributed hash table to extend LSI and VSM algorithms for their use in P2P networks.
Gaps in CBMIR using Different Methods
Posted by protogenist in Technology Research on October 19, 2011
Content-Based Medical Image Retrieval (CBMIR) is the application of CBIR technology in medical field. When CBMIR technology describes the image’s content, it is always extract image’s characteristics such as color, texture, shape and spatial relation to form image’s low-level feature vector as the basis of making index and matching. Since there are certain gaps between the description of these low-level features to medical image and the description of doctor’s, it is always cannot get satisfied results directly use these low-level features as retrieval basis. Therefore, it is necessary to find some kind of mapping relation between image’s low-level features and high-level semantic information are called Semantic gaps. Although the semantic gap & another gap is the sensory gap that describes the loss between the actual structure and the representation in a (digital) image; might seem more tangible to bridge in the medical domain, there are many other gaps to fill and limitations to overcome: A. Color Gaps In specialized fields, namely in the medical domain, absolute color or grey level features are often of very limited expressive power unless exact reference points exist as it is the case for computed tomography images. B. Texture Gaps Partly due to the imprecise understanding and definition of what exactly visual texture actually is, texture measures have an even larger variety than color measures. Some of the most common measures for capturing the texture of images are wavelets and Gabor filters where the Gabor filters do seem to perform better and correspond well to the properties of the human visual cortex for edge detection. C. Local and Global Features Gaps Both, color and texture features can be used on a global image level or on a local level on parts of the image. The easiest way to use regional features is to use blocks of fixed size and location, so-called partitioning of the image for local feature extraction. D. Segmentation and Shape Features Gaps Fully automated segmentation of images into objects itself is an unsolved problem. Even in fairly specialized domains, fully automated segmentation causes many problems and is often not easy to realize. In image retrieval, several systems attempt to perform an automatic segmentation of the images in the collection for feature extraction. To have an effective segmentation of images using varied image databases the segmentation process has to be done based on the color and texture properties of the image regions.
Web threat delivery mechanisms
Posted by protogenist in Technology Research on October 18, 2011
Web threats can be divided into two primary categories, based on delivery method – push and pull. Pushbased threats use spam, phishing, or other fraudulent means to lure a user to a malicious (often spoofed) Web site, which then collects information and/or injects malware. Push attacks use phishing, DNS poisoning (or pharming), and other means to appear to originate from a trusted source. Their creators have researched their target well enough to spoof corporate logos, official Web site copy, and other convincing evidence to increase the appearance of authenticity. Precisely-targeted push-based threats are often called “spear phishing” to reflect the focus of their data gathering (“phishing”) attack. Spear phishing typically targets specific individuals and groups for financial gain. In November 2006, a medical center fell victim to a spear phishing attack. Employees of the medical center received an email telling them they had been laid off. The email also contained a link that claimed to take the recipient to a career counseling site. Recipients that followed the link were infected by a keylogging Trojan. In other push-based threats, malware authors use social engineering such as enticing email subject lines that reference holidays, popular personalities, sports, pornography, world events, and other popular topics to persuade recipients to open the email and follow links to malicious sites or open attachments with malware that accesses the Web. Pull-based threats are often referred to as “drive-by” threats, since they can affect any visitor, regardless of precautions. Pull threat developers infect legitimate Web sites, which unknowingly transmit malware to visitors or alter search results to take users to malicious sites. Upon loading the page, the user’s browser passively runs a malware downloader in a hidden HTML frame (IFRAME) without any user interaction. Both push- and pull-based Web threat variants target infection at a regional or local level (for example, via local language sites aimed at particular demographics), rather than using the mass infection technique of many earlier malware approaches. These threats typically take advantage of Internet port 80, which is almost always open to permit access to the information, communication, and productivity that the Web affords to employees. Case Study: “The Italian Job” On June 15, 2007, a cyber criminal compromised nearly 6,000 Italian Web sites using three Trojans (software applications that claim to do one thing, but actually contain malicious code) that identified, stole, and uploaded personal information to a criminal network. The attack, which became known as “The Italian Job,” affected roughly 15,000 users over six days. While the damage caused by identity theft and fraud could easily reach millions of dollars, the cyber criminal who created the initial downloader used a malware kit (MPack v.86) that cost roughly $700 (USD).