Archive for November, 2011

Graphics Processing Units Using the Compute Unified Device Architecture

Graphic Processing Units (GPUs) have mainly been game- and video-centric devices.
Due to the increasing computational requirements of graphics-processing applications, GPUs have become
very powerful parallel processors and this, moreover, incited research interest in computing outside
the graphics-community. Until recently, however, programming GPUs was limited to graphics libraries
such as OpenGL and Direct3D, and for many applications, especially those based on integer-arithmetic,
the perfor mance improvements over CPUs was minimal or even degrading. The release of NVIDIA’s G80
series and ATI’s HD2000 series GPUs (which implemented the unified shader architecture), along with the
companies’ release of higherlevel language support with Compute Unified Device Architecture (CUDA),
Close to Metal (CTM) and the more recent Open Computing Language (OpenCL), however, facilitate the
development of massively-parallel general purpose applications for GPUs. These general purpose GPUs have
become a common target for numerically-intensive applications given their ease of programming
(compared to previous generation GPUs), and ability to outperform CPUs in data-parallel applications,
commonly by orders of magnitude.

In addition to the common floating point processing capabilities of previous generation GPUs, starting
with the G80 series, NVIDIA’s GPU architecture added support for integer arithmetic, including 32-bit
addition/subtraction and bit-wise operations, scatter/gather memory access and different memory spaces.
Each GPU contains between 10 and 30 streaming multiprocessors (SMs) each equipped with: eight scalar
processor (SP) cores, fast 16-way banked onchip shared memory (16KB/SM), a multithreaded instruction unit,
large register file (8192 for G80-based GPUs, 16384 for the newer GT200 series), read-only caches for
constant (8KB/SM) and texture memories (varying between 6 and 8 KB/SM), and two special function units
(for transcendentals).

CUDA is an extension of the C language that employs the new massively parallel programming model, single
instruction multiple-thread. SIMT differs from SIMD in that the underlying vector size is hidden and the
programmer is restricted to writing scalar code that is parallel at the thread-level. The programmer defines
kernel functions, which are compiled for and executed on the SPs of each SM, in parallel: each light-weight
thread executes the same code, operating on different data. A number of threads (less than 512) are grouped
into a thread block which is scheduled on a single SM, the threads of which timeshare the SPs. This additional
hierarchy provides for threads within the same block to communicate using the on-chip shared memory and
synchronize their execution using barriers. Moreover, multiple thread blocks can be executed simultaneously
on the GPU as part of a grid; a maximum of eight thread blocks can be scheduled per SM and in order to hide
instruction and memory (among other) latencies, it is important that at least two blocks be scheduled on each SM.


, , , , , , , ,

Leave a comment

Proposition of the Independent Recovery Protocols

There was a lot of effort invested into research of ideal independent recovery protocol.
The results are mainly negative.

Prop. 1
Independent recovery protocols exist only for single-site failures. There exists no independent recovery
protocol which is resilient to multiple-site failures.

Prop 2.
There exists no nonblocking protocol that is resilient to a network partition if messages are lost when the
partition occurs.

Prop 3.
There exist nonblocking protocols which are resilient to a single network partition if all undeliverable messages
are sent back to the sender.

Prop. 4
There exists no nonblocking protocol which is resilient to a multiple partition.

Thus it exists no a general solution of this problem.
Practical solutions: the largest partition terminates the transaction to be not blocked.

Problem: which partition is the largest?

Primary site approach and the majority approach

There are different methods to decide which partition is largest to terminate the group level transaction.

Primary site approach:

A site is designated as primary site, and the partition containing this primary site is allowed to terminate
the transaction. It is usual to denote the role of primary site to the coordinator. In this case all
transactions within this partition are terminated correctly.

If the primary site differs from the coordinator site, then a 3PC termination protocol should be used to
terminate all transaction of the group with the primary site.

Majority approach

Only the group containing the majority of sites can terminate the transaction. The sites in the groups may
vote for aborting or for committing. The majority of sites must agree on the abort or commit before the
transaction terminates.

, , , , , , , ,

Leave a comment

Coding in Feature Driven Development

Coding process in FDD is not as exciting and challenging as it is in XP (eXtreme Programming). This
happens because by the coding time the features have been extensively discussed during
Process One, iteration kick-off meeting, design review meeting. Classes and methods are

defined by now, their purpose is described in code documentation. Coding often becomes
a mechanical process.

Unlike XP FDD strongly discourages refactoring. The main argument against
refactoring here is that it takes time and does not bring any value to the customer. The
quality of code is addressed during code review meetings.

FDD encourages strong code ownership. The main idea is that every developer
knows the owned code and better realizes the consequence of changes. FDD fights the
problem of leaving team members from the different angle:

– Sufficient code documentation simplifies understanding somebody else’s code.
– Developers know what other people’s code does, since they reviewed the design.
– Developers will look at each other’s code during code review.

, , , , , , , ,

Leave a comment

5 Reasons Why People Spam Your Blog

No aspect of the World Wide Web is immune to spam – not even the blogosphere. No matter how strong your anti spam server is you may get hit every once and a while. Of course, the type of spam seen on personal blogs is different from the normal spam that you might be used to in the fact that instead of receiving these messages in your private inbox, they are being displayed on your blog for the entire world to see. Furthermore, the professional spammers who distribute unsolicited commercial e-mail for a living have different reasons for spamming a personal online blog versus sending unwanted junk mail into somebody’s inbox. So a bloggers need a good anti spam solutions in order to protect their blog.

#1: To advertise a website, product, or service. Perhaps the most generic reason for spamming a blog is for advertisement purposes. Through a blog it is easy to reach thousands of people every single day; this holds true for the owner of the blog as much as the ones who are spamming it.

#2: Get back links to their site. Many spammers simply leave a comment with nothing more than their website address, hoping to get as many clicks as possible.

#3: It is cheap when compared to other methods of spam. Even in the world of spam marketing, it takes money to make money – unless you’re spamming blogs, of course.

#4: The process can easily be automated to save time. Unlike some of the other spamming techniques, the entire process of spamming a blog can be automated.

#5: To collect e-mail addresses. Many times a user’s e-mail-address will be listed in their online profile, or even right alongside their post. Spammers collect these addresses in order to send them unsolicited commercial e-mail at a later time.

, , , , , , , ,

Leave a comment

Dynamic Coordination of Information Management Services for Processing Dynamic Web Content

Dynamic Web content provides us with time-sensitive and continuously changing data. To glean up-to-date information, users need to regularly browse, collect and analyze this Web content. Without proper tool support this information management task is tedious, time-consuming and error prone, especially when the quantity of the dynamic Web content is large, when many information management services are needed to analyze it, and when underlying services/network are not completely reliable. This describes a multi-level, lifecycle (design-time and run-time) coordination mechanism that enables rapid, efficient development and execution of information management applications that are especially useful for processing dynamic Web content.  Such a coordination mechanism brings dynamism to co-ordinating independent, distributed information management services. Dynamic parallelism spawns/merges multiple execution service branches based on available data, and dynamic run-time reconfiguration coordinates service execution to overcome faulty services and bottlenecks. These features enable information management applications to be more efficient in handling content and format changes in Web resources, and enable the applications to be evolved and adapted to process dynamic Web content.

The coverage of individual Web sites that provide such dynamic content is often incomplete, since individually they are limited by time and resource constraints. To obtain a complete picture about time-sensitive or wide-range topics, people tend to access multiple Web sites. For example, during the terror attacks, since it was such a time-critical situation, no single news site could provide complete information to understand what was happening. People needed to browse different news sites to access different coverage and different opinions, then compile and assemble the information together to understand the full situation. In addition, to understand different reactions from different parts of the world, news sources from various countries needed to be accessed.If a Web site is unresponsive due to congestion, people tend to switch to another Web site and come back later. People exhibit other forms of dynamism as well. For example, they will select a different set of information sources based on their topic area and geographic region of interest, and they will mentally filter and analyze the news articles based on the articles’ content, structure and format.

Any information management tool that supports this process of gleaning information for dynamic Web content should help alleviate the tedious and repetitive aspects, but should be flexible enough to allow users to incorporate the dynamic aspects of information analysis. This  describes a dynamic service coordination mechanism that brings dynamism in information management systems for processing dynamic Web content. This coordination mechanism allows users to incrementally develop information management applications on different abstraction levels through the design/runtime lifecycle, which is essential for processing dynamic Web contents efficiently and correctly. This mechanism has been adapted by USC ISI’s GeoWorlds system, and has been proven that it is practically effective on developing information management applications for processing dynamic Web content.

The characteristics of the class of information management :

 1. The information is time-sensitive and continuously changing
 2. The information needs to be joined together from multiple sources
 3. Multiple complex analysis steps are needed to jointly process the information
 4. The analysis steps need to be reconfigured to adapt to specific tasks
5.  The tasks are repetitive. They need to be performed periodically

, , , , , , , ,

Leave a comment

Polygon-Assisted JPEG and MPEG Compression of Synthetic Images

In realtime image compression and decompression hardware make it possible for a high-performance
graphics engine to operate as a rendering server in a networked environment. If the client is a
low-end workstation or set-top box, then the rendering task can be split across the two devices.
we explore one strategy for doing this. For each frame, the server generates a high-quality
rendering and a low-quality rendering, subtracts the two, and sends the difference in compressed
form. The client generates a matching low quality rendering, adds the decompressed difference image,
and displays the composite. Within this paradigm, there is wide latitude to choose what constitutes
a high-quality versus low-quality rendering. We have experimented with textured versus untextured
surfaces, fine versus coarse tessellation of curved surfaces, Phong versus Gouraud interpolated
shading, and antialiased versus nonantialiased edges. In all cases, our polygon-assisted compression
looks subjectively better for a fixed network bandwidth than compressing and sending the high-quality
rendering. We describe a software simulation that uses JPEG and MPEG-1 compression, and we show results
for a variety of scenes.

we consider an alternative solution that partitions the rendering task between client and server. We use
the server to render those features that cannot be rendered in real time on the client – typically
textures and complex shading. These are compressed using JPEG or MPEG and sent to the client. We use the
client to render those features that compress poorly using JPEG or MPEG – typically edges and smooth
shading. The two renderings are combined in the client for display on its screen. The resulting image is
subjectively better for the same bandwidth than can be obtained using JPEG or MPEG alone. Alternatively,
we can produce an image of comparable quality using less bandwidth.

Client-server relationship

The hardware consists of a high-performance workstation (henceforth called the server), a low-performance
workstation (henceforth called the client), and a network. To produce each frame of synthetic imagery,
these two machines perform the following three steps:

(1) On the server, compute a high-quality and low-quality rendering of the scene using one of the
partitioning strategies described.

(2) Subtract the two renderings, apply lossy compression to the difference image, and send it to the client.

(3) On the client, decompress the difference image, compute a low-quality rendering that matches the
low-quality rendering computed on the server, add the two images, and display the resulting composite image.

Depending on the partitioning strategy, there may be two geometric models describing the scene or one model
with two rendering options. The low-quality model may reside on both machines, or it may be transmitted from
server to client (or client to server) for each frame. If the model resides on both machines, this can be
implemented using display lists or two cooperating copies of the application program. The latter solution is
commonly used in networked visual simulation applications. To provide interactive performance, the server in
such a system would normally be a graphics workstation with hardware accelerated rendering. The client might
be a lower-end hardware-accelerated workstation, or it might be a PC performing rendering in software, or it
might be a set-top box utilizing a combination of software and hardware. Differencing and compression on the
server, and decompression and addition on the client, would most likely be performed in hardware, although
real-time software implementations are also beginning to appear. One important caveat regarding the selection
of client and server is that there are often slight differences in pixel values between equivalent-quality
renderings computed by highperformance and low-performance machines, even if manufactured by the same
vendor. If both renderings are antialiased, these differences are likely to be small.

, , , , , , , ,

Leave a comment

Streaming Tetrahedral Volume Meshes

In a streaming mesh format, tetrahedral and the vertices they reference are stored in an interleaved fashion.

This makes it possible to start operating on the data immediately without having to first load all the vertices, as is common practice with standard indexed formats. Furthermore, streaming formats provide explicit information about when vertices are referenced for the last time. This makes it possible to complete operations on these vertices and free the corresponding data structures for immediate reuse. The width of a streaming mesh is the maximal number of vertices that need to be in memory simultaneously. Those are vertices that have already streamed in but have not been finalized yet. The width is the lower bound for the amount of memory needed for processing a streaming mesh since any mesh processing application has to store at least that many vertices simply to dereference the mesh.

The streaming approach to compression relies on the input meshes either being stored or produced in a streaming manner.

The set of example volume meshes that we use to test our compressor, however, does not fulfill these expectations at all.

Not only arethese tetrahedral meshes distributed in conventional, non-streaming formats, they also come with absolutely “un-streamable” element orders, as illustrated by the layout diagrams. The horizontal axis represents the tetrahedral (in the order they occur in the file), and the vertical axis represents the vertices (also echoing their order in the file. The few unclassified data sets that are currently used by the visualization community for performance measurements were created several years ago. Back then, the difficulty of using random access in-core algorithms for producing larger and larger meshes were overcome simply by employing sufficiently powerful computer equipment. But only when there is enough main memory to hold the entire mesh is it possible to output meshes whose vertices and tetrahedral are ordered so “randomly” in the file.

In the near future we anticipate a new generation of meshing algorithms that produces and outputs volume mesh

data in a more coherent fashion. This is a necessity if algorithms are to scale to increasingly large data sets. An algorithm for tetrahedral mesh refinement, for example, could be designed to sweep over the data set and restrict its operation at any time to the currently active set until it achieves the desired element quality. For a mesh generation algorithm operating in this manner, it is natural to output reasonably coherent meshes in a streaming manner. To stream legacy data stored in non-streaming formats or with highly incoherent layouts  describe several conversion strategies.

, , , , , , , ,

Leave a comment

%d bloggers like this: