Archive for June, 2012
Responsibility, Resolution, and Residue of Computer Ethics
Posted by protogenist in Technology Research on June 28, 2012
There are many levels of relativity in value judgments. Some of our values
are relative to our being human. If we were angels or creatures from another
dimension, our core values might be different. And then, of course, different
cultures articulate the core human values differently. And different individuals
within a culture may differ in their assessments of values. Indeed, some
values of one individual may change over time. I have been arguing that
such relativity is compatible with rational discussion of ethical issues and
resolution of at least some ethical disputes. We are, after all, human beings,
not angels or creatures from another dimension. We share core values. This
provides us with a set of standards with which to assess policies even in
situations in which no previous policies exist and with which to assess other
value frameworks when disagreements occur.
Ethical responsibility begins by taking the ethical point of view. We must
respect others and their core values. If we can avoid policies that result in
significant harm to others, that would be a good beginning toward responsible
ethical conduct. Some policies are so obviously harmful that they are
readily rejected by our core-value standards. Selling computer software
which is known to malfunction in a way which is likely to result in death is
an obvious example. Other policies easily meet our standards. Building computer
interfaces which facilitate use by the disabled is a clear example. And
of course, some policies for managing computer technology will be disputed.
However, as I have been emphasizing, some of the ethical policies under
dispute may be subject to further rational discussion and resolution. The
major resolution technique, which I have been emphasizing, is the empirical
investigation of the actual consequences of proposed policies.For instance,
some people might propose a limitation on free speech on the Internet on
the grounds that such freedom would lead to an unstable society or to severe
psychological damage of some citizens. Advocates of free speech might
appeal to its usefulness in transmitting knowledge and its effectiveness
in calling attention to the flaws of government. To some extent these are
empirical claims that can be confirmed or disconfirmed, which in turn
may suggest compromises and modifications of policies.
Another resolution technique is to assume an impartial position when
evaluating policies. Imagine yourself as an outsider not being benefited or
harmed by a policy. Is it a fair policy? Is it a policy which you would advocate
if you were suddenly placed in a position in which you were affected by the
policy? It may be tempting to be the seller of defective software, but nobody
wants to be a buyer of defective software. And finally, analogies are sometimes
useful in resolving disagreements. If a computing professional would
not approve of her stockbroker’s with holding information from her about
the volatility of stock she is considering buying, it would seem by analogy
she should share information with a client about the instability of a computer
program which the client is considering purchasing.
All of these techniques for resolution can help form a consensus about
acceptable policies. But when the resolution techniques have gone as far as
they can, some residue of disagreement may remain. Even in these situations
alternative policies may be available which all parties can accept. But,
a residue of ethical difference is not to be feared. Disputes occur in every
human endeavor and yet progress is made. Computer ethics is no different
in this regard. The chief threat to computer ethics is not the possibility that
a residue of disagreements about which policies are best will remain after
debates on the issues are completed, but a failure to debate the ethical issues
of computing technology at all. If we naively regard the issues of computer
ethics as routine or, even worse, as unsolvable, then we are in the greatest
danger of being harmed by computer technology. Responsibility requires
us to adopt the ethical point of view and to engage in ongoing conceptual
analysis and policy formulation and justification with regard to this ever
evolving technology. Because the computer revolution now engulfs the
entire world, it is crucial that the issues of computer ethics be addressed on
a global level. The global village needs to conduct a global conversation
about the social and ethical impact of computing and what should be
done about it. Fortunately, computing may help us to conduct exactly that
conversation.
C14N Denial of Service
Posted by protogenist in Technology Research on June 26, 2012
Attack surface: Canonicalization
Attack impact: Denial of service
Description: C14N can be an expensive operation, requiring complex processing (Boyer ‟01), including entity expansion and normalization of whitespace, namespace declarations, and coalescing of adjacent text and CDATA nodes. This requires building a DOM and performing memory- and processor-intensive operations.
Exploit scenario: Attacker replaces the SignedInfo or XML content identified by a Reference with a very large set of XML data containing many namespace declarations, redundant adjacent text nodes, etc., leading to a denial of service condition.
Mitigation: Limit the total size of XML submitted for canonicalization.
Applies to XML Encryption? No
Declustering
Posted by protogenist in Technology Research on June 21, 2012
Declustering a relation involves distributing the tuples of a relation among two or more disk
drives according to some distribution criteria such as applying a hash function to the key attribute
of each tuple. Declustering has its origins in the concept of horizontal partitioning initially
developed as a distribution mechanism for distributed DBMS [RIES78] (see Figure 1). One of the
key reasons for using declustering in a parallel database systems is to enable the DBMS software to
exploit the I/O bandwidth reading and writing multiple disks in parallel. By declustering the tuples
of a relation the task of parallelizing a scan operator becomes trivial. All that is required is to start a
copy of the operator on each processor or disk containing relevant tuples, and to merge their
outputs at the destination. For operations requiring a sequential scan of an entire relation, this
approach can provide the same I/O bandwidth as a RAID-style system [SALE84], [PATI88]
without needing any specialized hardware.
While tuples can simply be declustered in a round-robin fashion, more interesting
alternatives exist. One is to apply a hashing function to the key attribute of each tuple to distribute
the tuples among the disks. This distribution mechanism allows exact match selection operations
on the partitioning attribute to be directed to a single disk, avoiding the overhead of starting such
queries on multiple disks. On the other hand, range queries on the partitioning attribute, must be
sent to all disks over which a relation has been declustered. A hash declustering mechanism is
provided by Arbre, Bubba, Gamma, and Teradata.
Figure 1: The three basic declustering schemes: range declustering maps contiguous fragments of a
table to various disks. Round-Robin declustering maps the i’th record to disk i mod n. Hashed
declustering, maps each record to a disk location based on some hash function. Each of these schemes
spreads data among a collection of disks, allowing parallel disk access and parallel processing.
An alternative declustering strategy is to associate a distinct range of partitioning attribute
values with each disk by, for example, dividing the range of possible values into N units, one for
each of the N processors in the system range partitioning. The advantage of range declustering is
that it can isolate the execution of both range and exact match-selection operations to the minimal
number of processors. Another advantage of range partitioning is that it can be used to deal with
non-uniformly distributed partitioning attribute values. A range-partitioning mechanism is provided
by Arbre, Bubba, Gamma, and Tandem.
While declustering is a simple concept that is easy to implement, it raises a number of new
physical database design issues. In addition to selecting a declustering strategy for each relation,
the number of disks over which a relation should be declustered must also be decided. While
Gamma declusters all relations across all disk drives (primarily to simplify the implementation), the
Bubba, Tandem, and Teradata systems allow a subset of the disks to be used. In general,
increasing the degree of declustering reduces the response time for an individual query and
(generally) increases the overall throughput of the system. For sequential scan queries, the
response time decreases because more processors and disks are used to execute the query. For
indexed selections on the partitioning attribute, the response time improves because fewer tuples
are stored at each node and hence the size of the index that must be searched decreases. However,
there is a point beyond which further declustering actually increases the response time of a query.
This point occurs when the cost of starting a query on a node becomes a significant fraction of the
actual execution time [COPE88, DEWI88, GHAN90a]. In general, full declustering is not always
a good idea, especially in a very large configurations. Bubba [COPE88] refines the concept of
range-partitioning by considering the heat of its tuples when declustering a relation; the goal
being to balance the frequency with which each disk is accessed rather than the actual number of
tuples on each disk. In [COPE88] the effect of the degree of the declustering on the multiuser
throughput of Bubba is studied. In [GHAN90a] the impact of the alternative partitioning strategies
on the multiuser throughput of Gamma is evaluated.
The XML Data Type
Posted by protogenist in Application Development on June 12, 2012
At the heart of DB2’s native XML support is the XML data type. XML is now a first-class data type
in DB2, just like any other SQL type. The XML data type can be used in a “create table” statement
to define one or more columns of type XML (Figure 1). Since XML has no different status than any
other types, tables can contain any combination of XML columns and relational columns. An XML-only
application may define tables that contain XML columns only. A column of type XML can hold one
well-formed XML document for every row of the table. The NULL value is used to indicate the
absence of an XML document. Though every XML document is logically associated with a row of a
table, XML and relational columns are stored differently. Relational and XML data are stored in
different formats that match their respective data models. The relational columns are stored in
traditional row structures while the XML data is stored in hierarchical structures. The two are
closely linked for efficient cross-access.
An XML schema is not required in order to define an XML column or to insert or query XML data. An
XML column can hold schema-less documents as well as documents for many different or evolving XML
schemas. Schema validation is optional on a per-document basis. Thus, the association between
schemas and documents is per document and not per column, which provides maximum flexibility.
Unlike a Varchar or a CLOB type, the XML type has no length associated with it. The XML storage
and processing architecture imposes no limit on the size of an XML document. Currently, only the
client-server communication protocol limits XML bind-in and bind-out to 2GB per document. With
very few exceptions, this is acceptable for all XML applications.
Figure 1: Table with a column of type “XML”
Values of type XML are processed in an internal representation that is not a string and not
directly comparable to strings. The XMLSERIALIZE function can be used to convert an XML value into
a string value which represents the same XML document. Similarly, the XMLPARSE function can be
used to convert a string value which represents an XML document into the corresponding XML value.
The XML type can be used not only as a column type but also as a data type for host variables in
languages such as C, Java, and COBOL. The XML type is also allowed for parameters and variables in
SQL stored procedures, user-defined functions (UDFs), and external stored procedures written in C
and Java. This is important for flexible application development.


