Posts Tagged XML

Using XML Signature

A digital signature is a cryptographic value that enables a recipient to verify the source and validity of
an incoming message. XML Signature defines an XML syntax for digital signatures.

When you enable SOAP header processing for a particular virtual service, the ACE XML Gateway
validates XML signatures in incoming messages received at the interface defined by the object. If a
signature does not match the element that is signed, the message is rejected.

Signature validity may not alone ensure message integrity—the signature could have been generated
using any certificate, including one issued by an untrusted source. If using XML Signature as part of
your implementation strategy, you should also specify which Certificate Authorities you want to be
trusted, and direct the ACE XML Gateway to accept only signatures generated with certificates issued
by those trusted CA.

Enabling header processing causes signatures to be validated if present in an incoming message (and
causes messages with invalid signatures to be blocked), but it doesn’t require a message to have a
signature.

The final step in configuring XML Signature, therefore, is specifying the elements of the incoming
message that must be signed. In the policy configuration, you can require a signature covering one or
more of:

  • the message timestamp (a common practice in Web service implementations).
  • the first element below the SOAP body.
  • a particular element you specify by XPath. Each XPath expression you specify must resolve to a
    signed XML element whose signature must be valid for the ACE XML Gateway to accept the
    message.

, , , , , , , ,

Leave a Comment

MySQL’s Core Feature Set for Data Warehousing

MySQL contains a solid core feature set that is suitable for all data warehousing use cases. The following
are just some of the features in the MySQL database server that help enable data warehousing:

  • Data/Index partitioning – available in MySQL 5.1 and higher; supports range, hash, key, list, and
    composite partitioning. Partition “pruning” is available, which involves MySQL only examining the
    partitions it needs to satisfy a particular query instead of an entire table or index. Partition
    management is also supported (ADD PARTITIION, DROP PARTITION, etc.)
  • No practical storage limits – for example, 1 tablespace=110TB limit
  • Automatic storage management – autogrowth data files, etc.
  • ANSI-SQL support for all datatypes – including BLOB and XML
  • Built-in Replication – simple and easy to configure
  • Main memory tables – keeps all data in-resident in RAM; perfect for dimension tables
  • Support for a variety of indexes – B-tree, fulltext, clustered, hash, GIS
  • Multiple-configurable data/index caches
  • Pre-loading of index data into index caches
  • Unique query cache – caches result set + query, not just data and therefore provides near
    instantaneous response times for repetitive queries like those used in data warehousing
  • Parallel data load – loads multiple files at the same time
  • Multi-insert DML – allows array-style processing via normal INSERT commands
  • Data compression – provides enormous storage savings
  • Read-only tables – protects sensitive data
  • Encryption – further protection for sensitive data
  • Cost-based optimizer – eliminates need for rule-based query writing
  • Wide platform support – no need for special hardware or operating systems

, , , , , , , , , , , , , , , , , , , , , ,

Leave a Comment

C14N Denial of Service

Attack surface: Canonicalization

Attack impact: Denial of service

Description: C14N can be an expensive operation, requiring complex processing (Boyer ‟01), including entity expansion and normalization of whitespace, namespace declarations, and coalescing of adjacent text and CDATA nodes. This requires building a DOM and performing memory- and processor-intensive operations.

Exploit scenario: Attacker replaces the SignedInfo or XML content identified by a Reference with a very large set of XML data containing many namespace declarations, redundant adjacent text nodes, etc., leading to a denial of service condition.

Mitigation: Limit the total size of XML submitted for canonicalization.

Applies to XML Encryption? No

, , , , , , , , , , , ,

Leave a Comment

The XML Data Type

At the heart of DB2’s native XML support is the XML data type. XML is now a first-class data type
in DB2, just like any other SQL type. The XML data type can be used in a “create table” statement
to define one or more columns of type XML (Figure 1). Since XML has no different status than any
other types, tables can contain any combination of XML columns and relational columns. An XML-only
application may define tables that contain XML columns only. A column of type XML can hold one
well-formed XML document for every row of the table. The NULL value is used to indicate the
absence of an XML document. Though every XML document is logically associated with a row of a
table, XML and relational columns are stored differently. Relational and XML data are stored in
different formats that match their respective data models. The relational columns are stored in
traditional row structures while the XML data is stored in hierarchical structures. The two are
closely linked for efficient cross-access.

An XML schema is not required in order to define an XML column or to insert or query XML data. An
XML column can hold schema-less documents as well as documents for many different or evolving XML
schemas. Schema validation is optional on a per-document basis. Thus, the association between
schemas and documents is per document and not per column, which provides maximum flexibility.
Unlike a Varchar or a CLOB type, the XML type has no length associated with it. The XML storage
and processing architecture imposes no limit on the size of an XML document. Currently, only the
client-server communication protocol limits XML bind-in and bind-out to 2GB per document. With
very few exceptions, this is acceptable for all XML applications.

Figure 1: Table with a column of type “XML”

Values of type XML are processed in an internal representation that is not a string and not
directly comparable to strings. The XMLSERIALIZE function can be used to convert an XML value into
a string value which represents the same XML document. Similarly, the XMLPARSE function can be
used to convert a string value which represents an XML document into the corresponding XML value.

The XML type can be used not only as a column type but also as a data type for host variables in
languages such as C, Java, and COBOL. The XML type is also allowed for parameters and variables in
SQL stored procedures, user-defined functions (UDFs), and external stored procedures written in C
and Java. This is important for flexible application development.

, , , , , , , , , , , , , , ,

Leave a Comment

A Comparison between Subtree Encryption and Server side Access Control

SUBTREE ENCRYPTION

Subtree encryption (element wise) is a good and straight-forward solution for XML Encryption
and it will fit into most situations. The encrypted entity can be transferred to
the client without a need for an additional encryption on the transport layer (like SSL).
The XML entities can be stored encrypted on the (potentially insecure and vulnerable)
web server. The decisions about access rights to different portions of the document can
be made by the document creator and be immediately applied to the XML document.
Encryption has to be applied to each document individually, but in analogy to extensible
stylesheet transformations (XLST), it should be possible to apply an “encryption
policy stylesheet” to a XML document which allows an automatic encryption process
based on a defined policy.

SERVER-SIDE ACCESS CONTROL

In contrast to this model, server-side access control has much more flexibilty in the resulting
document, because the confidentiality transformation is not constrained to
complete subtrees. The pruning of sensitive or classified information prevents the requesting
client from accessing this information, but during the transfer to the client,
there is a need for an additional encryption on transport layer (like SSL). The access
control processor needs to be secure and trustworthy, because this centralised element
has access to the complete information base. A disadvantage is the need to make
AC decisions online.

The access rights for a specific document have to be added to the ACL (access control
list) database. An advantage of this model is the ability of applying a specific ACL to a
large class of documents (based on DTD/Schema).

Table 1: Comparison between the existing models (disadvantages are marked grey)

It could be nice to get the best from subtree encryption and server-side AC:

  1. allow unencrypted (visible) content within an encrypted subtree
  2. does not need a trustworthy online access control processor (only encryption, no online
    transformations)
  3. no need for additional encryption

, , , , , , , , , , , , ,

Leave a Comment

XML Full Text Indexes

Full-text search is a common operation in document and content-centric XML applications. DB2’s existing
text search capabilities have been extended to work with the new XML column type. Full-text indexes
with awareness of XML document structures can be defined on any native XML column. The documents in an
XML column can be fully indexed or partially indexed, e.g. if it is known in advance that only a
certain part of each document will be subject to full-text search, such as a “description” or “comment”
element. Correspondingly, text search expressions can be applied to specific paths in a document.

The following statement defines a text index which fully indexes the documents in the XML column deptdoc
in our table dept in the database personneldb:

create index myIndex for text on dept (deptdoc) format xml connect to personneldb

The following query exploits this index but restricts the search to a specific element. The query
retrieves all documents where the element ‘/dept/comment’ contains the word “Brazil”:

select deptdoc from dept where
contains (deptdoc,‘sections(”/dept/comment”) “Brazil” ‘) = 1

Text search in specific parts of the documents is a critical feature for many applications. Standard
text search features are also available, such as scoring and ranking of search results as well as
thesaurus-based synonym search.

For best performance of XML insert, update, and delete operations the text index is maintained
asynchronously, i.e. not within the context of a DML transaction. However an “update index” command
is available to force synchronization of the text index.

, , , , , , ,

Leave a Comment

Lore’s XML Data Model

In Lore’s new XML-based data model, an XML element is a pair where (eid, value) eid is a unique element identifier, and value is either an atomic text string or a complex value containing the following four components:

1. A string-valued tag corresponding to the XML tag for that element.

2. An ordered list of attribute-name/atomic-value pairs, where each attribute-name is a string and each atomic-value has an atomic type drawn from integer, real, string, etc., or ID, IDREF, or IDREFS.

3. An ordered list of crosslink subelements of the form (label, eid), where label is a string. Crosslink subelements are introduced via an attribute of type IDREF or IDREFS.

4. An ordered list of normal subelements of the form (label, eid), where label is a string. Normal subelements are
introduced via lexical nesting within an XML document.

An XML document ismapped easily into our datamodel. Note that we ignore comments and whitespace
between tagged elements. As a base case, text between tags is translated into an atomic text
element; we do the same thing for CDATA sections, used in XML to escape text that might otherwise
be interpreted as markup. Otherwise, a document element is translated into a complex data element such that:

1. The tag of the data element is the tag of the document element.

2. The list of attribute-name/atomic-value pairs in the data element is derived directly from the document
element’s attribute list.

3. For each attribute value iof type IDREF in the document element, or component iof an attribute value of
type IDREFS, there is one crosslink sub-element (label, eid) in the data element, where label is the
corresponding attribute name and eid identifies the unique data element whose ID attribute value matches .

4. The subelements of the document element appear, in order, as the normal subelements of the data element.
The label for each data subelement is the tag of that document subelement, or Text if the document subelement
is atomic.

Note that multiple XML documents can be loaded into a single database, and any system of cross-document
links (e.g., XLink or XPointer) can be used provided information that uniquely identifies elements is not lost.

Figure 1:  XML document and its graph

Once one or more XML documents are mapped into our data model it is convenient to visualize the data as a
directed, labeled, ordered graph. The nodes in the graph represent the data elements and the edges represent
the element-subelement relationship. Each node representing a complex data element contains a tag and an
ordered list of attribute-name/atomic value pairs; atomic data element nodes contain string values. There are
two different types of edges in the graph: (i) normal subelement edges, labeled with the tag of the
destination subelement; (ii) crosslink edges, labeled with the attribute name that introduced the crosslink.
Note that the graph representation is isomorphic to the data model, so they can be discussed interchangeably.

It is useful to view the XML data in one of two modes: semantic or literal. Semantic mode is used when the
user or application wishes to view the database as an interconnected graph. The graph representing the
semantic mode omits attributes of type IDREF and IDREFS, and the distinction between sub element and
crosslink edges is gone. Literal mode is available when the user wishes to view the database as an XML
document. IDREF and IDREFS attributes are visible as textual strings, while crosslink edges are invisible.
In literal mode, the database is always a tree.

Figure 1 shows a small sample XML document and the graph representation in our datamodel. Element
identifiers (eids) appear within nodes and are written as &1, &2, etc. Attribute-name/atomic-value pairs
are shown next to the associated nodes (surrounded by {}), with IDREF attributes in italics. Subelement
edges are solid and crosslink edges are dashed. The ordering of subelements is left-to-right. We have
not shown the tag associated with each element since it is straight forward to deduce for this simple
database. (For example, node &3 has the tag Member and not Advisor.) In semantic mode, the database in
Figure 1 does not include the (italicized) IDREF attributes. In literal mode, the (dashed) crosslinks are
not included. Note that there is some structural heterogeneity in the data even though the sample data
was kept purposefully small.

, , , , , , , , , , , ,

Leave a Comment

System Architecture and XDP Interfaces

XTC database engine (XTC server) adheres to the widely used five-layer DBMS architectureIn Figure 1, we concentrate on the representation and mapping of XML documents. The file-services layer operates on the bit pattern stored on external, non-volatile storagedevices. In collaboration with the OS file system, the i/o managers store the physicaldata into extensible container files; their uniform block length is configurable to thecharacteristics of the XML documents to be stored. A buffer manager per container filehandles fixing and unfixing of pages in main memory and provides a replacement algorithmfor them which can be optimized to the anticipated reference locality inherent inthe respective XDP applications. Using pages as basic storage units, the record, index,and catalog managers form the access services. The record manager maintains in a setof pages the tree-connected nodes of XML documents as physically adjacent records.Each record is addressed by a unique life-time ID managed within a B-tree by the indexmanager. This is essential to allow for fine-grained concurrency control which requireslock acquisition on unique identifiable nodes. The catalog manager provides for the database metadata. The node manager implementing the navigational access layer transforms the records from their internal physical into an external representation, thereby managing the lock acquisition to isolate the concurrent transactions.The XML-services layer contains the XML manager responsible for declarative document access, e. g., evaluation of XPath queries or XSLT transformations.

Figure 1 XTC architecture overview

The agents of the interface layer make the functionality of the XML and node services
available to common internet browsers, ftp clients, and the XTC driver thereby
achieving declarative / set-oriented as well as navigational / node-oriented
interfaces. The XTCdriver linked to client-side applications provides for methods to
execute XPath-like queries and to manipulate documents via the SAX or DOM API. Each
API accesses the stored documents within a transaction to be started by the XTC
driver. Transactions can be processed in the well-known isolation levels
uncommitted, committed, repeatable, and serializable

, , , , , , , , , , ,

Leave a Comment

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: