Almarimi Abdelsalam, Pokorný Jaroslav
Schema Management for Data Integration: A Short Survey.In: Acta Polytechnica, Vol. 45, No. 1, 2005, pp. 24-27, The Czech Technical University in Prague, 2005.
Schema management is a basic problem in many database application domains such as data integration systems. Users need to access and manipulate data from several databases. In this context, in order to integrate data from distributed heterogeneous database sources, data integration systems demand the resolution of several issues that arise in managing schemas. In this paper, we present a brief survey of the problem of schema matching which is used for solving problems of schema integration processing. Moreover, we propose a technique for integrating and querying distributed heterogeneous XML schemas.
Almarimi Abdelsalam, Pokorný Jaroslav
A Mediation Layer For Heterogeneous XML SchemasIn: Int. Journal of Web Information Systems, Vol. 1, No. 1, March 2005, pp.25-32, Troubador Publishing LTD.
This paper describes an approach for mediation of heterogeneous XML schemas. Such an approach is proposed as a tool for XML data integration system. A global XML schema is specified by the designer to provide a homogeneous view over heterogeneous XML data. An XML mediation layer is introduced to manage: (1) establishing appropriate mappings between the global schema and the schemas of the sources; (2) querying XML data sources in terms of the global schema. The XML data sources are described by XML Schema language. The former task is performed through a semi-automatic process that generates local and global paths. A tree structure for each XML schema is constructed and represented by a simple form. This is in turn used for assigning indices manually to match local paths to corresponding global paths. By gathering all paths with the same indices, the equivalent local and global paths are grouped automatically, and an XML Metadata Document is constructed. An XML Query Translator for the latter task is described to translate a global user query into local queries by using the mappings that are defined in the XML Metadata Document.
Skopal Tomáš
On Fast Non-Metric Similarity Search by Metric Access MethodsAccepted to: 10th International Conference on Extending Database Technology
EDBT 2006, 26.3.-30.3.2006, Munich,
GermanyThe retrieval of objects from a multimedia database employs a measure which defines a similarity score for every pair of objects. The measure should effectively follow the nature of similarity, hence, it should not be limited by the triangular inequality, regarded as a restriction in similarity modeling. On the other hand, the retrieval should be as efficient (or fast) as possible. The measure is thus often restricted to a metric, because then the search can be handled by metric access methods (MAMs). In this paper we propose a general method of non-metric search by MAMs. We show the triangular inequality can be enforced for any semimetric (reflexive, non-negative and symmetric measure), resulting in a metric that preserves the original similarity orderings (retrieval effectiveness). We propose the TriGen algorithm for turning any blackbox semimetric into (approximated) metric, just by use of distance distribution in a fraction of the database. The algorithm finds such a metric for which the retrieval efficiency is maximized, considering any MAM.
Obdržálek David, Kulhánek Jiří
Generating and handling of differential data in DataPile-oriented systems Accepted to: The IASTED International Conference on Internet and Multimedia Systems and Applications (
EuroIMSA 2006), 13.2.-15.2.2006, Innsbruck,
AustriaBasics of the DataPile structure for data handling systems have been theoretically designed and published. During implementation of such system, numerous problems which were not addressed during the theoretical design phase arose. In a real production environment, the applications connected to the DataPile core need special treatment and set important requirements on the data synchronization process. This article concerns with generating of differential data being distributed from the central DataPile storage to individual applications. It is shown that the synchronization part of DataPile-structured system can be implemented and run efficiently despite of the restrictions or limitations these individual applications impose.
Snášel Václav, Moravec Pavel, and Pokorný Jaroslav
Using BFA with wordnet ontology based model for web retrievalIn: Proceedings of the First IEEE International Conference on Signal-Image Technology & Internet-Based Systems (
SITIS'05), 27.11.-1.12.2005, Yaoundé,
Cameroon, pp. 254-259.
In the area of information retrieval, the dimension of document vectors plays an important role. We may need to find a few words or concepts, which characterize the document based on its contents, to overcome the problem of the "curse of dimensionality", which makes indexing of highdimensional data problematic. To do so, we earlier proposed a Wordnet and Wordnet+LSI (Latent Semantic Indexing) based model for dimension reduction. While LSI works on the whole collection, another procedure of feature extraction (and thus dimension reduction) exists, using binary factorization. The procedure is based on the search of attractors in Hopfield-like associative memory. Separation of true attractors (factors) and spurious ones is based on calculation of their Lyapunov function. Being applied to textual data the procedure conducted well and even more it showed sensitivity to the context in which the words were used. In this paper, we suggest that the binary factorization may benefit from the Wordnet filtration.
Bednarek David, Obdrzalek David, Yaghob Jakub, Zavoral Filip
Data Integration Using DataPile Structure.In: Proceedings of the 9th East-European Conference on Advances in Databases and Information Systems (
ADBIS 2005), 12.9.-15.9.2005, Tallin,
Estonia, 2005. pp. 178-188
One of the areas of data integration covers systems that maintain co-herence among a heterogeneous set of databases. Such a system repeatedly col-lects data from the local databases, synchronizes them, and pushes the updates back. One of the key problems in this architecture is the conflict resolution. When data in a less relevant data source changes, it should not cause any data change in a store with higher relevancy. To meet such requirements, we propose a DataPile structure with following main advantages: effective storage of historical versions of data, straightfor-ward adaptation to global schema changes, separation of data conversion and replication logic, simple implementation of data relevance. Key usage of such mechanisms is in projects with following traits or require-ments: integration of heterogeneous data from sources with different reliability, data coherence of databases whose schema differs, data changes are performed on local databases and minimal load on the central database.
Pokorný Jaroslav
Database architectures: current trends and their relationships to environmental data management.In: Proceedings of the 19th Conference
EnviroInfo (Informatics for Environmental Protection, Networking Environmental Information), Masaryk University in Brno, 7.9 - 9.9., 2005, pp. 24-28
Ever increasing environmental demands from customers, authorities and governmental organizations as well as new business control functions are integrated to environmental management systems (EMSs). With a production of huge data sets and their processing in real-time applications, the needs for environmental data management have grown significantly. Current trends in database development and an associated research meet these challenges. The paper discusses recent advances in database technologies and attempts to highlight them with respect to requirements of EMSs.
Vojtáš Peter
Fuzzy Logic as an Optimization Task
In: Fuzzy Logic and Technology. (Ed.: Montseny E., Sobrevilla P.) - Barcelona, 2005, pp. 781-786 (ISBN: 84-7683-872-3)
Held: EUSFLAT - LFA 2005. Conference of the European Society for Fuzzy Logic and Technolog /13./, Recontres Francophones sur la Logique Floue et ses Applications /11./, 7.9.-9.9.2005, Barcelona, Spain.
Pokorný Jaroslav, Reschke J.
Exporting relational data into a native XML store.Presented at: ISD 2005 - International Conference on Information Systems Development, 14.8.-17.8.2005
Chapter in the book: Advances in Information Systems Development - Bridging the Gap between Academia and Industry. Vol. 2, Edited by A.G. Nilsson et al, Springer Verlag, 2006, pp. 807-818 (ISBN 0-387-30834-2)
Pokorný Jaroslav
Směrem k Sémantickému Webu.In: Sborník příspěvků 20. ročníku konference
Moderní databáze, 26.5.-27.5.2005, Hotel Amber, Roudnice n. L.. KOMIX, pp. 15-24.
Současné webové vyhledavače založené na technikách vyhledávání informací v textech nejsou schopny využít sémantické znalosti uvnitř webové stránky a tedy nemohou dát uspokojující odpovědi na uživatelské dotazy. Možným řešením se zdá být tzv. Sémantický web, který koncem 90. let popsal ve své vizi Tim Berners-Lee. Ideou, stojící za Sémantickým webem, je rozšířit webové stránky značkováním, které podchytí alespoň část významu obsahu stránky. Toto sémantické značkování znamená přidání jistých metadat, která poskytují formální sémantiku obsahu webu.
Projekty Sémantického webu vycházejí z několika technologií, z nichž ty základní jsou již standardizovány nebo alespoň doporučovány. Patří sem jazyky XML, XML Schema, RDF a RDF Schema. Tyto jazyky slouží pro zápis metadat, z nichž některá se organizují v tzv. ontologiích. Další úroveň Sémantického webu využívá jazyky logiky. Základ zpracování v takto pojatém webu poskytují softwaroví agenti, tj. programy, které pracují autonomně a proaktivně.
Cílem článku je uvést do technologií podporujících vytváření Sémantického webu, ukázat jeho architekturu a zmínit některé již rozpracované projekty směřující k vytváření inteligentních webových informačních služeb, personalizovaných webových míst a sémanticky zesílených vyhledávacích strojů.
Pokorný Jaroslav
Digitální knihovny v prostředí Sémantického webu.In: Sborník z 10. ročníku semináře
AKP, 3.5.-4.5.2005 (automatizace knihovnických procesů - 10.) (eds. D. Tkačíková, B. Ramajzlová), VIC ČVUT, 2005, pp. 64-73.
Digitální knihovny (DK) přispívají k rozvoji Sémantického webu a současně mohou využívat jeho technologické prvky. Lze tak docílit kvalitnějšího řízení dat v DK a snazší integrace více DK, jakož i a zvýšení možnosti interakce s dalšími informačními zdroji. Ideou, stojící za Sémantickým webem, je rozšířit webové stránky značkováním, které podchytí alespoň část významu obsahu stránky. Toto sémantické značkování znamená přidání jistých metadat, která poskytují formální sémantiku obsahu webu. Projekty Sémantického webu vycházejí z technologií, které jsou vyvíjeny jako standardy. Patří sem jazyky XML, XML Schema, RDF a RDF Schema. Tyto jazyky slouží pro zápis metadat, z nichž část se organizuje v ontologiích. Další úroveň Sémantického webu využívá jazyky logiky. Základ zpracování v takto pojatém webu poskytují programy - softwaroví agenti. Cílem článku je uvést do technologií Sémantického webu a ukázat jejich uplatnění při vytváření DK
Skopal Tomáš, Pokorný Jaroslav, Snášel Václav
Nearest Neighbours Search using the PM-tree.In: Proceedings of The 10th International Conference on Database Systems for Advanced Applications (
DASFAA 2005), 17.4.-20.4.2005, Beijing,
China, LNCS 3453, Springer Verlag, 2005, pp. 803-815.
Snášel Václav, Moravec Pavel, Pokorný Jaroslav
WordNet Ontology Based Model for Web Retrieval.In: Proceeding of the International Workshop on Challenges in
Web Information Retrieval and Integration (WIRI) 2005, 8.4.-9.4. 2005, Tokyo, Japan, In conjunction with IEEE ICDE 2005, pp. 231-236.
It is well known that ontologies will become a key piece, as they allow making the semantics of Semantic Web content explicit. In spite of the big advantages that the Semantic Web promises, there are still several problems to solve. Those concerning ontologies include their availability, development and evolution. In the area of information retrieval, the dimension of document vectors plays an important role. Firstly, with higher index dimensions the indexing structures suffer from the "curse of dimensionality" and their efficiency rapidly decreases. Secondly, we may not use exact words when looking for a document, thus we miss some relevant documents. LSI is a numerical method, which discovers latent semantics in documents by creating concepts from existing terms. In this paper we present a basic method of mapping LSI concepts on given ontology (WordNet), used both for retrieval recall improvement and dimension reduction. We offer experimental results for this method on a subset of TREC collection, consisting of Los Angeles Times articles.
Pokorný Jaroslav, Smižanský J.
Page Content Rank: an Approach to the Web Content Mining.In: Proceedings of
IADIS - International Conference Applied Computing, Volume 1, 22.2.-25.2. 2005, Algarve,
Portugal, IADIS Press, pp. 289-296.
Methods of web data mining can be divided into several categories according to a kind of mined information and goals that particular categories set: Web structure mining (WSM), Web usage mining (WUM), and Web Content Mining (WCM). The objective of this paper is to propose a new WCM method of a page relevance ranking based on the page content exploration. The method, we call it Page Content Rank (PCR) in the paper, combines a number of heuristics that seem to be important for analysing the content of Web pages. The page importance is determined on the base of the importance of terms which the page contains. The importance of a term is specified with respect to a given query q and it is based on its statistical and linguistic features. As a source set of pages for mining we use a set of pages responded by a search engine to the query q. PCR uses a neural network as its inner classification structure. We describe an implementation of the proposed method and a comparison of its results with the other existing classification system – PageRank algorithm.