Bartoň Stanislav, Zezula Pavel
Indexing Structure for Graph-Structured Data
In: Studies in Computational Intelligence, Volume: 165, Springer Berlin/Heidelberg, Berlin, 2008, pp. 167-188.
Bartoň Stanislav, Dohnal Vlastislav, Sedmidubský Jan, Zezula Pavel
Building Self-Organized Image Retrieval Network
In: Proceedings of 6th Workshop on Large-Scale Distributed Systems for Information Retrieval (LSDS-IR '08), ACM, USA, 2008.
(in_print)
Batko Michal, Skopal Tomáš, Lokoč Jakub
New Dynamic Construction Techniques for M-tree
In: Journal of Discrete Algorithms, Elsevier, Amsterdam, The Netherlands, 2008.
(in_print)
Since its introduction in 1997, the M-tree became a respected metric access method (MAM), while remaining, together with its descendants, still the only database-friendly MAM, that is, a dynamic structure persistent in paged index. Although there have been many other MAMs developed over the last decade, most of them require either static or expensive indexing. By contrast, the dynamic M-tree construction allows us to index very large databases in subquadratic time, and simultaneously the index can be maintained up-to-date (i.e., supports arbitrary insertions/deletions). In this article we propose two new techniques improving dynamic insertions in M-tree—the forced reinsertion strategies and so-called hybrid-way leaf selection. Both of the techniques preserve logarithmic asymptotic complexity of a single insertion, while they aim to produce more compact M-tree hierarchies (which leads to faster query processing). In particular, the former technique reuses the well-known principle of forced reinsertions, where the new insertion algorithm tries to re-insert the content of an M-tree leaf that is about to split in order to avoid that split. The latter technique constitutes an efficiency-scalable selection of suitable leaf node wherein a new object has to be inserted. In the experiments we show that the proposed techniques bring a clear improvement (speeding up both indexing and query processing) and also provide a tuning tool for indexing vs. querying efficiency trade-off. Moreover, a combination of the new techniques exhibits a synergic effect resulting in the best strategy for dynamic M-tree construction proposed so far.
Batko Michal, Novák David, Falchi Fabrizio, Zezula Pavel
Scalability Comparison of Peer-to-Peer Similarity Search Structures
In: Future Generation Computer Systems, Volume: 24, No: 8, Elsevier, Amsterdam, The Netherlands, 2008, pp. 834-848.
Batko Michal, Kohoutková Petra, Zezula Pavel
Combining Metric Features in Large Collections
In: 1st International Workshop on Similarity Search and Applications (SISAP 2008), IEEE Computer Society, Los Alamitos CA, Washington, Tokyo, 2008, pp. 79-86.
Batko Michal, Falchi Fabrizio, Lucchese Claudio, Novák David, Perego Raffaele, Rabitti Fausto, Sedmidubský Jan, Zezula Pavel
Crawling, Indexing, and Similarity Searching Images on the Web
In: Proceedings of the Sixteenth Italian Symposium on Advanced Database, 2008, pp. 382-389.
Bosch Sonja, Fellbaum Christiane, Pala Karel
Derivational Relations in English, Czech and Bantu Wordnet
In: Proc. of Fourth Global WordNet Conference, University of Szeged, Department of Informatics, 2008, pp. 74-90.
Presented at: GWC 2008, 22.-25.1.2008, Szeged,
Hungary.
Dohnal Vlastislav, Gennaro Claudio, Zezula Pavel
Efficiency and Scalability Issues in Metric Access Methods
In: Computational Intelligence in Medical Informatics, Springer Verlag, Berlin, Germany, 2008.
ISBN: 978-3-540-75766-5
The metric space paradigm has recently received attention as an
important model of similarity in the area of Bioinformatics. Numerous techniques have been proposed to solve similarity (range or
nearest-neighbor) queries on collections of data from metric domains. Though important representatives are outlined, this chapter is not trying to
substitute existing comprehensive surveys. The main objective is to explain and prove by experiments that similarity searching is typically an expensive
process which does not easily scale to very large volumes of data, thus distributed architectures able to exploit parallelism must be employed.
After a review of applications using the metric space approach in the field of Bioinformatics, the chapter provides an overview of methods used for
creating index structures able to speedup retrieval. In the metric space approach, only pair-wise distances between objects are quantified, so they
represent the level of dissimilarity. The key idea of index structures is to partition the data into subsets so that queries are evaluated without
examining entire collections -- minimizing both the number of distance computations and the number of I/O accesses. These objectives are obtained
by exploiting the property of metric spaces called the triangle inequality which states that if two objects are near a third object, they cannot be too
distant to one another. Unfortunately, computational costs are still high and the linear scalability of single-computer implementations prevents from
searching in large and ever growing data files efficiently. For these reasons, we describe very recent parallel and distributed similarity search
techniques and study performance of their implementations. Specifically, Section 12.1 presents the metric space approach and its applications in the
field of Bioinformatics. Section 12.2 describes some of the most popular centralized disk-based metric indexes. Consequently, Section
12.3 concentrates on parallel and distributed access methods which can deal with data collections that for practical purposes can be arbitrary large, which
is typical for Bioinformatics workloads. An experimental evaluation of the presented distributed approaches on real-life data sets is presented in 12.4.
The chapter concludes in Section 12.5.
Dohnal Vlastislav, Sedmidubský Jan, Zezula Pavel, Novák David
Similarity Searching: Towards Bulk-loading Peer-to-Peer Networks
In: 1st International Workshop on Similarity Search and Applications (SISAP 2008), IEEE, 2008, pp. 87-94.
Presented at: SISAP 2008 - Workshop at ICDE 2008, 11.-12.04.2008, Cancun,
Mexico.
Due to the exponential growth of digital data and its complexity,
we need a technique which allows us to search such collections efficiently.
A suitable solution is based on the peer-to-peer (P2P) network paradigm and
the metric-space model of similarity. When a large volume of data is being
inserted, the P2P network must expand to new peers in order to maintain its
efficiency. Thus, many peers must be split. During a peer split, the data is
halved and one half is migrated to a new peer. In this paper, we study the
problem of peer splits and propose a specialized algorithm for speeding it
up. In particular, we use the structured P2P network called the M-Chord.
Search performance within a single peer is enhanced by the M-tree. In
experimental evaluation, we compare the proposed algorithm with several
straightforward solutions on a real network organizing 10 million images.
Our algorithm provides a significant performance boost.
Falchi Fabrizio, Gennaro Claudio, Zezula Pavel
Nearest neighbor search in metric spaces through Content-Addressable Networks
In: Information Processing and Management, Volume: 44, No: 1, Elsevier, 2008, pp. 411-429.
Horák Aleš, Pala Karel, Rambousek Adam
The Global WordNet Grid Software Design
In: Proc. of Fourth Global WordNet Conference, University of Szeged, Department of Informatics, 2008, pp. 194-199.
Presented at: GWC 2008, 22.-25.1.2008, Szeged,
Hungary.
Horák Aleš
Computer Processing of Czech Syntax and Semantics
In:
Horák Aleš, Pala Karel, Rambousek Adam
Tools for Managing Multiligual Lexical Resources
In: Proc. of International Conference Inteligent Information Systems, Polish Academy of Sciences, 2008, pp. 451-460.
Presented at: International Conference Inteligent Information Systems, , Zakopane, Poland.
Ivanova K., Heid U., Schulte im Walde S., Kilgarriff A., Pomikálek Jan
Evaluating a German Sketch Grammar: A Case Study on Noun Phrase Case
In: Proceedings of the Sixth International Language Resources and Evaluation (LREC'08), European Language Resources Association (ELRA), 2008.
Presented at: International Conference on Language Resources and Evaluation, , Marrakech, Morocco.
Nováček Vít
Automatic Knowledge Acquisition and Integration Technique: Application to Large Scale Taxonomy Extraction and Document Annotation
In: Proceedings of ICEIS 2007, Kluwer Academic Publishing, Artificial Intelligence and Decision Support Systems, London, 2008, pp. 160-172.
Enterprise Information Systems (ICEIS 2007, revised selected papers), Springer, 2008, pp. 160-172.
Presented at: ICEIS 2007, 12.-16.6.2007, Funchal,
Madeira - Portugal.
Novák David, Batko Michal, Zezula Pavel
Content-based Image Retrieval on the Web
In: Proceedings of the Poster and Demonstration Paper Track of the 1st Future Internet Symposium (FIS 2008), CEUR Workshop Proceedings, Vienna, 2008, pp. 1-3.
Novák David, Batko Michal, Zezula Pavel
Web-scale System for Image Similarity Search: When the Dreams Are Coming True
In: Proceedings of the Sixth International Workshop on Content-Based Multimedia Indexing (CBMI 2008), IEEE, London, 2008, pp. 446-453.
Pomikálek Jan, Rychlý Pavel
Detecting Co-Derivative Documents in Large Text Collections
In: Proceedings of the Sixth International Language Resources and Evaluation (LREC'08), European Language Resources Association (ELRA), 2008, pp. 132-135.
Presented at: International Conference on Language Resources and Evaluation, , Marrakech, Morocco.
Sedmidubský Jan, Bartoň Stanislav, Dohnal Vlastislav, Zezula Pavel
Adaptive Approximate Similarity Searching through Metric Social Networks
In: 24th International Conference on Data Engineering (ICDE 2008), 2008, pp. 3.
Presented at: 24th International Conference on Data Engineering, 7.-12.4.2008, Cancún,
Mexico.
Exploiting the concepts of social networking represents a novel
approach to the approximate similarity query processing. We present a metric
social network where relations between peers, giving similar results, are
established on per-query basis. Based on the universal law of
generalization, a new query forwarding algorithm is proposed. The same
principle is used to manage query histories of individual peers with the
possibility to tune the tradeoff between the extent of the history and the
level of the query-answer approximation. All algorithms are tested on real
data and real network of computers.
Sedmidubský Jan, Bartoň Stanislav, Dohnal Vlastislav
mSN: Metric Social Network for Similarity Searching
SW prototype
This prototype implements the idea of social networking in metric
spaces. The metric social network is a peer-to-peer network in which users
can share their data without the need to send them to a centralized node.
Searching in this system is based on the notion of similarity which is
modelled using metric spaces. The architecture of the metric social network
is formally defined by using acquaintance and friendship relations. The
implementation builds on top of the MESSIF framework library.
Sedmidubský Jan, Dohnal Vlastislav, Bartoň Stanislav, Zezula Pavel
A Self-organized System for Content-based Search in Multimedia.
In: IEEE International Symposium on Multimedia (ISM 2008), Patrick Kellenberger, Los Alamitos, California, 2008.
(in_print)
Zezula Pavel, Dohnal Vlastislav, Batko Michal
File Organizations
In: Wiley Encyclopedia of Computer Science and Engineering, Wiley-Interscience, San Francisco, CA, USA, 2008, pp. 1-11.
Zezula Pavel, Batko Michal, Dohnal Vlastislav
Indexing Metric Spaces
In: Database Management and Information Retrieval, Springer-Verlag, New York, 2008, pp. 1-4.