Bartoň Stanislav, Zezula Pavel
Designing and Evaluating an Index for Graph Structured Data
In: Proceedings of ICDM MCD 2006, IEEE Press, Hong Kong, China, 2006, pp. 1-5.
Presented at: The Second International Workshop on Mining Complex Data - MCD'06 - In Conjunction with IEEE ICDM'06, 18.12.-22.12.2006, Hong Kong,
China.
Bartoň Stanislav, Zezula Pavel
Mining Citation Graphs Employing an Index for Graph Structured Data
In: Inteligentní modely, algoritmy, metody a nástroje pro vytváření sémantického webu, Ústav informatiky AV ČR, Prague, 2006, pp. 1-9.
ISBN: 80-903298-7-X
Presented at: Inteligentní modely, algoritmy, metody a nástroje pro vytváření sémantického webu - Seminář projektu programu Informační společnost, 5.10.-7.10.2006, Zadov,
Czech Republic.
In this paper a correlation between rho-operators and indirect
relations in citation analysis is presented. The rho-operators were defined
to explore complex relationships in graph structured data. Various direct
and indirect relations identified in citation analysis are used to study the
semantics within the citation network. The rho-index was used to implement the rho - path search in the citation network and gained results are
evaluated and discussed.
Bartoň Stanislav, Zezula Pavel
rho-Index - Designing and Evaluating an Indexing Structure for Graph Structured Data
Technical Report: FIMU-RS-2006-07, FI MU, Brno, 2006, 24 p.
Bartoň Stanislav
Searching Indirect Relationships in Citation Analysis Using an Index for Graph Structured Data
In: 2nd Doctoral Workshop on Mathematical and Engineering Methods in Computer Science MEMICS 2006, 2006, pp. 9-16.
Presented at: 2nd Doctoral Workshop on Mathematical and Engineering Methods in Computer Science MEMICS 2006, 27.10.-29.10.2006, Mikulov,
Czech Republic.
Batko Michal, Dohnal Vlastislav, Zezula Pavel
M-Grid: Similarity Searching in Grids
In: Proceedings of International Workshop on Information Retrieval in Peer-to-Peer Networks, ACM Press, Arlington, 2006, pp. 1-8.
Batko Michal, Novák David, Falchi Fabrizio, Zezula Pavel
On Scalability of the Similarity Search in the World of Peers
In: InfoScale '06: Proceedings of the 1st international conference on Scalable information systems, ACM Press, New York, NY, USA, 2006, pp. 1-12.
ISBN: 1-59593-428-6
Due to the increasing complexity of current digital data,
similarity search has become a fundamental computational
task in many applications. Unfortunately, its costs are still
high and the linear scalability of single server implementations
prevents from efficient searching in large data volumes.
In this paper, we shortly describe four recent scalable
distributed similarity search techniques and study their performance
of executing queries on three different datasets.
Though all the methods employ parallelism to speed up
query execution, different advantages for different objectives
have been identified by experiments. The reported results
can be exploited for choosing the best implementations
for specific applications. They can also be used for designing
new and better indexing structures in the future.
Hanks Patrick
The Organization of the Lexicon: Semantic Types and Lexical Sets
In: Proceedings of Euralex Conference, University of Torino, 2006.
Presented at: The 12th EURALEX International Congress, 6.9.-9.9.2006, Torino,
Italy.
This paper reports a new kind of lexicon currently being developed as a resource for
natural language processing, language teaching, and other applications. This is a "Pattern
Dictionary of English", based on detailed and extensive corpus analysis of each sense of each verb in the language. A pattern consists of a verb with its valencies, plus semantic values for each valency and other relevant clues, and is associated with an implicature that associates the meaning with the context rather than with the word in isolation. For each verb, all normal patterns are recorded. The semantic types in each argument slot are linked to actual words via a large ontology.
The dictionary is aimed primarily at the NLP community, but it also has relevance for language teaching. For NLP purposes, matching actual uses of verbs in previously unseen text to patterns in the pattern dictionary offers some hope of solving the “Word Sense Disambiguation (WSD) problem”.
The paper discusses the relationship between A) words as they are actually used and B) semantic types and functions in a theoretical lexicon. An attempt will be made in the full paper to relate empirically observable, corpus-based facts about ordinary word use to the theoretical abstractions of Generative Lexicon Theory of James Pustejovsky. Lexicography and linguistic theory are often uneasy bedfellows, but I shall suggest that in at least these two cases, there is a possibility of a harmonious and productive relationship.
Hlaváčková D., Horák Aleš, Kadlec V.
Exploitation of the VerbaLex Verb Valency Lexicon in the Syntactic Analysis of Czech
In: Text, Speech and Dialogue - Proceedings of the 9th International Conference, TSD 2006, LNCS 4188, Springer-Verlag, Berlin / Heidelberg, 2006, pp. 79-86.
Presented at: Ninth International Conference on TEXT, SPEECH and DIALOGUE TSD 2006,, 11.9.-15.9.2006, Brno,
Czech Republic.
Horák Aleš, Pala Karel, Rambousek Adam, Povolný Martin
DEBVisDic – First Version of New Client-Server Wordnet Browsing and Editing Tool
In: Proceedings of the Third International Wordnet Conference, 2006.
Presented at: The 3rd International WordNet Conference (GWC-06), 22.1.-26.1.2006, Jeju Island,
Korea.
In this paper, we present the new wordnet development
tool called DEBVisDic. It is built
on the recently developed platform for clientserver
XML databases, called DEBii. This
platform is able to cover many possible applications,
from which we concentrate on the new,
complete reimplementation of one of the mostspread
wordnet editor and browser – VisDic.
We argue for the benefits the new DEBii platform
brings to wordnet editing and to XML
databases in general. In the paper, we describe
the state of the implementation, the insides and
interfaces of the DEBVisDic tool. We also discuss
its functionality and some distinctions in
comparison with other dictionary writing systems.
Nováček Vít, Smrž Pavel
Ontology Acquisition for Automatic Building of Scientific Portals
In: Proceedings of SOFSEM 2006: Theory and Practice of Computer Science, LNCS 3831, Springer-Verlag, Berlin, 2006, pp. 493-500.
ISBN: 3-540-31198-X
Presented at: SOFSEM 2006: Theory and Practice of Computer Science, 21.1.-27.1.2006, Měřín,
Czech Republic.
Ontologies are commonly considered as one of the essential parts of the Semantic Web vision, providing a theoretical basis and implementation framework for conceptual integration and information sharing among various domains. In this paper, we present the main principles of a new ontology acquisition framework applied for semi-automatic generation of scientific portals. Extracted ontological relations play a crucial role in the structuring of the information at the portal pages, automatic classification of the presented documents as well as for personalisation at the presentation level.
Nováček Vít
Motivations of Extensive Incorporation of Uncertainty in OLE Ontologies
In: Proceedings of SOFSEM 2006, Volume: II, ICS AS CR, Prague, 2006, pp. 145-154.
ISBN: 80-903298-4-5
Presented at: SOFSEM 2006: Theory and Practice of Computer Science, 21.1.-27.1.2006, Měřín,
Czech Republic.
Recently, the significance of uncertain information representation has become obvious in the Semantic Web community. This paper presents an ongoing research of uncertainty handling in automatically created ontologies. Proposal of a specific framework is provided. The research is related to OLE (Ontology LEarning), a project aimed at bottom-up generation a nd merging of domain specific ontologies. Formal systems that underlie the uncertai nty representation are briefly introduced. We will discuss a universal internal form at of uncertain conceptual structures in OLE then. The proposed format serves as a basis for inference tasks performed among an ontology. These topics are outlined as motivations of our future work.
Nováček Vít, Smrž Pavel
Empirical Merging of Ontologies - A Proposal of Universal Uncertainty Representation Framework
In: The Semantic Web: Research and Applications - Proceedings of ESWC`06 - 3rd European Semantic Web Conference, LNCS 4011, Springer-Verlag, Berlin, 2006, pp. 65-79.
ISBN: 3-540-34544-2
Presented at: ESWC`06 - 3rd European Semantic Web Conference, 11.6.-14.6.2006, Budva,
Montenegro.
The significance of uncertainty representation has become obvious in the Semantic Web community recently. This paper presents our research on uncertainty handling in automatically created ontologies. A new framework for uncertain information processing is proposed. The research is related to OLE (Ontology LEarning) - a project aimed at bottom-up generation and merging of domain-specific ontologies. Formal systems that underlie the uncertainty representation are briefly introduced. We discuss the universal internal format of uncertain conceptual structures in OLE then and offer a utilisation example then. The proposed format serves as a basis for empirical improvement of initial knowledge acquisition methods as well as for general explicit inference tasks.
Nováček Vít, Smrž Pavel, Pomikálek Jan
Text Mining for Semantic Relations as a Support Base of a Scientific Portal Generator
In: Proceedings of LREC 2006 - 5th International Conference on Language Resources and Evaluation, ELRA, Paris, 2006, pp. 1338-1343.
ISBN: 2-9517408-2-4
Presented at: LREC 2006 - 5th International Conference on Language Resources and Evaluation, 24.5.-26.5.2006, Genoa,
Italy.
Current Semantic Web implementation efforts pose a number of challenges. One of the big ones among them is development and evolution of specific resources—the ontologies—as a base for representation of the meaning of the web. This paper deals with the automatic acquisition of semantic relations from the text of scientific publications (journal articles, conference papers, project descriptions, etc.). We also describe the process of building of corresponding ontological resources and their application for semi–automatic generation of scientific portals. Extracted relations and ontologies are crucial for the structuring of the information at the portal pages, automatic classification of the presented documents as well as for personalisation at the presentation level. Besides a general description of the portal generating system, we give also a detailed overview of extraction of semantic relations in the form of a domain–specific ontology. The overview consists of presentation of an architecture of the ontology extraction system, description of methods used for mining of semantic relations and analysis of selected results and examples.
Nováček Vít
Ontology Learning
In: Diploma Thesis, Faculty of Informatics, Masaryk University, Brno, 2006, pp. 1-65.
Ontology learning is one of the essential topics in the scope of an important area of current computer science and artificial intelligence - the upcoming Semantic Web. As the Semantic Web idea comprises semantically annotated descendant of the current world wide web and related tools and resources, the need of vast and reliable knowledge repositories is obvious. Ontologies present well defined, straightforward and standardised form of these repositories. There are many possible utilisations of ontologies - from automatic annotation of web resources to domain representation and reasoning tasks. However, the ontology creation process is very expensive, time-consuming and unobjective when performed manually. So a framework for automatic acquisition of ontologies would be very advantageous. In this work we present such a framework called OLE (an acronym for Ontology LEarning) and current results of its application. The main relevant topics, state of the art methods and techniques related to ontology acquisition are discussed as a part of theoretical background for the presentation of the OLE framework and respective results. Moreover, we describe also preliminary results of progressive research in the area of uncertain fuzzy ontology representation that will provide us with natural and reasonable instruments for dealing with inconsistencies in empiric data as well as for reasoning. Main future milestones of the ongoing research are debated as well.
Nováček Vít
Ontology Acquisition Supported by Imprecise Conceptual Refinement - New Results and Reasoning Perspectives
In: Inteligentní modely, algoritmy, metody a nástroje pro vytváření sémantického webu, Ústav informatiky AV ČR, Prague, 2006, pp. 91-101.
ISBN: 80-903298-7-X
Presented at: Inteligentní modely, algoritmy, metody a nástroje pro vytváření sémantického webu - Seminář projektu programu Informační společnost, 5.10.-7.10.2006, Zadov,
Czech Republic.
The significance of uncertainty representation has become
obvious in the Semantic Web community recently. This paper presents
new results of our research on uncertainty handling in ontologies created
automatically by means of Human Language Technologies. The research
is related to OLE (Ontology LEarning) a project aimed at bottom-up generation and merging of domain-specific ontologies. It utilises a
proposal of expressive fuzzy knowledge representation framework called
ANUIC. We discuss current achievements in taxonomy acquisition and
outline some interesting applications of the framework regarding non-traditional reasoning perspectives.
Novák David, Zezula Pavel
M-Chord: A Scalable Distributed Similarity Search Structure
In: InfoScale '06: Proceedings of the 1st international conference on Scalable information systems, ACM Press, New York, NY, USA, 2006, pp. 1-10.
ISBN: 1-59593-428-6
The need for a retrieval based not on the attribute values but on the very data content has recently led to rise of
the metric-based similarity search. The computational complexity
of such a retrieval and large volumes of processed
data call for distributed processing which allows to achieve
scalability. In this paper, we propose M-Chord, a distributed
data structure for metric-based similarity search.
The structure takes advantage of the idea of a vector index
method iDistance in order to transform the issue of similarity
searching into the problem of interval search in one
dimension. The proposed peer-to-peer organization, based
on the Chord protocol, distributes the storage space and
parallelizes the execution of similarity queries. Promising
features of the structure are validated by experiments on the
prototype implementation and two real-life datasets.
Pala Karel
Word Sketches and Semantic Roles
In: Proceedings of Corpus Linguistic Conference 2000, Saint-Petersburg State University, 2006.
ISBN: 5-288-04181-4
Sojka Petr
Towards Digital Mathematical Library
In: Inteligentní modely, algoritmy, metody a nástroje pro vytváření sémantického webu, Ústav informatiky AV ČR, Prague, 2006, pp. 110-113.
ISBN: 80-903298-7-X
Presented at: Inteligentní modely, algoritmy, metody a nástroje pro vytváření sémantického webu - Seminář projektu programu Informační společnost, 5.10.-7.10.2006, Zadov,
Czech Republic.
This paper describes a prototype of the OCR math engine
built in the DML-CZ project. Solution stands on the combination of FineReader and InftyReader programmes. The achieved error rate (counting
not only character errors, but also errors in the recognition of structure
of mathematics notation) decreased from an initial 12% to under 1%.
Sojka Petr, Choi Key-Sun, Fellbaum Christiane, Vossen Piek
Proceedings of the Third International WordNet Conference, GWC 2006
In: Proceedings of the Third International Wordnet Conference, 2006.
Sojka Petr, Kopeček Ivan, Pala Karel
Text, Speech and Dialogue - Proceedings of the 9th International Conference, TSD 2006
In: Text, Speech and Dialogue - Proceedings of the 9th International Conference, TSD 2006, LNCS 4188, Springer-Verlag, Berlin / Heidelberg, 2006.
ISBN: 978-3-540-39090-9
Zezula Pavel, Giuseppe Amato, Dohnal Vlastislav, Batko Michal
Similarity Search - The Metric Space Approach.
In: Advances in Database Systems, Volume: 32, Springer, 2006, pp. 220.
ISBN: 0-387-29146-6
Zezula Pavel
P2P Similarity Search Structures
In: Proceedings of the 14th Italian Symposium on Advanced Database Systems, peQuod, Ancona, Italy, 2006, pp. 1-12.
Presented as an invited talk: SEBD 2006 Fourteenth Italian Symposium on Advanced Database Systems, 18.6.-21.6.2006, Ancona,
Italy.
Zezula Pavel
Scalable Similarity Search in Computer Networks
In: Advances in Databases and Information Systems, LNCS 4152, Springer-Verlag, Berlin, 2006, pp. 3-3.
ISBN: 3-540-37899-5
Presented as an invited talk: Tenth East-European Conference on Advances in Databases and Information Systems, 3.9.-7.9.2006, Thessaloniki, Hellas,
Greece.
Zezula Pavel, Dohnal Vlastislav, Novák David
Towards Scalability of Similarity Searching
In: Global Data Management, (Ed. Baldoni R., Cortese G., Davide F., Melpignano A.), Volume: 8 of Emerging Communication: Studies in New Technologies and Practices in Communication, IOS Press, Amsterdam, The Netherlands, 2006.
ISBN: 1-58603-629-7
With the increasing number of applications that base searching on similarity rather than on exact matching, novel index structures are needed to speedup execution of similarity queries. An important stream of research in this direction uses the metric space as a model of similarity. We explain the principles and survey the most important representatives of index structures. We put most emphasis on distributed similarity search architectures which try to solve the difficult problem of scalability of similarity searching. The actual achievements are demonstrated by practical experiments. Future research directions are outlined in the conclusions.