SAPIR is a peer-to-peer multimedia information retrieval system that can index structured and unstructured text, still and moving images, speech, and music. The system's feature extraction component, which is responsible for analyzing documents to prepare them for indexing, is implemented using UIMA. Compound documents are handled using an architecture of (potentially nested) splitters and mergers within the UIMA processing chain. For example, the moving image from a video is processed by splitting it into multiple still frames, which are processed by the same analysis engine used for still photos, and then merging the results to form a representation of a video. The output of the feature extraction module is a document description in a representation based on the MPEG-7 standard.
In this paper, we present an experiment aimed at merging named entity annotations provided by different annotators. This work has been performed as part of the Infom@gic project, whose goal is the integration and validation of knowledge engineering and information analysis applications, and which is supported by the pole of competitiveness Cap Digital "Image, MultiMédia et Vie Numérique." We first describe the four annotators, which provide named entity annotations that conform to guidelines defined in the Infom@gic project. Then we present an algorithm for merging the different annotations. It uses information about the compatibility of various annotations and can point out conflicts, and thus yields annotations that are more reliable than those of any single annotator. We conclude by describing and interpreting the merging results obtained on a manually annotated reference corpus.
In an information retrieval system, a thesaurus can be used for query expansion, i.e. adding words to queries in order to improve recall. We propose a semi-automatic and interactive approach for the creation and maintenance of domain-specific thesauri for query expansion. Domain-specific thesauri are especially required in highly technical domains where the use of general thesauri for query expansion introduces more noise than useful results. Our semi-automatic approach to thesaurus creation constitutes a good compromise between fully manual approaches, which produce high-quality thesauri but at a prohibitively high cost, and fully automatic approaches, which are cheap but produce thesauri of limited quality. This article describes our approach and the architecture of the system implementing it, named Cannelle. It exploits user query logs and natural language processing to identify valuable synonymy candidates, and allows editors to interactively explore and validate these candidates in the context of a domain-specific searchable knowledge base. We evaluated the system in the domain of online troubleshooting, where the proposed method clearly yielded an improvement in the quality of the search results obtained.
Content based search in audio-visual collections requires media specific analysis for extracting low level features to be efficiently indexed and searched. We present the SAPIR media framework for analyzing digital content and representing the extracted features in a common schema. The framework contains splitters of compound objects to simple objects to deal with complex media like videos, using image and speech analyzers. The extracted features are then merged into a common representation. We report usage of this framework in the SAPIR demo.
We have developed an interactive query refinement tool that helps users search a knowledge base for solutions to problems with electronic equipment. The system is targeted towards non-technical users, who are often unable to formulate precise problem descriptions on their own. Two distinct but interrelated functionalities support the refinement of a vague, non-technical initial query into a more precise problem description: a synonymy mechanism that allows the system to match non-technical words in the query with corresponding technical terms in the knowledge base, and a novel refinement mechanism that helps the user build up successively longer and more precise problem descriptions starting from the seed of the initial query. A natural language parser is used both in the application of context-sensitive synonymy rules and the construction of the refinement tree.
Millions of facts are stored within the biological literature. Most of these facts represent small advances in the knowledge on an established theory, but a small fraction offer new insight into a biological phenomenon. We propose a method based on computational linguistic tools for distinguishing these facts (extraction) and exposing knowledge that may be important in future developments (prediction). The method is based on finding linguistic cues indicating that the authors of biological articles have identified a problem with, or a break from, conventional knowledge.
In their framework for ontological analysis, Guarino and Welty provide a number of insights that are useful for guiding the design of taxonomic hierarchies. However, the formal statements of these insights as logical schemata are flawed in a number of ways, including inconsistent notation that makes the intended semantics of the logic unclear, false claims of logical consequence, and definitions that provably result in the triviality of some of their property features. This paper makes a negative contribution, by demonstrating these flaws in a rigorous way, but also makes a positive contribution wherever possible, by identifying the underlying intuitions that the faulty definitions were intended to capture, and attempting to formalize those intuitions in a more accurate way.
We propose a logic of belief in which the expansion of beliefs beyond what has been explicitly learned is modeled as a finite computational process. The logic does not impose a particular computational mechanism; rather, the mechanism is a parameter of the logic, and we show that as long as the mechanism meets a particular set of constraints, the resulting logic has certain desirable properties. Chief among these is the property that one can reason soundly about another agent's beliefs by simulating its computational mechanism with one's own.
The EPILOG system, a computer program designed for narrative understanding, serves as a case study for the application of the model and the implementation of simulative inference about belief.
We propose a logic of belief in which the expansion of beliefs beyond what has been explicitly learned is modeled as a finite computational process. The logic does not impose a particular computational mechanism; rather, the mechanism is a parameter of the logic, and we show that as long as the mechanism meets a particular set of constraints, the resulting logic has certain desirable properties. Chief among these is the property that one can reason soundly about another agent's beliefs by simulating its computational mechanism with one's own. We also give a detailed comparison of our model with Konolige's deduction model, another model of belief in which the believer's reasoning mechanism is a parameter.
One can gain efficiency in an inference system by using special-purpose representations for reasoning about certain predicates, but some such representations make it impossible for the system to keep track of the reasons for which it holds each of its beliefs. We illustrate the potential conflict with some examples, and distill some general principles for designing representations that can support both efficient special-purpose inference and some form of reason maintenance.
If one has attributed certain initial beliefs to an agent, it is sometimes possible to reason about further beliefs the agent must hold by observing what conclusions one's own reasoning mechanism draws when given the initial beliefs as premises. This technique is called simulative inference. In an earlier paper, we described a logic of belief in which the reasoning that generates beliefs is modeled explicitly as a computational process. We used this logic to characterize a class of computational inference mechanisms for which simulative inference is sound, under the assumption that the observer and the observed have similar mechanisms. In this paper, we present a different form of simulative inference, and show that unlike the earlier form, it is sound even for some mechanisms that perform defeasible inference.