[Silva-general] indexing of .doc/.pdf files

Marc Petitmermet petitmermet at mat.ethz.ch
Wed May 20 10:29:38 CEST 2009


On 20. Mai 2009, at 20. Mai 2009|08:51 Uhr, Kit BLAKE wrote:

> On 19 May 2009, at 18:04, Marc Petitmermet wrote:
>> how can i tell silva which silva file (.doc/.pdf) to index and which
>> not? is it possible to de-index a file afterwards? i know exactly
>> which files should be indexed and which not (the final accepted
>> version and not any other revisions). or is there some kind of filter
>> which excludes all data from unwanted silva files from the search
>> results (some metadata would be nice: index/don't index like the  
>> "hide
>> from tables of content")? if there is no easy way i probably just  
>> copy
>> the final version to a different directory and have silva reindex the
>> catalogue and then restrict silva find to this folder.
>
> All content gets indexed. If there are multiple versions of the same  
> asset (.doc/.pdf) you should move the older versions somewhere else.  
> There are various ways you can control the search. One is by setting  
> a path and restricting the search to a folder or branch of the site.
>
> You could also put the old versions in a (sub)folder and give it an  
> access restriction. Public visitors who are not logged in won't see  
> the results, but if you're logged in you will. Then set the folder  
> to not show up in navigation.
>
> Interesting idea about the metadata, but it's better to control that  
> with queries. Silva Find is not just for public use, it can also be  
> really useful for authoring tasks.

we intend to use silva find for version 3.0 of our report center where  
students submit their reports [1]. i was told that assistants  
sometimes have the suspicion that they have read the same thing  
previously in an older report. with silva find they should be able to  
verify at least simple copy/paste actions by hand. having the content  
of the reports nicely indexed would also allow for some more  
sophisticated analysis for plagiarisms. if somebody has some nice  
formulas, howtos or papers to read i would be very much interested...

regards,
marc


[1] http://www.mat.ethz.ch/silva_cms/reportcenter



More information about the silva-general mailing list