Friday, February 5, 2010

RE: Why are you for killing libraries?

Comment on blog posting: Why are you for killing libraries? http://www.librarything.com/thingology/index.php
I think it would be naive to believe that the ebook will not have much of an effect on either bookstores or libraries, once there is an ebook reader the equivalent of the Ipod, which will probably happen sooner rather than later.

I am a librarian, and although I love books, I look forward to the future. But the future holds danger. I compare the situation of libraries to the newspaper world, where journalists are in danger of losing everything, in a similar way librarians are faced with the same danger.

But this is where librarians can learn from the journalists. There is general agreement that journalists are still needed. Many of them are now beginning to differentiate the field of "journalism" from "working for a newspaper." Since newspapers are in real danger, does that mean that the field of journalism is too? How can someone survive as a journalist without having to work for a newspaper? Some are deciding that it may be possible.

In the same way, there is the field of librarianship with our ethics and values, our skills and methods, and this differs from the tasks that we have of managing a physical collection. If libraries do not survive, or remain only as museums of "physical curiosities" (which is a possibility) does it mean that it is also the death knell for librarianship as a field? I don't think so, that is, if librarians reorient themselves.

I think that librarians and journalists can survive and even thrive in the new environment, but I don't know if newspapers and libraries will survive. If not, it will be a sad time, indeed.

Still, it should be a wild ride for everybody!

RE: [RDA-L] Utlility of ISBD/MARC vs. URIs (Was: Systems ...)

Bernhard Eversberg wrote:
<snip>
J. McRee Elrod wrote:
>
>> imposes structure where it isn't helpful (e.g., where it was based on obsolete card design).
>
> Every word of your post rang true, until I reached that last sentence. Insofar as the old unit card structure is reflected in the choice and order of elements of the ISBD, it is *very* helpful.

Mac, I wasn't targeting ISBD here, and I'm as convinced as you are about its usefulness and importance. (We only want to get rid of punctuation at the end of subfields.)

Rather, I was getting at the innumerable rules that concern the arrangement of entries and tracings and whether or not an added entry was necessary, and how to control these things. Most of the indicators that concerned card production are not helpful any more but add to the confusion that governs opinions about MARC. Also, stuff like the omission of leading articles in uniform titles, which came into being *only* because that field lacks the indicator.

</snip>

In addition, I think it's important to consider how it is best to focus our (most probably) ever decreasing resources in a truly shared, open environment. Let us just imagine for the moment, that we can get ONIX or DC copy for every single resource we catalog (that will be quite some time in the future if ever, but let's just imagine) and the cataloger updates the record. Efficiency will probably still dictate that there be copy catalogers who concentrate on the simple updates, and complex catalogers who will do more. How will it look if the copy catalogers report that for the week they have added filing indicators to 200 records and 245$b to 300 records? :-)

Joking aside, I think we have to get to the kernel of what our users need, plus I think we need to accept that once projects such as Google books come online, fewer and fewer people will search our local catalogs separately. They will come to our catalogs (if at all) from Google Books, where they will find the full-text plus a mashup of our metadata mixed in with who knows what, to find whether a library near them has a physical copy of an item, although they will be able to read the book online. Only time can tell how long it will be before people don't care so much about the physical book. (As an aside, I just bought a Sony ebook reader, and although I am definitely a bookman, I absolutely love it! For the first time, I can actually enjoy reading a book I have taken from the web! I have shown it to people and most want one too)

I admit this is a terrifying scenario (for me, at least), but it is one that is both logical and easy to predict. Once it is accepted however, we can begin to consider exactly what catalogers can provide our patrons that the Googles and the Yahoos cannot. I think there is an awful lot we can do and we can prove that we are still necessary.

But I don't know how much of it will resemble what we have always done. Is browsing alphabetically by title *really* so important to people that we must devote resources to do it? Would those resources be better used in adding new materials? I don't know but I have my own opinions. I think the situation is becoming so important that today we must make a case why people need something so desperately, e.g. browsing alphabetized lists of book titles, that we must devote staff time to redoing records that are otherwise correct. No longer can we rely on simply continuing current practices. Of course, this goes for all of MARC and the cataloging rules, but one must start somewhere.

Tuesday, February 2, 2010

FW: [RDA-L] Systems v Cataloging was: RDA and granularity

Bernhard Eversberg wrote:

<snip>
Some metadata creators are inclined to follow no rules except their own, not disclosing what these are. But OK, we should not be pointing fingers at them but try very hard to make sense of everything they might come up with, creating a grand mashup (resisted to write hotchpotch.)

If that is so, and if metadata creators are not interested in getting the most out of our stuff either, why do we keep following extremely complex rules requiring innumerable elements? Dumb down RDA and MARC so we have only one elementfor keyword indexable text, and a few indispensable codes and dates. Wouldn't that immensely ease the job of creating the mashup? After all, what more is Google doing, and who except us is saying that's not good enough?
</snip>

I think that each group sincerely believes its own standard to be better than anyone else's. (I believe it!) So long as everyone holds onto such ideas, there can be no change and the result will be that a separate metadata record will forever be made and remade by each metadata community (or when taken to a reductio ad absurdum, even each library/bibliographical agency). This is the situation as it has always been, but before the WWW it was practically impossible to know about and share records with all of the other bibliographic agencies. Those difficulties have now been overcome. This situation becomes uncomfortable however, since earlier, while we honestly could not see the records produced by others, today we either have to pretend not to see them or willfully ignore them. This results in a situation that I don't believe serves anybody very well.

The practice of cataloging is based on the principle of "consistency" which can turn cataloging into the most conservative of endeavors. By following the principle of consistency, catalogers ensure that the records they make today must work with the older records, some of them made 100 or more years ago. If you don't keep this in mind, the result can be hiding the previous records or at least making those earlier records incomprehensible. Of course, lots of practices have changed tremendously, but the basic idea is for everything to work together. Can the principle of consistency be retained in an open, shared, cooperative environment? I think it can.

Perhaps I'm a dreamer, but since it seems as if the general public wants reliable metadata (ref. the Language Log discussion about the metadata in Google Books) I still think that it's not too late, so long as catalogers are willing to adapt to some different practices. If we could simply get the rules pertaining to each separate bit of metadata, e.g. these page numbers follow the rules of the FAO of the UN, or by CERN, AACR2, Dewey, etc., it could go a long way for making the information more understandable.

I emphasize that this would be for librarians, who need this level of detail for their work of maintaining the collection, and not for users, who rarely need anything like this.

[RDA-L] Systems v Cataloging was: RDA and granularity

Daniel CannCasciato wrote:

<snip>
Karen Coyle wrote in part:

" all of the needs are user needs . . . "

Brava!
</snip>

Pardons, but this is not correct. If we are to manage the collection (whatever "the collection" happens to be), we will need tools, and some of these tools will be designed for library use and not for the users.

There's nothing strange about this: for example, there are many things on an automobile that the general public does not need to understand in order to drive the car safely and correctly. Still, just because I do not understand them, I do not conclude that they are unnecessary. Some of the things may be there for no other reason than to make it easier (and cheaper) for the mechanics to do maintenance. Good! If I insist on knowing what all of these strange things are, I can learn what they are there for, but it is highly presumptuous to conclude that they are unnecessary.

For this reason, something like the number of pages is useful and vital primarily for librarians to manage a collection. What do I mean by this? If a selector is deciding whether to buy a copy of a certain text, e.g. yet another copy of Romeo and Juliet, he or she first needs to know if there is already a copy in the collection. The paging must describe the item well enough so that the selector does not have to march into the stacks to check how many pages the item *really* has. If the selector ends up buying an additional copy of something already in the collection, everybody gets mad because of the waste of money, staff time, and shelf space. But very few patrons, i.e. only the extreme specialists of our general reading public, really care much about how many pages something has.

There are many other areas of the record like this: the publishing/copyright/printing date(s), statement of responsibility, series statement, arguably the series tracing, many of the notes, and so on.

The traditional catalog serves many functions for many people, and one of the primary functions is as an inventory tool. It remains to be seen whether e.g. the incredibly complex system of subject headings are there for users, or more for librarians to ensure reliable retrieval.

In today's mashup world, where all kinds of metadata will be thrown together in ways we cannot predict, it is our task to figure our some way to have all of this make sense. See for example, the current thread in the NGC4LIB list about CERN making their bibliographic data open, which is non-ISBD. I am sure that other libraries will follow and Anglo-American libraries eventually will be forced to do the same. Sooner or later, our metadata, based on different standards, will *HAVE* to interoperate with CERN's metadata, and many other standards.

But let's face it: this is what is happening in our catalogs right now, since they contain various bibliographic standards other than the current flavor of AACR2. Our catalogs have always managed to contain AACR2, AACR1, non-ISBD, Cutter rules, Dewey rules, ALA rules, and on and on. If RDA is implemented, there is yet another standard.

Looked at in this way, the new environment may not be all that much different from what we have today.

Again, I think these are the directions we should take instead of coming up with yet another new set of rules that few metadata creators will follow.

Monday, February 1, 2010

FW: [NGC4LIB] The CERN Library publishes its book catalog as Open Data

Concerning The CERN Library publishing its book catalog as Open Data
The whole dataset can be downloaded from http://cern.ch/bookdata
This is really great and I hope that other libraries will follow.

*But* the question will be how to incorporate all of this together in a coherent way. The standards of CERN are quite different from Anglo-American standards. Below is a record taken at random, with the record pf the same item in LC. After a quick look, I see that in CERN there is no size; in the LC record the place of publication reflects AACR2 practice of adding a place within the country of the cataloging agency; there are differences in the date of publication vs. date of copyright; no statement of responsibility and no edition statement in the CERN record; the paging itself is different. These last are important for AACR2's determination of copy vs. new edition. CERN's subjects reflect their narrower collecting focus vs. LCSH's broader focus, e.g. "Python" vs. "Python (Computer program language)." Noel Rappin's name does not have the date of birth as occurs in the NAF. There are several other differences, including some of differing cataloging philosophies.

None of this is to find fault, but rather, while the sharing is great, that is only a first step. How can we use these records in the best, most efficient way for our own purposes and for our users? Of course, some of these problems can be solved with URIs, but I don't believe everything can. Do we just settle for a mashup or can we do something else?

Jim Weinheimer

<record>
<controlfield tag="001">984645</controlfield>
<controlfield tag="005">20071109101316.0</controlfield>
<datafield tag="020" ind1=" " ind2=" ">
<subfield code="a">0596002475</subfield>
<subfield code="u">print version, paperback</subfield>
</datafield>
<datafield tag="020" ind1=" " ind2=" ">
<subfield code="a">9780596002475</subfield>
<subfield code="u">print version, paperback</subfield>
</datafield>
<datafield tag="041" ind1=" " ind2=" ">
<subfield code="a">eng</subfield>
</datafield>
<datafield tag="080" ind1=" " ind2=" ">
<subfield code="a">004.438.Jython</subfield>
</datafield>
<datafield tag="100" ind1=" " ind2=" ">
<subfield code="a">Pedroni, Samuele</subfield>
</datafield>
<datafield tag="245" ind1=" " ind2=" ">
<subfield code="a">Jython</subfield>
<subfield code="b">Essentials</subfield>
</datafield>
<datafield tag="246" ind1=" " ind2=" ">
<subfield code="a">Rapid Scripting in Java</subfield>
<subfield code="i">Cover title</subfield>
</datafield>
<datafield tag="260" ind1=" " ind2=" ">
<subfield code="a">Beijing</subfield>
<subfield code="b">O'Reilly</subfield>
<subfield code="c">2002</subfield>
</datafield>
<datafield tag="300" ind1=" " ind2=" ">
<subfield code="a">277 p</subfield>
</datafield>
<datafield tag="490" ind1=" " ind2=" ">
<subfield code="a">O'Reilly &amp; Asociates books</subfield>
</datafield>
<datafield tag="650" ind1="1" ind2="7">
<subfield code="2">SzGeCERN</subfield>
<subfield code="a">Computing and Computers</subfield>
</datafield>
<datafield tag="653" ind1="1" ind2=" ">
<subfield code="9">CERN</subfield>
<subfield code="a">Jython</subfield>
</datafield>
<datafield tag="653" ind1="1" ind2=" ">
<subfield code="9">CERN</subfield>
<subfield code="a">Java</subfield>
</datafield>
<datafield tag="653" ind1="1" ind2=" ">
<subfield code="9">CERN</subfield>
<subfield code="a">Python</subfield>
</datafield>
<datafield tag="690" ind1="C" ind2=" ">
<subfield code="a">BOOK</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
<subfield code="a">Rappin, Noel</subfield>
</datafield>
<datafield tag="916" ind1=" " ind2=" ">
<subfield code="d">200609</subfield>
<subfield code="s">h</subfield>
<subfield code="w">200638</subfield>
</datafield>
<datafield tag="960" ind1=" " ind2=" ">
<subfield code="a">21</subfield>
</datafield>
<datafield tag="961" ind1=" " ind2=" ">
<subfield code="c">20080407</subfield>
<subfield code="h">2044</subfield>
<subfield code="l">CER01</subfield>
<subfield code="x">20060920</subfield>
</datafield>
<datafield tag="963" ind1=" " ind2=" ">
<subfield code="a">PUBLIC</subfield>
</datafield>
<datafield tag="970" ind1=" " ind2=" ">
<subfield code="a">002647668CER</subfield>
</datafield>
<datafield tag="980" ind1=" " ind2=" ">
<subfield code="a">BOOK</subfield>
</datafield>
<datafield tag="964" ind1=" " ind2=" ">
<subfield code="a">0001</subfield>
</datafield>
</record>

LC Control No.: 2003266066
LCCN Permalink: http://lccn.loc.gov/2003266066
000 01438cam a22003494a 450
001 13108602
005 20090729142230.0
008 030303s2002 ch a b 001 0 eng
010 __ |a 2003266066
015 __ |a GBA2-Y6751
020 __ |a 0596002475
035 __ |a (OCoLC)ocm49044531
040 __ |a UKM |c UKM |d CUS |d TXA |d CUY |d DAY |d DLC
042 __ |a pcc
050 00 |a QA76.73.J38 |b P43 2002
082 04 |a 005.133 |2 21
100 1_ |a Pedroni, Samuele.
245 10 |a Jython essentials / |c Samuele Pedroni and Noel Rappin ; foreword by Jim Hugunin.
250 __ |a 1st ed.
260 __ |a Beijing ; |a Sebastopol, CA : |b O'Reilly, |c c2002.
300 __ |a xx, 277 p. : |b ill. ; |c 23 cm.
500 __ |a "Rapid scripting in Java"--Cover.
504 __ |a Includes bibliographical references (p. xvi-xvii) and index.
650 _0 |a Java (Computer program language)
650 _0 |a Jython (Computer program language)
650 _0 |a Python (Computer program language)
700 1_ |a Rappin, Noel, |d 1971-
856 42 |3 Publisher description |u http://www.loc.gov/catdir/enhancements/fy0715/2003266066-d.html
856 42 |3 Contributor biographical information |u http://www.loc.gov/catdir/enhancements/fy0912/2003266066-b.html
906 __ |a 7 |b cbc |c pccadap |d 2 |e ncip |f 20 |g y-gencatlg
925 0_ |a acquire |b 2 shelf copy |x policy default
955 __ |a ps05 2003-03-03 to ASCD |c jf05 2003-03-11 to subj. |d jf09 2003-03-11 to sl |e jf12 2003-03-12 to Dewey |a jf16 2003-07-11 copy2 to BCCD

Wednesday, January 20, 2010

Posting to: human? More or less, by Glen Lowry.

A very interesting post, especially since I happen to be a librarian. In my experience, the moment you put something on the web, it begins to fall apart, to deconstruct. The ability to hyperlink from one bit of information to another bit of information tends to turn what had appeared to be a coherent unity, e.g. a newspaper, a journal, a book, or a library, into a myriad of separate bits of “information” reorganized in unpredictable and even bizarre ways. Some decry these bits as totally chaotic but others do not. I personally think that a debate over whether it is chaos or not is useless since I don’t see any possibility of anything stopping it for a long time to come. It is much more productive to figure out how to adapt to it.

Just as journalists have identified closely with their newspapers, librarians have identified very closely with their libraries. Therefore, they believe that a lessening in the importance of the “library” translates into a lessening of importance for themselves as well. But now with the decline of newspapers, we are beginning to see that it is important to differentiate between “journalism” as an endeavor, from the “reporter who works at a newspaper” as separate concepts, and therefore I believe it will be just as important to differentiate between “librarianship” and “a librarian who works in a specific library.” While I have no doubt that “journalism” and “librarianship” can survive and even flourish in the new environment somehow in all kinds of novel ways, I don’t know if “newspapers” or “libraries” will be able to adapt. This is a further example of the trend toward deconstruction that affects people’s lives and careers. I don’t know if this is such a bad thing although it is definitely disruptive.

Since the very nature of the “newspaper” has disintegrated beyond all practical recognition with tools like Google News, I believe it is safe to predict that “libraries” will disintegrate as well as projects such as Google Books (among many others) come online. It will be increasingly important for “those who practice librarianship,” i.e. experts in information retrieval divorced from any single source, to remain flexible and adaptable. It will be plenty of work to create new tools and techniques, but there’s nothing wrong with that.

Tuesday, January 19, 2010

LIBER Quarterly Article on Europeana

Posting to NGC4LIB

Liber has recently published a very interesting article (http://liber.library.uu.nl/publish/issues/2009-2/index.html?000472) where the creators of the digital library Europeana asked for an outsider's view of their project and asked Rick Erway of OCLC to do it. This was quite a courageous act by Europeana, and I commend them.

The entire article is quite thought provoking and I am still reading it, but Ms. Erway has some interesting things to say about metadata in a shared, internationalized environment. I take the liberty to quote the entire section:

<quote>
Metadata
My theories on metadata are

1. We do not need another standard.

2. People will use standards, but not in standard ways. Surprising choices are made even in using plain old Dublin Core. Having to hunt for or transform data, based on site-specific rules, does not easily scale.

3. People say they want to be told what to do, but they will not do it, because their situation or collection is unique.

4. No one likes their own metadata.

5. Mapping is a mythical grail.

What follows is a gross generalization (to which I have found no exceptions): Librarians want metasearch or federated searching. They do not like their own implementation. They blame the deficiency on metadata mapping. If they just had a better crosswalk, it would be better. So they change their software, retool with better mapping, and they still do not like it.

The reason is that a butterfly specimen has entirely different metadata than a painting of a butterfly. Who is the creator and what is the title or subject of a butterfly specimen? What is the Latin name or habitat of an impressionistic rendition of a butterfly? Just how many fields can be mapped between these two records?

My recommendation is to require a very small set of common elements and allow the rest to aid free text searching. Europeana's adoption of OAI-PMH and Dublin Core is a good thing. It precludes the development of yet another approach and adopts one that others may already be using. Requiring some very basic elements makes some advanced searches or filtering possible. If participants are allowed to leave required elements empty, it will render those documents not discoverable. Allowing data beyond what is required will allow for better retrieval, but just through free text searching. That's pretty much what users do anyway, type words in a box. Google manages to make it work.

User-generated information is intriguing. Access points that users use can be added to the ones we use. And we may get very rich information from experts. But there is a management headache when the data being augmented is in an aggregation. How do you coordinate giving enriched records back to contributors? If you do not or if they do not incorporate them into their catalog, then how do you coordinate updates from the contributor to the records that have been enhanced?

</quote>


I don't know if I agree with this or not. I do agree with the idea that metadata creation/cataloging can get far too theoretical and thereby lose a sense of practicality, her "Who is the creator and what is the title or subject of a butterfly specimen? What is the Latin name or habitat of an impressionistic rendition of a butterfly?" is a good example of this tendency.

Still, I consider this more as a matter of a loss of focus in the very purpose of the catalog among the practitioners; a "can't see the forest because of the trees" syndrome. For example, perhaps we should consider that it is relatively unimportant whether somebody/something is a creator or contributor or editor or web manager or whatever. The question should be: would somebody want to find this particular resource by searching this particular entity? If so, would they want to search this entity in a way that has something to do with the creation of the intellectual aspects of the resource, or by the publication, dissemination aspects of the resource, or by the various ways of describing the resource, perhaps the titles and subjects? Naturally there are difficulties with this: is conference a name, title, or "event"? Dates of creation, editing, etc. vs. dates of publication, issue. But such issues should not divert us from the essence of the matter, and many times these discussions are purely theoretical with little or no impact on how people search, retrieve and understand metadata.

In my experience, I think a fundamental idea is being lost among the populace: that a well-organized catalog truly allows searching for *concepts.". For example, he writes: "Allowing data beyond what is required will allow for better retrieval, but just through free text searching. That's pretty much what users do anyway, type words in a box. Google manages to make it work." I cannot agree with this. When people type in e.g., "wwi" into a box, it doesn't follow that they realize that they are searching the *text* "wwi" and not the *concept* of that war that took place from 1914-1918. So, when I have pointed this out to people, they are shocked that by typing "wwi" into the box, they miss-by definition-anything before 1938, because nobody called it WWI until there was WWII. Once the public realizes this, it becomes clear to them, and they are not so happy with Google results, but unless you have worked at this for a long time, such as professional catalogers have, you will never realize it. And of course, when you consider the totality of languages and how languages change, it is a far more complex and subtle matter than any single person can understand. Non-textual materials, music, videos, images, etc. have entire realms of other considerations as well.

Also, the conclusion that "Google manages to make it work" does not follow, in my opinion. Google manages to *make people happy* with the results of the search, but it does not mean that it really works the way people expect, as the WWI example above demonstrates. "Customer satisfaction" may be the correct goal for a company such as Google, but it is definitely not a satisfactory goal for doctors or lawyers, who are ethically compelled to tell you the truth whether it makes you happy or not. I like to think that librarians are more a part of the latter group instead of in the corporate business group that follows the motto "Let the buyer beware."

Her comment: "People say they want to be told what to do, but they will not do it, because their situation or collection is unique." I cannot agree with completely, either. I think people want to be told what to do, and especially since people are scared today, they may be willing to cooperate more than ever, but they will not be dictated to and therefore, cooperation will not be 100%. Cooperation involves a vast amount of give and take among all the groups involved, and that means us as well. Plus, cooperation includes an element of trust that seems to be lacking at the moment.

So, her recommendation "to require a very small set of common elements and allow the rest to aid free text searching," is absolutely necessary and I agree, but it does not obviate the need for a genuine organization of materials. How that can be done in a world of diminishing resources, higher productivity, and genuinely shared workflows remains to be seen.