Thursday, November 6, 2008

System Requirements

I heard Arctos will only run on X. I don't have that! I have Y, and I LIKE it!!

Alternatively: I don't want to be in a shared environment! I want to build my own system, and Arctos doesn't work like that/the developers will egg my house/the license doesn't allow this.

You've heard lots of things, haven't you? They're mostly wrong. Here's the real deal.

You'll need a database. Oracle will let you re-use all of our DDL code, but we've run Arctos on other things. Oracle will run on Windows, Lunix, Solaris, and probably some other stuff.

You'll need a CFML interpreter. ColdFusion is probably required to enjoy all of Arctos' functionality, but the basics should work under several interpreters. CF will run on Linux or Windows, and has been unofficially ported to a few other things.

That is all. The developers aren't likely to talk to you unless you're running the core code under Oracle and ColdFusion in a reasonable environment, but they won't accept your money either.

There are other niceties - Apache is more stable than ColdFusion's built-in webserver, for example. You'll want a reasonable amount of RAM - as in all things, more is better. You'll want some disk space, especially if you'll host Media locally. A decent network connection is nice.

Anything else is simply not required, but do not expect blazing performance or fabulous stability in an untuned or poorly-hosted environment. Your knockoff implementation might not work like you think it should, but that won't be the fault of Arctos or it's developers.

Basic specimen data requirements

ZOMG! I've just got a dead rat! I don't know or care what a Project or Loan is! This is too complicated!

Arctos has very minimalistic requirements for entering new data, all of which can be some form of "unknown." You must have an Identification ("unidentifiable" is OK), a Collector ("unknown" is OK), Geography ("no higher geography recorded" is OK), a Locality ("No specific locality recorded" is OK), and an Event ("between January first, 4.5 billion BC to the present" is OK). Everything else is optional. We would encourage Curators and Directors to have higher standards, but we will not encode more extensive requirements.

Taxonomy and friends

Arctos somehow owns my data! I need to use a particular format for loan numbers/taxon concept/scientific name/font/screen brightness/interpretation of the number two, and Arctos won't let me! This is hijacking! Help!!!

Our place in the community

Arctos is an attempt to empower Curators. Arctos, as a system, concept, or implementation, has neither the capacity nor inclination to change how it's users do business. We have our own electrons and are not interested in stealing yours. We understand that different collections and institutions will have differing ideas about how things should be done, and we embrace that. We do attempt to provide the tools to build upon the efforts of others and to share your data and results with others. We believe we are better at supporting the freedom to accurately record your data while supporting data standardization efforts than any other natural history management system.

Arctos is open-source. You are free to take the code, under the terms of our license, and do with it what you will. You are also free, under certain restrictions (you cannot interfere with how others do business, nor severely and negatively impact system performance), to submit your code for incorporation into the Arctos project.

All Data Definition Language code is freely available.

As an Arctos participant, you are entitled to download data or receive a regular copy of the Oracle backup files. We think this is unnecessary, but we will accommodate your needs as we can. Arctos' host, AlasConnect, also hosts Golden Valled Electric Company and most of Fairbank's electronic medical data. The Department of Homeland Security conducts regular audits. Specimen data is at least as secure as your electric bill, the software that regulates your power, and your latest medical images.

Media stored in the iRODS system is archived at two supercomputer centers.

We firmly believe that your data are as secure as electronic data can be.

Taxonomy

Arctos continually attempts to provide access to external taxon authorities while allowing curatorial users - you know, those folks who create taxonomy - the tools and flexibility they need to do their jobs. So, while we may suggest taxon concepts from IPNI, uBio, ITIS, or other resources, we will never limit your choices to just those things.

Formatting

Identifiers - loan and accession numbers, field numbers, and soon catalog numbers may be entered in most any format. We encourage standardization, and can provide additional tools (such as incrementers) for standardized data, but there are very few actual requirements.

Rumors and hearsay

Arctos is based on proprietary software. Isn't that expensive, scary, and perhaps even evil?

Database

Arctos is based on what works. Arctos has used Oracle, Sybase, and PostgreSQL as the underlying RDBMS at various times. Oracle is by far the most feature-rich and stable platform that we've tried. That said, the entire Arctos community is pro open-source. Find a RDBMS that does what we want at least as well as Oracle and we're in. Briefly, that must include:
  • User and role management: We use Oracle's users and roles to their full extent.
  • Virtual Private Databases: We must be able to confidently let a user do what they want to their data in a shared environment.
  • Integrity constraints: There are thousands of rules about relationships among records, data value rules, and triggers that make Arctos what it is.
  • Data Consistency: We are absolutely certain that Oracle will never do strange things to our data. We are equally certain that we can recover - to the minute, if necessary - if something does happen.
All of those things could be engineered into PostgreSQL. Doing so would require extensive developer commitment, and would still result in an untested and one-off solution. Oracle natively provides these attributes and is cheap by comparison, both in terms of dollars and data security.

The question is, what are your data worth, and do you wish to make that investment in Enterprise-grade software or custom coding?

Application Environment

We explored several alternatives to ColdFusion. Versata, WebObjects, Oracle Forms, and PHP have been scrutinized most closely. All of them, and many other application servers/languages, could be made to run Arctos. ColdFusion has proven reliable and flexible, and it's what we most know. In addition to the basic job of communicating between a web page and a database, ColdFusion provides native support for a GUI report interface and PDF creation. PDF creation has become commonplace. Comparative GUI reporters (Crystal Reports is probably the most comparable) are expensive and potentially clunky.

ColdFusion simply injects CFML (ColdFusion Markup Language) and returns JAVA to any J2EE application server. The also enables us to easily extend ColdFusion with other JAVA applications.

Additionally, ColdFusion allows us to use different languages as we need them. We've coded extensively in JavaScript, and have several extensions that use PHP.

We currently believe ColdFusion provides the best "bang for the buck." Moving to any other language would require extensive rewrites and would certainly introduce a myriad of bugs.