Friday, October 2, 2009

Customizations

Complaint: Arctos isn't customizable enough. The screens are mostly fixed, and collections can't change labels at will.

Response: I don't think you've thought the problem all the way through! Two things are at issue here: A concept called normalization, and the issue of biodiversity standards.

Any database that requires extensive customization to support a diversity of collections simply isn't doing one or both of those things.

Normalization allows us to use the data itself as a label. Arctos's Attributes screen, for example:



While no existing standards tell us what to label each field in a database, common terminology is something worth striving for. When there's nothing to follow, we in the Arctos community strive to lead.

Tuesday, May 12, 2009

Arctos v. Specify: A comparison








































































































































Arctos Specify
Description Enterprise software, hardware, backups (one in Fairbanks, one in Austin, one in San Diego), professional sysadmin. Software. User responsible for hardware setup, sysadmin, backups, etc.
Cost Software freely available to noncommercial enterprises. Hosting, development, and administrative costs are shared and negotiable. Free software, requires hardware, someone to maintain it, a defensible backup strategy, and network access.
Development Model Release early, release often. Let the users intimately guide every aspect of progress. Formalized issue tracking, Steering Committee, Advisory Committee. Release infrequently. May consider user input from the Specify Forum.
Front End ColdFusion (system works under PHP, Java, et al.) Java (including business rules)
Data Model Highly normalized; easily “pluggable” and expandable. 83 tables, 836 columns Denormalized. 143 tables, 2400 columns (as of 1 May 2009)
Back End Oracle – an enterprise-class RDBMS known for its concurrency management and stability. MySQL – a lightweight open-source RDBMS designed for fast query access. Not designed for archival usage. Limited concurrency management.
Business Rules In DB, where they’re always enforced. In application layer, where they may be bypassed (by DB updates, add-on applications, or Application bugs)
Permissions In DB, where they’re always enforced. May be used to define Virtual Private Databases. In application layer.
Security Independent layers in application and DB. Professionally managed and audited. In application layer, determined by system administrator.
Bulk Import No practical record limit. 2000-row limit.
Interfaces Intuitive customizable web applications. “Roll your own” queries against tables.
Taxonomy Formal separation of taxonomy and determinations. Accommodates composite taxonomy (hybrids, multiple taxa in one object) through identification formulae. Determinations treated as taxonomy.
Object Tracking Individual Specimen Parts are tracked and loaned. Cataloged Items are tracked and loaned.
Online Access Integral Coming soon? Limited to query only - limited data available?
Batch edits Most data; many access points None
DiGIR/Tapir Integral and automatic. Live data served. Coming soon? Manually maintained cache.
Media Relate to any “node.” Stored anywhere on Internet, or uploaded to server. ( +100K images, 3.8 TB at TACC) Stored on local filesystem or MorphBank.
System Requirements Reasonably modern browser and Internet access “Lowest common denominator”
Publications/Citations Inherent Unclear
Living Collections No apparent obstacles, but untested. Possible future development if the community wants to develop a separate schema.
Business Model Short-term NSF and institutional (MVZ, UAM, MCZ, & MSB) support. Short-term NSF and institutional support.
Data Quality Defined and enforced by the Arctos community. Left to individual operators.
Customizations User-customizable search and results. Collection-specific appearance and CSS. Operator customizable search and results.
Mapping BerkeleyMapper, Google Maps, Google Earth, download KML. Uncertainty represented as error circles. Point mapping via downloadable KML
Saved queries Save, name, email dynamic queries Save static results sets. Email to agents with email addresses.
Taxon-specific attributes User-definable, infinitely expandable determinations. Allows adding any biological collection. Predefined assertions.

Tuesday, January 13, 2009

Specify: Competition?

I expect you've seen that Specify is now cross-platform:

http://specify6.specifysoftware.org/

I've been browsing their screenshots & they've got some nice additions. I expect this is our main competitor, and given that it's 'free' to users and so much advertising etc behind it, I expect it to become even more entrenched than it is now.


I don't really consider Specify competition - they support a completely different paradigm.

They are not very dynamic, offer little in the way of user customizations, and aren't very responsive to user needs. They provide software, but you need to maintain your own system - which typically means dicey consumer hardware and craptacular backup strategies. They don't change anything quickly, and seem to have an effective pre-release cycle, so they do release stable software.

Arctos is (potentially and increasingly) centralized, meaning you get Enterprise-caliber hardware and software, system-wide tech support from those of us who wrote it, and planned and tested backup strategies, all for a fraction of the actual cost (and perhaps even much more for much less sometime soon). We'll listen to your ideas, implement them if they're not too wacky, and even get you set up to write code if you want. We follow a release early/release often strategy (which mostly means we're too poor to pay for actual testers, but also means that you're likely to see requested changes very quickly).

Specify is software. Arctos is a system (from which you can use the software if you so desire). To implement Specify and get Arctos-like reliability, you'd need a pair of servers ($5000 each), on-site backup ($2000 + tapes [$50 per day - as many as you can afford, but at least 30]), off-site backup (Amazon's S3 would be one option - something less than $500/month), a firewall, and ideally security folks, systems administration folks (software and hardware support), and database folks. They you're still left running MySQL as a backend. That's a fabulous little database for things like retrieval speed, but I would NEVER trust it to maintain archival or sensitive data, especially in an online environment. It was not designed for that, doesn't support the tools necessary to do so securely, and will never be comparable to Enterprise-class software like Oracle (they could have at least picked PostGreSQL!). Specify's ability to run on much less does not mean that doing so is a good idea. I think fulfilling our public trust obligations demands more (and, increasingly, so does NSF). Free does not necessarily imply inexpensive when system costs are considered.

Arctos's to-user costs are significantly less than Specify's. We're currently asking for $5000 per participant per year - the cost of one decent Dell server or the ColdFusion license. We've reached a size where we can ask for additional support to further reduce that cost while simultaneously increasing resources. There is a proposal in the pipeline that would do just that, in a fairly dramatic fashion, while also bringing a major museum (and some very smart developers and users there) closer to the core Arctos development team.

We also differ in strategy. Specify maintains a fairly traditional schema, and their tables are increasingly wide (non-normalized). Arctos tends towards leaner normalized tables and more sophisticated coding to make that work. The payoff for us is extensibility - Gordon just spent 3 days in Chicago discussing with various DB folks something we implemented in around 20 minutes. The cost is programming - it's more difficult to write code to normalized structures (one of the original model architects once told me it was impossible - while staring at Arctos, interestingly enough).

Specify's new stratigraphy extension is a pretty good example of the differing development philosophies. They've added 3 wide tables which allows you to record values for a finite and pre-determined set of geological attributes. We've added one very narrow table which allows you to define any number of terms and values, along with who, when, and why. Our model is infinitely flexible, even in the absence of a programmer, and records determinations or opinions. Theirs is simpler to implement, requires code changes to support new types of data, and records assertions.

Some Museum folks aren't much interested in collaborating with peers, tracking non-traditional data (usage and such), participating in community initiatives (MaNIS et al.), having broad exposure for their data, or generally being on the frontlines of information accessibility. Those people will probably never be happy in Arctos, and, assuming they can provide long-term data and hardware security, Specify is probably where they should be. The half-dozen recovering Specify users I personally know, admittedly all very much proactive front-lines folks, have few nice things to say about Specify. In fairness, I probably wouldn't know the people who remain happy with Specify.

It's worth mentioning that Specify and Arctos share a basic model, and Jim Beach was involved in the early development efforts of that model. There's been much divergence, but the core tables still share names and the occasional field.

All that said, I'm not above blatantly stealing good ideas from anyone, and our development strategy nicely equips us to do so. Let me know if you see something you think we need in their screenshot gallery.