Archive for the 'Tech' Category

Oct 23 2007

Just another place to look for email?

Published by Mark Reichenbach under Tech

DashAs if there weren’t enough places to look for discoverable data, Dan Farber and the folks at ZDNet take a look at a new consumer electronics offering that brings the Net into your car.

Dash brings the Internet inside the car by ZDNet’s Dan Farber — At the Web 2.0 Summit, Dash Navigation showed of what it called the world’s first connected navigation platform. The GPS device uses Wi-Fi and the cellular network to connect to the Internet. What’s unique is that the Dash device can mashup with Web services. For example, with a Zillow feed users can see what houses […]

No responses yet

Oct 15 2007

ICANN & IDN - Masters of Our Domain

Published by Mark Reichenbach under Tech

New York  - October 15, 2007

By Mark V. Reichenbach for On the Mark 

Today ICANN expanded its Internationalized Domain Name effort, adding 11 more foreign languages to the test.

Read 10/15/2007 Announcement

While ICANN issued a press release on October 9 and a brief news story ran on NPR, I’ve heard little of the test and suspect other users in the US and UK have not been exposed to a great deal of press about this topic.

However, on the international technology stage this story is something we need to be aware of. Many questions and problems must be addressed; hopefully ICANN and others are on the ball with respect to these issues. Kim Davies of ICANN has posted the challenges she sees  on her ICANN Blog.

We talked earlier here at On the Mark about Unicode and its future use in e-discovery software and platforms.

What challenges will confront e-discovery software developers as IDN registrations explode and these new domain names work their way into the data we’re processing?

What challenges await multinational corporations?

On the Mark is quite fortunate to have Jonathan Redgrave of  RDRW shed some much-needed light:

The reality of a multi-cultural and multi-lingual Internet brings all of the world’s languages (including slang and idioms) directly in focus in terms of electronic discovery.  Just as the total volume of data being pulled into litigation in the United States has increased dramatically with the electronic age, the sheer number (and relative percentage) of data that is non-English- (and non-Roman-character-) based will also increase in United States litigation in the coming years, presenting even more challenges to the legal profession.  The ICANN announcement serves as an apt reminder that: (a) e-discovery processes must be designed to anticipate this increasingly multi-lingual and multi-cultural base of potentially discoverable electronically stored information;  and (2) e-discovery software solutions must have sufficient capabilities and flexibility to address new data locations, types, forms and reference indicators, including the full implementation of Internationalized Domain Names. 

What will corporations need to consider in order to protect their Internet Domain Name rights?

I suspect we will see another wave of domain-name “land grabs,” this time in foreign languages and character sets. Are the Go-Daddys, Verisigns and other domain-name registrars of the world simply salivating at the thought of a boom ahead?

And when that happens, where and how will the inevitable disputes be resolved?

Jonathan Redgrave:

“While it is unclear if there will be a new “land grab” or “gold rush” for new domain names that essentially translate existing owned Internet “real estate” (such as trademarks), domestic and international corporations should be attentive to the changing Internet not only to protect what they already have but also to think about how adapting to such changes may increase their reach to more people in their native languages”. 

The Masters of Our Domain

All Seinfeld jokes aside, the characters (and symbols) we’re talking about here may have different meanings and interpretations in different languages. Some characters and symbols have multiple meanings in their native language. Complicate this with something as simple as an incorrect font being installed or more specifically not having the correct font installed, viewing  and accessing these domain names may be difficult to nearly impossible.

Using the existing criteria, I can see how two entities may both have equal standing to claim the rights to a name.

Will ICANN re-evaluate its rules and criteria with respect to an organization’s right to ownership?

It’s far too early to tell, and we encourage readers to return to On the Mark as we continue to monitor developments and bring the clarity of expert voices in our community to this site.

 

6 responses so far

Oct 10 2007

Unicode Down The Road

Published by Mark Reichenbach under Tech

Now that I’m officially in the blogosphere, I was out giving up a little “link-love” (blog speak) and came across the blog entry by George Rudoy on EDDUpdate, which touches upon a future consideration we should all have on our radar screens.

Blogs should be brief, so follow the link above to read his original post and the follow-up comments that spurred this blog entry.

While the demographic of On The Mark users may comprise mostly technical and industry insiders, perhaps some here would benefit from a quick baseline for those not quite up-to-speed on Unicode.

What is Unicode? That’s a fair question. And a long answer. I’ll take the easy path and suggest you familiarize yourself with the efforts of The Unicode Consortium.

Think of all the different written languages in the world, comprised of text or symbols, and then think of all the problems computers and programs and people have integrating them. Some very smart people identified a need and devised a standard that made sense across the entire spectrum. Here’s a snippet of their definition:

 “Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.”

Scintillating stuff, right?  OK, so here’s where it gets interesting for us. Just as corporations are worldwide and diverse, so are the e-discovery factors that make it so challenging. The old tried-and-true ways we process and review legal documents is not going to cut it, if you put multilingual, multinational equations into play. Throw into the mix EU privacy regulation, US privacy laws and all the other laws and regulations, etc. that may be applicable to your data scenario and you can see this is a complex, faceted equation.

Just as George Rudoy’s post was informative, the comments (not mine) made to his post are insightful. I’ll paste some here for continuity and ease of access.

Georgy Pados of Shearman and Sterling opined:

“While no software package is perfect, more will be required to deal with multiple language data than recognizing Unicode characters and throwing analytics on top of it.
“It’s key to have the different components play together hand in hand throughout the entire process.”

 

“-auto-categorization: how sophisticated this really is? Majority of the cases bring mixed / multi characterset challenges: can the threshold be set in the software to manage the language categorization algorithms to order and score the results ?

(can you set the language score order if only the parent email is in Italian but the attachment is in English what about if the email thread is partly Italian/German (30%/ 60%) but the attachment drafts are 100% English)”

“how does the system handle paper scanned / OCR extracted content ? Can you seamlessly incorporate OCR results into the indexer (Do you have unicode OCR/OWR engines plugged into the indexer)?”
 

On The Mark: Georgy makes really good points here and gives a really good example.

Another comment, by my learned colleague

Chuck William CTO of MetaLINCS:

Full support for the languages of the world involves many complex technical issues.

First, all of the different character coding systems used around the world. These all need to be recognized and transformed to Unicode, wherever these encodings might occur in content and metadata.

The next challenge is tokenizing the content, i.e. breaking it up into words, numbers and other sensible units. As westerners we think this is easy– just look for the spaces and other punctuation. But take a look at languages like Chinese and Japanese, where there are typically no spaces, and you’ll begin to appreciate the problem.

Then we get into the whole area of linguistic analysis, starting from something simple like stemming (e.g., “going” –> “go”), moving into more complex features like identifying noun phrases (which form the basis for many notions of “concept” used in EDD). These functions are all language-specific.

A good system needs to identify the language(s) in which each document has been written, both for internal reasons such as applying the correct linguistic analysis rules, and for the benefit of users who may wish to search, review or translate content in a specific language and so need to identify the documents using that language.

Integration with the UI is another challenging area. Consider query term highlighting and entering queries in a different language. Right-to-left languages like Arabic and Hebrew present unique challenges in this area.

Fortunately, these are all well-understood technical issues and good products exist that address them effectively.

On The Mark: Chuck also makes really good points .

Our industry is witnessing an evolution in the development of enterprise software for corporations and law firms alike. And few companies have demonstrated leadership in the thoughtful and proper implementation of unicode foreign-language detection, functionality and foreign data analytics for in-house installation. Although I’ve checked my ”evangelist” hat at the door for this post, know I am quite confident of one particular offering “in this space”.

One response so far

Oct 04 2007

A Burger Blog and E-Coli Printing for MicroChips

Published by Mark Reichenbach under General, Tech

Here’s two quick items that both seem slightly ridiculous but I’m sure of possible interest to some readers here.

Blogging on Burgers? Why not?  This guy probably has little interest in E-Discovery unless it has cheese and a side of fries but here you go: Click the link for Adam Kuban’s “Serious Eat’s” 

Second item is out of Duke University and one I found on Roland Piquepaille’s Blog.

Pocket protector or not, this is interesting (and tiny) stuff.

“Researchers at Duke University have developed a new printing technique using catalysts to create microdevices such as labs-on-a-chip. Their inkless printing technique uses enzymes from E. coli bacteria and has an accuracy of less than 2 nanometers. While they’re now using enzymes to stamp nanopatterns without ink, the research team is already working with non-enzymatic catalysts. And it added that ‘future versions of the inkless technique could be used to build complex nanoscale devices with unprecedented precision.’”

No responses yet

« Prev

WP-Highlight